SRT Target overflow disconnects Troubleshooting guide
31 min
what is an srt target overflow disconnect? an srt target overflow disconnect is a condition where a zixi broadcaster detect that an output buffer for an srt pull target has exceeded its capacity, causing the broadcaster to terminate and immediately re establish the connection overflow on an srt output is distinct from a general network drop or timeout the broadcaster is actively cutting the connection in response to buffer pressure the underlying transport path may be healthy because the broadcaster reconnects automatically and quickly, the events can be easy to overlook the stream appears to recover on its own in practice, each disconnect reconnect cycle causes a brief output interruption, and if the underlying cause it not resolved, the disconnects may continue to recur, often at increasing frequency under sustained load or high motion content why srt targets overflow and disconnect understanding which of the following causes applies is the first step in selecting the correct fix encoder sending excessive udp packets (packet flooding) the most common cause of overflow on a zixi broadcaster is an encoder sending 1 mpeg ts packet per udp packet instead of the recommended 7 this results in a packet rate approximately 9x higher than expected, which overwhelms the broadcaster's srt output buffer regardless of the stream bitrate \<font color="#15803d">expected vs actual packet rate for a 27 mbps stream \</font> \<font color="#15803d">expected 27,000,000 ÷ 8 ÷ 1316 ≈ 2,565 packets/sec \</font> \<font color="#15803d">flooding 18,000 packets/sec (approx 7× too high)\</font> if the observed packet rate is significantly higher than the expected value packet flooding is the cause fix configure the encoder to send 7 mpeg ts packets per udp packet this eliminates the packet flod and removes the overflow condition network path differences causing output buffer overflow when some srt targets on the same source are stable while others are not, the cause is typically a difference in the different network path to each destination the srt latency buffer must be sized to accommodate the round trip time (rtt) of the specific path a latency setting this is adequate for a low rtt destination (eg a nearby data centre) will be sufficient for a high rtt destination (eg a distant cloud region), causing buffer overflow on the longer path while the shorter path targets remain stable recommendation set srt latency to at least 3 4x the measured rtt to the destination too low a latency setting is a frequent cause of srt overflow on long distance paths broadcaster migration host configuration differences overflow issues that occur immediately after migration to a new broadcaster instance are often caused by differences in the host environment rather than the stream itself the stream and network may be identical, but the new host might be less capable of handling the same load contributing factors include os level socket buffer sizes smaller than the previous host nic settings or driver configuration that limit throughput or increase latency firewall rules on the new host that asymmetrically rate limit traffic to specific target destinations software version differences between old/new broadcaster encoder bitrate spikes exceeding srt buffer headroom even with correctly configured packet rates and adequate srt headroom latency, a vbr encoder that produces large bitrate spikes (ie scene changes, fast motion or cut heavy content) can momentarily exceed the srt output buffer's capacity the spike saturates the buffer faster than it can drain to the target destination triggering an overflow disconnect this cause is particularly common on long destination paths where the drain rate is constrained by the bandwidth or rtt see encoder bitrate spike troubleshooting guide docid 9ci3oqw8sz lqibokrqoo for diagnosis and fixes specific to this cause insufficient max bitrate configuration in zixi the max bitrate parameter on zixi broadcaster inputs and failover groups controls internal buffer allocation if max bitrate is set correctly within the 1 5 2x the stream's bitrate recommended value, the allocated buffer has no headroom to absorb short bursts including retransmission overhead, bitrate spikes or congestion when the actual bitrate (incl overhead) exceeds the allocated ceiling, overflow results how overflow disconnects cause problems downstream stream instability and viewer impact each disconnect/reconnect causes a brief interruption in the output stream delivered to the srt pull target this will be viewed as a black screen, audio dropout or momentary freeze at the receiving end even if the reconnect is fast, the interruption is visible and disruptive for live content cascading failures in redundant architectures in failover configurations where multiple srt targets are receiving from the same source, an overflow condition on one target does not always stay isolated the processing overhead generated by the repeat disconnect/reconnect cycles on affected targets can increase load on the broadcaster, degrading performance for stable targets on the same instance in high density deployments, a single flooding encoder or misconfigured target can trigger overflow events across multiple unrelated streams on the same host masking of upstream issues because the broadcaster reconnects automatically and quickly, repeated overflow disconnects can mask a more serious upstream problem if overflow is caused by encoder packet flooding, the actual stream quality issue (malformed ts packaging, excessive pid rate, or encoder misconfiguration) goes unaddressed while the symptom (brief drops) appears to self resolve without checking the system log, the overflow trigger may never be identified impact during broadcaster migration host configuration differences can cause overflow disconnects and incorrectly be attributed to the migration process itself underlying socket buffer or nic configuration difference then persists and continue to cause overflows on the new host indefinitely what to check confirm overflow is the trigger what to check before investigating possible causes, confirm that overflow is the actual reason for the disconnect rather than a network timeout, authentication failure, or unrelated error how • check the broadcaster system log for overflow events coinciding with the disconnect timestamps the log should show an overflow condition on the specific output stream immediately before the disconnect • confirm the reconnect occurs immediately a brief disconnect followed by instant reconnect is characteristic of an overflow triggered cut rather than a network timeout, which typically has a longer reconnect delay • in zen master, check the event log for the affected target and note the exact timestamps of disconnect events cross reference these against the broadcaster log to confirm overflow entries at the same timestamps root cause confirmed if overflow event is logged immediately before each disconnect, and reconnect is near instantaneous calculate and compare packet rate what to check determine whether the encoder is flooding the broadcaster with an excessive number of udp packets how on the encoder or using a stream analyzer, measure the actual udp packet rate on the source stream then calculate the expected packet rate expected packet rate = bitrate (bps) ÷ 8 ÷ 1316 example 27 mbps stream 27,000,000 ÷ 8 ÷ 1316 = 2,565 packets/sec expected if observed rate is 7–9× higher (e g 18,000 packets/sec) you can also capture a short packet trace on the broadcaster's inbound interface using tcpdump and count udp packets per second from the encoder source ip to confirm tcpdump i \<interface> n udp src \<encoder ip> c 10000 2>/dev/null | wc l root cause observed packet rate is 7–9× higher than the calculated expected rate — encoder is sending 1 mpeg ts packet per udp packet configure the encoder to encapsulate 7 mpeg ts packets per udp packet this is a standard setting on most broadcast encoders and is the correct configuration for mpeg ts over udp/ip delivery to zixi confirm the packet rate drops to expected levels after the change compare stable vs affected targets what to check if some targets on the same source are stable while others are overflowing, the cause is a network path difference between destinations how • identify which targets on the same source are stable and which are overflowing if stable targets route to one destination and affected targets route to another, proceed with path comparison • measure rtt to both destinations from the broadcaster host and compare a significantly higher rtt to the affected destination confirms the path is the differentiating factor • check the srt latency setting on both sets of targets if the affected targets have the same latency setting as the stable targets but a longer path rtt, the latency is insufficient latency should be at least 3–4× the measured rtt to that specific destination • check for any firewall rules, qos policies, or routing differences that apply to the affected destination path but not the stable one • if all targets on the source are affected, path difference is not the cause proceed to steps 4 and 5 root cause srt latency setting is insufficient for the rtt of the affected destination path increase the srt latency on the affected targets to at least 3–4× the measured rtt for example, a path with 80ms rtt requires a minimum srt latency of 320ms apply the change and monitor for recurrence do not apply the increased latency to stable targets on shorter paths unnecessarily high latency increases buffer size and end to end delay without benefit check srt latency configuration what to check even when all targets appear to be on similar paths, an undersized srt latency setting is a common and easily overlooked cause of overflow on any destination with non trivial rtt how • on the affected targets in zixi broadcaster, navigate to outputs > \[target] > edit and review the srt latency setting • measure the current rtt to the destination from the broadcaster host using ping or mtr confirm the latency setting meets the 3–4× rtt minimum if the latency is set to a default value (commonly 120ms or 200ms) and the actual rtt is above 50ms, the buffer is likely undersized for the path • also check whether the latency setting on the receiving end (if it is a zixi device or srt capable decoder) matches or is compatible with the broadcaster's output latency setting a mismatch can cause silent buffer underruns at the receiver that feed back as pressure on the broadcaster output buffer root cause srt latency set to a generic default that does not reflect the actual path rtt to the destination increase srt latency on the affected output to 4× rtt as a conservative starting point monitor overflow events in the broadcaster log after the change reduce incrementally if the overflow resolves and lower latency is desired, keeping 3× rtt as the minimum floor check max bitrate what to check confirm that max bitrate on the affected input and failover group is set high enough to provide adequate buffer headroom for the stream's peak bitrate how • in zixi broadcaster, navigate to inputs > \[stream] > edit and check the max bitrate value also check max bitrate on the associated failover group if one is in use • compare max bitrate against the stream's observed peak bitrate from the zen master bitrate graph or broadcaster input statistics not just the nominal average if max bitrate is at or close to the average bitrate, the buffer has no headroom for retransmission overhead, bitrate spikes, or burst events • the recommended value is 1 5–2× the highest expected peak bitrate eg for a 20 mbps average stream that peaks at 30 mbps, max bitrate should be set to at least 45–60 mbps root cause max bitrate set incorrectly, leaving no internal buffer headroom for peak traffic set max bitrate to 1 5–2× the highest expected peak bitrate on both the input and failover group restart the affected input after the change if required for the new setting to take effect check for host level issues after migration what to check if overflow appeared after migrating to a new broadcaster host, compare the host environment against the previous instance to identify configuration differences that reduce the new host's ability to handle the same load how ssh into the new broadcaster host and check the following os socket buffer sizes sysctl net core rmem max sysctl net core wmem max \# compare against values from the previous host \# low values (< 8mb) can cause srt buffer pressure under high bitrate conditions \# to increase (non persistent — test first) sysctl w net core rmem max=16777216 sysctl w net core wmem max=16777216 \# to persist across reboots, add to /etc/sysctl conf \# net core rmem max = 16777216 \# net core wmem max = 16777216 nic settings and driver version ethtool i \<interface> # check driver version ethtool g \<interface> # check ring buffer sizes ethtool s \<interface> # check for rx drops \# look for rx dropped, rx queue 0 drops, nic rx dropped \# non zero rx drops indicate packets being discarded at the nic level software version and firewall • confirm the broadcaster software version on the new host matches the previous host, or review the release notes for any srt related changes between versions • check firewall rules on the new host for any asymmetric rate limiting or egress rules that target specific destination ips or ports and were not present on the previous host root cause new broadcaster host has smaller socket buffers, different nic configuration, or additional firewall rules compared to the previous host align the new host's os socket buffer sizes, nic ring buffer settings, and firewall rules with the previous host configuration if the broadcaster software version differs, review the release notes for srt changes and test after aligning the version summary decision tree srt target(s) disconnecting and reconnecting repeatedly overflow event logged immediately before disconnect? > no > not an overflow disconnect check for network timeouts or auth issues yes continue below observed packet rate >> (bitrate ÷ 8 ÷ 1316)? > yes > encoder flooding bx with 1 mpeg ts per udp packet fix configure encoder to send 7 mpeg ts packets per udp packet (step 2) some targets on same source stable; others overflowing? > yes > network path difference between destinations check rtt to each destination; srt latency on affected targets fix increase srt latency to ≥4× rtt on affected targets (step 3) issue appeared after migrating to a new broadcaster host? > yes > host configuration difference check os socket buffers, nic settings, firewall rules, software version fix align new host config with previous host (step 6) max bitrate set at or close to nominal average bitrate? > yes > insufficient internal buffer headroom fix set max bitrate to 1 5–2× highest expected peak bitrate (step 5) overflow only during high motion content or scene changes? > yes > encoder bitrate spike exceeding srt buffer headroom fix see encoder bitrate spike troubleshooting guide docid 9ci3oqw8sz lqibokrqoo increase max bitrate and review encoder vbv/cbr settings cause unconfirmed after above steps? > yes > action enable network logging (info level) on the affected broadcaster collect logs and packet capture, and open a support ticket with rtt data and overflow event timestamps escalation checklist if disconnects continue after applying the fixes above, escalate to zixi support with the following item details / location broadcaster version and os shown in broadcaster ui header or /etc/zixi/version stream name and affected target name push/pull, protocol, srt mode (caller/listener) system log with overflow events visible network logging set to info level on the broadcaster; covering the period of disconnects measured rtt to affected and stable destinations from the broadcaster host using ping or mtr encoder packet rate and mpeg ts per udp configuration from stream analyser or tcpdump capture on the inbound interface srt latency settings for affected vs stable targets from outputs → \[target] → edit max bitrate configuration for the affected input and failover group from inputs → \[stream] → edit os socket buffer sizes and nic settings on broadcaster host sysctl net core rmem max / wmem max; ethtool g and s packet capture on the broadcaster's inbound interface pcap covering the overflow period (tcpdump on the interface)
