Abstract
This publication discusses the spectrum of problems associated with transporting Constant Bit Rate (CBR) circuits over packet networks, specifically focusing VoIP services. It provides guidance on practical calculation for voice bandwidth allocation in IP networks, including the maximum bandwidth proportion allocation and LLQ queue settings. Lastly, the publication discusses the benefits and drawbacks of transporting CBR flows over packet switched networks and demonstrates some effectiveness criteria.Introduction
Historically, the main design goal of Packet Switched Networks (PSNs) was optimum bandwidth utilization for low-speed links. Compared to their counterpart, circuit-switched networks (CSNs such as SONET/SDH networks), PSNs use statistical as opposed to deterministic (synchronous) multiplexing. This feature allows PSNs to be very effective for bursty traffic sources, i.e. those that send traffic sporadically. Indeed, with many sources this allows the transmission channel to be optimally utilized by sending traffic only when necessary. Statistical multiplexing is only possible if every node in the network implements packet queueing, because PSNs introduce link contention. One good historical example is ARPANET: the network theoretical foundation has been developed in Kleinrock’s work on distributed queueing systems.In PSNs, it is common for the traffic from multiple sources to be scheduled for sending out the same link at the same moment. In such case of contention for the shared resource, exceeding packets are buffered, delayed and possibly dropped. In addition to this, packets could be re-ordered, i.e. packets sent earlier may arrive behind packets that have been sent after them. The latter is normally a result of packets taking different paths in the PSN as a due to routing decisions. Such behavior is OK with bursty, delay insensitive data traffic, but completely inconsistent with the behavior of constant bit rate (CBR), delay/jitter sensitive traffic sources, such as emulated TDM traffic. Indeed, transporting CBR flows over PSNs poses significant challenges. Firstly, emulating a circuit service requires that every node should not buffer the CBR packets (i.e. should not introduce delay or packet drops) and be “flow-aware” to avoid re-ordering. The other challenge is the “packet overhead” tax imposed on emulated CBR circuits. Per their definition, CBR sources produce relatively small burst of data at regular periodic intervals. The more frequent are the intervals, the typically smaller are the bursts. In turn, PSNs apply a header to every transmitted burst of information to implement network addressing and routing, with the header size being often comparable to the CBR payload. This significantly decreases link utilization efficiency when transporting CBRtraffic.
Emulating CBR services over PSN
At first, it may seem that changing queuing discipline in every node will resolve the buffering problem. Obviously, if we distinguish CBR flow packets and service them ahead of all other packets using priority-queue then they would never get buffered. This assumes that link speed is fast enough so that serialization delay is negligible in the context of the given CBR flow. Such delay may vary depending on the CBR source: for example, voice flows typically produce one codec sample every 10ms and based on this, serialization delay at every node should not exceed 10ms, or preferably be less than that (otherwise, the next produced packet will “catch up” the previous one). Serialization problem on slow links could be solved using fragmentation and interleaving mechanics, e.g. as demonstrated in [6]. Despite of priority queueing and fragmentation, situation becomes more complicated with multiple CBR flows transported over the same PSN. The reason is that there is now contention among the CBR flows, since all of them should be serviced on priority basis. This creates queueing issues and intolerable delays. There is only one answer to reduce resource contention PSNs – over-provisioning.Following the work [2], let’s review how the minimum over-provisioning rate could be calculated. We first define the CBR flows as those that cannot tolerate a single delay of their packet in the link queue. Assume there are r equally behaving traffic flows contending for the same link. Pick up a designated flow out of the set. When we randomly “look” at the link, the probability that we see a packet from the designated flow is 1/r, since we assume that all flows are serviced equally by the connected PSN node. Then, the probability that the selected packet does NOT belong to our flow is 1-1/r respectively. If the link can accept at maximum t packets per millisecond, then during an interval of k milliseconds the probability that our designated flow may send a packet over the link without blocking is: P=1-(1-1/r)^tk, where (1-1/r)^tk is the probability of NOT seeing our flows designated packet amount the tk packets. The value P is the probability of any given packet NOT being delayed due to contention. It is important to understand that every delayed packet will cause flow behavior deviation from CBR. Following [2], we define u=tk as the “over-provisioning ratio” where u=1 when the channel can only send one flow packet during a time unit that take a flow to generate the same packet, i.e. when channel rate = flow rate. When u=2 the link is capable of sending twice as much packets during a unit of time compared to the number of packets sent by a single flow during the same interval. With the new variable, the formula becomes P=1-(1-1/r)^u. Fixing the value of P in this equation we obtain:
u=ln(1-P)/ln(1-1/r). (*)which the minimum over-provision ratio to achieve the desired P probability of successfully transmitting the designated flow’s packet when r equal flows are contending for the link. For example, with P=99.9% and r=10 we end up having u=ln(0.001)/ln(0.9)=65.5. That is, in order to provide the guarantee of not delaying 99.9% packets for 10 CBR flows we need to have at least 66 times more bandwidth than a single flow requires. Lowering P to 99% results in u=44 over-provisioning coefficient. It is interesting to look at the r/u ratio, which demonstrate what portion of minimally over-provisioned link bandwidth would be occupied by the “sensitive” flows when they all are transmitted in parallel. If we take the ratio r/u=ln((1-1/r)^r)/ln(1-P) then with large number of r we can replace (1-1/r)^r with 1/e and the link utilization ratio becomes approximated by:
r/u=-1/ln(1-P). (**)For P=99% we get the ratio of 21%, for P=99.9% the ratio is 14% and for P=90% the ratio becomes 43%. In practice, this means that for moderately large amount of concurrent CBR flows, e.g. over 30, you may allocate no more than specified percentage of the link’s bandwidth to CBR traffic based on the target QoS requirement.
Now that we are done with buffering delays, what about packet reordering and packet overhead tax? Reordering problem could be solved at the routing level in PSNs: if every routing node is aware of the flow state it may ensure that all packets belonging to the same flow are sent across the same path. This is typically implemented by deep packet inspection (which by the way violates the end-to-end principle as stated in RFC 1958) and classifying the packets based on the higher-level information. Such implementations are, however, rather effective as inspection and flow classification is typically performed in forwarding path using hardware acceleration. The overhead tax problem has two solution. The first one is, again, over-provisioning. By using high-capacity, low-utilized link we may ignore the bandwidth wastage due to the overhead. The second solution requires adding some state to network nodes: by performing flow inspection at the both ends of a single link we may strip the header information and replace it with a small flow ID. The other end of the link will reconstruct the original headers by matching the flow ID to the locally stored state information. This solution violates the end-to-end principle and has poor scalability as the number of flows grow. Typically it is used on slow-speed links. For example, VoIP services utilize IP/RTP/UDP and possibly TCP header compression to reduce the packet overhead tax.
Practical Example: VoIP Bandwidth Consumption
Let’s say we have a 2Mbps link and we want to know how to provision priority-queue settings for G.729 calls. Firstly, we need to know per-flow bandwidth consumption. You may find enough information on this topic referring to [3]. Assuming we are using header compression over Frame-Relay, the per-flow bandwidth is 11.6Kbps and maximum link capacity is roughly 2000Kbps the theoretical maximum over-subscription rate is 2000/11.6=172. We can find the maximum number of flows allowed under the condition of P asr=1/(1-exp(ln(1-p)/u)) = 1/(1-(1-P)^(1/u)) (***)setting u=170 and P=0.99. This yields the theoretical limit of 37 concurrent flows. The total bandwidth for that many flows is 37*11.6=429Kbps or about 21% of the link capacity, as predicted by the asymptotic formula (**) above. The remaining bandwidth could be used by other non-CBR applications, as it should be expected from a PSN exhibiting high link utilization efficiency.
Knowing the aggregate bandwidth and maximum number of flows provides us the parameters for admission control tools (e.g. policer rate, RSVP bandwidth and so froth). However, what is left to define yet are the burst settings and the queue depth for LLQ. The maximum theoretical burst size equals to the maximum number of flows multiplied by the voice frame size. From [3] we readily obtain that a compressed G.729 frame size for Frame-Relay connection is 29 bytes. This gives us the burst of 1073 bytes, which we could round up to 1100 bytes for safety. The maximum queue depth could be defined as number of flows minus one, since in the worst case at least one priority packet would be scheduled for serializing while others held in the priority queue for processing. This means the queue depth would be at maximum 36 packets. The resulting IOS configuration would look like:
policy-map TEST class TEST priority 430 1100 queue-limit 36 packets
Circuits vs Packets
It is interesting to compare voice call capacity for a digital TDM circuit vs the same circuit being used for packet mode transport. Talking of a E1 circuit, we can transport as many as 30 calls, if one channel is used for associated signaling (e.g. ISDN). Compare this to the 37 G.729 VoIP calls we may obtain if the same circuit is channelized and runs IP – about 20% increase in call capacity. However, it is important to point out the quality of G.729 calls is degraded as compared to digital 64Kbps bearer channels, not to mention that other services could not be delivered over a compressed emulated voice channel. It might be more fair comparing the digital E1 to the packetized E1 circuit carrying G.711 VoIP calls. In this case, the bit rate for a single call running over Frame-Relay encapsulation with IP/RTP/UDP header compression would be (160+2+7)*50*8=67600bps or 67.6Kbps. Maximum over-provision rate is 29 in such case, which ends up with only six (6) VoIP calls allowed for the packetized link with P=99% in-time delivery! Therefore, if you try providing digital call quality over a packet network you end up with extremely inefficient implementation. Finally, consider an intermediate case – G.729 calls without IP/RTP/UDP header compression. This case assumes that complexity belongs to the network edge, as any transit links are not required to implement the header compression procedure. We end up with the following: uncompressed G.729 call over Frame-Relay generates (20+40+7)*50*8=26.8Kbps which results in over-provisioning coefficient of u=74 and r=16 flows – slightly over the half of the number that a E1 could carry.Using the asymptotic formula (**) we see that for P=99% no more than 21% of the packetized link could be used for CBR services. This implies that the packet compression scheme should reduce the bandwidth of pure CBR flow by more than 5 times to be effectively compared with circuit-switched transport. Based on this, we conclude that PSNs could be more efficient for CBR transportation compared to “native” circuit-networks only if they utilize advanced processing features such as payload/header compression yielding compression coefficient over 5 times. However, we should keep in mind that such compression is not possible for all CBR services, e.g. relaying T1/E1 over IP has to maintain the full bandwidth of the original TDM channels, which is extremely inefficient in terms of resource utilization. Furthermore, the advanced CODEC features require complex equipment at the network edge and possibly additional complexity in the other parts of the network, e.g. in order to implement link header compression.
It could be argued that the remaining 79% of the packetized link could be used for data transmission, but the same is possible with circuit switched networks, provided that packet routers are attached to the edges. All the data packet switching routers need to do is dynamically request transmission circuit from the CSN based on traffic demands and used them for packet transmissions. This approach has been implemented, among others, in GMLPS ([5]).
Conclusions
The above logic demonstrates that PSNs were not really designed to be good at emulating true CBR services. Naturally, as the original intent of PSNs was maximizing the use of scarce link bandwidth. Transporting CBR services not only requires complex queueing disciplines but also ultimately over-provisioning the link bandwidth, thus somewhat defeating the main purpose of PSNs. Indeed if all that a PSN is used for is CBR service emulation, the under-utilization remains very significant. Some cases, like VoIP, allows for effective payload transformation and significant bandwidth reduction, which allows for more efficient use of network resources. On the other hand, such payload transformation requires introducing extra complexity to networking equipment. All this in addition to the fact that packet-switching equipment is inherently more complex and expensive compared to circuit-switched networks especially for very high-speed links. Indeed, packet switching logic requires complex dynamic lookups, large buffer memory and internal interconnection fabric. Memory requirements and dynamic state grow proportionally to the link speed, making high-speed packet-switching routers extremely expensive not only in hardware but also in software, due to advanced control plane requirement and proliferating services. More on this subject could be found in [4]. It is worth mentioning that the packet-switching inefficiency in network core has been realized long time ago, and there have been attempts for integrating circuit-switching core networks with packet-switching networks, most notable being GMPS ([5]). However, so far, the industry inertia did not make any of the proposed integration solutions viable.Despite of all arguments, VoIP implementations have been highly successful so far, most likely akin to the effectiveness of VoIP codecs. Of course, no one can yet say than VoIP over Internet provides quality comparable to digital phone lines, but at least it is cheap, and that’s what market is looking for. VoIP has been highly successful in enterprises, so far, mainly due to the fact that enterprise campus networks are mostly high-speed based on ethernet switched technology, that demonstrates very low general link utilization ratio, within 1-3% of available bandwidth. In such over-provisioned conditions, deploying VoIP should not pose major QoS challenges.