Traffic engineering in BGP: a small-scale overview of a large-scale topic
After our first blog was released, we were asked by a reader, ‘How do I optimise traffic flow over a large-scale network?’ The answer is not short or simple, so we thought we would release a blog series about Traffic Engineering in BGP (or TE for short!) over the coming months.
Traffic engineering is simply optimising how the packets on the Internet go from their place of origin to their destination in the most efficient way because, let’s face it, no-one wants their CoD game lagging, especially when you’ve just lined up that headshot!
As the demand for Internet usage continues to increase, TE is becoming more critical as end users demand higher quality service and network costs start biting. Performance optimisation is a concern, therefore, particularly for large-scale IP networks. Network performance requirements are multifaceted and complex, making TE very challenging. This blog post will help provide a simple overview of TE and look at 2 critical aspects of TE: capacity management and traffic management.
What is traffic engineering?
TE is an essential aspect of network engineering that attempts to optimise network performance through path optimisation and capacity balancing in operational networks. In our context, it means measurement, modelling and control of Internet traffic to ensure traffic types use the paths best suited to their efficient throughput.
Why do we need to ‘engineer’ Internet traffic?
Most of you packet experts will already have answered the question that is the paragraph title. Still, for those who haven’t, TE is used to optimise spend, congestion, bandwidth utilisation, latency and other factors towards improving overall network performance!
In the beginning …
The principles of traffic routing can be seen in classical telephone networks that used static hierarchical routing where routing patterns were fixed. The hierarchy sought to accommodate overflow traffic, improve reliability through alternate routes and prevent call looping. But, in short, this wasn’t the most effective way to manage traffic, so dynamic routing was introduced. This certainly alleviated the routing inflexibility issue so the network could operate more efficiently. It saw significant economic gains, including the reduction of overall loss probability and improved network resilience by recalculating routes and periodically updating routes. There’s more we could go into here, but who really wants to be tortured by the recall of Erlang calculations!
Historically, Internet routing protocols only used rudimentary traffic engineering principles in their ‘distance vector’ or ‘link state’ forms but with the evolution of BGP more parameters have been included to allow calculation and prioritisation of the ‘best’ path.
TE today – capacity management and traffic management
There are many facets of TE, and each network must assess its own network needs and plan accordingly. In this blog, we cover 2 important aspects of TE: capacity management and traffic management, both of which require continual monitoring and improvement – there’s no option to just set and forget here!
Capacity management focuses on capacity planning, routing control and resource management. It requires engineers to monitor the performance and throughput of their networks, conduct analysis of measurement data, performance tuning, and understand and forecast end-user requirements to create plans for the network and capacity. In short, we need to understand your requirements based on your network’s usage patterns to provide a quality, reliable and robust peering fabric!
At IAA, our basic stack comprises SNMP for monitoring interface stats from switches, Observium and Grafana to help visualise the SNMP data – the graphs from Grafana are a thing of beauty!
Further, our portal data allows us to see total VLL and extended reach provisioned capacity, and we use a set of policies that help us determine when capacity upgrades should occur. If you didn’t see our annual report article, we’re waiting on gorgeous 400 Gbps Aristas to arrive so we can upgrade our Sydney ring. Lastly, we’ve started using Akvorado to monitor sFlow data on our transit network (AS10084) to help identify any congested links and provide further information on traffic flows, such as highest-traffic ASNs.
Traffic management focuses on node-based traffic control functions such as traffic conditioning, queue management, scheduling and arbitrating access to network resources between different packets or between different traffic streams. Some examples of traffic management techniques include class-of-service policies (prioritising certain types of traffic over others, such as voice), shaping (using packet buffers to avoid loss on heavily utilised links), policing (enforcing maximum traffic rates on circuits with an agreed peak rate lower than that of the physical interfaces and dropping excess packets) and firewalls/access control lists. Traffic management policies typically only apply within an individual network, as each network operator usually has a different view of which traffic types are higher priority and what service classes those traffic types should fall into, although exceptions to this rule do exist.
IX Australia doesn’t employ any traffic classification or arbitration mechanisms, as the nature of the Internet is that it is a ‘best-effort’ network, and all traffic types are treated equally, although we do throttle broadcast and multicast traffic for obvious reasons.
Coming up next month
If you’ve enjoyed this blog article, stay tuned for next month. We’ll continue our TE in BGP series looking at the difference between outbound and inbound traffic.