It’s no secret that the public Internet is a quagmire of latency and packet loss problems. No wonder, many of clients are reluctant to trust Internet-based SD-WANs with VoIP and business-critical applications. After all, how can an SD-WAN running over Internet provide a predictable user experience if the underlying transport is so unpredictable?
To answer that question, SD-WAN Experts recently evaluated the performance and stability of long-distance Internet connections. Our goal: to determine the source of the Internet’s performance problems by measuring variability and latency in the last and middle miles.
What we found was by swapping out the Internet core for a managed middle mile makes an enormous difference. Case in point is Amazon. The latency and variation between our AWS workloads was significantly better across Amazon’s network than the public Internet (see figure). Why that’s the case and how we tested is explained below and in greater depth from this post on our site.
The why and how we tested
If I was an old MPLS hand (which I am) clinging to my MPLS past (which I’m not) I might be inclined to dismiss our research from the outset. After all, if my MPLS performance declines what difference does it make where it happens in my service provider’s network? The service has a problem and either the provider fixes the problem, or I ultimately switch MPLS services.
But because of the way the Internet works we have more options when it comes to selecting providers. Precisely because Internet connections consist of three components — the last mile from the customer’s premises to the ISP’s premises (sometimes called the “first mile”), the middle mile (sometimes called the Internet core), and the last mile from the destination ISP’s premises to the customer’s premises — we have some additional flexibility. We can select a different last mile provider, if it’s the source of our problems. And if the problem is in the middle mile, we can choose a different middle-mile provider either explicitly or by switching to an ISP connected to a different middle mile.
To figure out the source of the Internet’s performance problems, our testing measured the time to first byte (TTFB) when sending test files from AWS workloads between one another and to locations in six cities. TTFB is a more accurate measure of latency than a simple PING. It looks at the time needed to send a packet and wait for an acknowledgment, eliminating the time for setup and connection negotiation. Tests were conducted with various Internet tools — Catchpoint, SpeedTest, and Cedexis. We repeated the tests, in part for verification purposes, and reported on each path’s median latency and variation (the standard deviation from median latency). Last and middle mile performance were compared in absolute numbers and relative to the segment or overall latency.
Middle mile vs. last mile: which is more stable?
We’ve often spoken about the unpredictability of the Internet middle mile. A major source of the problem is that ISPs route packets based on economics not application performance. They dump traffic on one another to maximize their infrastructure investments. As such, Internet traffic might take the fewest possible hops to reach a destination one day and bounce around the world the next.
And while most people I speak with think the middle mile is more unstable than the last mile, that’s not exactly true. Our testing shows, last mile latency fluctuates by a far greater percentage than latency in the middle mile, reaching 196% in some cases. By contrast, the middle mile fluctuated by no more than 143%.
So why focus on the middle mile? Simple: The middle mile’s impact on the overall connection is far more significant. The above mentioned 196% number, for example, described the last mile variation for four paths to Bangalore from San Jose, London, Tokyo and Sydney. The actual number, though, was a matter of 5.88ms (3ms was the median last mile latency). By contrast, the middle miles varied from 36% to 85% — 92ms to 125ms — a 20x greater impact on the connection.
And that aligns with what you probably intuitively knew all along. Last miles are comparatively short, extending from the customer premise to the local ISP’s network. The middle miles in our case, though, stretched across North America, the Atlantic, or the Pacific depending on the path. Latency should be far greater across the middle mile if only because of the distance not to mention the routing issues.
Moreover, the middle mile issues might be more prominent on an international Internet routes, but they’re not limited to them. Even within well-developed Internet regions, such as in the US, middle mile issues occur. In our testing, we found that latency variation in the middle mile from Virginia to San Francisco, for example, to be the highest of all paths (103 ms or 143% from the path’s median latency).
The answer: privatize the middle mile
Back to our original question: Can you trust an SD-WAN to deliver a predictable user experience if the basis of that SD-WAN is the unpredictable public Internet? Based on our testing, the answer is a qualified “yes.”
Let’s take apart the problem — middle mile and last mile. As we’ve seen, any concerns around latency will lie in the Internet core, at least when long distances separate the source and destination. In those cases, as we’ve seen, fluctuations in the last mile become negligible relative to the middle mile.
As such, latency and variability problems will likely arise when delivering VoIP and other applications requiring low-latency, predictable connections. It’s a crapshoot; some days VoIP will be fine operating over the Internet middle mile, other days unsuitable. (To better understand why running enterprise VoIP across the Internet is less than ideal, see this in-depth blog by longtime VoIP expert, Phil Edholm.) The impact of the last mile, though, from a latency standpoint is nominal. As such, you can use the last mile for access to a middle mile without concern that last mile latency will be problem. (Packet loss rates and availability are issues in the last mile, which we haven’t spoken about here but can be largely addressed by a combination of SD-WAN features and selecting the right ISP.)
All of which is why providers of global SD-WAN services can claim “MPLS-like” performance even though customers must access their global networks using the Internet last mile. Swapping the Internet core for a managed network significantly reduces latency even when that managed network is not MPLS. How much better? Consider this: the median variation of the tested paths across the Amazon network was just 9.91ms vs. 83.6 for the public Internet — a nearly 10x difference.