It was very important to us to give our potential customers a measure of the latency of the MetaConnect 16 which is reliable and transparent as to its calculation to help them in their assessment. Having enjoyed a healthy scepticism for others' benchmarks in the past, we hold ourselves to the highest standards with respect to this. We believed, based on the design, it should have a latency of around 5 nanoseconds. We believe we've come up with a methodology which gives believable results and the industry can place its trust in. Nevertheless, if you are sceptical of our benchmarks after reading this article, please get in touch with us at firstname.lastname@example.org.
Sources of error
As the latency of network technology has decreased, the technology required to measure it accurately has had to improve. There are a number of challenges with measuring low-latency devices.
Latency measurements are usually based on a timestamped packet capture – comparing the arrival times of two packets. If the timestamps are applied in software there is inevitably a non-determinism that comes when sharing resources (processors, caches, busses, etc). Network cards from several manufacturers include hardware timestamping and time synchronisation hardware (notably those from third party companies such as Solarflare and Myricom, which are popular in HFT – http://www.solarflare.com and http://www.myricom.com/). Dedicated packet capture cards like those from other third party companies Emulex (http://www.emulex.com/products/network-visibility-products/) and Napatech (http://www.napatech.com/) provide a line-rate capture with a hardware timestamping feature. These solutions solve the packet capture problem, but not the problems with experimental error that often occur. Hardware-based timestamping, implemented well, should give accurate and deterministic results.
The next problem is one of resolution – hardware timestamping cards usually have a resolution of around 4ns – that's about the same as the maximum useful clock speed in the FPGAs generally used to implement them. The accuracy of the cards is generally much worse than the precision – you can't necessarily believe what the cards tell you. Lastly, the synchronisation accuracy is usually much worse again.
Light travels through fibre (and copper) at roughly 5ns per metre. That means that variations in the amount of fibre in the test system can change the test results.
The problem for us (Metamako) in measuring the MetaConnect 16 latency is that the latency we're measuring (around 4ns) is smaller than the precision of the packet capture cards, and is also about the same as the fibres used to connect it to the test equipment.
To overcome these problems we used the following setup:
Two machines with two network cards (NIC 1 and NIC 2) are connected to each other directly in one direction, and via the system under test in the other. OM1 multimode fibre is used for connections throughout. The input and output of the system under test are tapped using optical taps, and the result is fed into an Emulex DAG9.2X2 packet capture card. The capture card has two ports on it which can timestamp using the same clock – i.e. there is no synchronisation error. The published accuracy for the Emulex 9.2X2 is 7.5ns.
In these tests, NIC 1 sends ping packets – ICMP requests – to NIC 2, which responds with ICMP replies. The rate and size of the ping packets can be adjusted but for the purposes of this test we have simply used a ping flood. Other test patterns could be generated, but for the purposes of the MetaConnect tests, the nature of the packets is not overly important (since MetaConnect forwards at a bit level, the contents at Layer 2 – packets – are not a relevant performance metric). We pass over 1 million of these packets through the system under test, and capture the packets and their timestamps using the DAG card. Comparing the timestamp from the packet going into the SUT to the timestamp on the packet coming out of the SUT gives us the latency of the SUT for the test configuration. We run an analysis over the million packets passed through the SUT for each test in order to obtain statistics about that test run.
To overcome the effects of resolution and quantisation error, we tested a large number of different configurations, where the only change in the latency of each configuration was the number of times the signal through the SUT passed through the MetaConnect16. To do this we set up a base-line system like this:
Here we have used fibre couplers to join a number of separate fibres together. We use 16 joiners in this configuration, and they are connected by 17 fibres.
To determine the latency of the MetaConnect device, the 16 joiners are replaced by connections to the MetaConnect – the MetaConnect is configured to transmit anything received on a port back to the same port – a loopback. We have no reason to expect that the loopback latency would be different to the latency to any other port – the data is taking the same path through the matrix switch (i.e. the loopback is not a shortcut).
For this test, we used Finisar FLTX8571 SFP+ modules in the MetaConnect and Mellanox ConnectX3 cards running at 10G. Note that for the purposes of this test we have no way of measuring the latency through the SFP+ modules. We would expect this to be on the order of a few hundred picoseconds for the round-trip.
By replacing the joiners, one-by-one, with a "hop" through the MetaConnect, we are able to determine the difference in latency for one hop, two hops, three hops, four hops, etc. Up to sixteen hops through the MetaConnect. This diagram shows two of the couplers having been replaced with hops through the MetaConnect.
To get an idea of what this involved, see some of these pictures taken during this series of tests:
The mean results for each run are recorded and plotted here:
Plotting this data gives us the following:
This is interesting! We have fitted two lines to the data to show that it is bi-modal. Two different groups of measurements, each with a very similar gradient but a different offset. The R^2 is high for each linear model, and the gradient of each line is very similar – a bit over 3.9ns per hop. Our explanation for these results is counter-intuitive. We believe this is caused by aliasing – because there is very little jitter, we are affected by the quantisation error in the Emulex DAG card. i.e. because the DAG has an accuracy of 7ns and the timestamps are quantised to this, the timestamps will be rounded. For different latencies, they will either be rounded up or down – i.e. it is bi-modal. We are seeing the network-latency equivalent of Moire Patterns.
So, by fitting two linear functions, observing that the coefficient of determination (R2) is high, and that the two fits have an extremely similar gradient, we can be confident in our measurements.
The gradients above indicate the latency per hop for that dataset – i.e. MetaConnect16 has a latency per hop of 3.95ns.
We have discussed a methodology for measuring the latency through an extremely low latency device – the MetaConnect 16. By taking a number of measurements and fitting a curve to those measurements we have been able to identify and compensate for several sources of error.
We have conclusively shown that the latency of the MetaConnect 16 is very close to 3.95ns per hop.