Avoiding the network tap dance

Latest News AND EVENTS

Stay up to date and find all the latest news and latest events from Metamako right here.

Avoiding the network tap dance

Posted on August 20, 2015 by David Snowdon 20 August 2015

 

In mission critical architectures it often becomes necessary to analyze data at different points within the network.  Common reasons for doing this include troubleshooting/problem diagnosis, latency analysis, capturing data for regulatory obligations or to backtest strategies.  The mechanisms in place to do this are usually implemented as an afterthought using outdated technologies or performed using a number of ad-hoc components applied as and when a problem occurs, which is often too late.
TAP’s (Test Access Points) should not be an operational burden nor should it affect network performance.  The following guidelines should be followed to enable useful tapping architectures:
 

1. Don’t wait until there is a problem to implement taps.

Most firms will under-invest in infrastructure that can be used to add taps.  Due to the lack of investment a common occurrence is that a tap will not be available when needed and need to be installed after hours or within an approved change window.  Quite often this simply too late to diagnose a problem, or results in the tap being in place and the issue not re-occuring.  To add to the delay, different types of tap infrastructure may be needed for different media types.  What's needed to add a tap a 10G LR optical signal may differ from what's needed to add a tap to a Twinax Direct Attach Copper cable.   Infrastructures should be built in a way that every media type in the network should have the capability to add a tap, using the same technology, on demand and when needed.
 

2. Carefully consider the value of passive optical taps.

Aside from the inherent costs associated with passive optical taps which we discussed in a previous blog post there are a number of reasons why they are not the optimal solution for mission critical architectures.

Passive optical taps divert some of the light in the fiber from the network receiving device to the capture/analysis device, which results in a loss to the optical signal strength.  If you have put a passive optical tap in place to diagnose a packet drop issue, there’s no way to be sure that the passive optical tap is not going to exacerbate the issue.   Also degradation of the quality of the optical signal is likely which will result in jitter of the signal, along with a decrease in the eye margin.  
Due to the “passive” nature of passive optical taps you have no visibility into the data coming through the tap, nor do you have any ability to perform value added functions on that stream, such as timestamping.  In short, what passive optical taps gain by being passive, is massively outweighed by their lack of visibility and predictability.
 

3. Don't use SPAN ports

SPAN (Switch Port Analyzer) ports will copy the data RX and TX data from each of the received ports into a single switch port for receipt by a capture or analysis device.  This can be extremely convenient as it allows for source ports to be added to SPAN ports dynamically and aggregates all of the data to one single location for analysis.  However SPAN ports come with a number of challenges, specifically when applied to mission critical architectures:
  • If the aggregate data stream exceeds the the SPAN port capacity packets will be dropped.  Not only does this make problem diagnosis somewhat difficult - it also makes this method wholly unsuitable for any type of historic data capture (for compliance or backtesting).
  • Media/hardware errors are dropped - making it even harder to determine where exactly the root cause of any problem may be.
  • Due to the burden placed on a switch's CPU by copying the data to the SPAN port quite often latency is introduced by enabling a SPAN port.

 4. Aggregate your taps to a single analysis or capture point

Despite the challenges with SPAN ports the logic of aggregating all RX and TX streams to a single port for delivery to a capture or analytics device is solid.  The cost associated with such capture/analytics devices are usually quite a high proportion of the budget for infrastructures and the need to leverage multiple devices can come with an extremely high cost.  It's also important to ensure that each of the aggregated streams are aggregated in a deterministic way and in the order received. 
 

5. Rely on accurate and well synchronized timestamps

Recording all of the data from separate points of interest on a network is extremely useful.  Knowing when that data actually arrives at each point in the network is infinitely more useful either for problem diagnosis or latency measurements.  The device acting as the tap should always have the ability to timestamp the data at nanosecond resolution and sync with whatever the master source of time being used by the capture/analaytics platform is.  Knowing that one size doesn’t fit all, at Metamako we support PTP, NTP and PPS from all of our devices.  ESMA (European Securities and Markets Authority) has made a recommendation that time-stamping be at the nanosecond level - leading to a hard requirement to have taps be able to support nanosecond resolution timestamps for any compliance captures in financial services.
 

Conclusion

Adding the ability to have taps shouldn’t be an afterthought to network design.  For mission critical environments, the taps requirements should be designed from the outset and align with the guidelines above.   Our MetaMux device allows for 16 bi-directional taps to be aggregated, on demand, with minimal impact to the pass through stream.  Having a second MetaMux as an aggregation tier allows for many taps to be aggregated to a single point for capture or analytics.
 
 
MetaMux addresses the challenges of traditional taps above by providing:
  • The ability to enable taps on demand for bi-directional connections passing through the device
  • One infrastructure to add taps to multiple media types
  • Fully regenerated signal of all data passed through the device
  • Buffering of aggregated data
  • Complete visibility (including eye diagrams) and statistics of all data passed through the device
  • Minimal 5ns (and predictable, within 100 picos)  impact to the latency of data streams passing through the device - the equivalent of a metre of fibre!
  • Information and statistics on hardware and media packet drops
  • Full visibility into light levels and eye diagrams for optical signals
  • The ability to diagnose issues at a basic level using inbuilt statistics and Tcpdump before activating a tap
  • Aggregation of all data from taps to a single point for delivery to capture or analytics devices
  • The ability to add ingress timestamps for all data 
Adding network taps shouldn’t need to be complicated or time consuming.   Contact us today to evaluate MetaMux: info@metamako.com