![]() |
Taxonomy of Network and Service Monitoring Approaches |
|
Monday, 23-Nov-2009 01:52:47 EST |
|
|
Author: Russ Clark (Russ.Clark@oit.gatech.edu)
Over the years I've been working on this taxonomy of network monitoring techniques. My interest is not just in monitoring the network itself but also in measuring the user experience of the network from diverse locations in order to identify problems before the user calls the helpdesk. To do this, we have to monitor layers 1 through 3 with classic network monitoring and measurement tools. However, we also need to look up into the application and monitor what users are actually trying to do. In addition to getting information to the network manager before the user calls it also greatly benefits the task of root cause analysis. We describe the family of network monitoring approaches in two categories, passive monitors and active monitors. Passive MonitorsThese don't add traffic to the network, they just watch what goes by. The major advantage of this of course is that we don't generate "extra" load on the network and servers. If the monitoring is done in enough detail, one can calculate actual user perceived performance for things like TCP connections, DNS lookups and file transfers. This is a big deal. While many of the active approaches (described later) claim to be measuring what the user is actually seeing, they don't really get all the way to the user. The disadvantage of passive monitoring is that it becomes more and more difficult to do it right as the volume of data increases. Also the movement away from true broadcast networks to switching means that more monitoring points are required in order to "see" all of the traffic. Another growing problem is that the increasing use of encryption hides the actual application details that we want to monitor. Passive monitoring generally relies on a promiscuous mode tap that can see all network traffic. This the classic RMON approach. It also can be found in commercial products like TrafficDirector as well as the current GOAT and many other publicly available tools and appliances such as NTOP. These tools are typically deployed at one or more locations on a network (e.g. border gateway, one per subnet). The data is gathered and often brought back to a central server for correlation and analysis. In addition to the dedicated monitoring device, there are a number of passive client-based tools that have been developed. These tools focus on the network performance experienced by a single user. A passive monitor is install on the user's computer that watches their network applications as they use them and reports the performance to a central collection point. From the network and service manager this is ideal as all of the users become "free" network probes. Of course, nothing is ever really free and some performance degradation will be seen by the user. The more successful attempts at this have worked to limit the pain. Some examples I know of in this arena are a commercial product called FirstSense and GT's very own NETI@home. Active MonitorsUnlike passive monitors, active monitors will generate traffic to perform a measurement. This includes traditional network tests like ping and traceroute but also application tests like file transfers and DNS lookups. The primary advantages of this approach are that it is somewhat easier to implement than the passive scanner and that it is possible for the network administrator to see a problem even before a user would see it. For instance, we can discover that the mail server went down at 4am and get it back up and running before the users ever notice the problem. The primary disadvantages of active monitoring are the additional load on both network and servers and the fact that we don't actually observe the real user experience but something designed to look like a user. These techniques can be divided into two groups: the real tests and the synthetic tests. In a sense, this is a measure of how close we are really getting to measuring the real user experience. A Real Test Active Monitor is a probe that sits out on the network, either in a dedicated box or on a user's computer, and performs operations with an on-line, production server. This tests not only the network performance but the complete end-to-end service. The goal is to get as close to the real user's experience as possible. If the probe can do a DNS lookup or get a DHCP lease then there's a good chance that the user can too. There are several publicly available tools that do this including Nagios. Commercial tools include those from Micromuse and one that I originally developed for Concord. An Active Synthetic Test is very similar to a real test in that it performs some real application such as a file transfer. However, this is not done to the production server but to a collection of dedicated performance testing boxes. There are several of these in the Internet today. Tools such as AMP and PingER fit here as do many others. The Iperf tool is often used in this manner. The Ganymede tool is one of the commercial offerings I am familiar with that operates in this way. Most implementations of active monitors will break down the test into components. For instance, a web server measurement will include timings for DNS lookup, TCP connection and then detailed application transaction timings (complete order, process credit card, etc.) ReportingOf course, the real value of all of this monitoring is limited unless there is adequate work on the data gathering, correlation and reporting tools. This is where the real analysis is done to determine first whether or not a problem exists and then who to contact to get it fixed. | |
|
This page was last modified on Tuesday, 23-Jan-2007 15:22:38 EST.
|
|