CPR: Design

Sunday, 22-Nov-2009 23:22:25 EST

Measurements

The set of tools deployed on the CPR boxes should provide insight into network and application problems. The active monitoring should be fully meshed between the hosts and certain CPR hosts will also have additional tasks, for example selective monitoring of offsite nodes. Home-grown testing tools and related software will largely be developed by willing students.

The measurements should include

  • one way latency (using the owamp program)
  • round-trip latency (using ping),
  • layer 3 routing (using traceroute, UDP and ICMP probes).
  • throughput (using iperf).
In addition the CPR box should monitor applications
  • http (telnet to port 80 on the main Georgia Tech webpage, time the syn/ack response)
  • dhcp. A client could be written to test the leasing of addresses, but not use it.
  • dns (telnet to port 53 on the main Georgia Tech name servers, and check responses to selected requests)
  • ftp (uploading/downloading files to determine real throughput)
The CPR team will also explore passive monitoring techniques, these might include
  • Correlating with the existing GOAT tool
  • Sniffing the traffic (perhaps using tcpdump)
  • Analyzing logs to correlate change.
  • Working with other groups such as NETI or NDT to analyze user performance and recommend improvements
The CPR team will also work with the wireless team to obtain LAWN diagnostics such as
  • SSID association testing
  • Monitoring RF Load.
  • Rogue AP detection
  • P2P/bluetooth/interference monitoring.
  • DHCP and authentication testing.
  • Performance and capacity monitoring.

Visualization and Analysis

Visualization of the data should include current (last 24 hours, last 1 hour) and historic histograms of the responses. All the data should be logged locally and centrally. If local data fails to upload to the central archive, this in itself should generate an alarm. Perhaps a summary of the results can be over-layed on a layer 3 map similarly to the Open-view front end. John Merrit has previously suggested constructing a layer 2 map. Perhaps a student project can explore using SNMP or CDP to construct a map of each VLAN, showing the location of the root bridge and other important information. A summary page with a color-coded status bar should be used to allow problems to be flagged quickly. What about Current Monitoring? The network monitoring tools such as Open view and SPAM used by the helpdesk do not necessarily reflect the users experience or help the backbone team to resolve it. CPR will extend the ability of the existing tools to detect problems, and provide new ways to resolve them. In addition, the SWARM tool could be used to monitor the performance of the CPR hosts.
This page was last modified on Tuesday, 23-Jan-2007 13:37:54 EST .