Summary: Reportable Metrics

Here is a summary of a recent query I posted to the list.

Reportable Metrics:

Original Query: to identify a list of network metrics to compile and report
to management on a monthly basis. (Emphasis is on metrics and not the tools
used to gather them.)

My original list:

1. Uptime per WAN or Internet circuit
2. # and average length of outages
3. Bandwidth utilization per WAN/Internet circuit and "important" VLANs
4. Overall Network Latency,  RTT measured from various parts of network
(cisco IPM)to various other parts
5. Top talkers per WAN circuit
6. Top destinations per WAN circuit
7. Top 10 most utilized WAN circuits (% burst above CIR, etc)
7. Protocol distribution per WAN circuit
8. Syslog/Sniffer alarms  by severity
9. Application Response time for key Apps (eg, SAP, HTTP)
10. Security Incidents
11. TACACs reports on number of logins, changes, etc
12. Bandwidth/Latency trending

Vince Mulhollan's additions:

1.	acceptable use policy violations
2.	number/severity of externally reported abuse complaints
3.	IOS deployments: upgrades, schedules, risk/rewards of IOSs in use
4.	Installations completed: hw and circuit changes.  Time to implement
each.
5.	RFO:  reasons for outages and remedies employed
6.	Employees % of time spent on: upgrades, installs, security
incidents, etc. Headcount in line with workload?
7.	Any hardware related trends:, ie particular devices burning out
frequently, etc.  Establish loose figure of likelihood of failure per type
of device


Joe St Sauver's recommendations:
1.	Don't overwhelm management with large quantity of data
2.	Implement "management by exception" by tracking/reporting "material
statistical deviations from expected values wherever possible. "
3.	"The other key concept is to give management gauges that will help
them drive the plane, rather than historical data that will tell them
when/where/how badly they crashed (last month). E.G., make the data timely
and operationally relevant."
	a.	-- What's broken?
	b.	-- Where am I vulnerable?
	c.	-- Where am I running out of capacity?
	d.	-- Where do I have performance problems?
	e.	-- What are we doing really well?
	f.	Where can I increase my return on already deployed assets?
(e.g., where  do I have underutilized capacity?)Look longitudinally (over
time), geographically (spatially), and at snapshots (cross sectionally).
4.	Focus on downtime as opposed to uptime and only report those that
exceed some acceptable threshold. Focus on cause of outages and responses to
those outages and whether there are ongoing problems in solving the issues.
5.	Tie all measurements to realities of the business: stats that bear
out billing expenses, those that might help marketing, or those that help in
planning, etc
6.	Dial-ins
7.	A list of URLs for more ideas
a.	Compare and contrast:
		http://hydra.uits.iu.edu/~abilene/traffic/
		http://monon.uits.iupui.edu/abilene/dnvr.html
		http://monon.uits.iupui.edu/abilene/dnvr/index.html
		http://monon.uits.iupui.edu/abilene/dnvr/uoreg-bits.html
		http://www.itec.oar.net/abilene-netflow/
b.	Latency, packet loss, route changes:
	i.	http://amp.nlanr.net/active/amp-uoregon/HPC/body.html
	ii.