The implementation of sFlow on virtual switches places an sFlow agent in an 
ideal location to monitor the performance of physical and virtual machines, 
unifying network and system performance monitoring.

The scalability of sFlow's counter push mechanism provides an efficient way to 
monitor the large number physical and virtual switches and servers in a data 
center. The number of virtual machines per server is going up, 20-40 virtual 
machines per physical machine is not unusual. Monitoring a data center with 
10,000 physical switch ports might involve monitoring as many as 5,000 physical 
server, 10,000 virtual switches and 200,000 virtual switch ports and 100,000 
virtual servers. sFlow has the scalability needed to monitor the traffic and 
performance of all the physical and virtual switches and physical and virtual 
servers in this environment. Extending sFlow to add server performance 
monitoring is straightforward and would simplify management by providing a 
single, unified measurement system.

There are a relatively small number of metrics that are typically used to 
monitor system performance, the following set is exported by Ganglia, a widely 
used, open source performance monitoring system for monitoring cluster/grid 
performance
http://ganglia.sourceforge.net/

         bytes_in      Number of bytes in per second            l,f
         bytes_out     Number of bytes out per second           l,f
         cpu_aidle     Percent of time since boot idle CPU      l
         cpu_idle      Percent CPU idle                         l,f
         cpu_intr
         cpu_nice      Percent CPU nice                         l,f
         cpu_num       Number of CPUs                           l,f
         cpu_rm
         cpu_speed     Speed in MHz of CPU                      l,f
         cpu_ssys
         cpu_system    Percent CPU system                       l,f
         cpu_user      Percent CPU user                         l,f
         cpu_vm
         cpu_wait
         cpu_wio
         disk_free     Total free disk space                    l,f
         disk_total    Total available disk space               l,f
         load_fifteen  Fifteen minute load average              l,f
         load_five     Five minute load average                 l,f
         load_one      One minute load average                  l,f
         location      GPS coordinates for host                 e
         machine_type
         mem_buffers   Amount of buffered memory                l,f
         mem_cached    Amount of cached memory                  l,f
         mem_free      Amount of available memory               l,f
         mem_shared    Amount of shared memory                  l,f
         mem_total     Amount of available memory               l,f
         part_max_used Maximum percent used for all partitions  l,f
         pkts_in       Packets in per second                    l,f
         pkts_out      Packets out per second                   l,f
         proc_run      Total number of running processes        l,f
         proc_total    Total number of processes                l,f
         swap_free     Amount of available swap memory          l,f
         swap_total    Total amount of swap memory              l,f

Note: Ganglia defines a common set of metrics that can be collected from a wide 
variety of operating systems. Basing the sFlow counters on the Ganglia metrics 
builds on 10 years of work in defining a set of metrics that has proven to be 
effective in a wide range of systems. In addition, the Ganglia project has 
built a library for obtaining these metrics on different platforms. Basing the 
sFlow specification on the Ganglia metrics would allow an sFlow agent to 
leverage this library, greatly simplifies the task of collecting host 
statistics for an sFlow agent.

For virtual machines, the xenstat library defines a similar set of counters for 
virtual machines in a Xen environment and VMware maintains similar performance 
counters for virtual machines.

It is relatively easy to come up with a set of sFlow counter structures to 
export this data. However, to unify network and system monitoring (i.e. to be 
able to associate the network traffic generated by a host with its performance 
counters) you need a common key.

Each physical/virtual machine is associated with one or more physical or 
virtual network adapters. Defining an sFlow structure to associate the adapter 
MAC addresses with the host performance counters provides the needed common key.

/* Physical or virtual network adapter NIC/vNIC */
struct host_adapter {
   unsigned int ifIndex;     /* ifIndex associated with adapter
                                Must match ifIndex of vSwitch
                                port if vSwitch is exporting sFlow
                                0 = unknown */
   mac mac_address<>;        /* Adapter MAC address(es) */
}

/* Set of adapters associated with entity.
   A physical server will identify the physical network adapters
   associated with it and a virtual server will identify its virtual
   adapters. */
/* opaque = counter_data; enterprise = 0; format = 2001 */

struct host_adapters {
   adapter adapters<>;              /* adapter(s) associated with entity */
}

The basic mechanisms of sFlow (counter polling and random packet sampling) are 
extremely scalable and can be extended to collect the data needed to manage 
converged data center infrastructures. An integrated, end-to-end, 
instrumentation system is needed management of complex operations like virtual 
machine migration that affect system, network and storage performance. 
Extending sFlow provides a multi-vendor solution to data center monitoring that 
addresses these requirements.

Peter

Reply via email to