On Wed, Feb 11, 2009 at 4:33 PM, Brendan Gregg - Sun Microsystems
<bren...@sun.com> wrote:
> On Tue, Feb 10, 2009 at 11:56:10PM -0600, Jason King wrote:
>> On Tue, Feb 10, 2009 at 11:08 PM, Brendan Gregg - Sun Microsystems
>> <bren...@sun.com> wrote:
>> > G'Day Folks,
>> >
>> > On Tue, Feb 10, 2009 at 08:03:17PM +0000, Peter Tribble wrote:
>> > [...]
>> >> Create a net-snmp module that exposes well known Solaris performance
>> >> metrics via SNMP.  If possible, this will include presenting kstat
>> >> metrics in a  generic fashion via SNMP.
> [...]
>> >
>> > ... and if we start with what's needed instead of what Solaris provides, 
>> > then
>> > we may have a generic enough performance MIB to port to other systems. :)
>>
>> Well what I wanted to start with is the data presented by the *stat
>> commands (vmstat, mpstat, etc.).  In most cases, they are just showing
>> the difference between a number of kstats over a chosen time interval,
>> which just happens to make the implementation a bit easier.   Or to
>> think of it another way, for the initial piece at least, the interface
>> is the same metrics seen using the *stat commands (so to speak), the
>> fact kstats are used in obtaining the numbers is an implementation
>> detail.
>>
>> I think with that would address most of the stability concerns.
>>
>> If (as was suggested) we add the ability to present kstats in a more
>> generic fashion (I think that would be in addition to the above
>> piece), it would need to be in a way that if new kstats are added, or
>> old ones deleted, the MIB would not need to be updated.  I think
>> everyone here knows that kstats are subject to change without notice.
>> However if that's all that's available at the time, a working 'wrong'
>> solution is better than a non-existant 'right' solution.
>
> Why would the right solution be non-existant?  It's not hard to add kstats.
> The world of performance has too many wrong solutions - it confuses customers
> and can lead to purchases based on bad information.  For a recent example,
> read Bryan's article on the commonly requested SPEC SFS benchmark:
> http://blogs.sun.com/bmc/entry/eulogy_for_a_benchmark

Adding kstats is not the issue.
The problem is, today there is essentially 0 way to record and
aggregrate any performance data whether it's kstats, sar, dtrace from
multiple systems.
Doing it manually or rolling your own is a needless waste of time (and
quite tedious).

So from that, what performance data exists?  Essentially all that's
out there is vmstat, mpstat, iostat, fsstat, intrstat, etc.  They are
well known, even if all the numbers aren't perfect, they're better
than 0 data, and seem to have done the job well enough for the past
10+ years.  So I think those are a good starting point.

Now what happens if what you need to know isn't in the data in one of
those commands? (again with the whole multi-system collection &
aggregration requirement).  Well if a kstat happens to have the data
we need, being able to _easily_ grab that is damn useful, stable
output or not.   Saying 'well no, you have to wait until it's all
perfected and put into a form that's been stabilized and fully ARCed,
then released when someone finds time and resources to write it' isn't
much of an answer.

>
>> As it is, the initial impetus for this was trying to do a basic
>> compare box A to box B for work, to be able to do at least rudimentary
>> evaluation for consolidation for zones.  This means looking at
>> historic data.  Today the only bundled option is to parse the sar
>> data.  That is painful for a number of reasons (group that admins box
>> B has sar collecting over different intervals, just manipulating the
>> data in general is rather annoying and time comsuming, etc.)  Going
>> forward, one could write a bunch of custom scripts to run vmstat,
>> mpstat, etc. and write them to a log or a database or a central
>> server, or one could just avoid reinventing the wheel and just make
>> them available via snmp.  One round wheel is as good as another, so I
>> don't feel the need to make another one :)
>
> This is touching on a different issue - yes, we need a better performance
> archive solution than sar.  Fishworks has bundled a kstat/DTrace based
> one called Analytics in the new storage products (which outright kills
> any need for sar), although that doesn't help us on [Open]Solaris right now.

Can it aggregrate data across multiple systems?  And yes, unless it's
available on OpenSolaris, it's not of much use.

> This may be an opportunity to create new and more useful perf statistics
> that we export via kstat and SNMP - which I think has more value than
> reheating what's already there.  Consider the following:
>
>  $ sysperfstat 1
>             ------ Utilisation ------     ------ Saturation ------
>     Time    %CPU   %Mem  %Disk   %Net     CPU    Mem   Disk    Net
>  23:07:10    0.85  44.11   2.40   0.19    0.01   0.00   0.00   0.00
>  23:07:11    7.00  95.17   0.00   0.00    0.00   0.00   0.00   0.00
>  23:07:12    4.00  95.63   0.00   0.00    0.00   0.00   0.00   0.00
>  23:07:13    5.00  96.09   0.00   0.00    0.00   0.00   0.00   0.00
>  23:07:14    5.00  96.55   0.00   0.00    0.00   0.00   0.00   0.00
>  23:07:15    5.00  97.01   0.00   0.00    0.00   0.00   0.00   0.00
>  23:07:16    6.00  97.47   0.00   0.00    0.00   0.00   0.00   0.00
>  23:07:17    5.00  97.92   0.00   0.00    0.00   0.00   0.00   0.00
>  23:07:18    9.00  97.84   2.00   0.00    0.00  20.51   0.04   0.00
>  23:07:19    6.00  97.92   2.75   0.00    0.00  13.04   0.04   0.00
>  23:07:20    6.00  97.91   2.85   0.00    0.00  18.22   0.04   0.00
>  [...]
>
> I wrote this as a solution to the problem of system wide observability (and
> in this case, one that fits in an 80-char wide format - SNMP dosn't have that
> restriction, so should serve a better and more detailed selection of
> statistics.)  Serving out vmstat style metrics without addressing why is a
> solution in search of a problem - and a solution that's about 25 years old.

vmstat and the like have been useful enough apparently that no one
(including Sun) has so far created replacements.  They are still used
by Sun support for solving customer performance problems.


> But, if you are wedded to the idea of re-serving vmstat and what not, I'd make
> that clear in the MIB - that this is the SNMP view of vmstat etc - which
> hopefully doesn't confuse anyone more than the existing tools.  Customers can
> also use existing documentation to understand vmstat's ancient metrics.  And
> so, 'perfmib' may be a bad name - it's not the best perf MIB we could possibly
> do; perhaps 'perftoolsmib' - as the SNMP view of common perf tools...  Which
> I'd agree does have real value. :)

Well (to pick one example), if the number of major faults on a cpu (as
seen from mpstat) doesn't mean just that, I think there are larger
problems in OpenSolaris.

One thing I think that gets forgotten, when there's a problem, while
being able to observe certain behavior real time is very useful, it's
also useful to be able to compare historical data (which dtrace can't
help with).  A lot of times problems pop up, and even if the solution
is found, the question of 'why did it happen now' gets asked a lot.
Historical data goes a long way towards helping answer that.
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to