Re: [perf-discuss] Improved Performance MIB for OpenSolaris - proposal

Jason King Thu, 12 Feb 2009 07:10:14 -0800

On Thu, Feb 12, 2009 at 1:06 AM, adrian cockcroft
<adrian.cockcr...@gmail.com> wrote:
> There is an endless number of free performance monitoring tools, it would
> make more sense to me to build something more portable distributed and high
> level like ganglia or xetoolkit into opensolaris.


Perhaps I didn't look close enough, but all of those require yet
another agent be installed on the system, as well as a rather large
amount of agent configuration to get it to where it can actually get
anything more than the bare minimum of information from an Opensolaris
system.

Also, ganglia's model assumes all the world is nothing but HPC
clusters.  While it can apparently be shoehorned to be used with the
rest of the world, you're going to be subject to the confines of the
original model, which past experience has made me leery because of the
number of awkward workarounds one usually has to do.

>
> A highly extended Solaris snmpd full of performance stats was created years
> ago for the SunMC product. No point reinventing that wheel, the code must
> still be kicking around somewhere, even if SunMC isn't.

That implies that the code can be opensourced.  We don't know that,
and given the complexity of the agents (as I recall, even something as
simple as changing the port number the agent listened on was a rather
involved and error prone process), I'm not sure how useful it'd be if
it was.

>
> I'm solving this problem for myself on a few machines right now using orca
> (orcaware.com) although I live in Linux land nowadays so I'm using
> procallator.pl rather than orcallator.se to get the data. They key is to
> have one timebase for collection, and collate all the data you could ever
> want aligned against it.

And then you're left with a proliferation of agents running on the
box.  Having been in environments where such things were taken to
absurd extremes, I'd rather have one information collection agent
instead of 10 or more, all with different install and updating
procedures, operating quirks, or worse (one in particular would force
itself to always run in the RT scheduling class!).  Don't
underestimate the amount of administrative overhead that can generate.
 SMA is supposedly the snmp solution for Opensolaris, let's stop
treating it like a third-class citizen.

>
> Adrian
>
> On Wed, Feb 11, 2009 at 3:32 PM, Jason King <ja...@ansipunx.net> wrote:
>>
>> On Wed, Feb 11, 2009 at 4:33 PM, Brendan Gregg - Sun Microsystems
>> <bren...@sun.com> wrote:
>> > On Tue, Feb 10, 2009 at 11:56:10PM -0600, Jason King wrote:
>> >> On Tue, Feb 10, 2009 at 11:08 PM, Brendan Gregg - Sun Microsystems
>> >> <bren...@sun.com> wrote:
>> >> > G'Day Folks,
>> >> >
>> >> > On Tue, Feb 10, 2009 at 08:03:17PM +0000, Peter Tribble wrote:
>> >> > [...]
>> >> >> Create a net-snmp module that exposes well known Solaris performance
>> >> >> metrics via SNMP.  If possible, this will include presenting kstat
>> >> >> metrics in a  generic fashion via SNMP.
>> > [...]
>> >> >
>> >> > ... and if we start with what's needed instead of what Solaris
>> >> > provides, then
>> >> > we may have a generic enough performance MIB to port to other
>> >> > systems. :)
>> >>
>> >> Well what I wanted to start with is the data presented by the *stat
>> >> commands (vmstat, mpstat, etc.).  In most cases, they are just showing
>> >> the difference between a number of kstats over a chosen time interval,
>> >> which just happens to make the implementation a bit easier.   Or to
>> >> think of it another way, for the initial piece at least, the interface
>> >> is the same metrics seen using the *stat commands (so to speak), the
>> >> fact kstats are used in obtaining the numbers is an implementation
>> >> detail.
>> >>
>> >> I think with that would address most of the stability concerns.
>> >>
>> >> If (as was suggested) we add the ability to present kstats in a more
>> >> generic fashion (I think that would be in addition to the above
>> >> piece), it would need to be in a way that if new kstats are added, or
>> >> old ones deleted, the MIB would not need to be updated.  I think
>> >> everyone here knows that kstats are subject to change without notice.
>> >> However if that's all that's available at the time, a working 'wrong'
>> >> solution is better than a non-existant 'right' solution.
>> >
>> > Why would the right solution be non-existant?  It's not hard to add
>> > kstats.
>> > The world of performance has too many wrong solutions - it confuses
>> > customers
>> > and can lead to purchases based on bad information.  For a recent
>> > example,
>> > read Bryan's article on the commonly requested SPEC SFS benchmark:
>> > http://blogs.sun.com/bmc/entry/eulogy_for_a_benchmark
>>
>> Adding kstats is not the issue.
>> The problem is, today there is essentially 0 way to record and
>> aggregrate any performance data whether it's kstats, sar, dtrace from
>> multiple systems.
>> Doing it manually or rolling your own is a needless waste of time (and
>> quite tedious).
>>
>> So from that, what performance data exists?  Essentially all that's
>> out there is vmstat, mpstat, iostat, fsstat, intrstat, etc.  They are
>> well known, even if all the numbers aren't perfect, they're better
>> than 0 data, and seem to have done the job well enough for the past
>> 10+ years.  So I think those are a good starting point.
>>
>> Now what happens if what you need to know isn't in the data in one of
>> those commands? (again with the whole multi-system collection &
>> aggregration requirement).  Well if a kstat happens to have the data
>> we need, being able to _easily_ grab that is damn useful, stable
>> output or not.   Saying 'well no, you have to wait until it's all
>> perfected and put into a form that's been stabilized and fully ARCed,
>> then released when someone finds time and resources to write it' isn't
>> much of an answer.
>>
>> >
>> >> As it is, the initial impetus for this was trying to do a basic
>> >> compare box A to box B for work, to be able to do at least rudimentary
>> >> evaluation for consolidation for zones.  This means looking at
>> >> historic data.  Today the only bundled option is to parse the sar
>> >> data.  That is painful for a number of reasons (group that admins box
>> >> B has sar collecting over different intervals, just manipulating the
>> >> data in general is rather annoying and time comsuming, etc.)  Going
>> >> forward, one could write a bunch of custom scripts to run vmstat,
>> >> mpstat, etc. and write them to a log or a database or a central
>> >> server, or one could just avoid reinventing the wheel and just make
>> >> them available via snmp.  One round wheel is as good as another, so I
>> >> don't feel the need to make another one :)
>> >
>> > This is touching on a different issue - yes, we need a better
>> > performance
>> > archive solution than sar.  Fishworks has bundled a kstat/DTrace based
>> > one called Analytics in the new storage products (which outright kills
>> > any need for sar), although that doesn't help us on [Open]Solaris right
>> > now.
>>
>> Can it aggregrate data across multiple systems?  And yes, unless it's
>> available on OpenSolaris, it's not of much use.
>>
>> > This may be an opportunity to create new and more useful perf statistics
>> > that we export via kstat and SNMP - which I think has more value than
>> > reheating what's already there.  Consider the following:
>> >
>> >  $ sysperfstat 1
>> >             ------ Utilisation ------     ------ Saturation ------
>> >     Time    %CPU   %Mem  %Disk   %Net     CPU    Mem   Disk    Net
>> >  23:07:10    0.85  44.11   2.40   0.19    0.01   0.00   0.00   0.00
>> >  23:07:11    7.00  95.17   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:12    4.00  95.63   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:13    5.00  96.09   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:14    5.00  96.55   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:15    5.00  97.01   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:16    6.00  97.47   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:17    5.00  97.92   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:18    9.00  97.84   2.00   0.00    0.00  20.51   0.04   0.00
>> >  23:07:19    6.00  97.92   2.75   0.00    0.00  13.04   0.04   0.00
>> >  23:07:20    6.00  97.91   2.85   0.00    0.00  18.22   0.04   0.00
>> >  [...]
>> >
>> > I wrote this as a solution to the problem of system wide observability
>> > (and
>> > in this case, one that fits in an 80-char wide format - SNMP dosn't have
>> > that
>> > restriction, so should serve a better and more detailed selection of
>> > statistics.)  Serving out vmstat style metrics without addressing why is
>> > a
>> > solution in search of a problem - and a solution that's about 25 years
>> > old.
>>
>> vmstat and the like have been useful enough apparently that no one
>> (including Sun) has so far created replacements.  They are still used
>> by Sun support for solving customer performance problems.
>>
>>
>> > But, if you are wedded to the idea of re-serving vmstat and what not,
>> > I'd make
>> > that clear in the MIB - that this is the SNMP view of vmstat etc - which
>> > hopefully doesn't confuse anyone more than the existing tools.
>> >  Customers can
>> > also use existing documentation to understand vmstat's ancient metrics.
>> >  And
>> > so, 'perfmib' may be a bad name - it's not the best perf MIB we could
>> > possibly
>> > do; perhaps 'perftoolsmib' - as the SNMP view of common perf tools...
>> >  Which
>> > I'd agree does have real value. :)
>>
>> Well (to pick one example), if the number of major faults on a cpu (as
>> seen from mpstat) doesn't mean just that, I think there are larger
>> problems in OpenSolaris.
>>
>> One thing I think that gets forgotten, when there's a problem, while
>> being able to observe certain behavior real time is very useful, it's
>> also useful to be able to compare historical data (which dtrace can't
>> help with).  A lot of times problems pop up, and even if the solution
>> is found, the question of 'why did it happen now' gets asked a lot.
>> Historical data goes a long way towards helping answer that.
>> _______________________________________________
>> perf-discuss mailing list
>> perf-discuss@opensolaris.org
>
>
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] Improved Performance MIB for OpenSolaris - proposal

Reply via email to