On Thu, Feb 12, 2009 at 6:26 PM, Brendan Gregg - Sun Microsystems
<bren...@sun.com> wrote:
> On Wed, Feb 11, 2009 at 05:32:37PM -0600, Jason King wrote:
>> On Wed, Feb 11, 2009 at 4:33 PM, Brendan Gregg - Sun Microsystems
>> <bren...@sun.com> wrote:
> [..]
>> >> However if that's all that's available at the time, a working 'wrong'
>> >> solution is better than a non-existant 'right' solution.
>> >
>> > Why would the right solution be non-existant?  It's not hard to add kstats.
>> > The world of performance has too many wrong solutions - it confuses 
>> > customers
>> > and can lead to purchases based on bad information.  For a recent example,
>> > read Bryan's article on the commonly requested SPEC SFS benchmark:
>> > http://blogs.sun.com/bmc/entry/eulogy_for_a_benchmark
>>
>> Adding kstats is not the issue.
>> The problem is, today there is essentially 0 way to record and
>> aggregrate any performance data whether it's kstats, sar, dtrace from
>> multiple systems.
>> Doing it manually or rolling your own is a needless waste of time (and
>> quite tedious).
>>
>> So from that, what performance data exists?  Essentially all that's
>> out there is vmstat, mpstat, iostat, fsstat, intrstat, etc.  They are
>> well known, even if all the numbers aren't perfect, they're better
>> than 0 data,
>
> No.  Stop.  Do not assume any data is better than no data.  Wrong or 
> misleading
> data is *worse* than no data.

If the tools are presenting wrong data, that is a bug of the tool.  If
misleading, it suggests perhaps a documentation gap.  Having no data
when you're being pressured by management to come up with an answer
and just going 'well that's the way it is' is not likely to be well
received.

>
> I'm proposing either:
>
> A) 'perftoolsmib' - export vmstat, iostat, mpstat, etc.
>
> B) 'perfmib' - make an effort to export useful performance metrics, to meet
>   stated needs.  Examine what's there and keep what is good (I think
>   "iostat -xne" output is great), drop what's bad (some of vmstat), and add
>   what is missing - which means adding kstats to the kernel.

vmstat is probably a bad example -- the initial discussions (you can
look at perfmib-dev @ opensolaris.org -- please chime in if you have
good ideas) have so far proposed a exposing some of the data from
mpstat and fsstat so far, mostly based on experiences as to what
things have proven useful to figure out 'what's going on'.

>
>> and seem to have done the job well enough for the past
>> 10+ years.  So I think those are a good starting point.
>
> I disagree - I've met many Joe sysadmins who believe performance is a black
> art.  It shouldn't be.  Part of the problem is the tools available.
>
>> Now what happens if what you need to know isn't in the data in one of
>> those commands? (again with the whole multi-system collection &
>> aggregration requirement).  Well if a kstat happens to have the data
>> we need, being able to _easily_ grab that is damn useful, stable
>> output or not.   Saying 'well no, you have to wait until it's all
>> perfected and put into a form that's been stabilized and fully ARCed,
>> then released when someone finds time and resources to write it' isn't
>> much of an answer.
>
> If you dump kstats into SNMP - who is going to maintain the MIB file?

Well _if_ it's done (that's still up in the air, and I think would be
addressed after a suitable initial module is done), I would imagine
it'd be something of a table like this:
   module    octetstring
   name       octetstring
   instance   integer
   statname  octetstring
   strval       octetstring
   intval       integer    (0 if N/A for the particular stat)

I would think it would be more of a 'last' resort when a more suitable
structured value cannot be viewed.  Possibly in a separate module
even, so that much like the kstat command itself, you use it if you
have to, but you know the risks going in.

>
> [...]
>> > This may be an opportunity to create new and more useful perf statistics
>> > that we export via kstat and SNMP - which I think has more value than
>> > reheating what's already there.  Consider the following:
>> >
>> >  $ sysperfstat 1
>> >             ------ Utilisation ------     ------ Saturation ------
>> >     Time    %CPU   %Mem  %Disk   %Net     CPU    Mem   Disk    Net
>> >  23:07:10    0.85  44.11   2.40   0.19    0.01   0.00   0.00   0.00
>> >  23:07:11    7.00  95.17   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:12    4.00  95.63   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:13    5.00  96.09   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:14    5.00  96.55   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:15    5.00  97.01   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:16    6.00  97.47   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:17    5.00  97.92   0.00   0.00    0.00   0.00   0.00   0.00
>> >  23:07:18    9.00  97.84   2.00   0.00    0.00  20.51   0.04   0.00
>> >  23:07:19    6.00  97.92   2.75   0.00    0.00  13.04   0.04   0.00
>> >  23:07:20    6.00  97.91   2.85   0.00    0.00  18.22   0.04   0.00
>> >  [...]
>> >
>> > I wrote this as a solution to the problem of system wide observability (and
>> > in this case, one that fits in an 80-char wide format - SNMP dosn't have 
>> > that
>> > restriction, so should serve a better and more detailed selection of
>> > statistics.)  Serving out vmstat style metrics without addressing why is a
>> > solution in search of a problem - and a solution that's about 25 years old.
>>
>> vmstat and the like have been useful enough apparently that no one
>> (including Sun) has so far created replacements.  They are still used
>> by Sun support for solving customer performance problems.
>
> I just gave you an example of a new perf tool (sysperfstat - which can be
> improved further) - and you've returned to vmstat, because  ... everyone
> else sticks with vmstat?

Because, the instant someone says 'we'd like to use this kstat X:Y:Z
in programX', there's a thundering chorus of 'NO! YOU CANNOT DO THAT!
DO NOT USE KSTATS (EVER)!  THEY ARE UNSTABLE!', So what use is
creating _more_ kstats going to solve if they can never be used
(except by Sun of course).  So what choices are left? Since vmstat,
mpstat, iostat, etc. have been presenting the same metrics for a very
long time, they're the hardest ones to argue against (to the above
statement).  I don't know that _all_ of the data presented by those
tools would be exposed.  In fact, I'd guess that it'll end up being a
subset of those metrics that people have found useful in the past.

>
> People have been creating new tools inside and outside of Sun, but there is so
> much inertia behind what is there.  Just like there is for SPEC SFS - which is
> why I included Bryan's post for something technical to chew on.

Well a big part of the inertia is what is delivered with the OS.
Third party tools at some places are frowned upon, or there's already
a cross-platform 'standard' program of questionable utility which
means you cannot deviate from that (OS bundled tools are often exempt
for such restrictions).

Even if Sun has add-on tools, Sun (as a company) done a _terrible_ job
of marketing them (not limited to performance tools).  If you want
offline, I can tell you all sorts of horror stories about SunMC, patch
management, etc. as experience by a large F500 company.  A lot of
times it's 'well here's something, but we don't support it' 'Ok, we're
not going to use it then'.

>
> ...
>
> It's a choice between (A) and (B).
>
> Brendan
>
> --
> Brendan Gregg, Sun Microsystems Fishworks.    http://blogs.sun.com/brendan
>
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to