Nathan Kroenert - Server ESG wrote:

> I also *believe* (though am not certain - Perhaps someone else on the 
> list might be?) it would be possible to have each *event* (so - the 
> individual events that lead to a Fault Diagnosis) generate a message if 
> it was required, though I have never taken the time to do that one...

If this is possible, it's entirely undocumented... Actually, fmd's 
documentation is generally terrible. The sum total of configuration 
information is:

FILES
      /etc/fm/fmd             Fault manager  configuration  direc-
                              tory

Which is empty... It does look like I could write code to copy the 
output of "fmdump -f" somewhere useful if I had to.

> All of this said, I understand if you feel things are being 'hidden' 
> from you until it's *actually* busted that you are having some of your 
> forward vision obscured 'in the name of a quiet logfile'. I felt much 
> the same way for a period of time. (Though, I live more in the CPU / 
> Memory camp...)
> 
> But - Once I realised what I could do with fmstat and fmdump, I was not 
> the slightest bit unhappy (Actually, that's not quite true... Even once 
> I knew what they could do, it still took me a while to work out the 
> options I cared about for fmdump / fmstat), but I now trust FMA to look 
> after my CPU / Memory issues better than I would in real life. I can 
> still get what I need when I want to, and the data is actually more 
> accessible and interesting. I just needed to know where to go looking.
> 
> All this being said, I was not actually aware that many of our disk / 
> target drivers were actually FMA'd up yet. heh - Shows what I know.
> 
> Does any of this make you feel any better (or worse)?

Hiding the raw data isn't helping. Log it at debug if you want, but log 
it off-box. The local logs won't be available when your server is dead 
and you want to figure out why.

A real world example is that sometimes the only host-side sign of FC 
storage issues is a retryable error (as everything is redundant). Now 
I'm sure the storage folks can get other errors out of their side, but 
sadly I can't. That retryable error is our canary in the coal mine 
warning us that we may have just lost redundancy. We don't want fmd to 
take any action, but we do want to know...

-- 
Carson
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to