I'm realizing I never sent the answer to this story, which is that the
server needed more RAM. We knew the ARC cache was implicated but had
missed just how much RAM zfs needs for the  ARC cache, and this server
had a LOT of file systems.  THOUSANDS.  Partially because a lot of
this information wasn't in the ZETG until recently...

This all came to light when we crossed some sort of line and the
server started hanging intermittently and seemingly at random, but
increasingly frequent intervals.

Taking the server from 2G  to 10G made the problem disappear. 6G would
have been sufficient (possibly 4G) but that week the price was the
same for 4G vs 8G.

I omit the part of the story where we became  mired in arc cache
variable changes, because that's probably just relevant to u3/u4
users.  I did take my replacement servers up to u6/u7


On Tue, Feb 17, 2009 at 11:56 PM, Elizabeth
Schwartz<betsy.schwa...@gmail.com> wrote:
> I've got a server that freezes when I run a zpool scrub from cron.
> Zpool scrub runs fine from the command line, no errors.
> The freeze happens within 30 seconds of the zpool scrub happening.
> The one core dump I succeeded in taking showed an arccache eating up
> all the ram.
> The server's running Solaris 10 u3, kernel patch 127727-11 but it's
> been patched and seems to have some u4 features  (particularly, the
> arc variables)
>
> The only bug report I could find shows a similar bug patched in
> 120011-14, a patch which I installed many months ago.
>
> Sun support threw up their hands and said to install Solaris 10 u6,
> which I'm not really happy about doing as a bug fix to a production
> server running a supported version of Sun OS. Once Upon a Time, Sun
> used to offer *patches* to paying customers for operating system bugs.
> I quote the latest ticket note in disgust: "I really don't know what
> to tell you. S10u6 has many enhancements and improvments to zfs, but
> most can be gained though patchs with the exception of new features."
>
> I'm trying to escalate the ticket, but really, I'm angry. I've been a
> big champion of staying with Sun/Solaris over Linux and one of the
> reasons has been that traditionally Sun had really good tech support,
> and you could *get* patches if you needed them. If the answer is going
> to be "we don't know what the bug is but maybe a later release will
> fix it - or not " that's not very reassuring.
>
> Any thoughts - besides upgrading? Which we'll do, but it's a
> production server so I don't want to rush it.
>
> --
> Unix Systems Administrator
> Harvard Graduate School of Design
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to