Re: [zfs-discuss] With RAID-Z2 under load, machine stops responding to local or remote login

Scott Duckworth Fri, 15 May 2009 11:59:08 -0700

Are you running compression on the file systems that you're rsync'ing to?
That'll drive up the load average pretty high, and it's in the kernel (from
what I can tell).


In particular, I've seen gzip compression on ZFS file systems bump the load
average over 60 when running multiple parallel rsyncs over SSH.  prstat/top
shows little userland CPU usage.  We're running on 2 cores (8 threads per
core) of an UltraSPARC T2 (using LDOMs) and it handles the load nicely - the
domain is still acceptably responsive.  I can see how a dual core x86
machine would get swamped with such a load.

We're running Solaris 10, not OpenSolaris, so it could also be the case that
there is a regression somewhere in there.

Scott Duckworth, Systems Programmer II
Clemson University School of Computing


On Tue, May 12, 2009 at 10:10 PM, Rince <rincebr...@gmail.com> wrote:

> Hi world,
> I have a 10-disk RAID-Z2 system with 4 GB of DDR2 RAM and a 3 GHz Core 2
> Duo.
>
> It's exporting ~280 filesystems over NFS to about half a dozen machines.
>
> Under some loads (in particular, any attempts to rsync between another
> machine and this one over SSH), the machine's load average sometimes
> goes insane (27+), and it appears to all be in kernel-land (as nothing
> in userland reports more than 5% CPU usage, and top reports 50%+ CPU
> usage).
>
> I say 27+ because when the load spikes this high, the machine stops
> responding to any meaningful commands. Console login will take a
> username and password then hang forever without printing anything. SSH
> login will block forever without prompting for user or password.
> Machine responds to ping. NFS drops.
>
> snv_113, this has occurred since the RAID-Z2 was created (b102).
>
> I have no idea how to instrument this, as it doesn't appear to be
> panicking, or running out of RAM (as far as I can see from the last
> responses of top and prstat), and I don't know how to ask dtrace about
> where I'm mostly spending my time. I read one or two guides, but I
> don't follow how the output of it is meaningful.
>
> I'm sending this to zfs-discuss as I can't replicate this problem
> unless I'm doing heavy I/O on ZFS.
>
> (Final note - this 10-disk pool is serviced by an ARC 1280ML, and
> during the time the kernel is  heavily under load, zpool iostat -v is
> reporting no more than 1 MB/s per disk, and almost always to the tune
> of 128 KB/s.)
>
> - Rich
>
> --
>
> The generation of random numbers is too important to be left to chance.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] With RAID-Z2 under load, machine stops responding to local or remote login

Reply via email to