I saw this _exact_ problem after I bumped ram from 48GB to 192GB.  Low
memory pressure seemed to be the cuplrit.  Happened usually during storage
vmotions or something like that which effectively nullified the data in the
ARC (sometimes 50GB of data would be purged from the ARC).  The system was
so busy that it would drop 10Gbit LACP portchannels from our Nexus 5k stack.
I never got a good solution to this other than to set arc_min_c to something
that was close to what I wanted the system to use - I settled on setting it
at ~160GB.  It still dropped the arcsz, but it didn't try to adjust arc_c
and resulted in significantly fewer xcalls.

-Matt Breitbach

-----Original Message-----
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Sašo Kiselkov
Sent: Tuesday, June 12, 2012 10:14 AM
To: Richard Elling
Cc: zfs-discuss
Subject: Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free

On 06/12/2012 03:57 PM, Sašo Kiselkov wrote:
> Seems the problem is somewhat more egregious than I thought. The xcall
> storm causes my network drivers to stop receiving IP multicast packets
> and subsequently my recording applications record bad data, so
> ultimately, this kind of isn't workable... I need to somehow resolve
> this... I'm running four on-board Broadcom NICs in an LACP
> aggregation. Any ideas on why this might be a side-effect? I'm really
> kind of out of ideas here...
> 
> Cheers,
> --
> Saso

Just as another datapoint, though I'm not sure if it's going to be much
use, is that I found (via arcstat.pl) that the storms always start
happen when ARC downsizing starts. E.g. I would see the following in
"./arcstat.pl 1":

    Time  read    dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
16:29:45    21       0    0     0    0     0    0   111G  111G
16:29:46     0       0    0     0    0     0    0   111G  111G
16:29:47     1       0    0     0    0     0    0   111G  111G
16:29:48     0       0    0     0    0     0    0   111G  111G
16:29:49    5K       0    0     0    0     0    0   111G  111G
  (this is where the problem starts)
16:29:50    36       0    0     0    0     0    0   109G  107G
16:29:51    51       0    0     0    0     0    0   107G  107G
16:29:52    10       0    0     0    0     0    0   107G  107G
16:29:53   148       0    0     0    0     0    0   107G  107G
16:29:54    5K       0    0     0    0     0    0   107G  107G
  (and after a while, around 10-15 seconds, it stops)

(I omitted the miss and miss% columns to make the rows fit).

During the time, the network stack is dropping input IP multicast UDP
packets like crazy, so I see my network input drop by about 30-40%.
Truly strange behavior...

Cheers,
--
Saso
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to