I saw this _exact_ problem after I bumped ram from 48GB to 192GB. Low memory pressure seemed to be the cuplrit. Happened usually during storage vmotions or something like that which effectively nullified the data in the ARC (sometimes 50GB of data would be purged from the ARC). The system was so busy that it would drop 10Gbit LACP portchannels from our Nexus 5k stack. I never got a good solution to this other than to set arc_min_c to something that was close to what I wanted the system to use - I settled on setting it at ~160GB. It still dropped the arcsz, but it didn't try to adjust arc_c and resulted in significantly fewer xcalls.
-Matt Breitbach -----Original Message----- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Sašo Kiselkov Sent: Tuesday, June 12, 2012 10:14 AM To: Richard Elling Cc: zfs-discuss Subject: Re: [zfs-discuss] Occasional storm of xcalls on segkmem_zio_free On 06/12/2012 03:57 PM, Sašo Kiselkov wrote: > Seems the problem is somewhat more egregious than I thought. The xcall > storm causes my network drivers to stop receiving IP multicast packets > and subsequently my recording applications record bad data, so > ultimately, this kind of isn't workable... I need to somehow resolve > this... I'm running four on-board Broadcom NICs in an LACP > aggregation. Any ideas on why this might be a side-effect? I'm really > kind of out of ideas here... > > Cheers, > -- > Saso Just as another datapoint, though I'm not sure if it's going to be much use, is that I found (via arcstat.pl) that the storms always start happen when ARC downsizing starts. E.g. I would see the following in "./arcstat.pl 1": Time read dmis dm% pmis pm% mmis mm% arcsz c 16:29:45 21 0 0 0 0 0 0 111G 111G 16:29:46 0 0 0 0 0 0 0 111G 111G 16:29:47 1 0 0 0 0 0 0 111G 111G 16:29:48 0 0 0 0 0 0 0 111G 111G 16:29:49 5K 0 0 0 0 0 0 111G 111G (this is where the problem starts) 16:29:50 36 0 0 0 0 0 0 109G 107G 16:29:51 51 0 0 0 0 0 0 107G 107G 16:29:52 10 0 0 0 0 0 0 107G 107G 16:29:53 148 0 0 0 0 0 0 107G 107G 16:29:54 5K 0 0 0 0 0 0 107G 107G (and after a while, around 10-15 seconds, it stops) (I omitted the miss and miss% columns to make the rows fit). During the time, the network stack is dropping input IP multicast UDP packets like crazy, so I see my network input drop by about 30-40%. Truly strange behavior... Cheers, -- Saso _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss