On Tue, May 3, 2011 at 9:24 PM, Claudio Jeker <[email protected]> wrote: > On Tue, May 03, 2011 at 07:53:56PM +0200, Mike Belopuhov wrote: >> hi, >> >> recently me and jsg have figured out that a traffic burst can kill >> your interface (deplete the rx ring) if you're using ipsec. the >> problem is that there's no limit on number of outstanding crypto >> operations. every job contains a pointer to a cluster that comes >> from the nic driver. >> >> under high load a number of jobs on a crypto queue reaches the >> limit for the mclpool and network card fails to allocate clusters >> for the rx ring and stalls (both bnx and ix don't recover). >> >> we experimented with large watermarks (>20000 clusters) but that >> just delays the problem a bit. real solution is to prevent an >> excessive amount of operations to be enqueued to the crypto queue. >> the problem is as usual: how to define a limit? in this diff i've >> used a 1/3 of the cluster pool hardlimit which behaves pretty good >> with different kern.maxclusters values. >> >> opinions? OKs? > > While reading this text I only had one thought. That is why we have > ifqueues. Like for example the IP input queue. Sure the crypto thread > works differently but it should behave similar.
true. that's the first thing i thought of. but here it's really tied to the cluster pool and it makes sense to fail crypto operations for clusters only because others can wait. > Does it actually make > sense to queue 2000+ packets on the crypto thread? > Aren't the packets > delayed for a very long time? that depends on how fast is your machine, but more or less yes. we had 4000 packets on the queue w/o any problems on the receiver.
