On Tue, May 3, 2011 at 9:24 PM, Claudio Jeker <[email protected]>
wrote:
> On Tue, May 03, 2011 at 07:53:56PM +0200, Mike Belopuhov wrote:
>> hi,
>>
>> recently me and jsg have figured out that a traffic burst can kill
>> your interface (deplete the rx ring) if you're using ipsec.  the
>> problem is that there's no limit on number of outstanding crypto
>> operations.  every job contains a pointer to a cluster that comes
>> from the nic driver.
>>
>> under high load a number of jobs on a crypto queue reaches the
>> limit for the mclpool and network card fails to allocate clusters
>> for the rx ring and stalls (both bnx and ix don't recover).
>>
>> we experimented with large watermarks (>20000 clusters) but that
>> just delays the problem a bit.  real solution is to prevent an
>> excessive amount of operations to be enqueued to the crypto queue.
>> the problem is as usual: how to define a limit?  in this diff i've
>> used a 1/3 of the cluster pool hardlimit which behaves pretty good
>> with different kern.maxclusters values.
>>
>> opinions?  OKs?
>
> While reading this text I only had one thought. That is why we have
> ifqueues. Like for example the IP input queue. Sure the crypto thread
> works differently but it should behave similar.

true.  that's the first thing i thought of.  but here it's really tied to the
cluster pool and it makes sense to fail crypto operations for clusters
only because others can wait.

> Does it actually make
> sense to queue 2000+ packets on the crypto thread?
> Aren't the packets
> delayed for a very long time?

that depends on how fast is your machine, but more or less yes. we had
4000 packets on the queue w/o any problems on the receiver.

Reply via email to