IPsec crash in TCP, also NFS DRC patches (was: Re: Limits on jumbo mbuf cluster allocation)

2013-03-29 Thread Garrett Wollman
< said: > The patch includes a lot of drc2.patch and drc3.patch, so don't try > and apply it to a patched kernel. Hopefully it will apply cleanly to > vanilla sources. > Tha patch has been minimally tested. Well, it's taken a long time, but I was finally able to get some testing. The user whos

Re: Limits on jumbo mbuf cluster allocation

2013-03-19 Thread Rick Macklem
I wrote: > Garrett Wollman wrote: > > < > said: > > > > > I've attached a patch that has assorted changes. > > > > So I've done some preliminary testing on a slightly modified form of > > this patch, and it appears to have no major issues. However, I'm > > still waiting for my user with 500 VMs to

Re: Limits on jumbo mbuf cluster allocation

2013-03-19 Thread Rick Macklem
Garrett Wollman wrote: > < said: > > > I've attached a patch that has assorted changes. > > So I've done some preliminary testing on a slightly modified form of > this patch, and it appears to have no major issues. However, I'm > still waiting for my user with 500 VMs to have enough free to be a

Re: Limits on jumbo mbuf cluster allocation

2013-03-19 Thread Andre Oppermann
On 19.03.2013 05:29, Garrett Wollman wrote: < said: I've attached a patch that has assorted changes. So I've done some preliminary testing on a slightly modified form of this patch, and it appears to have no major issues. However, I'm still waiting for my user with 500 VMs to have enough fr

Re: Limits on jumbo mbuf cluster allocation

2013-03-18 Thread Garrett Wollman
< said: > I've attached a patch that has assorted changes. So I've done some preliminary testing on a slightly modified form of this patch, and it appears to have no major issues. However, I'm still waiting for my user with 500 VMs to have enough free to be able to run some real stress tests fo

Re: Limits on jumbo mbuf cluster allocation

2013-03-13 Thread Rick Macklem
Garrett Wollman wrote: > < said: > > > Basically, this patch: > > - allows setting of the tcp timeout via vfs.nfsd.tcpcachetimeo > > (I'd suggest you go down to a few minutes instead of 12hrs) > > - allows TCP caching to be disabled by setting vfs.nfsd.cachetcp=0 > > - does the above 2 things y

Re: Limits on jumbo mbuf cluster allocation

2013-03-12 Thread Garrett Wollman
< said: > Basically, this patch: > - allows setting of the tcp timeout via vfs.nfsd.tcpcachetimeo > (I'd suggest you go down to a few minutes instead of 12hrs) > - allows TCP caching to be disabled by setting vfs.nfsd.cachetcp=0 > - does the above 2 things you describe to try and avoid the live

Re: Limits on jumbo mbuf cluster allocation

2013-03-12 Thread Rick Macklem
Garrett Wollman wrote: > < said: > > > To be honest, I'd consider seeing a lot of non-empty receive queues > > for TCP connections to the NFS server to be an indication that it is > > near/at its load limit. (Sure, if you do netstat a lot, you will > > occasionally > > see a non-empty queue here

Re: Limits on jumbo mbuf cluster allocation

2013-03-11 Thread Garrett Wollman
< said: > To be honest, I'd consider seeing a lot of non-empty receive queues > for TCP connections to the NFS server to be an indication that it is > near/at its load limit. (Sure, if you do netstat a lot, you will occasionally > see a non-empty queue here or there, but I would not expect to see

Re: Limits on jumbo mbuf cluster allocation

2013-03-11 Thread Rick Macklem
Garrett Wollman wrote: > In article <513db550.5010...@freebsd.org>, an...@freebsd.org writes: > > >Garrett's problem is receive side specific and NFS can't do much > >about it. > >Unless, of course, NFS is holding on to received mbufs for a longer > >time. The NFS server only holds onto receive mb

Re: Limits on jumbo mbuf cluster allocation

2013-03-11 Thread Rick Macklem
Andre Oppermann wrote: > On 11.03.2013 17:05, Garrett Wollman wrote: > > In article <513db550.5010...@freebsd.org>, an...@freebsd.org writes: > > > >> Garrett's problem is receive side specific and NFS can't do much > >> about it. > >> Unless, of course, NFS is holding on to received mbufs for a lo

Re: Limits on jumbo mbuf cluster allocation

2013-03-11 Thread Garrett Wollman
In article <513e3d75.7010...@freebsd.org>, an...@freebsd.org writes: >On 11.03.2013 17:05, Garrett Wollman wrote: >> Well, I have two problems: one is running out of mbufs (caused, we >> think, by ixgbe requiring 9k clusters when it doesn't actually need >> them), and one is livelock. Allowing pot

Re: Limits on jumbo mbuf cluster allocation

2013-03-11 Thread Andre Oppermann
On 11.03.2013 17:05, Garrett Wollman wrote: In article <513db550.5010...@freebsd.org>, an...@freebsd.org writes: Garrett's problem is receive side specific and NFS can't do much about it. Unless, of course, NFS is holding on to received mbufs for a longer time. Well, I have two problems: one

Re: Limits on jumbo mbuf cluster allocation

2013-03-11 Thread Jack Vogel
Then you are using the default ring size, which is 2K descriptors, you might try reducing to 1K and see how that works. Jack On Mon, Mar 11, 2013 at 10:09 AM, Garrett Wollman < woll...@hergotha.csail.mit.edu> wrote: > In article > , > jfvo...@gmail.com writes: > > >How large are you configuring

Re: Limits on jumbo mbuf cluster allocation

2013-03-11 Thread Garrett Wollman
In article , jfvo...@gmail.com writes: >How large are you configuring your rings Garrett? Maybe if you tried >reducing them? I'm not configuring them at all. (Well, hmmm, I did limit the number of queues to 6 (per interface, it appears, so that's 12 in all).) There's a limit to how much experim

Re: Limits on jumbo mbuf cluster allocation

2013-03-11 Thread Jack Vogel
How large are you configuring your rings Garrett? Maybe if you tried reducing them? Jack On Mon, Mar 11, 2013 at 9:05 AM, Garrett Wollman < woll...@hergotha.csail.mit.edu> wrote: > In article <513db550.5010...@freebsd.org>, an...@freebsd.org writes: > > >Garrett's problem is receive side specif

Re: Limits on jumbo mbuf cluster allocation

2013-03-11 Thread Garrett Wollman
In article <513db550.5010...@freebsd.org>, an...@freebsd.org writes: >Garrett's problem is receive side specific and NFS can't do much about it. >Unless, of course, NFS is holding on to received mbufs for a longer time. Well, I have two problems: one is running out of mbufs (caused, we think, by

Re: Limits on jumbo mbuf cluster allocation

2013-03-11 Thread Andre Oppermann
On 11.03.2013 00:46, Rick Macklem wrote: Andre Oppermann wrote: On 10.03.2013 03:22, Rick Macklem wrote: Garett Wollman wrote: Also, it occurs to me that this strategy is subject to livelock. To put backpressure on the clients, it is far better to get them to stop sending (by advertising a sma

Re: Limits on jumbo mbuf cluster allocation

2013-03-10 Thread Rick Macklem
Andre Oppermann wrote: > On 09.03.2013 01:47, Rick Macklem wrote: > > Garrett Wollman wrote: > >> < >> said: > >> > >>> [stuff I wrote deleted] > >>> You have an amd64 kernel running HEAD or 9.x? > >> > >> Yes, these are 9.1 with some patches to reduce mutex contention on > >> the > >> NFS server'

Re: Limits on jumbo mbuf cluster allocation

2013-03-10 Thread Rick Macklem
Andre Oppermann wrote: > On 10.03.2013 07:04, Garrett Wollman wrote: > > < > > said: > > > >> Yes, in the past the code was in this form, it should work fine > >> Garrett, > >> just make sure > >> the 4K pool is large enough. > > > > [Andre Oppermann's patch:] > >>> if (adapter->max_frame_size <= 2

Re: Limits on jumbo mbuf cluster allocation

2013-03-10 Thread Rick Macklem
Andre Oppermann wrote: > On 10.03.2013 03:22, Rick Macklem wrote: > > Garett Wollman wrote: > >> Also, it occurs to me that this strategy is subject to livelock. To > >> put backpressure on the clients, it is far better to get them to > >> stop > >> sending (by advertising a small receive window) t

Re: Limits on jumbo mbuf cluster allocation

2013-03-10 Thread Andre Oppermann
On 10.03.2013 03:22, Rick Macklem wrote: Garett Wollman wrote: Also, it occurs to me that this strategy is subject to livelock. To put backpressure on the clients, it is far better to get them to stop sending (by advertising a small receive window) than to accept their traffic but queue it for a

Re: Limits on jumbo mbuf cluster allocation

2013-03-10 Thread Andre Oppermann
On 10.03.2013 07:04, Garrett Wollman wrote: < said: Yes, in the past the code was in this form, it should work fine Garrett, just make sure the 4K pool is large enough. [Andre Oppermann's patch:] if (adapter->max_frame_size <= 2048) adapter-> rx_mbuf_sz = MCLBYTES; - else if (adapter

Re: Limits on jumbo mbuf cluster allocation

2013-03-10 Thread Andre Oppermann
On 09.03.2013 01:47, Rick Macklem wrote: Garrett Wollman wrote: < said: [stuff I wrote deleted] You have an amd64 kernel running HEAD or 9.x? Yes, these are 9.1 with some patches to reduce mutex contention on the NFS server's replay "cache". The cached replies are copies of the mbuf list d

Re: Limits on jumbo mbuf cluster allocation

2013-03-09 Thread Garrett Wollman
< said: > Yes, in the past the code was in this form, it should work fine Garrett, > just make sure > the 4K pool is large enough. [Andre Oppermann's patch:] >> if (adapter->max_frame_size <= 2048) adapter-> rx_mbuf_sz = MCLBYTES; >> - else if (adapter->max_frame_size <= 4096) >> + el

Re: Limits on jumbo mbuf cluster allocation

2013-03-09 Thread Rick Macklem
Garrett Wollman wrote: > < said: > > > I suspect this indicates that it isn't mutex contention, since the > > threads would block waiting for the mutex for that case, I think? > > No, because our mutexes are adaptive, so each thread spins for a while > before blocking. With the current implement

Re: Limits on jumbo mbuf cluster allocation

2013-03-09 Thread Rick Macklem
Garett Wollman wrote: > In article <20795.29370.194678.963...@hergotha.csail.mit.edu>, I > wrote: > >< > said: > >> I've thought about this. My concern is that the separate thread > >> might > >> not keep up with the trimming demand. If that occurred, the cache > >> would > >> grow veryyy laarrggge

Re: Limits on jumbo mbuf cluster allocation

2013-03-09 Thread Garrett Wollman
In article <20795.29370.194678.963...@hergotha.csail.mit.edu>, I wrote: >< said: >> I've thought about this. My concern is that the separate thread might >> not keep up with the trimming demand. If that occurred, the cache would >> grow veryyy laarrggge, with effects like running out of mbuf cluste

Re: Limits on jumbo mbuf cluster allocation

2013-03-09 Thread Garrett Wollman
< said: > I suspect this indicates that it isn't mutex contention, since the > threads would block waiting for the mutex for that case, I think? No, because our mutexes are adaptive, so each thread spins for a while before blocking. With the current implementation, all of them end up doing this

Re: Limits on jumbo mbuf cluster allocation

2013-03-09 Thread Rick Macklem
Garrett Wollman wrote: > < said: > > > If reducing the size to 4K doesn't fix the problem, you might want > > to > > consider shrinking the tunable vfs.nfsd.tcphighwater and suffering > > the increased CPU overhead (and some increased mutex contention) of > > calling nfsrv_trimcache() more freque

Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread Garrett Wollman
< said: > If reducing the size to 4K doesn't fix the problem, you might want to > consider shrinking the tunable vfs.nfsd.tcphighwater and suffering > the increased CPU overhead (and some increased mutex contention) of > calling nfsrv_trimcache() more frequently. Can't do that -- the system beco

Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread Rick Macklem
Garrett Wollman wrote: > < said: > > > [stuff I wrote deleted] > > You have an amd64 kernel running HEAD or 9.x? > > Yes, these are 9.1 with some patches to reduce mutex contention on the > NFS server's replay "cache". > The cached replies are copies of the mbuf list done via m_copym(). As such

Re: UNS: Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread Jack Vogel
Yes, the write-back descriptor has a bit in the status field that says its EOP (end of packet) or not. Jack On Fri, Mar 8, 2013 at 12:28 PM, Garrett Wollman wrote: > < said: > > > Yes, in the past the code was in this form, it should work fine Garrett, > > just make sure > > the 4K pool is larg

UNS: Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread Garrett Wollman
< said: > Yes, in the past the code was in this form, it should work fine Garrett, > just make sure > the 4K pool is large enough. I take it then that the hardware works in the traditional way, and just keeps on using buffers until the packet is completely written, then sets a field on the ring d

Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread Jack Vogel
Yes, in the past the code was in this form, it should work fine Garrett, just make sure the 4K pool is large enough. I've actually been thinking about making the ring mbuf allocation sparse, and what type of strategy could be used. Right now I'm thinking of implementing a tunable threshold, and as

Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread Andre Oppermann
On 08.03.2013 18:04, Garrett Wollman wrote: < said: I am not strongly opposed to trying the 4k mbuf pool for all larger sizes, Garrett maybe if you would try that on your system and see if that helps you, I could envision making this a tunable at some point perhaps? If you can provide a patch

Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread Garrett Wollman
< said: > [stuff I wrote deleted] > You have an amd64 kernel running HEAD or 9.x? Yes, these are 9.1 with some patches to reduce mutex contention on the NFS server's replay "cache". > Jumbo pages come directly from the kernel_map which on amd64 is 512GB. > So KVA shouldn't be a problem. Your pr

Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread Garrett Wollman
< said: > I am not strongly opposed to trying the 4k mbuf pool for all larger sizes, > Garrett maybe if you would try that on your system and see if that helps > you, I could envision making this a tunable at some point perhaps? If you can provide a patch I can certainly build it in to our kernel

Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread YongHyeon PYUN
On Fri, Mar 08, 2013 at 12:27:37AM -0800, Jack Vogel wrote: > On Thu, Mar 7, 2013 at 11:54 PM, YongHyeon PYUN wrote: > > > On Fri, Mar 08, 2013 at 02:10:41AM -0500, Garrett Wollman wrote: > > > I have a machine (actually six of them) with an Intel dual-10G NIC on > > > the motherboard. Two of th

Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread Jack Vogel
On Thu, Mar 7, 2013 at 11:54 PM, Andre Oppermann wrote: > On 08.03.2013 08:10, Garrett Wollman wrote: > >> I have a machine (actually six of them) with an Intel dual-10G NIC on >> the motherboard. Two of them (so far) are connected to a network >> using jumbo frames, with an MTU a little under 9

Re: Limits on jumbo mbuf cluster allocation

2013-03-08 Thread Jack Vogel
On Thu, Mar 7, 2013 at 11:54 PM, YongHyeon PYUN wrote: > On Fri, Mar 08, 2013 at 02:10:41AM -0500, Garrett Wollman wrote: > > I have a machine (actually six of them) with an Intel dual-10G NIC on > > the motherboard. Two of them (so far) are connected to a network > > using jumbo frames, with an

Re: Limits on jumbo mbuf cluster allocation

2013-03-07 Thread YongHyeon PYUN
On Fri, Mar 08, 2013 at 02:10:41AM -0500, Garrett Wollman wrote: > I have a machine (actually six of them) with an Intel dual-10G NIC on > the motherboard. Two of them (so far) are connected to a network > using jumbo frames, with an MTU a little under 9k, so the ixgbe driver > allocates 32,000 9k

Re: Limits on jumbo mbuf cluster allocation

2013-03-07 Thread Andre Oppermann
On 08.03.2013 08:10, Garrett Wollman wrote: I have a machine (actually six of them) with an Intel dual-10G NIC on the motherboard. Two of them (so far) are connected to a network using jumbo frames, with an MTU a little under 9k, so the ixgbe driver allocates 32,000 9k clusters for its receive r

Limits on jumbo mbuf cluster allocation

2013-03-07 Thread Garrett Wollman
I have a machine (actually six of them) with an Intel dual-10G NIC on the motherboard. Two of them (so far) are connected to a network using jumbo frames, with an MTU a little under 9k, so the ixgbe driver allocates 32,000 9k clusters for its receive rings. I have noticed, on the machine that is