Re: [PATCH v10 0/3] Generic IOMMU pooled allocator

2015-04-16 Thread David Miller
From: Sowmini Varadhan Date: Thu, 9 Apr 2015 15:33:29 -0400 > Investigation of network performance on Sparc shows a high > degree of locking contention in the IOMMU allocator, and it > was noticed that the PowerPC code has a better locking model. > > This patch series tries to extract the gener

Re: [PATCH v10 0/3] Generic IOMMU pooled allocator

2015-04-09 Thread David Miller
From: Sowmini Varadhan Date: Thu, 9 Apr 2015 15:33:29 -0400 > v10: resend patchv9 without RFC tag, and a new mail Message-Id, > (previous non-RFC attempt did not show up on the patchwork queue?) Yes, if the patch is identical the patch postings hashes to the same value as the RFC ones, and ther

[PATCH v10 0/3] Generic IOMMU pooled allocator

2015-04-09 Thread Sowmini Varadhan
Investigation of network performance on Sparc shows a high degree of locking contention in the IOMMU allocator, and it was noticed that the PowerPC code has a better locking model. This patch series tries to extract the generic parts of the PowerPC code so that it can be shared across multiple P

[PATCH v9 RFC 0/3] Generic IOMMU pooled allocator

2015-04-05 Thread Sowmini Varadhan
Addresses latest BenH comments: need_flush checks, add support for dma mask and align_order. Sowmini Varadhan (3): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Make LDC use common iommu poll management f

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-04-02 Thread Benjamin Herrenschmidt
On Fri, 2015-04-03 at 07:22 +1100, Benjamin Herrenschmidt wrote: > On Thu, 2015-04-02 at 12:21 -0400, David Miller wrote: > > From: Sowmini Varadhan > > Date: Thu, 2 Apr 2015 08:51:52 -0400 > > > > > do I need to resubmit this without the RFC tag? Perhaps I should > > > have dropped that some tim

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-04-02 Thread Benjamin Herrenschmidt
On Thu, 2015-04-02 at 12:21 -0400, David Miller wrote: > From: Sowmini Varadhan > Date: Thu, 2 Apr 2015 08:51:52 -0400 > > > do I need to resubmit this without the RFC tag? Perhaps I should > > have dropped that some time ago. > > I want to hear from the powerpc folks whether they can positively

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-04-02 Thread David Miller
From: Sowmini Varadhan Date: Thu, 2 Apr 2015 08:51:52 -0400 > do I need to resubmit this without the RFC tag? Perhaps I should > have dropped that some time ago. I want to hear from the powerpc folks whether they can positively adopt the new generic code or not. _

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-04-02 Thread Sowmini Varadhan
On (03/31/15 23:12), David Miller wrote: > > It's much more amortized with smart buffering strategies, which are > common on current generation networking cards. > > There you only eat one map/unmap per "PAGE_SIZE / rx_pkt_size". > > Maybe the infiniband stuff is doing things very suboptimally,

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread David Miller
From: Sowmini Varadhan Date: Tue, 31 Mar 2015 21:08:18 -0400 > I'm starting to wonder if some approximation of dma premapped > buffers may be needed. Doing a map/unmap on each packet is expensive. It's much more amortized with smart buffering strategies, which are common on current generation n

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Sowmini Varadhan
On 03/31/2015 09:01 PM, Benjamin Herrenschmidt wrote: On Tue, 2015-03-31 at 14:06 -0400, Sowmini Varadhan wrote: Having bravely said that.. the IB team informs me that they see a 10% degradation using the spin_lock as opposed to the trylock. one path going forward is to continue processing thi

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Benjamin Herrenschmidt
On Tue, 2015-03-31 at 14:06 -0400, Sowmini Varadhan wrote: > Having bravely said that.. > > the IB team informs me that they see a 10% degradation using > the spin_lock as opposed to the trylock. > > one path going forward is to continue processing this patch-set > as is. I can investigate this

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread David Miller
From: Sowmini Varadhan Date: Tue, 31 Mar 2015 14:06:42 -0400 > Having bravely said that.. > > the IB team informs me that they see a 10% degradation using > the spin_lock as opposed to the trylock. > > one path going forward is to continue processing this patch-set > as is. I can investigate

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Sowmini Varadhan
On (03/31/15 10:40), Sowmini Varadhan wrote: > > I've not heard back from the IB folks, but I'm going to make > a judgement call here and go with the spin_lock. *If* they > report some significant benefit from the trylock, probably > need to revisit this (and then probably start by re-exmaining >

[PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Sowmini Varadhan
Addresses BenH comments with one exception: I've left the IOMMU_POOL_HASH as is, so that powerpc can tailor it to their convenience. I've not heard back from the IB folks, but I'm going to make a judgement call here and go with the spin_lock. *If* they report some significant benefit from the try

Re: [PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-27 Thread Sowmini Varadhan
On (03/26/15 08:05), Benjamin Herrenschmidt wrote: > > PowerPC folks, what do you think? > > I'll give it another look today. > > Cheers, > Ben. Hi Ben, did you have a chance to look at this? --Sowmini ___ Linuxppc-dev mailing list Linuxppc-dev@list

Re: Generic IOMMU pooled allocator

2015-03-26 Thread Benjamin Herrenschmidt
On Thu, 2015-03-26 at 16:00 -0700, David Miller wrote: > From: casca...@linux.vnet.ibm.com > Date: Wed, 25 Mar 2015 21:43:42 -0300 > > > On Mon, Mar 23, 2015 at 10:15:08PM -0400, David Miller wrote: > >> From: Benjamin Herrenschmidt > >> Date: Tue, 24 Mar 2015 13:08:10 +1100 > >> > >> > For the

Re: Generic IOMMU pooled allocator

2015-03-26 Thread David Miller
From: casca...@linux.vnet.ibm.com Date: Wed, 25 Mar 2015 21:43:42 -0300 > On Mon, Mar 23, 2015 at 10:15:08PM -0400, David Miller wrote: >> From: Benjamin Herrenschmidt >> Date: Tue, 24 Mar 2015 13:08:10 +1100 >> >> > For the large pool, we don't keep a hint so we don't know it's >> > wrapped, in

Re: Generic IOMMU pooled allocator

2015-03-26 Thread Sowmini Varadhan
On (03/25/15 21:43), casca...@linux.vnet.ibm.com wrote: > However, when using large TCP send/recv (I used uperf with 64KB > writes/reads), I noticed that on the transmit side, largealloc is not > used, but on the receive side, cxgb4 almost only uses largealloc, while > qlge seems to have a 1/1 usag

Re: Generic IOMMU pooled allocator

2015-03-25 Thread Benjamin Herrenschmidt
On Wed, 2015-03-25 at 21:43 -0300, casca...@linux.vnet.ibm.com wrote: > On Mon, Mar 23, 2015 at 10:15:08PM -0400, David Miller wrote: > > From: Benjamin Herrenschmidt > > Date: Tue, 24 Mar 2015 13:08:10 +1100 > > > > > For the large pool, we don't keep a hint so we don't know it's > > > wrapped,

Re: Generic IOMMU pooled allocator

2015-03-25 Thread cascardo
On Mon, Mar 23, 2015 at 10:15:08PM -0400, David Miller wrote: > From: Benjamin Herrenschmidt > Date: Tue, 24 Mar 2015 13:08:10 +1100 > > > For the large pool, we don't keep a hint so we don't know it's > > wrapped, in fact we purposefully don't use a hint to limit > > fragmentation on it, but the

Re: [PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread Benjamin Herrenschmidt
On Wed, 2015-03-25 at 14:12 -0400, David Miller wrote: > From: Sowmini Varadhan > Date: Wed, 25 Mar 2015 13:34:45 -0400 > > > Changes from patchv6: moved pool_hash initialization to > > lib/iommu-common.c and cleaned up code duplication from > > sun4v/sun4u/ldc. > > Looks good to me. > > Powe

Re: [PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread David Miller
From: Sowmini Varadhan Date: Wed, 25 Mar 2015 13:34:45 -0400 > Changes from patchv6: moved pool_hash initialization to > lib/iommu-common.c and cleaned up code duplication from > sun4v/sun4u/ldc. Looks good to me. PowerPC folks, what do you think? _

Re: [PATCH v6 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread Sowmini Varadhan
On (03/24/15 18:16), David Miller wrote: > Generally this looks fine to me. > > But about patch #2, I see no reason to have multiple iommu_pool_hash > tables. Even from a purely sparc perspective, we can always just do > with just one of them. > > Furthermore, you can even probably move it down

[PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread Sowmini Varadhan
Changes from patchv6: moved pool_hash initialization to lib/iommu-common.c and cleaned up code duplication from sun4v/sun4u/ldc. Sowmini (2): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Sowmini Varadha

Re: [PATCH v6 0/3] Generic IOMMU pooled allocator

2015-03-24 Thread David Miller
From: Sowmini Varadhan Date: Tue, 24 Mar 2015 13:10:27 -0400 > Deltas from patchv5: > - removed iommu_tbl_ops, and instead pass the ->flush_all as > an indirection to iommu_tbl_pool_init() > - only invoke ->flush_all when there is no large_pool, based on > the assumption that large-pool usage

[PATCH v6 0/3] Generic IOMMU pooled allocator

2015-03-24 Thread Sowmini Varadhan
Deltas from patchv5: - removed iommu_tbl_ops, and instead pass the ->flush_all as an indirection to iommu_tbl_pool_init() - only invoke ->flush_all when there is no large_pool, based on the assumption that large-pool usage is infrequently encountered. Sowmini (2): Break up monolithic iommu t

Re: Generic IOMMU pooled allocator

2015-03-23 Thread David Miller
From: Benjamin Herrenschmidt Date: Tue, 24 Mar 2015 13:08:10 +1100 > For the large pool, we don't keep a hint so we don't know it's > wrapped, in fact we purposefully don't use a hint to limit > fragmentation on it, but then, it should be used rarely enough that > flushing always is, I suspect, a

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Benjamin Herrenschmidt
On Mon, 2015-03-23 at 21:44 -0400, David Miller wrote: > From: Benjamin Herrenschmidt > Date: Tue, 24 Mar 2015 09:21:05 +1100 > > > Dave, what's your feeling there ? Does anybody around still have > > some HW that we can test with ? > > I don't see what the actual problem is. > > Even if you us

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
benh> It might be sufficient to add a flush counter and compare it between runs benh> if actual wall-clock benchmarks are too hard to do (especially if you benh> don't have things like very fast network cards at hand). benh> benh> Number of flush / number of packets might be a sufficient metric, it

Re: Generic IOMMU pooled allocator

2015-03-23 Thread David Miller
From: Benjamin Herrenschmidt Date: Tue, 24 Mar 2015 09:21:05 +1100 > Dave, what's your feeling there ? Does anybody around still have > some HW that we can test with ? I don't see what the actual problem is. Even if you use multiple pools, which we should for scalability on sun4u too, just do t

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/24/15 11:47), Benjamin Herrenschmidt wrote: > > Yes, pass a function pointer argument that can be NULL or just make it a > member of the iommu_allocator struct (or whatever you call it) passed to > the init function and that can be NULL. My point is we don't need a > separate "ops" structur

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Benjamin Herrenschmidt
On Mon, 2015-03-23 at 19:19 -0400, Sowmini Varadhan wrote: > What I've tried to do is to have a bool large_pool arg passed > to iommu_tbl_pool_init. In my observation (instrumented for scsi, ixgbe), > we never allocate more than 4 pages at a time, so I pass in > large_pool == false for all the s

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Benjamin Herrenschmidt
On Mon, 2015-03-23 at 19:08 -0400, Sowmini Varadhan wrote: > > Sowmini, I see various options for the second choice. We could stick to > > 1 pool, and basically do as before, ie, if we fail on the first pass of > > alloc, it means we wrap around and do a flush, I don't think that will > > cause a

Re: Generic IOMMU pooled allocator

2015-03-23 Thread chase rayfield
On Mar 23, 2015 7:13 PM, "Sowmini Varadhan" wrote: > > On (03/24/15 09:21), Benjamin Herrenschmidt wrote: > > > > So we have two choices here that I can see: > > > > - Keep that old platform use the old/simpler allocator > > Problem with that approach is that the base "struct iommu" structure > f

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/24/15 09:36), Benjamin Herrenschmidt wrote: > > - One pool only > > - Whenever the allocation is before the previous hint, do a flush, that > should only happen if a wrap around occurred or in some cases if the > device DMA mask forced it. I think we always update the hint whenever we >

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/24/15 09:21), Benjamin Herrenschmidt wrote: > > So we have two choices here that I can see: > > - Keep that old platform use the old/simpler allocator Problem with that approach is that the base "struct iommu" structure for sparc gets a split personality: the older one is used with the o

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Benjamin Herrenschmidt
On Mon, 2015-03-23 at 15:05 -0400, David Miller wrote: > From: Sowmini Varadhan > Date: Mon, 23 Mar 2015 12:54:06 -0400 > > > If it was only an optimization (i.e., removing it would not break > > any functionality), and if this was done for older hardware, > > and *if* we believe that the directi

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Benjamin Herrenschmidt
On Mon, 2015-03-23 at 12:54 -0400, Sowmini Varadhan wrote: > If it was only an optimization (i.e., removing it would not break > any functionality), and if this was done for older hardware, > and *if* we believe that the direction of most architectures is to > follow the sun4v/HV model, then, giv

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Benjamin Herrenschmidt
On Mon, 2015-03-23 at 15:05 -0400, David Miller wrote: > From: Sowmini Varadhan > Date: Mon, 23 Mar 2015 12:54:06 -0400 > > > If it was only an optimization (i.e., removing it would not break > > any functionality), and if this was done for older hardware, > > and *if* we believe that the directi

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/23/15 15:05), David Miller wrote: > > Why add performance regressions to old machines who already are > suffering too much from all the bloat we are constantly adding to the > kernel? I have no personal opinion on this- it's a matter of choosing whether we want to have some extra baggage

Re: Generic IOMMU pooled allocator

2015-03-23 Thread David Miller
From: Sowmini Varadhan Date: Mon, 23 Mar 2015 12:54:06 -0400 > If it was only an optimization (i.e., removing it would not break > any functionality), and if this was done for older hardware, > and *if* we believe that the direction of most architectures is to > follow the sun4v/HV model, then,

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Arnd Bergmann
On Monday 23 March 2015, Benjamin Herrenschmidt wrote: > On Mon, 2015-03-23 at 07:04 +0100, Arnd Bergmann wrote: > > > > My guess is that the ARM code so far has been concerned mainly with > > getting things to work in the first place, but scalability problems > > will only be seen when there are

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/23/15 12:29), David Miller wrote: > > In order to elide the IOMMU flush as much as possible, I implemnented > a scheme for sun4u wherein we always allocated from low IOMMU > addresses to high IOMMU addresses. > > In this regime, we only need to flush the IOMMU when we rolled over > back to

Re: Generic IOMMU pooled allocator

2015-03-23 Thread David Miller
From: Sowmini Varadhan Date: Sun, 22 Mar 2015 15:27:26 -0400 > That leaves only the odd iommu_flushall() hook, I'm trying > to find the history behind that (needed for sun4u platforms, > afaik, and not sure if there are other ways to achieve this). In order to elide the IOMMU flush as much as po

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Benjamin Herrenschmidt
On Mon, 2015-03-23 at 07:04 +0100, Arnd Bergmann wrote: > > My guess is that the ARM code so far has been concerned mainly with > getting things to work in the first place, but scalability problems > will only be seen when there are faster CPU cores become available. In any case, I think this is

Re: Generic IOMMU pooled allocator

2015-03-22 Thread Arnd Bergmann
On Sunday 22 March 2015, Benjamin Herrenschmidt wrote: > On Sun, 2015-03-22 at 18:07 -0400, Sowmini Varadhan wrote: > > On (03/23/15 09:02), Benjamin Herrenschmidt wrote: > > > > How does this relate to the ARM implementation? There is currently > > > > an effort going on to make that one shared wi

Re: [PATCH v5 RFC 0/3] Generic IOMMU pooled allocator

2015-03-22 Thread Benjamin Herrenschmidt
On Sun, 2015-03-22 at 15:22 -0400, Sowmini Varadhan wrote: > Follows up on the feedback in the thread at > http://www.spinics.net/lists/sparclinux/msg13493.html > > - removed ->cookie_to_index and ->demap indirection from the iommu_tbl_ops > The caller needs to call these functions as needed,

Re: Generic IOMMU pooled allocator

2015-03-22 Thread Benjamin Herrenschmidt
On Sun, 2015-03-22 at 18:07 -0400, Sowmini Varadhan wrote: > On (03/23/15 09:02), Benjamin Herrenschmidt wrote: > > > How does this relate to the ARM implementation? There is currently > > > an effort going on to make that one shared with ARM64 and possibly > > > x86. Has anyone looked at both the

Re: Generic IOMMU pooled allocator

2015-03-22 Thread Sowmini Varadhan
On (03/23/15 09:02), Benjamin Herrenschmidt wrote: > > How does this relate to the ARM implementation? There is currently > > an effort going on to make that one shared with ARM64 and possibly > > x86. Has anyone looked at both the PowerPC and ARM ways of doing the > > allocation to see if we could

Re: Generic IOMMU pooled allocator

2015-03-22 Thread Benjamin Herrenschmidt
On Sun, 2015-03-22 at 20:36 +0100, Arnd Bergmann wrote: > How does this relate to the ARM implementation? There is currently > an effort going on to make that one shared with ARM64 and possibly > x86. Has anyone looked at both the PowerPC and ARM ways of doing the > allocation to see if we could p

Re: Generic IOMMU pooled allocator

2015-03-22 Thread Arnd Bergmann
On Thursday 19 March 2015, David Miller wrote: > PowerPC folks, we're trying to kill the locking contention in our > IOMMU allocators and noticed that you guys have a nice solution to > this in your IOMMU code. > > Sowmini put together a patch series that tries to extract out the > generic parts o

Re: Generic IOMMU pooled allocator

2015-03-22 Thread Sowmini Varadhan
Turned out that I was able to iterate over it, and remove both the ->cookie_to_index and the ->demap indirection from iommu_tbl_ops. That leaves only the odd iommu_flushall() hook, I'm trying to find the history behind that (needed for sun4u platforms, afaik, and not sure if there are other ways t

[PATCH v5 RFC 0/3] Generic IOMMU pooled allocator

2015-03-22 Thread Sowmini Varadhan
Follows up on the feedback in the thread at http://www.spinics.net/lists/sparclinux/msg13493.html - removed ->cookie_to_index and ->demap indirection from the iommu_tbl_ops The caller needs to call these functions as needed, before invoking the generic arena allocator functions. - added the

Re: Generic IOMMU pooled allocator

2015-03-19 Thread Sowmini Varadhan
On 03/19/2015 02:01 PM, Benjamin Herrenschmidt wrote: Ben> One thing I noticed is the asymetry in your code between the alloc Ben> and the free path. The alloc path is similar to us in that the lock Ben> covers the allocation and that's about it, there's no actual mapping to Ben> the HW done, it'

Re: Generic IOMMU pooled allocator

2015-03-18 Thread Alexey Kardashevskiy
On 03/19/2015 02:01 PM, Benjamin Herrenschmidt wrote: On Wed, 2015-03-18 at 22:25 -0400, David Miller wrote: PowerPC folks, we're trying to kill the locking contention in our IOMMU allocators and noticed that you guys have a nice solution to this in your IOMMU code. .../... Adding Alexei to

Re: Generic IOMMU pooled allocator

2015-03-18 Thread Benjamin Herrenschmidt
On Wed, 2015-03-18 at 22:25 -0400, David Miller wrote: > PowerPC folks, we're trying to kill the locking contention in our > IOMMU allocators and noticed that you guys have a nice solution to > this in your IOMMU code. .../... Adding Alexei too who is currently doing some changes to our iommu co

Re: Generic IOMMU pooled allocator

2015-03-18 Thread David Miller
From: Benjamin Herrenschmidt Date: Thu, 19 Mar 2015 13:46:15 +1100 > Sounds like a good idea ! CC'ing Anton who wrote the pool stuff. I'll > try to find somebody to work on that here & will let you know asap. Thanks a lot Ben. ___ Linuxppc-dev mailing

Re: Generic IOMMU pooled allocator

2015-03-18 Thread Benjamin Herrenschmidt
On Wed, 2015-03-18 at 22:25 -0400, David Miller wrote: > PowerPC folks, we're trying to kill the locking contention in our > IOMMU allocators and noticed that you guys have a nice solution to > this in your IOMMU code. > > Sowmini put together a patch series that tries to extract out the > generic

Generic IOMMU pooled allocator

2015-03-18 Thread David Miller
PowerPC folks, we're trying to kill the locking contention in our IOMMU allocators and noticed that you guys have a nice solution to this in your IOMMU code. Sowmini put together a patch series that tries to extract out the generic parts of your code and place it in lib/iommu-common.c so that bot