Fwd: Re: sparc64: Build failure due to commit f1600e549b94 (sparc: Make sparc64 use scalable lib/iommu-common.c functions)

2015-04-19 Thread Sowmini Varadhan
> On (04/19/15 14:09), David Miller wrote: > > > On (04/18/15 21:23), Guenter Roeck wrote: > >> lib/built-in.o:(.discard+0x1): multiple definition of > >> `__pcpu_unique_iommu_pool_hash' > >> arch/powerpc/kernel/built-in.o:(.discard+0x18): first defined here > >> .. I get a similar failure in the

[PATCH v10 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-04-09 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan Acked-by: Benjamin Herrenschmidt --- v2: moved

[PATCH v10 1/3] Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-09 Thread Sowmini Varadhan
ion infrastructure. Signed-off-by: Sowmini Varadhan Acked-by: Benjamin Herrenschmidt --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map

[PATCH v10 3/3] sparc: Make LDC use common iommu poll management functions

2015-04-09 Thread Sowmini Varadhan
typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan Acked-by: Benjamin Herrenschmidt --- v3: added this file to be a consumer of the common iommu library v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead inline these calls in

[PATCH v10 0/3] Generic IOMMU pooled allocator

2015-04-09 Thread Sowmini Varadhan
mask and align_order. v10: resend without RFC tag, and new mail Message-Id. Sowmini Varadhan (3): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Make LDC use common iommu poll management functions arch/

Re: [PATCHv9 RFC 1/3] Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-08 Thread Sowmini Varadhan
On (04/08/15 18:30), Benjamin Herrenschmidt wrote: > > I'm happy with your last version, feel free to add my > > Acked-by: Benjamin Herrenschmidt sounds good, I'll do this and rese a non-RFC version today. Thanks for all the feedback - it was very useful to me, and I'm much happier with the en

Re: [PATCHv9 RFC 1/3] Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-05 Thread Sowmini Varadhan
On (04/05/15 22:26), Benjamin Herrenschmidt wrote: > > So you decided to keep the logic here that updates the hint instead of > just getting rid of need_flush alltogether ? > > Out of curiosity, what's the rationale ? Did you find a reason why > resetting the hint in those two cases (rather than

[PATCH v9 RFC 0/3] Generic IOMMU pooled allocator

2015-04-05 Thread Sowmini Varadhan
Addresses latest BenH comments: need_flush checks, add support for dma mask and align_order. Sowmini Varadhan (3): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Make LDC use common iommu poll management

[PATCH v9 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-04-05 Thread Sowmini Varadhan
typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan --- v3: added this file to be a consumer of the common iommu library v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead inline these calls into ldc before calling into iommu-

[PATCHv9 RFC 1/3] Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-05 Thread Sowmini Varadhan
ion infrastructure. Signed-off-by: Sowmini Varadhan --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size is not ve

[PATCH v9 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-04-05 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan --- v2: moved sparc specific fileds into iommu_sparc

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-04 Thread Sowmini Varadhan
One last question before I spin out v9.. the dma_mask code is a bit confusing to me, so I want to make sure... the code is > if (limit + tbl->it_offset > mask) { > limit = mask - tbl->it_offset + 1; > /* If we're constrained on address range, first try >

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Sowmini Varadhan
On (04/04/15 08:06), Benjamin Herrenschmidt wrote: > > No, I meant "n < pool->hint", ie, the start of the newly allocated > block. ah, got it. I'll do my drill with patchset and get back, probably by Monday. --Sowmini ___ Linuxppc-dev mailing list Linu

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Sowmini Varadhan
Just want to confirm: > > + again: > > + if (pass == 0 && handle && *handle && > > + (*handle >= pool->start) && (*handle < pool->end)) > > + start = *handle; > > + else > > + start = pool->hint; > > Now this means "handle" might be < pool->hint, in that case you a

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
On (04/03/15 08:57), Benjamin Herrenschmidt wrote: > > > > I only just noticed too, you completely dropped the code to honor > > > the dma mask. Why that ? Some devices rely on this. /* Sowmini's comment about this coming from sparc origins.. */ > Probably, not that many devices have limits

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
the other question that comes to my mind is: the whole lazy_flush optimization probably works best when there is exactly one pool, and no large pools. In most other cases, we'd end up doing a lazy_flush when we wrap within our pool itself, losing the benefit of that optimization. Given that the

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
On (04/03/15 07:54), Benjamin Herrenschmidt wrote: > > + limit = pool->end; > > + > > + /* The case below can happen if we have a small segment appended > > +* to a large, or when the previous alloc was at the very end of > > +* the available space. If so, go back to the beginning and f

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-04-02 Thread Sowmini Varadhan
On (03/31/15 23:12), David Miller wrote: > > It's much more amortized with smart buffering strategies, which are > common on current generation networking cards. > > There you only eat one map/unmap per "PAGE_SIZE / rx_pkt_size". > > Maybe the infiniband stuff is doing things very suboptimally,

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Sowmini Varadhan
On 03/31/2015 09:01 PM, Benjamin Herrenschmidt wrote: On Tue, 2015-03-31 at 14:06 -0400, Sowmini Varadhan wrote: Having bravely said that.. the IB team informs me that they see a 10% degradation using the spin_lock as opposed to the trylock. one path going forward is to continue processing

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Sowmini Varadhan
On (03/31/15 10:40), Sowmini Varadhan wrote: > > I've not heard back from the IB folks, but I'm going to make > a judgement call here and go with the spin_lock. *If* they > report some significant benefit from the trylock, probably > need to revisit this (and then proba

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-31 Thread Sowmini Varadhan
On (03/31/15 15:15), David Laight wrote: > > I've wondered whether the iommu setup for ethernet receive (in particular) > could be made much more efficient if there were a function that > would unmap one buffer and map a second buffer? > My thought is that iommu pte entry used by the old buffer co

[PATCH v8 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-03-31 Thread Sowmini Varadhan
typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan --- v3: added this file to be a consumer of the common iommu library v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead inline these calls into ldc before calling into iommu-

[PATCH v8 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-03-31 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan --- v2: moved sparc specific fileds into iommu_sparc

[PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-31 Thread Sowmini Varadhan
ion infrastructure. Signed-off-by: Sowmini Varadhan --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size is not ve

[PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Sowmini Varadhan
enefit from the trylock, probably need to revisit this (and then probably start by re-exmaining the hash function to avoid collisions, before resorting to trylock). Sowmini Varadhan (3): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib

Re: [PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-30 Thread Sowmini Varadhan
On (03/31/15 08:28), Benjamin Herrenschmidt wrote: > > Provided that the IB test doesn't come up with a significant difference, > I definitely vote for the simpler version of doing a normal spin_lock. sounds good. let me wait for the confirmation from IB, and I'll send out patchv8 soon after. F

Re: [PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-30 Thread Sowmini Varadhan
On (03/30/15 09:01), Sowmini Varadhan wrote: > > So I tried looking at the code, and perhaps there is some arch-specific > subtlety here that I am missing, but where does spin_lock itself > do the cpu_relax? afaict, LOCK_CONTENDED() itself does not have this. To answer my question

Re: [PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-30 Thread Sowmini Varadhan
On (03/30/15 21:55), Benjamin Herrenschmidt wrote: > > No that's not my point. The lock is only taken for a short time but > might still collide, the bouncing in that case will probably (at least > that's my feeling) hurt more than help. > > However, I have another concern with your construct. Es

Re: [PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-30 Thread Sowmini Varadhan
On (03/30/15 14:24), Benjamin Herrenschmidt wrote: > > + > > +#define IOMMU_POOL_HASHBITS 4 > > +#define IOMMU_NR_POOLS (1 << IOMMU_POOL_HASHBITS) > > I don't like those macros. You changed the value from what we had on > powerpc. It could be that the new values are as good for us but

Re: [PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-27 Thread Sowmini Varadhan
On (03/26/15 08:05), Benjamin Herrenschmidt wrote: > > PowerPC folks, what do you think? > > I'll give it another look today. > > Cheers, > Ben. Hi Ben, did you have a chance to look at this? --Sowmini ___ Linuxppc-dev mailing list Linuxppc-dev@list

Re: Generic IOMMU pooled allocator

2015-03-26 Thread Sowmini Varadhan
On (03/25/15 21:43), casca...@linux.vnet.ibm.com wrote: > However, when using large TCP send/recv (I used uperf with 64KB > writes/reads), I noticed that on the transmit side, largealloc is not > used, but on the receive side, cxgb4 almost only uses largealloc, while > qlge seems to have a 1/1 usag

Re: [PATCH v6 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread Sowmini Varadhan
On (03/24/15 18:16), David Miller wrote: > Generally this looks fine to me. > > But about patch #2, I see no reason to have multiple iommu_pool_hash > tables. Even from a purely sparc perspective, we can always just do > with just one of them. > > Furthermore, you can even probably move it down

[PATCH v7 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-03-25 Thread Sowmini Varadhan
typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan --- v3: added this file to be a consumer of the common iommu library v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead inline these calls into ldc before calling into iommu-

[PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-25 Thread Sowmini Varadhan
ure. Signed-off-by: Sowmini Varadhan --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size is not very large - cookie_

[PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread Sowmini Varadhan
Changes from patchv6: moved pool_hash initialization to lib/iommu-common.c and cleaned up code duplication from sun4v/sun4u/ldc. Sowmini (2): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Sowmini

[PATCH v7 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-03-25 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan --- v2: moved sparc specific fileds into iommu_sparc

[PATCH v6 0/3] Generic IOMMU pooled allocator

2015-03-24 Thread Sowmini Varadhan
iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Sowmini Varadhan (1): Make LDC use common iommu poll management functions arch/sparc/include/asm/iommu_64.h |7 +- arch/sparc/kernel/iommu.c

[PATCH v6 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-24 Thread Sowmini Varadhan
ure. Signed-off-by: Sowmini Varadhan --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size is not very large - cookie_

[PATCH v6 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-03-24 Thread Sowmini Varadhan
typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan --- v3: added this file to be a consumer of the common iommu library v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead inline these calls into ldc before calling into iommu-

[PATCH v6 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-03-24 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan --- v2: moved sparc specific fileds into iommu_sparc

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
benh> It might be sufficient to add a flush counter and compare it between runs benh> if actual wall-clock benchmarks are too hard to do (especially if you benh> don't have things like very fast network cards at hand). benh> benh> Number of flush / number of packets might be a sufficient metric, it

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/24/15 11:47), Benjamin Herrenschmidt wrote: > > Yes, pass a function pointer argument that can be NULL or just make it a > member of the iommu_allocator struct (or whatever you call it) passed to > the init function and that can be NULL. My point is we don't need a > separate "ops" structur

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/24/15 09:36), Benjamin Herrenschmidt wrote: > > - One pool only > > - Whenever the allocation is before the previous hint, do a flush, that > should only happen if a wrap around occurred or in some cases if the > device DMA mask forced it. I think we always update the hint whenever we >

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/24/15 09:21), Benjamin Herrenschmidt wrote: > > So we have two choices here that I can see: > > - Keep that old platform use the old/simpler allocator Problem with that approach is that the base "struct iommu" structure for sparc gets a split personality: the older one is used with the o

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/23/15 15:05), David Miller wrote: > > Why add performance regressions to old machines who already are > suffering too much from all the bloat we are constantly adding to the > kernel? I have no personal opinion on this- it's a matter of choosing whether we want to have some extra baggage

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/23/15 12:29), David Miller wrote: > > In order to elide the IOMMU flush as much as possible, I implemnented > a scheme for sun4u wherein we always allocated from low IOMMU > addresses to high IOMMU addresses. > > In this regime, we only need to flush the IOMMU when we rolled over > back to

Re: Generic IOMMU pooled allocator

2015-03-22 Thread Sowmini Varadhan
On (03/23/15 09:02), Benjamin Herrenschmidt wrote: > > How does this relate to the ARM implementation? There is currently > > an effort going on to make that one shared with ARM64 and possibly > > x86. Has anyone looked at both the PowerPC and ARM ways of doing the > > allocation to see if we could

Re: Generic IOMMU pooled allocator

2015-03-22 Thread Sowmini Varadhan
Turned out that I was able to iterate over it, and remove both the ->cookie_to_index and the ->demap indirection from iommu_tbl_ops. That leaves only the odd iommu_flushall() hook, I'm trying to find the history behind that (needed for sun4u platforms, afaik, and not sure if there are other ways t

[PATCH v5 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-03-22 Thread Sowmini Varadhan
typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan --- v3: added this file to be a consumer of the common iommu library v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead inline these calls into ldc before calling into iommu-

[PATCH v5 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-22 Thread Sowmini Varadhan
ure. Signed-off-by: Sowmini Varadhan --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size is not very large - cookie_

[PATCH v5 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-03-22 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan --- v2: moved sparc specific fileds into iommu_sparc

[PATCH v5 RFC 0/3] Generic IOMMU pooled allocator

2015-03-22 Thread Sowmini Varadhan
added the "skip_span_boundary" argument to iommu_tbl_pool_init() for those callers like LDC which do no care about span boundary checks. Sowmini (2): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions

Re: Generic IOMMU pooled allocator

2015-03-19 Thread Sowmini Varadhan
On 03/19/2015 02:01 PM, Benjamin Herrenschmidt wrote: Ben> One thing I noticed is the asymetry in your code between the alloc Ben> and the free path. The alloc path is similar to us in that the lock Ben> covers the allocation and that's about it, there's no actual mapping to Ben> the HW done, it'