enefit from the trylock, probably
need to revisit this (and then probably start by re-exmaining
the hash function to avoid collisions, before resorting to
trylock).
Sowmini Varadhan (3):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib
ion infrastructure.
Signed-off-by: Sowmini Varadhan
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not ve
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
---
v2: moved sparc specific fileds into iommu_sparc
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-
On (03/31/15 15:15), David Laight wrote:
>
> I've wondered whether the iommu setup for ethernet receive (in particular)
> could be made much more efficient if there were a function that
> would unmap one buffer and map a second buffer?
> My thought is that iommu pte entry used by the old buffer co
On (03/31/15 10:40), Sowmini Varadhan wrote:
>
> I've not heard back from the IB folks, but I'm going to make
> a judgement call here and go with the spin_lock. *If* they
> report some significant benefit from the trylock, probably
> need to revisit this (and then proba
On 03/31/2015 09:01 PM, Benjamin Herrenschmidt wrote:
On Tue, 2015-03-31 at 14:06 -0400, Sowmini Varadhan wrote:
Having bravely said that..
the IB team informs me that they see a 10% degradation using
the spin_lock as opposed to the trylock.
one path going forward is to continue processing
On (03/31/15 23:12), David Miller wrote:
>
> It's much more amortized with smart buffering strategies, which are
> common on current generation networking cards.
>
> There you only eat one map/unmap per "PAGE_SIZE / rx_pkt_size".
>
> Maybe the infiniband stuff is doing things very suboptimally,
On (04/03/15 07:54), Benjamin Herrenschmidt wrote:
> > + limit = pool->end;
> > +
> > + /* The case below can happen if we have a small segment appended
> > +* to a large, or when the previous alloc was at the very end of
> > +* the available space. If so, go back to the beginning and f
the other question that comes to my mind is: the whole lazy_flush
optimization probably works best when there is exactly one pool,
and no large pools. In most other cases, we'd end up doing a lazy_flush
when we wrap within our pool itself, losing the benefit of that
optimization.
Given that the
On (04/03/15 08:57), Benjamin Herrenschmidt wrote:
>
> > > I only just noticed too, you completely dropped the code to honor
> > > the dma mask. Why that ? Some devices rely on this.
/* Sowmini's comment about this coming from sparc origins.. */
> Probably, not that many devices have limits
Just want to confirm:
> > + again:
> > + if (pass == 0 && handle && *handle &&
> > + (*handle >= pool->start) && (*handle < pool->end))
> > + start = *handle;
> > + else
> > + start = pool->hint;
>
> Now this means "handle" might be < pool->hint, in that case you a
On (04/04/15 08:06), Benjamin Herrenschmidt wrote:
>
> No, I meant "n < pool->hint", ie, the start of the newly allocated
> block.
ah, got it. I'll do my drill with patchset and get back, probably by
Monday.
--Sowmini
___
Linuxppc-dev mailing list
Linu
One last question before I spin out v9.. the dma_mask code
is a bit confusing to me, so I want to make sure... the code is
> if (limit + tbl->it_offset > mask) {
> limit = mask - tbl->it_offset + 1;
> /* If we're constrained on address range, first try
>
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
---
v2: moved sparc specific fileds into iommu_sparc
ion infrastructure.
Signed-off-by: Sowmini Varadhan
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not ve
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-
Addresses latest BenH comments: need_flush checks, add support
for dma mask and align_order.
Sowmini Varadhan (3):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Make LDC use common iommu poll management
On (04/05/15 22:26), Benjamin Herrenschmidt wrote:
>
> So you decided to keep the logic here that updates the hint instead of
> just getting rid of need_flush alltogether ?
>
> Out of curiosity, what's the rationale ? Did you find a reason why
> resetting the hint in those two cases (rather than
On (04/08/15 18:30), Benjamin Herrenschmidt wrote:
>
> I'm happy with your last version, feel free to add my
>
> Acked-by: Benjamin Herrenschmidt
sounds good, I'll do this and rese a non-RFC version today.
Thanks for all the feedback - it was very useful to me, and
I'm much happier with the en
ion infrastructure.
Signed-off-by: Sowmini Varadhan
Acked-by: Benjamin Herrenschmidt
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
Acked-by: Benjamin Herrenschmidt
---
v2: moved
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
Acked-by: Benjamin Herrenschmidt
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls in
mask and align_order.
v10: resend without RFC tag, and new mail Message-Id.
Sowmini Varadhan (3):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Make LDC use common iommu poll management functions
arch/
> On (04/19/15 14:09), David Miller wrote:
>
> > On (04/18/15 21:23), Guenter Roeck wrote:
> >> lib/built-in.o:(.discard+0x1): multiple definition of
> >> `__pcpu_unique_iommu_pool_hash'
> >> arch/powerpc/kernel/built-in.o:(.discard+0x18): first defined here
> >> .. I get a similar failure in the
On 03/19/2015 02:01 PM, Benjamin Herrenschmidt wrote:
Ben> One thing I noticed is the asymetry in your code between the alloc
Ben> and the free path. The alloc path is similar to us in that the lock
Ben> covers the allocation and that's about it, there's no actual mapping to
Ben> the HW done, it'
added the "skip_span_boundary" argument to iommu_tbl_pool_init() for
those callers like LDC which do no care about span boundary checks.
Sowmini (2):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
---
v2: moved sparc specific fileds into iommu_sparc
ure.
Signed-off-by: Sowmini Varadhan
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not very large
- cookie_
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-
Turned out that I was able to iterate over it, and remove
both the ->cookie_to_index and the ->demap indirection from
iommu_tbl_ops.
That leaves only the odd iommu_flushall() hook, I'm trying
to find the history behind that (needed for sun4u platforms,
afaik, and not sure if there are other ways t
On (03/23/15 09:02), Benjamin Herrenschmidt wrote:
> > How does this relate to the ARM implementation? There is currently
> > an effort going on to make that one shared with ARM64 and possibly
> > x86. Has anyone looked at both the PowerPC and ARM ways of doing the
> > allocation to see if we could
On (03/23/15 12:29), David Miller wrote:
>
> In order to elide the IOMMU flush as much as possible, I implemnented
> a scheme for sun4u wherein we always allocated from low IOMMU
> addresses to high IOMMU addresses.
>
> In this regime, we only need to flush the IOMMU when we rolled over
> back to
On (03/23/15 15:05), David Miller wrote:
>
> Why add performance regressions to old machines who already are
> suffering too much from all the bloat we are constantly adding to the
> kernel?
I have no personal opinion on this- it's a matter of choosing
whether we want to have some extra baggage
On (03/24/15 09:21), Benjamin Herrenschmidt wrote:
>
> So we have two choices here that I can see:
>
> - Keep that old platform use the old/simpler allocator
Problem with that approach is that the base "struct iommu" structure
for sparc gets a split personality: the older one is used with
the o
On (03/24/15 09:36), Benjamin Herrenschmidt wrote:
>
> - One pool only
>
> - Whenever the allocation is before the previous hint, do a flush, that
> should only happen if a wrap around occurred or in some cases if the
> device DMA mask forced it. I think we always update the hint whenever we
>
On (03/24/15 11:47), Benjamin Herrenschmidt wrote:
>
> Yes, pass a function pointer argument that can be NULL or just make it a
> member of the iommu_allocator struct (or whatever you call it) passed to
> the init function and that can be NULL. My point is we don't need a
> separate "ops" structur
benh> It might be sufficient to add a flush counter and compare it between runs
benh> if actual wall-clock benchmarks are too hard to do (especially if you
benh> don't have things like very fast network cards at hand).
benh>
benh> Number of flush / number of packets might be a sufficient metric, it
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
---
v2: moved sparc specific fileds into iommu_sparc
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-
ure.
Signed-off-by: Sowmini Varadhan
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not very large
- cookie_
iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Sowmini Varadhan (1):
Make LDC use common iommu poll management functions
arch/sparc/include/asm/iommu_64.h |7 +-
arch/sparc/kernel/iommu.c
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
---
v2: moved sparc specific fileds into iommu_sparc
Changes from patchv6: moved pool_hash initialization to
lib/iommu-common.c and cleaned up code duplication from
sun4v/sun4u/ldc.
Sowmini (2):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Sowmini
ure.
Signed-off-by: Sowmini Varadhan
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not very large
- cookie_
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-
On (03/24/15 18:16), David Miller wrote:
> Generally this looks fine to me.
>
> But about patch #2, I see no reason to have multiple iommu_pool_hash
> tables. Even from a purely sparc perspective, we can always just do
> with just one of them.
>
> Furthermore, you can even probably move it down
On (03/25/15 21:43), casca...@linux.vnet.ibm.com wrote:
> However, when using large TCP send/recv (I used uperf with 64KB
> writes/reads), I noticed that on the transmit side, largealloc is not
> used, but on the receive side, cxgb4 almost only uses largealloc, while
> qlge seems to have a 1/1 usag
On (03/26/15 08:05), Benjamin Herrenschmidt wrote:
> > PowerPC folks, what do you think?
>
> I'll give it another look today.
>
> Cheers,
> Ben.
Hi Ben,
did you have a chance to look at this?
--Sowmini
___
Linuxppc-dev mailing list
Linuxppc-dev@list
On (03/30/15 14:24), Benjamin Herrenschmidt wrote:
> > +
> > +#define IOMMU_POOL_HASHBITS 4
> > +#define IOMMU_NR_POOLS (1 << IOMMU_POOL_HASHBITS)
>
> I don't like those macros. You changed the value from what we had on
> powerpc. It could be that the new values are as good for us but
On (03/30/15 21:55), Benjamin Herrenschmidt wrote:
>
> No that's not my point. The lock is only taken for a short time but
> might still collide, the bouncing in that case will probably (at least
> that's my feeling) hurt more than help.
>
> However, I have another concern with your construct. Es
On (03/30/15 09:01), Sowmini Varadhan wrote:
>
> So I tried looking at the code, and perhaps there is some arch-specific
> subtlety here that I am missing, but where does spin_lock itself
> do the cpu_relax? afaict, LOCK_CONTENDED() itself does not have this.
To answer my question
On (03/31/15 08:28), Benjamin Herrenschmidt wrote:
>
> Provided that the IB test doesn't come up with a significant difference,
> I definitely vote for the simpler version of doing a normal spin_lock.
sounds good. let me wait for the confirmation from IB,
and I'll send out patchv8 soon after.
F
53 matches
Mail list logo