> On (04/19/15 14:09), David Miller wrote:
>
> > On (04/18/15 21:23), Guenter Roeck wrote:
> >> lib/built-in.o:(.discard+0x1): multiple definition of
> >> `__pcpu_unique_iommu_pool_hash'
> >> arch/powerpc/kernel/built-in.o:(.discard+0x18): first defined here
> >> .. I get a similar failure in the
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
Acked-by: Benjamin Herrenschmidt
---
v2: moved
ion infrastructure.
Signed-off-by: Sowmini Varadhan
Acked-by: Benjamin Herrenschmidt
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
Acked-by: Benjamin Herrenschmidt
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls in
mask and align_order.
v10: resend without RFC tag, and new mail Message-Id.
Sowmini Varadhan (3):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Make LDC use common iommu poll management functions
arch/
On (04/08/15 18:30), Benjamin Herrenschmidt wrote:
>
> I'm happy with your last version, feel free to add my
>
> Acked-by: Benjamin Herrenschmidt
sounds good, I'll do this and rese a non-RFC version today.
Thanks for all the feedback - it was very useful to me, and
I'm much happier with the en
On (04/05/15 22:26), Benjamin Herrenschmidt wrote:
>
> So you decided to keep the logic here that updates the hint instead of
> just getting rid of need_flush alltogether ?
>
> Out of curiosity, what's the rationale ? Did you find a reason why
> resetting the hint in those two cases (rather than
Addresses latest BenH comments: need_flush checks, add support
for dma mask and align_order.
Sowmini Varadhan (3):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Make LDC use common iommu poll management
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-
ion infrastructure.
Signed-off-by: Sowmini Varadhan
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not ve
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
---
v2: moved sparc specific fileds into iommu_sparc
One last question before I spin out v9.. the dma_mask code
is a bit confusing to me, so I want to make sure... the code is
> if (limit + tbl->it_offset > mask) {
> limit = mask - tbl->it_offset + 1;
> /* If we're constrained on address range, first try
>
On (04/04/15 08:06), Benjamin Herrenschmidt wrote:
>
> No, I meant "n < pool->hint", ie, the start of the newly allocated
> block.
ah, got it. I'll do my drill with patchset and get back, probably by
Monday.
--Sowmini
___
Linuxppc-dev mailing list
Linu
Just want to confirm:
> > + again:
> > + if (pass == 0 && handle && *handle &&
> > + (*handle >= pool->start) && (*handle < pool->end))
> > + start = *handle;
> > + else
> > + start = pool->hint;
>
> Now this means "handle" might be < pool->hint, in that case you a
On (04/03/15 08:57), Benjamin Herrenschmidt wrote:
>
> > > I only just noticed too, you completely dropped the code to honor
> > > the dma mask. Why that ? Some devices rely on this.
/* Sowmini's comment about this coming from sparc origins.. */
> Probably, not that many devices have limits
the other question that comes to my mind is: the whole lazy_flush
optimization probably works best when there is exactly one pool,
and no large pools. In most other cases, we'd end up doing a lazy_flush
when we wrap within our pool itself, losing the benefit of that
optimization.
Given that the
On (04/03/15 07:54), Benjamin Herrenschmidt wrote:
> > + limit = pool->end;
> > +
> > + /* The case below can happen if we have a small segment appended
> > +* to a large, or when the previous alloc was at the very end of
> > +* the available space. If so, go back to the beginning and f
On (03/31/15 23:12), David Miller wrote:
>
> It's much more amortized with smart buffering strategies, which are
> common on current generation networking cards.
>
> There you only eat one map/unmap per "PAGE_SIZE / rx_pkt_size".
>
> Maybe the infiniband stuff is doing things very suboptimally,
On 03/31/2015 09:01 PM, Benjamin Herrenschmidt wrote:
On Tue, 2015-03-31 at 14:06 -0400, Sowmini Varadhan wrote:
Having bravely said that..
the IB team informs me that they see a 10% degradation using
the spin_lock as opposed to the trylock.
one path going forward is to continue processing
On (03/31/15 10:40), Sowmini Varadhan wrote:
>
> I've not heard back from the IB folks, but I'm going to make
> a judgement call here and go with the spin_lock. *If* they
> report some significant benefit from the trylock, probably
> need to revisit this (and then proba
On (03/31/15 15:15), David Laight wrote:
>
> I've wondered whether the iommu setup for ethernet receive (in particular)
> could be made much more efficient if there were a function that
> would unmap one buffer and map a second buffer?
> My thought is that iommu pte entry used by the old buffer co
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
---
v2: moved sparc specific fileds into iommu_sparc
ion infrastructure.
Signed-off-by: Sowmini Varadhan
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not ve
enefit from the trylock, probably
need to revisit this (and then probably start by re-exmaining
the hash function to avoid collisions, before resorting to
trylock).
Sowmini Varadhan (3):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib
On (03/31/15 08:28), Benjamin Herrenschmidt wrote:
>
> Provided that the IB test doesn't come up with a significant difference,
> I definitely vote for the simpler version of doing a normal spin_lock.
sounds good. let me wait for the confirmation from IB,
and I'll send out patchv8 soon after.
F
On (03/30/15 09:01), Sowmini Varadhan wrote:
>
> So I tried looking at the code, and perhaps there is some arch-specific
> subtlety here that I am missing, but where does spin_lock itself
> do the cpu_relax? afaict, LOCK_CONTENDED() itself does not have this.
To answer my question
On (03/30/15 21:55), Benjamin Herrenschmidt wrote:
>
> No that's not my point. The lock is only taken for a short time but
> might still collide, the bouncing in that case will probably (at least
> that's my feeling) hurt more than help.
>
> However, I have another concern with your construct. Es
On (03/30/15 14:24), Benjamin Herrenschmidt wrote:
> > +
> > +#define IOMMU_POOL_HASHBITS 4
> > +#define IOMMU_NR_POOLS (1 << IOMMU_POOL_HASHBITS)
>
> I don't like those macros. You changed the value from what we had on
> powerpc. It could be that the new values are as good for us but
On (03/26/15 08:05), Benjamin Herrenschmidt wrote:
> > PowerPC folks, what do you think?
>
> I'll give it another look today.
>
> Cheers,
> Ben.
Hi Ben,
did you have a chance to look at this?
--Sowmini
___
Linuxppc-dev mailing list
Linuxppc-dev@list
On (03/25/15 21:43), casca...@linux.vnet.ibm.com wrote:
> However, when using large TCP send/recv (I used uperf with 64KB
> writes/reads), I noticed that on the transmit side, largealloc is not
> used, but on the receive side, cxgb4 almost only uses largealloc, while
> qlge seems to have a 1/1 usag
On (03/24/15 18:16), David Miller wrote:
> Generally this looks fine to me.
>
> But about patch #2, I see no reason to have multiple iommu_pool_hash
> tables. Even from a purely sparc perspective, we can always just do
> with just one of them.
>
> Furthermore, you can even probably move it down
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-
ure.
Signed-off-by: Sowmini Varadhan
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not very large
- cookie_
Changes from patchv6: moved pool_hash initialization to
lib/iommu-common.c and cleaned up code duplication from
sun4v/sun4u/ldc.
Sowmini (2):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Sowmini
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
---
v2: moved sparc specific fileds into iommu_sparc
iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Sowmini Varadhan (1):
Make LDC use common iommu poll management functions
arch/sparc/include/asm/iommu_64.h |7 +-
arch/sparc/kernel/iommu.c
ure.
Signed-off-by: Sowmini Varadhan
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not very large
- cookie_
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
---
v2: moved sparc specific fileds into iommu_sparc
benh> It might be sufficient to add a flush counter and compare it between runs
benh> if actual wall-clock benchmarks are too hard to do (especially if you
benh> don't have things like very fast network cards at hand).
benh>
benh> Number of flush / number of packets might be a sufficient metric, it
On (03/24/15 11:47), Benjamin Herrenschmidt wrote:
>
> Yes, pass a function pointer argument that can be NULL or just make it a
> member of the iommu_allocator struct (or whatever you call it) passed to
> the init function and that can be NULL. My point is we don't need a
> separate "ops" structur
On (03/24/15 09:36), Benjamin Herrenschmidt wrote:
>
> - One pool only
>
> - Whenever the allocation is before the previous hint, do a flush, that
> should only happen if a wrap around occurred or in some cases if the
> device DMA mask forced it. I think we always update the hint whenever we
>
On (03/24/15 09:21), Benjamin Herrenschmidt wrote:
>
> So we have two choices here that I can see:
>
> - Keep that old platform use the old/simpler allocator
Problem with that approach is that the base "struct iommu" structure
for sparc gets a split personality: the older one is used with
the o
On (03/23/15 15:05), David Miller wrote:
>
> Why add performance regressions to old machines who already are
> suffering too much from all the bloat we are constantly adding to the
> kernel?
I have no personal opinion on this- it's a matter of choosing
whether we want to have some extra baggage
On (03/23/15 12:29), David Miller wrote:
>
> In order to elide the IOMMU flush as much as possible, I implemnented
> a scheme for sun4u wherein we always allocated from low IOMMU
> addresses to high IOMMU addresses.
>
> In this regime, we only need to flush the IOMMU when we rolled over
> back to
On (03/23/15 09:02), Benjamin Herrenschmidt wrote:
> > How does this relate to the ARM implementation? There is currently
> > an effort going on to make that one shared with ARM64 and possibly
> > x86. Has anyone looked at both the PowerPC and ARM ways of doing the
> > allocation to see if we could
Turned out that I was able to iterate over it, and remove
both the ->cookie_to_index and the ->demap indirection from
iommu_tbl_ops.
That leaves only the odd iommu_flushall() hook, I'm trying
to find the history behind that (needed for sun4u platforms,
afaik, and not sure if there are other ways t
typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan
---
v3: added this file to be a consumer of the common iommu library
v4: removed ->cookie_to_index and ->demap from iommu_tbl_ops and instead
inline these calls into ldc before calling into iommu-
ure.
Signed-off-by: Sowmini Varadhan
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not very large
- cookie_
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan
---
v2: moved sparc specific fileds into iommu_sparc
added the "skip_span_boundary" argument to iommu_tbl_pool_init() for
those callers like LDC which do no care about span boundary checks.
Sowmini (2):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
On 03/19/2015 02:01 PM, Benjamin Herrenschmidt wrote:
Ben> One thing I noticed is the asymetry in your code between the alloc
Ben> and the free path. The alloc path is similar to us in that the lock
Ben> covers the allocation and that's about it, there's no actual mapping to
Ben> the HW done, it'
53 matches
Mail list logo