On Sun, Mar 12, 2017 at 6:49 PM, Eric Dumazet wrote:
> On Sun, 2017-03-12 at 17:49 +0200, Saeed Mahameed wrote:
>> On Sun, Mar 12, 2017 at 5:29 PM, Eric Dumazet wrote:
>> > On Sun, 2017-03-12 at 07:57 -0700, Eric Dumazet wrote:
>> >
>> >> Problem is XDP TX :
>> >>
>> >> I do not see any guarantee
On Sun, 2017-03-12 at 17:49 +0200, Saeed Mahameed wrote:
> On Sun, Mar 12, 2017 at 5:29 PM, Eric Dumazet wrote:
> > On Sun, 2017-03-12 at 07:57 -0700, Eric Dumazet wrote:
> >
> >> Problem is XDP TX :
> >>
> >> I do not see any guarantee mlx4_en_recycle_tx_desc() runs while the NAPI
> >> RX is owne
On Sun, Mar 12, 2017 at 5:29 PM, Eric Dumazet wrote:
> On Sun, 2017-03-12 at 07:57 -0700, Eric Dumazet wrote:
>
>> Problem is XDP TX :
>>
>> I do not see any guarantee mlx4_en_recycle_tx_desc() runs while the NAPI
>> RX is owned by current cpu.
>>
>> Since TX completion is using a different NAPI,
On Sun, 2017-03-12 at 07:57 -0700, Eric Dumazet wrote:
> Problem is XDP TX :
>
> I do not see any guarantee mlx4_en_recycle_tx_desc() runs while the NAPI
> RX is owned by current cpu.
>
> Since TX completion is using a different NAPI, I really do not believe
> we can avoid an atomic operation, l
On Wed, 2017-02-22 at 18:06 -0800, Eric Dumazet wrote:
> On Wed, 2017-02-22 at 17:08 -0800, Alexander Duyck wrote:
>
> >
> > Right but you were talking about using both halves one after the
> > other. If that occurs you have nothing left that you can reuse. That
> > was what I was getting at.
On Wed, 22 Feb 2017 18:06:58 -0800
Eric Dumazet wrote:
> On Wed, 2017-02-22 at 17:08 -0800, Alexander Duyck wrote:
>
> >
> > Right but you were talking about using both halves one after the
> > other. If that occurs you have nothing left that you can reuse. That
> > was what I was getting at.
On 23/02/2017 4:18 AM, Alexander Duyck wrote:
On Wed, Feb 22, 2017 at 6:06 PM, Eric Dumazet wrote:
On Wed, 2017-02-22 at 17:08 -0800, Alexander Duyck wrote:
Right but you were talking about using both halves one after the
other. If that occurs you have nothing left that you can reuse. Th
On Wed, Feb 22, 2017 at 6:06 PM, Eric Dumazet wrote:
> On Wed, 2017-02-22 at 17:08 -0800, Alexander Duyck wrote:
>
>>
>> Right but you were talking about using both halves one after the
>> other. If that occurs you have nothing left that you can reuse. That
>> was what I was getting at. If you
On Wed, 2017-02-22 at 17:08 -0800, Alexander Duyck wrote:
>
> Right but you were talking about using both halves one after the
> other. If that occurs you have nothing left that you can reuse. That
> was what I was getting at. If you use up both halves you end up
> having to unmap the page.
>
On Wed, Feb 22, 2017 at 10:21 AM, Eric Dumazet wrote:
> On Wed, 2017-02-22 at 09:23 -0800, Alexander Duyck wrote:
>> On Wed, Feb 22, 2017 at 8:22 AM, Eric Dumazet wrote:
>> > On Mon, 2017-02-13 at 11:58 -0800, Eric Dumazet wrote:
>> >> Use of order-3 pages is problematic in some cases.
>> >>
>> >
On Wed, 2017-02-22 at 09:23 -0800, Alexander Duyck wrote:
> On Wed, Feb 22, 2017 at 8:22 AM, Eric Dumazet wrote:
> > On Mon, 2017-02-13 at 11:58 -0800, Eric Dumazet wrote:
> >> Use of order-3 pages is problematic in some cases.
> >>
> >> This patch might add three kinds of regression :
> >>
> >> 1
From: Alexander Duyck
> Sent: 22 February 2017 17:24
...
> So there is a problem that is being overlooked here. That is the cost
> of the DMA map/unmap calls. The problem is many PowerPC systems have
> an IOMMU that you have to work around, and that IOMMU comes at a heavy
> cost for every map/unm
On Wed, Feb 22, 2017 at 8:22 AM, Eric Dumazet wrote:
> On Mon, 2017-02-13 at 11:58 -0800, Eric Dumazet wrote:
>> Use of order-3 pages is problematic in some cases.
>>
>> This patch might add three kinds of regression :
>>
>> 1) a CPU performance regression, but we will add later page
>> recycling
On Mon, 2017-02-13 at 11:58 -0800, Eric Dumazet wrote:
> Use of order-3 pages is problematic in some cases.
>
> This patch might add three kinds of regression :
>
> 1) a CPU performance regression, but we will add later page
> recycling and performance should be back.
>
> 2) TCP receiver could g
On Thu, Feb 16, 2017 at 9:03 PM, David Miller wrote:
> From: Tom Herbert
> Date: Thu, 16 Feb 2017 09:05:26 -0800
>
>> On Thu, Feb 16, 2017 at 5:08 AM, Tariq Toukan wrote:
>>>
>>> On 15/02/2017 6:57 PM, Eric Dumazet wrote:
On Wed, Feb 15, 2017 at 8:42 AM, Tariq Toukan wrote:
>
On Thu, Feb 16, 2017 at 7:11 PM, Eric Dumazet wrote:
>> You're admitting that Eric's patches improve driver quality,
>> stability, and performance but you're not allowing this in the kernel
>> because "we know what benchmarks our customers are going to run".
>
> Note that I do not particularly car
From: Tom Herbert
Date: Thu, 16 Feb 2017 09:05:26 -0800
> On Thu, Feb 16, 2017 at 5:08 AM, Tariq Toukan wrote:
>>
>> On 15/02/2017 6:57 PM, Eric Dumazet wrote:
>>>
>>> On Wed, Feb 15, 2017 at 8:42 AM, Tariq Toukan wrote:
Isn't it the same principle in page_frag_alloc() ?
It is ca
> You're admitting that Eric's patches improve driver quality,
> stability, and performance but you're not allowing this in the kernel
> because "we know what benchmarks our customers are going to run".
Note that I do not particularly care if these patches go in 4.11 or 4.12 really.
I already bac
On Thu, Feb 16, 2017 at 5:08 AM, Tariq Toukan wrote:
>
> On 15/02/2017 6:57 PM, Eric Dumazet wrote:
>>
>> On Wed, Feb 15, 2017 at 8:42 AM, Tariq Toukan wrote:
>>>
>>> Isn't it the same principle in page_frag_alloc() ?
>>> It is called form __netdev_alloc_skb()/__napi_alloc_skb().
>>>
>>> Why is i
On Thu, 2017-02-16 at 15:08 +0200, Tariq Toukan wrote:
> On 15/02/2017 6:57 PM, Eric Dumazet wrote:
> > On Wed, Feb 15, 2017 at 8:42 AM, Tariq Toukan wrote:
> >> Isn't it the same principle in page_frag_alloc() ?
> >> It is called form __netdev_alloc_skb()/__napi_alloc_skb().
> >>
> >> Why is it o
On 15/02/2017 6:57 PM, Eric Dumazet wrote:
On Wed, Feb 15, 2017 at 8:42 AM, Tariq Toukan wrote:
Isn't it the same principle in page_frag_alloc() ?
It is called form __netdev_alloc_skb()/__napi_alloc_skb().
Why is it ok to have order-3 pages (PAGE_FRAG_CACHE_MAX_ORDER) there?
This is not ok.
On Wed, Feb 15, 2017 at 8:42 AM, Tariq Toukan wrote:
>
>
> Isn't it the same principle in page_frag_alloc() ?
> It is called form __netdev_alloc_skb()/__napi_alloc_skb().
>
> Why is it ok to have order-3 pages (PAGE_FRAG_CACHE_MAX_ORDER) there?
This is not ok.
This is a very well known problem,
On 14/02/2017 7:29 PM, Tom Herbert wrote:
On Tue, Feb 14, 2017 at 7:51 AM, Eric Dumazet wrote:
On Tue, 2017-02-14 at 16:56 +0200, Tariq Toukan wrote:
As the previous series caused hangs, we must run functional regression
tests over this series as well.
Run has already started, and results w
>
> This obviously does not work for the case I'm talking about
> (transmitting out another device with XDP).
>
XDP_TX does not handle this yet.
When XDP_TX was added, it was very clear that the transmit _had_ to be
done on the same port.
Since all this discussion happened in this thread ( mlx4:
On Tue, 14 Feb 2017 11:02:01 -0800
Eric Dumazet wrote:
> On Tue, Feb 14, 2017 at 10:46 AM, Jesper Dangaard Brouer
> wrote:
> >
>
> >
> > With this Intel driver page count based recycle approach, the recycle
> > size is tied to the size of the RX ring. As Eric and Tariq discovered.
> > And fo
From: Jesper Dangaard Brouer
Date: Tue, 14 Feb 2017 20:38:22 +0100
> On Tue, 14 Feb 2017 12:04:26 -0500 (EST)
> David Miller wrote:
>
>> One path I see around all of this is full integration. Meaning that
>> we can free pages into the page allocator which are still DMA mapped.
>> And future al
On Tue, 14 Feb 2017 11:06:25 -0800
Alexander Duyck wrote:
> On Tue, Feb 14, 2017 at 10:46 AM, Jesper Dangaard Brouer
> wrote:
> > On Tue, 14 Feb 2017 09:29:54 -0800
> > Alexander Duyck wrote:
> >
> >> On Tue, Feb 14, 2017 at 6:56 AM, Tariq Toukan
> >> wrote:
> >> >
> >> >
> >> > On 14/02/
On Tue, 14 Feb 2017 12:04:26 -0500 (EST)
David Miller wrote:
> From: Tariq Toukan
> Date: Tue, 14 Feb 2017 16:56:49 +0200
>
> > Internally, I already implemented "dynamic page-cache" and
> > "page-reuse" mechanisms in the driver, and together they totally
> > bridge the performance gap.
It s
On Tue, Feb 14, 2017 at 10:46 AM, Jesper Dangaard Brouer
wrote:
> On Tue, 14 Feb 2017 09:29:54 -0800
> Alexander Duyck wrote:
>
>> On Tue, Feb 14, 2017 at 6:56 AM, Tariq Toukan
>> wrote:
>> >
>> >
>> > On 14/02/2017 3:45 PM, Eric Dumazet wrote:
>> >>
>> >> On Tue, Feb 14, 2017 at 4:12 AM, Jespe
On Tue, Feb 14, 2017 at 10:46 AM, Jesper Dangaard Brouer
wrote:
>
>
> With this Intel driver page count based recycle approach, the recycle
> size is tied to the size of the RX ring. As Eric and Tariq discovered.
> And for other performance reasons (memory footprint of walking RX ring
> data-str
On Tue, 14 Feb 2017 09:29:54 -0800
Alexander Duyck wrote:
> On Tue, Feb 14, 2017 at 6:56 AM, Tariq Toukan wrote:
> >
> >
> > On 14/02/2017 3:45 PM, Eric Dumazet wrote:
> >>
> >> On Tue, Feb 14, 2017 at 4:12 AM, Jesper Dangaard Brouer
> >> wrote:
> >>
> >>> It is important to understand that
On Tue, Feb 14, 2017 at 6:56 AM, Tariq Toukan wrote:
>
>
> On 14/02/2017 3:45 PM, Eric Dumazet wrote:
>>
>> On Tue, Feb 14, 2017 at 4:12 AM, Jesper Dangaard Brouer
>> wrote:
>>
>>> It is important to understand that there are two cases for the cost of
>>> an atomic op, which depend on the cache-c
On Tue, Feb 14, 2017 at 7:51 AM, Eric Dumazet wrote:
> On Tue, 2017-02-14 at 16:56 +0200, Tariq Toukan wrote:
>
>> As the previous series caused hangs, we must run functional regression
>> tests over this series as well.
>> Run has already started, and results will be available tomorrow morning.
>
From: David Laight
Date: Tue, 14 Feb 2017 17:17:22 +
> From: David Miller
>> Sent: 14 February 2017 17:04
> ...
>> One path I see around all of this is full integration. Meaning that
>> we can free pages into the page allocator which are still DMA mapped.
>> And future allocations from that
From: David Miller
> Sent: 14 February 2017 17:04
...
> One path I see around all of this is full integration. Meaning that
> we can free pages into the page allocator which are still DMA mapped.
> And future allocations from that device are prioritized to take still
> DMA mapped objects.
...
For
From: Tariq Toukan
Date: Tue, 14 Feb 2017 16:56:49 +0200
> Internally, I already implemented "dynamic page-cache" and
> "page-reuse" mechanisms in the driver, and together they totally
> bridge the performance gap.
I worry about a dynamically growing page cache inside of drivers
because it is in
On Tue, 2017-02-14 at 16:56 +0200, Tariq Toukan wrote:
> As the previous series caused hangs, we must run functional regression
> tests over this series as well.
> Run has already started, and results will be available tomorrow morning.
>
> In general, I really like this series. The re-factoriza
> Anything _relying_ on order-3 pages being available to impress
> friends/customers is a lie.
>
BTW, you do understand that on PowerPC right now, an Ethernet frame
holds 65536*8 = half a MByte , right ?
So any PowerPC host using mlx4 NIC can easily be bringed down, by
using a few TCP flows and
On 14/02/2017 3:45 PM, Eric Dumazet wrote:
On Tue, Feb 14, 2017 at 4:12 AM, Jesper Dangaard Brouer
wrote:
It is important to understand that there are two cases for the cost of
an atomic op, which depend on the cache-coherency state of the
cacheline.
Measured on Skylake CPU i7-6700K CPU @ 4
On Tue, Feb 14, 2017 at 5:45 AM, Eric Dumazet wrote:
>
> Could we now please Ack this v3 and merge it ?
>
BTW I found the limitation on sender side.
After doing :
lpaa23:~# ethtool -c eth0
Coalesce parameters for eth0:
Adaptive RX: on TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-lo
On Tue, Feb 14, 2017 at 4:12 AM, Jesper Dangaard Brouer
wrote:
> It is important to understand that there are two cases for the cost of
> an atomic op, which depend on the cache-coherency state of the
> cacheline.
>
> Measured on Skylake CPU i7-6700K CPU @ 4.00GHz
>
> (1) Local CPU atomic op : 2
On Mon, 13 Feb 2017 15:16:35 -0800
Alexander Duyck wrote:
[...]
> ... As I'm sure Jesper will probably point out the atomic op for
> get_page/page_ref_inc can be pretty expensive if I recall correctly.
It is important to understand that there are two cases for the cost of
an atomic op, which de
On Mon, Feb 13, 2017 at 4:57 PM, Eric Dumazet wrote:
>
>> Alex, be assured that I implemented the full thing, of course.
>
> Patch was :
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index
> aa074e57ce06fb2842fa1faabd156c3cd2fe10f5..
> Alex, be assured that I implemented the full thing, of course.
Patch was :
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index
aa074e57ce06fb2842fa1faabd156c3cd2fe10f5..0ae1b544668d26c24044dbdefdd9b12253596ff9
100644
--- a/drivers/net/e
On Mon, 2017-02-13 at 16:46 -0800, Eric Dumazet wrote:
> Alex, be assured that I implemented the full thing, of course.
>
> ( The pagecnt_bias field, .. refilled every 16K rounds )
Correction, USHRT_MAX is ~64K, not 16K
On Mon, 2017-02-13 at 16:34 -0800, Alexander Duyck wrote:
> On Mon, Feb 13, 2017 at 4:22 PM, Eric Dumazet wrote:
> > On Mon, Feb 13, 2017 at 3:47 PM, Alexander Duyck
> > wrote:
> >
> >> Actually it depends on the use case. In the case of pktgen packets
> >> they are usually dropped pretty early
On Mon, Feb 13, 2017 at 4:22 PM, Eric Dumazet wrote:
> On Mon, Feb 13, 2017 at 3:47 PM, Alexander Duyck
> wrote:
>
>> Actually it depends on the use case. In the case of pktgen packets
>> they are usually dropped pretty early in the receive path. Think
>> something more along the lines of a TCP
On Mon, Feb 13, 2017 at 3:47 PM, Alexander Duyck
wrote:
> Actually it depends on the use case. In the case of pktgen packets
> they are usually dropped pretty early in the receive path. Think
> something more along the lines of a TCP syn flood versus something
> that would be loading up a socke
On Mon, Feb 13, 2017 at 3:29 PM, Eric Dumazet wrote:
> On Mon, Feb 13, 2017 at 3:26 PM, Alexander Duyck
> wrote:
>
>>
>> Odds are for a single TCP flow you won't notice. This tends to be
>> more of a small packets type performance issue. If you hammer on he
>> Rx using pktgen you would be more
On Mon, Feb 13, 2017 at 3:26 PM, Alexander Duyck
wrote:
>
> Odds are for a single TCP flow you won't notice. This tends to be
> more of a small packets type performance issue. If you hammer on he
> Rx using pktgen you would be more likely to see it.
>
> Anyway patch looks fine from a functional
On Mon, Feb 13, 2017 at 3:22 PM, Eric Dumazet wrote:
> On Mon, Feb 13, 2017 at 3:16 PM, Alexander Duyck
> wrote:
>
>> Any plans to add the bulk page count updates back at some point? I
>> just got around to adding it for igb in commit bd4171a5d4c2 ("igb:
>> update code to better handle increment
On Mon, Feb 13, 2017 at 3:16 PM, Alexander Duyck
wrote:
> Any plans to add the bulk page count updates back at some point? I
> just got around to adding it for igb in commit bd4171a5d4c2 ("igb:
> update code to better handle incrementing page count"). I should have
> patches for ixgbe, i40e, an
On Mon, Feb 13, 2017 at 1:09 PM, Eric Dumazet wrote:
> On Mon, Feb 13, 2017 at 12:51 PM, Alexander Duyck
> wrote:
>> On Mon, Feb 13, 2017 at 11:58 AM, Eric Dumazet wrote:
>
>>> + PAGE_SIZE, priv->dma_dir);
>>> page = page_alloc->page;
>>>
On Mon, Feb 13, 2017 at 12:51 PM, Alexander Duyck
wrote:
> On Mon, Feb 13, 2017 at 11:58 AM, Eric Dumazet wrote:
>> + PAGE_SIZE, priv->dma_dir);
>> page = page_alloc->page;
>> /* Revert changes done by mlx4_alloc_pages */
>> -
On Mon, Feb 13, 2017 at 11:58 AM, Eric Dumazet wrote:
> Use of order-3 pages is problematic in some cases.
>
> This patch might add three kinds of regression :
>
> 1) a CPU performance regression, but we will add later page
> recycling and performance should be back.
>
> 2) TCP receiver could grow
Use of order-3 pages is problematic in some cases.
This patch might add three kinds of regression :
1) a CPU performance regression, but we will add later page
recycling and performance should be back.
2) TCP receiver could grow its receive window slightly slower,
because skb->len/skb->truesi
56 matches
Mail list logo