On Tue, Jul 12, 2016 at 09:52:52PM +0200, Jesper Dangaard Brouer wrote:
> >
> > >> Also unconditionally doing batch of 8 may also hurt depending on what
> > >> is happening either with the stack, bpf afterwards or even cpu version.
> > >
> > > See this as software DDIO, if the unlikely case that
On Tue, 12 Jul 2016 09:46:26 -0700
Alexander Duyck wrote:
> On Tue, Jul 12, 2016 at 5:45 AM, Jesper Dangaard Brouer
> wrote:
> > On Mon, 11 Jul 2016 16:05:11 -0700
> > Alexei Starovoitov wrote:
> >
> >> On Mon, Jul 11, 2016 at 01:09:22PM +0200, Jesper Dangaard Brouer wrote:
> >> > > - /* Pr
On Tue, Jul 12, 2016 at 5:45 AM, Jesper Dangaard Brouer
wrote:
> On Mon, 11 Jul 2016 16:05:11 -0700
> Alexei Starovoitov wrote:
>
>> On Mon, Jul 11, 2016 at 01:09:22PM +0200, Jesper Dangaard Brouer wrote:
>> > > - /* Process all completed CQEs */
>> > > + /* Extract and prefetch completed CQEs */
On Mon, 11 Jul 2016 16:05:11 -0700
Alexei Starovoitov wrote:
> On Mon, Jul 11, 2016 at 01:09:22PM +0200, Jesper Dangaard Brouer wrote:
> > > - /* Process all completed CQEs */
> > > + /* Extract and prefetch completed CQEs */
> > > while (XNOR(cqe->owner_sr_opcode & MLX4_CQE_OWNER_MASK,
> > >
On Mon, Jul 11, 2016 at 01:09:22PM +0200, Jesper Dangaard Brouer wrote:
> > - /* Process all completed CQEs */
> > + /* Extract and prefetch completed CQEs */
> > while (XNOR(cqe->owner_sr_opcode & MLX4_CQE_OWNER_MASK,
> > cq->mcq.cons_index & cq->size)) {
> > + vo
On Mon, Jul 11, 2016 at 01:09:22PM +0200, Jesper Dangaard Brouer wrote:
[...]
> This patch is based on top of Brenden's patch 11/12, and is mean to
> replace patch 12/12.
>
> Prefetching is very important for XDP, especially when using a CPU
> without DDIO (here i7-4790K CPU @ 4.00GHz).
>
> Progr
On Fri, 08 Jul 2016 18:02:20 +0200
Jesper Dangaard Brouer wrote:
> This patch is about prefetching without being opportunistic.
> The idea is only to start prefetching on packets that are marked as
> ready/completed in the RX ring.
>
> This is acheived by splitting the napi_poll call mlx4_en_pro
This patch is about prefetching without being opportunistic.
The idea is only to start prefetching on packets that are marked as
ready/completed in the RX ring.
This is acheived by splitting the napi_poll call mlx4_en_process_rx_cq()
loop into two. The first loop extract completed CQEs and start