On 18 Apr 2019, at 3:10, Björn Töpel wrote:

On Wed, 17 Apr 2019 at 21:58, Jonathan Lemon <jonathan.le...@gmail.com> wrote:

When the XDP program attached to a zero-copy AF_XDP socket returns XDP_TX, queue the umem frame on the XDP TX ring, and arrange for it to be released
via the ZCA free routine, which should place it back onto the reuseq.


There are a bunch of compiler errors, but I'll leave them out from the comments!

Yes, sigh - generated from the wrong tree. I did compile it, honest!


+static int i40e_xmit_rcvd_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
+{
+       struct i40e_ring *xdp_ring;
+       struct i40e_tx_desc *tx_desc;
+       struct i40e_tx_buffer *tx_bi;
+       struct xdp_frame *xdpf;
+       dma_addr_t dma;
+
+       xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
+
+       if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) {
+               xdp_ring->tx_stats.tx_busy++;
+               return I40E_XDP_CONSUMED;
+       }
+       xpdf = convert_to_xdp_frame_keep_zc(xdp);
+       if (unlikely(!xpdf)
+               return I40E_XDP_CONSUMED;
+       xpdf->handle = xdp->handle;
+
+       dma = xdp_umem_get_dma(rx_ring->xsk_umem, xdp->handle);
+       tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use];
+       tx_bi->bytecount = xpdf->len;
+       tx_bi->gso_segs = 1;
+       tx_bi->xdpf = xdpf;
+       tx_bi->tx_flags = I40E_TX_FLAGS_ZC_FRAME;
+
+       tx_desc = I40E_TX_DESC(xdp_ring, xdp_ring->next_to_use);
+       tx_desc->buffer_addr = cpu_to_le64(dma);
+ tx_desc->cmd_type_offset_bsz = build_ctob(I40E_TX_DESC_CMD_ICRC | + I40E_TX_DESC_CMD_EOP,
+                                                 0, xpdf->len, 0);
+       smp_wmb();
+
+       xdp_ring->next_to_use++;
+       if (xdp_ring->next_to_use == xdp_ring->count)
+               xdp_ring->next_to_use = 0;
+
+       tx_bi->next_to_watch = tx_desc;
+
+       return I40E_XDP_TX;
+}

What you're basically doing here is a AF_XDP Tx, but triggered from
the Rx path, and instead of completion (after the packet is sent) to
the completion ring, it's recycled to the Rx HW ring (via zca_free). I
like the idea but we need more plumbing first. Let me expand;
Unfortunately, the current recycle mechanism requires that at the
point of recycling, there has to be space in Rx ring. In the XDP_TX
case, there's no completion ring, and we cannot guarantee that there's
space on Rx ring (since Rx and Tx are asynchronous).

Yes - this is why it's an RFC. Your current design keeps the RX reuse
context in the driver itself, instead of using the reuse queue provided
by umem. I believe the driver only uses it when the ring is torn down,
so this needs to be addressed.

Since we're transmitting something that was just received, on TX completion the buffer should logically be placed back into the fill queue, but with the current SPSC mechanism, this isn't possible. Hence the ReuseQ. I think this should be transparently be made part of the FQ mechanism, so the RQ is always
checked before using the FQ. I believe xsk_umem_peek_addr_rq() was added
for this purpose, but should really be the main API.


IOW, Rx recycling
can currently *only* be done in an Rx context.

I was assuming that the Tx was part of the same VSI, so the Rx ring would
be accessible from the napi_poll() routine.


What I would like to do, is moving i40e-xsk to Jesper's page-pool,
instead of the existing recycle mechanism. Then we could return the
descriptor to the pool, if the Rx ring doesn't have space for the
completed/sent buffer.

Hm, not sure how this would help? The xdp_return() goes to the ZCA free
routine for zero-copy buffers, bypassing the page pool, doesn't it?


TL;DR version: Passing zc-frames in XDP_TX cannot be done properly
until the Rx recycle mechanism is more robust. :-(

Agree. So i40e_zca_free() would look more like:

  if (space in rx_bi)
    /* locally cache buffer */
    i40e_reuse_rx_buffer_zc(rx, buf);
  else
    xsk_umem_fq_reuse(rx->umem, buf)

and the the two alloc_rx_buffers_*_zc() would need to be adjusted?
--
Jonathan

Reply via email to