On Fri, Jan 29, 2021 at 2:56 PM Sven Van Asbroeck <thesve...@gmail.com> wrote: > > From: Sven Van Asbroeck <thesve...@gmail.com> > > Multi-buffer packets enable us to use rx ring buffers smaller than > the mtu. This will allow us to change the mtu on-the-fly, without > having to stop the network interface in order to re-size the rx > ring buffers. > > This is a big change touching a key driver function (process_packet), > so care has been taken to test this extensively: > > Tests with debug logging enabled (add #define DEBUG). > > 1. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers. > Ping to chip, verify correct packet size is sent to OS. > Ping large packets to chip (ping -s 1400), verify correct > packet size is sent to OS. > Ping using packets around the buffer size, verify number of > buffers is changing, verify correct packet size is sent > to OS: > $ ping -s 472 > $ ping -s 473 > $ ping -s 992 > $ ping -s 993 > Verify that each packet is followed by extension processing. > > 2. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers. > Run iperf3 -s on chip, verify that packets come in 3 buffers > at a time. > Verify that packet size is equal to mtu. > Verify that each packet is followed by extension processing. > > 3. Set chip and host mtu to 2000. > Limit rx buffer size to 500, so mtu (2000) takes 4 buffers. > Run iperf3 -s on chip, verify that packets come in 4 buffers > at a time. > Verify that packet size is equal to mtu. > Verify that each packet is followed by extension processing. > > Tests with debug logging DISabled (remove #define DEBUG). > > 4. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers. > Run iperf3 -s on chip, note sustained rx speed. > Set chip and host mtu to 2000, so mtu takes 4 buffers. > Run iperf3 -s on chip, note sustained rx speed. > Verify no packets are dropped in both cases. > > Tests with DEBUG_KMEMLEAK on: > $ mount -t debugfs nodev /sys/kernel/debug/ > $ echo scan > /sys/kernel/debug/kmemleak > > 5. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers. > Run the following tests concurrently for at least one hour: > - iperf3 -s on chip > - ping -> chip > Monitor reported memory leaks. > > 6. Set chip and host mtu to 2000. > Limit rx buffer size to 500, so mtu (2000) takes 4 buffers. > Run the following tests concurrently for at least one hour: > - iperf3 -s on chip > - ping -> chip > Monitor reported memory leaks. > > 7. Simulate low-memory in lan743x_rx_allocate_skb(): fail every > 100 allocations. > Repeat (5) and (6). > Monitor reported memory leaks. > > 8. Simulate low-memory in lan743x_rx_allocate_skb(): fail 10 > allocations in a row in every 100. > Repeat (5) and (6). > Monitor reported memory leaks. > > 9. Simulate low-memory in lan743x_rx_trim_skb(): fail 1 allocation > in every 100. > Repeat (5) and (6). > Monitor reported memory leaks. > > Signed-off-by: Sven Van Asbroeck <thesve...@gmail.com> > --- > > Tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git # > 46eb3c108fe1 > > To: Bryan Whitehead <bryan.whiteh...@microchip.com> > To: unglinuxdri...@microchip.com > To: "David S. Miller" <da...@davemloft.net> > To: Jakub Kicinski <k...@kernel.org> > Cc: Andrew Lunn <and...@lunn.ch> > Cc: Alexey Denisov <rtg...@gmail.com> > Cc: Sergej Bauer <sba...@blackbox.su> > Cc: Tim Harvey <thar...@gateworks.com> > Cc: Anders Rønningen <and...@ronningen.priv.no> > Cc: netdev@vger.kernel.org > Cc: linux-ker...@vger.kernel.org (open list) > > drivers/net/ethernet/microchip/lan743x_main.c | 321 ++++++++---------- > drivers/net/ethernet/microchip/lan743x_main.h | 2 + > 2 files changed, 143 insertions(+), 180 deletions(-)
> +static struct sk_buff * > +lan743x_rx_trim_skb(struct sk_buff *skb, int frame_length) > +{ > + if (skb_linearize(skb)) { Is this needed? That will be quite expensive > + dev_kfree_skb_irq(skb); > + return NULL; > + } > + frame_length = max_t(int, 0, frame_length - RX_HEAD_PADDING - 2); > + if (skb->len > frame_length) { > + skb->tail -= skb->len - frame_length; > + skb->len = frame_length; > + } > + return skb; > +} > + > static int lan743x_rx_process_packet(struct lan743x_rx *rx) > { > - struct skb_shared_hwtstamps *hwtstamps = NULL; > + struct lan743x_rx_descriptor *descriptor, *desc_ext; > int result = RX_PROCESS_RESULT_NOTHING_TO_DO; > int current_head_index = le32_to_cpu(*rx->head_cpu_ptr); > struct lan743x_rx_buffer_info *buffer_info; > - struct lan743x_rx_descriptor *descriptor; > + struct skb_shared_hwtstamps *hwtstamps; > + int frame_length, buffer_length; > + struct sk_buff *skb; > int extension_index = -1; > - int first_index = -1; > - int last_index = -1; > + bool is_last, is_first; > > if (current_head_index < 0 || current_head_index >= rx->ring_size) > goto done; > @@ -2068,170 +2075,126 @@ static int lan743x_rx_process_packet(struct > lan743x_rx *rx) > if (rx->last_head < 0 || rx->last_head >= rx->ring_size) > goto done; > > - if (rx->last_head != current_head_index) { > - descriptor = &rx->ring_cpu_ptr[rx->last_head]; > - if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_) > - goto done; > + if (rx->last_head == current_head_index) > + goto done; Is it possible to avoid the large indentation change, or else do that in a separate patch? It makes it harder to follow the functional change. > > - if (!(le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_FS_)) > - goto done; > + descriptor = &rx->ring_cpu_ptr[rx->last_head]; > + if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_) > + goto done; > + buffer_info = &rx->buffer_info[rx->last_head]; > > - first_index = rx->last_head; > - if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_) { > - last_index = rx->last_head; > - } else { > - int index; > - > - index = lan743x_rx_next_index(rx, first_index); > - while (index != current_head_index) { > - descriptor = &rx->ring_cpu_ptr[index]; > - if (le32_to_cpu(descriptor->data0) & > RX_DESC_DATA0_OWN_) > - goto done; > - > - if (le32_to_cpu(descriptor->data0) & > RX_DESC_DATA0_LS_) { > - last_index = index; > - break; > - } > - index = lan743x_rx_next_index(rx, index); > - } > - } > - if (last_index >= 0) { > - descriptor = &rx->ring_cpu_ptr[last_index]; > - if (le32_to_cpu(descriptor->data0) & > RX_DESC_DATA0_EXT_) { > - /* extension is expected to follow */ > - int index = lan743x_rx_next_index(rx, > - last_index); > - if (index != current_head_index) { > - descriptor = &rx->ring_cpu_ptr[index]; > - if (le32_to_cpu(descriptor->data0) & > - RX_DESC_DATA0_OWN_) { > - goto done; > - } > - if (le32_to_cpu(descriptor->data0) & > - RX_DESC_DATA0_EXT_) { > - extension_index = index; > - } else { > - goto done; > - } > - } else { > - /* extension is not yet available */ > - /* prevent processing of this packet > */ > - first_index = -1; > - last_index = -1; > - } > - } > - } > + is_last = le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_; > + is_first = le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_FS_; > + > + if (is_last && le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_EXT_) { > + /* extension is expected to follow */ > + int index = lan743x_rx_next_index(rx, rx->last_head); > + > + if (index == current_head_index) > + /* extension not yet available */ > + goto done; > + desc_ext = &rx->ring_cpu_ptr[index]; > + if (le32_to_cpu(desc_ext->data0) & RX_DESC_DATA0_OWN_) > + /* extension not yet available */ > + goto done; > + if (!(le32_to_cpu(desc_ext->data0) & RX_DESC_DATA0_EXT_)) > + goto move_forward; > + extension_index = index; > } > - if (first_index >= 0 && last_index >= 0) { > - int real_last_index = last_index; > - struct sk_buff *skb = NULL; > - u32 ts_sec = 0; > - u32 ts_nsec = 0; > - > - /* packet is available */ > - if (first_index == last_index) { > - /* single buffer packet */ > - struct sk_buff *new_skb = NULL; > - int packet_length; > - > - new_skb = lan743x_rx_allocate_skb(rx); > - if (!new_skb) { > - /* failed to allocate next skb. > - * Memory is very low. > - * Drop this packet and reuse buffer. > - */ > - lan743x_rx_reuse_ring_element(rx, > first_index); > - goto process_extension; > - } > > - buffer_info = &rx->buffer_info[first_index]; > - skb = buffer_info->skb; > - descriptor = &rx->ring_cpu_ptr[first_index]; > - > - /* unmap from dma */ > - packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_ > - (descriptor->data0); > - if (buffer_info->dma_ptr) { > - > dma_sync_single_for_cpu(&rx->adapter->pdev->dev, > - buffer_info->dma_ptr, > - packet_length, > - DMA_FROM_DEVICE); > - > dma_unmap_single_attrs(&rx->adapter->pdev->dev, > - buffer_info->dma_ptr, > - > buffer_info->buffer_length, > - DMA_FROM_DEVICE, > - > DMA_ATTR_SKIP_CPU_SYNC); > - buffer_info->dma_ptr = 0; > - buffer_info->buffer_length = 0; > - } > - buffer_info->skb = NULL; > - packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_ > - (le32_to_cpu(descriptor->data0)); > - skb_put(skb, packet_length - 4); > - skb->protocol = eth_type_trans(skb, > - rx->adapter->netdev); > - lan743x_rx_init_ring_element(rx, first_index, > new_skb); > - } else { > - int index = first_index; > + /* Only the last buffer in a multi-buffer frame contains the total > frame > + * length. All other buffers have a zero frame length. The chip > + * occasionally sends more buffers than strictly required to reach the > + * total frame length. > + * Handle this by adding all buffers to the skb in their entirety. > + * Once the real frame length is known, trim the skb. > + */ > + frame_length = > + > RX_DESC_DATA0_FRAME_LENGTH_GET_(le32_to_cpu(descriptor->data0)); > + buffer_length = buffer_info->buffer_length; > > - /* multi buffer packet not supported */ > - /* this should not happen since buffers are allocated > - * to be at least the mtu size configured in the mac. > - */ > + netdev_dbg(rx->adapter->netdev, "%s%schunk: %d/%d", > + is_first ? "first " : " ", > + is_last ? "last " : " ", > + frame_length, buffer_length); > > - /* clean up buffers */ > - if (first_index <= last_index) { > - while ((index >= first_index) && > - (index <= last_index)) { > - lan743x_rx_reuse_ring_element(rx, > - index); > - index = lan743x_rx_next_index(rx, > - index); > - } > - } else { > - while ((index >= first_index) || > - (index <= last_index)) { > - lan743x_rx_reuse_ring_element(rx, > - index); > - index = lan743x_rx_next_index(rx, > - index); > - } > - } > - } > + /* unmap from dma */ > + if (buffer_info->dma_ptr) { > + dma_unmap_single(&rx->adapter->pdev->dev, > + buffer_info->dma_ptr, > + buffer_info->buffer_length, > + DMA_FROM_DEVICE); > + buffer_info->dma_ptr = 0; > + buffer_info->buffer_length = 0; > + } > + skb = buffer_info->skb; > > -process_extension: > - if (extension_index >= 0) { > - descriptor = &rx->ring_cpu_ptr[extension_index]; > - buffer_info = &rx->buffer_info[extension_index]; > - > - ts_sec = le32_to_cpu(descriptor->data1); > - ts_nsec = (le32_to_cpu(descriptor->data2) & > - RX_DESC_DATA2_TS_NS_MASK_); > - lan743x_rx_reuse_ring_element(rx, extension_index); > - real_last_index = extension_index; > - } > + /* allocate new skb and map to dma */ > + if (lan743x_rx_init_ring_element(rx, rx->last_head)) { > + /* failed to allocate next skb. > + * Memory is very low. > + * Drop this packet and reuse buffer. > + */ > + lan743x_rx_reuse_ring_element(rx, rx->last_head); > + goto process_extension; > + } > + > + /* add buffers to skb via skb->frag_list */ > + if (is_first) { > + skb_reserve(skb, RX_HEAD_PADDING); > + skb_put(skb, buffer_length - RX_HEAD_PADDING); > + if (rx->skb_head) > + dev_kfree_skb_irq(rx->skb_head); > + rx->skb_head = skb; > + } else if (rx->skb_head) { > + skb_put(skb, buffer_length); > + if (skb_shinfo(rx->skb_head)->frag_list) > + rx->skb_tail->next = skb; > + else > + skb_shinfo(rx->skb_head)->frag_list = skb; Instead of chaining skbs into frag_list, you could perhaps delay skb alloc until after reception, allocate buffers stand-alone, and link them into the skb as skb_frags? That might avoid a few skb alloc + frees. Though a bit change, not sure how feasible. > + rx->skb_tail = skb; > + rx->skb_head->len += skb->len; > + rx->skb_head->data_len += skb->len; > + rx->skb_head->truesize += skb->truesize; > + } else { > + rx->skb_head = skb; > + } > > - if (!skb) { > - result = RX_PROCESS_RESULT_PACKET_DROPPED; > - goto move_forward; > +process_extension: > + if (extension_index >= 0) { > + u32 ts_sec; > + u32 ts_nsec; > + > + ts_sec = le32_to_cpu(desc_ext->data1); > + ts_nsec = (le32_to_cpu(desc_ext->data2) & > + RX_DESC_DATA2_TS_NS_MASK_); > + if (rx->skb_head) { > + hwtstamps = skb_hwtstamps(rx->skb_head); > + if (hwtstamps) This is always true. You can just call skb_hwtstamps(skb)->hwtstamp = ktime_set(ts_sec, ts_nsec); Though I see that this is existing code just moved due to aforementioned indentation change.