> -----Original Message-----
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Wednesday, October 9, 2019 7:25 PM
> 
> On Wed, 9 Oct 2019 17:20:58 +0200
> Morten Brørup <m...@smartsharesystems.com> wrote:
> 
> > > -----Original Message-----
> > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > Sent: Wednesday, October 9, 2019 5:15 PM
> > >
> > > On Wed, 9 Oct 2019 17:06:24 +0200
> > > Morten Brørup <m...@smartsharesystems.com> wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > > > Sent: Wednesday, October 9, 2019 5:02 PM
> > > > >
> > > > > On Wed, 9 Oct 2019 11:11:46 +0000
> > > > > "Ananyev, Konstantin" <konstantin.anan...@intel.com> wrote:
> > > > >
> > > > > > Hi Morten,
> > > > > >
> > > > > > >
> > > > > > > Hi Konstantin and Stephen,
> > > > > > >
> > > > > > > I just noticed the same bug in your bpf and pcap libraries:
> > > > > > >
> > > > > > > You are using rte_pktmbuf_mtod(), but should be using
> > > > > rte_pktmbuf_read(). Otherwise you cannot read data across
> multiple
> > > > > segments.
> > > > > >
> > > > > > In plain data buffer mode expected input for BPF program is
> start
> > > of
> > > > > first segment packet data.
> > > > > > Other segments are simply not available to BPF program in
> that
> > > mode.
> > > > > > AFAIK, cBPF uses the same model.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Med venlig hilsen / kind regards
> > > > > > > - Morten Brørup
> > > > > >
> > > > >
> > > > > For packet capture, the BPF program is only allowed to look at
> > > first
> > > > > segment.
> > > > > pktmbuf_read is expensive and can cause a copy.
> > > >
> > > > It is only expensive if going beyond the first segment:
> > > >
> > > > static inline const void *rte_pktmbuf_read(const struct rte_mbuf
> *m,
> > > >         uint32_t off, uint32_t len, void *buf)
> > > > {
> > > >         if (likely(off + len <= rte_pktmbuf_data_len(m)))
> > > >                 return rte_pktmbuf_mtod_offset(m, char *, off);
> > > >         else
> > > >                 return __rte_pktmbuf_read(m, off, len, buf);
> > > > }
> > >
> > > But it would mean potentially big buffer on the stack (in case)
> >
> > No, the buffer only needs to be the size of the accessed data. I use
> it like this:
> >
> > char buffer[sizeof(uint32_t)];
> >
> > for (;; pc++) {
> >     switch (pc->code) {
> >         case BPF_LD_ABS_32:
> >             p = rte_pktmbuf_read(m, pc->k, sizeof(uint32_t), buffer);
> >             if (unlikely(p == NULL)) return 0; /* Attempting to read
> beyond packet. Bail out. */
> >             a = rte_be_to_cpu_32(*(const uint32_t *)p);
> >             continue;
> >         case BPF_LD_ABS_16:
> >             p = rte_pktmbuf_read(m, pc->k, sizeof(uint16_t), buffer);
> >             if (unlikely(p == NULL)) return 0; /* Attempting to read
> beyond packet. Bail out. */
> >             a = rte_be_to_cpu_16(*(const uint16_t *)p);
> >             continue;
> >
> 
> Reading down the chain of mbuf segments to find a uint32_t (and that
> potentially crosses)
> seems like a waste.
> 
        
Slow and painful is the only way to read beyond the first segment, I agree.

But when reading from the first segment, rte_pktmbuf_read() basically does the 
same as your code. So there shouldn't be any performance penalty from 
supporting both by using rte_pktmbuf_read() instead.

I think the modification in the pdump library is simple, as you already pass 
the mbuf. But the bpf library requires more work, as it passes a pointer to the 
data in the first segment to the processing function instead of passing the 
mbuf.

> The purpose of the filter is to look at packet headers.

Some might look deeper. So why prevent it? E.g. our StraighShaper appliance 
sometimes looks deeper, but for performance reasons we stopped using BPF for 
this a long time ago.

> Any driver
> making mbufs that
> are dripples of data is broken. 

I agree very much with you on this regarding general-purpose NICs! Although I 
know of an exception that confirms the rule... a few year ago we worked with a 
component vendor with some very clever performance optimizations doing exactly 
this for specific purposes. Unfortunately it's under NDA, so I can't go into 
details.

> chaining is really meant for case of jumbo or tso.
> 

Try thinking beyond PMD ingress. There are multiple use cases in egress. Here 
are a couple:

- IP Multicast to multiple subnets on a Layer 3 switch. The VLAN ID and Source 
MAC must be replaced in each packet; this can be done using segments.
- Tunnel encapsulation. E.g. putting a packet into a VXLAN tunnel could be done 
using segments.

Reply via email to