> -----Original Message-----
> From: Morten Brørup [mailto:m...@smartsharesystems.com]
> Sent: Thursday, October 10, 2019 8:30 AM
> To: Stephen Hemminger <step...@networkplumber.org>
> Cc: Ananyev, Konstantin <konstantin.anan...@intel.com>; dpdk-dev 
> <dev@dpdk.org>; Jerin Jacob <jer...@marvell.com>
> Subject: RE: [dpdk-dev] packet data access bug in bpf and pdump libs
> 
> > -----Original Message-----
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger
> > Sent: Wednesday, October 9, 2019 7:25 PM
> >
> > On Wed, 9 Oct 2019 17:20:58 +0200
> > Morten Brørup <m...@smartsharesystems.com> wrote:
> >
> > > > -----Original Message-----
> > > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > > Sent: Wednesday, October 9, 2019 5:15 PM
> > > >
> > > > On Wed, 9 Oct 2019 17:06:24 +0200
> > > > Morten Brørup <m...@smartsharesystems.com> wrote:
> > > >
> > > > > > -----Original Message-----
> > > > > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > > > > Sent: Wednesday, October 9, 2019 5:02 PM
> > > > > >
> > > > > > On Wed, 9 Oct 2019 11:11:46 +0000
> > > > > > "Ananyev, Konstantin" <konstantin.anan...@intel.com> wrote:
> > > > > >
> > > > > > > Hi Morten,
> > > > > > >
> > > > > > > >
> > > > > > > > Hi Konstantin and Stephen,
> > > > > > > >
> > > > > > > > I just noticed the same bug in your bpf and pcap libraries:
> > > > > > > >
> > > > > > > > You are using rte_pktmbuf_mtod(), but should be using
> > > > > > rte_pktmbuf_read(). Otherwise you cannot read data across
> > multiple
> > > > > > segments.
> > > > > > >
> > > > > > > In plain data buffer mode expected input for BPF program is
> > start
> > > > of
> > > > > > first segment packet data.
> > > > > > > Other segments are simply not available to BPF program in
> > that
> > > > mode.
> > > > > > > AFAIK, cBPF uses the same model.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Med venlig hilsen / kind regards
> > > > > > > > - Morten Brørup
> > > > > > >
> > > > > >
> > > > > > For packet capture, the BPF program is only allowed to look at
> > > > first
> > > > > > segment.
> > > > > > pktmbuf_read is expensive and can cause a copy.
> > > > >
> > > > > It is only expensive if going beyond the first segment:
> > > > >
> > > > > static inline const void *rte_pktmbuf_read(const struct rte_mbuf
> > *m,
> > > > >       uint32_t off, uint32_t len, void *buf)
> > > > > {
> > > > >       if (likely(off + len <= rte_pktmbuf_data_len(m)))
> > > > >               return rte_pktmbuf_mtod_offset(m, char *, off);
> > > > >       else
> > > > >               return __rte_pktmbuf_read(m, off, len, buf);
> > > > > }
> > > >
> > > > But it would mean potentially big buffer on the stack (in case)
> > >
> > > No, the buffer only needs to be the size of the accessed data. I use
> > it like this:
> > >
> > > char buffer[sizeof(uint32_t)];
> > >
> > > for (;; pc++) {
> > >     switch (pc->code) {
> > >         case BPF_LD_ABS_32:
> > >             p = rte_pktmbuf_read(m, pc->k, sizeof(uint32_t), buffer);
> > >             if (unlikely(p == NULL)) return 0; /* Attempting to read
> > beyond packet. Bail out. */
> > >             a = rte_be_to_cpu_32(*(const uint32_t *)p);
> > >             continue;
> > >         case BPF_LD_ABS_16:
> > >             p = rte_pktmbuf_read(m, pc->k, sizeof(uint16_t), buffer);
> > >             if (unlikely(p == NULL)) return 0; /* Attempting to read
> > beyond packet. Bail out. */
> > >             a = rte_be_to_cpu_16(*(const uint16_t *)p);
> > >             continue;
> > >
> >
> > Reading down the chain of mbuf segments to find a uint32_t (and that
> > potentially crosses)
> > seems like a waste.

+1
Again just imagine how painful it would be to support it in JIT...

Another thing - in librte_bpf, if RTE_BPF_ARG_PTR is used,
it means that input parameter for exec/jit is just a plain data buffer and it 
doesn’t
make any assumptions about it (mbuf or not, etc.).

Though there is a possibility to access mbuf metadata
and call  rte_pktmbuf_read() or any other functions from your eBPF code,
but for that you need to specify RTE_BPF_ARG_PTR_MBUF at load stage. 

> >
> 
> Slow and painful is the only way to read beyond the first segment, I agree.
> 
> But when reading from the first segment, rte_pktmbuf_read() basically does 
> the same as your code. So there shouldn't be any performance
> penalty from supporting both by using rte_pktmbuf_read() instead.
> 
> I think the modification in the pdump library is simple, as you already pass 
> the mbuf. But the bpf library requires more work, as it passes a
> pointer to the data in the first segment to the processing function instead 
> of passing the mbuf.
> 
> > The purpose of the filter is to look at packet headers.
> 
> Some might look deeper. So why prevent it? E.g. our StraighShaper appliance 
> sometimes looks deeper, but for performance reasons we
> stopped using BPF for this a long time ago.
> 
> > Any driver
> > making mbufs that
> > are dripples of data is broken.
> 
> I agree very much with you on this regarding general-purpose NICs! Although I 
> know of an exception that confirms the rule... a few year
> ago we worked with a component vendor with some very clever performance 
> optimizations doing exactly this for specific purposes.
> Unfortunately it's under NDA, so I can't go into details.
> 
> > chaining is really meant for case of jumbo or tso.
> >
> 
> Try thinking beyond PMD ingress. There are multiple use cases in egress. Here 
> are a couple:
> 
> - IP Multicast to multiple subnets on a Layer 3 switch. The VLAN ID and 
> Source MAC must be replaced in each packet; this can be done using
> segments.
> - Tunnel encapsulation. E.g. putting a packet into a VXLAN tunnel could be 
> done using segments.

Reply via email to