On 2/28/2023 1:17 PM, Jerin Jacob wrote: > On Tue, Feb 28, 2023 at 6:16 PM Ferruh Yigit <ferruh.yi...@amd.com> wrote: >> >> On 2/28/2023 11:29 AM, David Marchand wrote: >>> On Tue, Feb 28, 2023 at 12:05 PM Ferruh Yigit <ferruh.yi...@amd.com> wrote: >>>> >>>> On 2/23/2023 12:30 PM, Ankur Dwivedi wrote: >>>>> The speed_fec_capa pointer can be null. So dereferencing the pointer is >>>>> removed and only the pointer is captured in trace function. >>>>> Fixed few more trace functions in which null pointer can be dereferenced. >>>>> >>>>> Coverity issue: 383238 >>>>> Bugzilla ID: 1162 >>>>> Fixes: 6679cf21d608 ("ethdev: add trace points") >>>>> Fixes: ed04fd4072e9 ("ethdev: add trace points for flow") >>>>> >>>>> Signed-off-by: Ankur Dwivedi <adwiv...@marvell.com> >>>> >>>> Hi Ankur, >>>> >>>> There is another bug report: https://bugs.dpdk.org/show_bug.cgi?id=1167 >>>> >>>> >>>> As far as I can see that is caused by '__rte_trace_point_register()' is >>>> calling 'register_fn()' [1]. >>>> >>>> At registering trace point stage, most of the pointers can be invalid, >>>> and this can crash other locations too. >>> >>> I remember hitting this issue when running with UBsan. >>> >>>> >>>> Why 'register_fn()' called withing the trace point register? Can we >>>> remove it? >>> >>> IIRC, this is used to evaluate the size of the trace point event. >>> >>> >> >> Yes, as checked with Jerin, it is used to evaluate size and some sanity >> checks fro size. >> >> We need either find a way to calculate size without really reading the >> pointer content during register phase, all convert all pointer tracing >> to emit_ptr(). >> >> I prefer first option if we can. > > Looking at the root of the issue. > > rte_trace_point_emit_* has two variant > a)trace point version - Which will emit the trace > b)trace register version - Which will find the size of trace > automatically with single definition of trace point to make life easy > for defining the trace point > > In this case, conf value is junk, as we are referencing the value at > registration time. Looks like in PPC arch, the stack content comes as > junk at this point and got this issue. > And other arch or other environment that adders is OK and since we're > just _reading_ the value. It is not making any issue. But it is a bug. > > RTE_TRACE_POINT was designed to just use > rte_trace_point_emit_u16(conf->my_value) so that in registration scope > "conf->my_value" will be omitted by compiler. > But there as issue in using bitfiled[1] as trace is not supporting > bitfield. Ankur added a local variable to work around the bitfiled > tracing. > > So couple of options, I can think of > > 1)Remove the bitfiled trace point( There are only some trace point > uses that, Just we need to remove bitfield items from that) > 2)It is possible to have anonymous union of type like u16, u32 for > tracing the the bitfields[2], but that would need change in public > structure > 3) Another option is to have a #define to define the scope of > registration. Something like [3] which is noisy. > > I think, we can just do 1 now for rc2 and (2) or (3) or some other > ideas we can think in next release. >
+1 to go for option 1, specially at this phase of the release. Only limited number of traces are affected by this bitfield related issue anyway. btw, this is for the Bug 1167, I thought 1167 & 1162 share same root cause but they don't, so 1162 is still open. > > [1] > ../lib/ethdev/ethdev_trace.h: In function > ‘rte_eth_trace_tx_hairpin_queue_setup’: > ../lib/eal/include/rte_trace_point.h:378:21: error: cannot take > address of bit-field ‘peer_count’ > 378 | memcpy(mem, &(in), sizeof(in)); \ > | ^ > > > [2] > struct abc { > union { > uint32_t val; > struct { > val_5_bit:5 > > } > } > } > > [3] > > [main]dell[dpdk.org] $ git diff > diff --git a/lib/eal/include/rte_trace_point_register.h > b/lib/eal/include/rte_trace_point_register.h > index a9682d3f22..266350b29c 100644 > --- a/lib/eal/include/rte_trace_point_register.h > +++ b/lib/eal/include/rte_trace_point_register.h > @@ -16,6 +16,8 @@ extern "C" { > #include <rte_per_lcore.h> > #include <rte_trace_point.h> > > +#define RTE_TRACE_SCOPE_IS_REGISTRATION > + > RTE_DECLARE_PER_LCORE(volatile int, trace_point_sz); > > #define RTE_TRACE_POINT_REGISTER(trace, name) \ > diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h > index 53d1a71ff0..ba42c3d10a 100644 > --- a/lib/ethdev/ethdev_trace.h > +++ b/lib/ethdev/ethdev_trace.h > @@ -237,13 +237,23 @@ RTE_TRACE_POINT( > RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t rx_queue_id, > uint16_t nb_rx_desc, const struct rte_eth_hairpin_conf *conf, > int ret), > - uint16_t peer_count = conf->peer_count; > - uint8_t tx_explicit = conf->tx_explicit; > - uint8_t manual_bind = conf->manual_bind; > - uint8_t use_locked_device_memory = conf->use_locked_device_memory; > - uint8_t use_rte_memory = conf->use_rte_memory; > - uint8_t force_memory = conf->force_memory; > > + uint16_t peer_count; > + uint8_t tx_explicit; > + uint8_t manual_bind; > + uint8_t use_locked_device_memory; > + uint8_t use_rte_memory; > + uint8_t force_memory; > +#ifdef RTE_TRACE_SCOPE_IS_REGISTRATION > + RTE_SET_USED(conf); > +#else > + peer_count = conf->peer_count; > + tx_explicit = conf->tx_explicit; > + manual_bind = conf->manual_bind; > + use_locked_device_memory = conf->use_locked_device_memory; > + use_rte_memory = conf->use_rte_memory; > + force_memory = conf->force_memory; > +#endif > rte_trace_point_emit_u16(port_id); > rte_trace_point_emit_u16(rx_queue_id); > rte_trace_point_emit_u16(nb_rx_desc); > [main]dell[dpdk.org] $