On 2020-03-23 16:08, Jerin Jacob wrote: > On Mon, Mar 23, 2020 at 8:13 PM Mattias Rönnblom > <mattias.ronnb...@ericsson.com> wrote: >> On 2020-03-23 14:37, Jerin Jacob wrote: >>>>> + } >>>>> + >>>>> + /* Initialize the trace point */ >>>>> + if (rte_strscpy(tp->name, name, TRACE_POINT_NAME_SIZE) < 0) { >>>>> + trace_err("name is too long"); >>>>> + rte_errno = E2BIG; >>>>> + goto free; >>>>> + } >>>>> + >>>>> + /* Copy the field data for future use */ >>>>> + if (rte_strscpy(tp->ctf_field, field, TRACE_CTF_FIELD_SIZE) < 0) { >>>>> + trace_err("CTF field size is too long"); >>>>> + rte_errno = E2BIG; >>>>> + goto free; >>>>> + } >>>>> + >>>>> + /* Clear field memory for the next event */ >>>>> + memset(field, 0, TRACE_CTF_FIELD_SIZE); >>>>> + >>>>> + /* Form the trace handle */ >>>>> + *handle = sz; >>>>> + *handle |= trace.nb_trace_points << __RTE_TRACE_FIELD_ID_SHIFT; >>>>> + *handle |= (uint64_t)level << __RTE_TRACE_FIELD_LEVEL_SHIFT; >>>> If *handle would be a struct, you could use a bitfield instead, and much >>>> simplify this code. >>> I thought that initially, Two reasons why I did not do that >>> 1) The flags have been used in fastpath, I prefer to work with flags >>> in fastpath so that >> Is it really that obvious that flags are faster than bitfield >> operations? I think most modern architectures have machine instructions >> for bitfield manipulation. > Add x86 maintainers. > > There were comments in ml about bitfield inefficiency usage with x86. > > https://protect2.fireeye.com/v1/url?k=2bd2d3ad-7706d931-2bd29336-8631fc8bdea5-8a1bf17ed26f6ce6&q=1&e=0c620ac5-c028-44d9-a4e8-e04057940075&u=http%3A%2F%2Fpatches.dpdk.org%2Fpatch%2F16482%2F > > Search for: Bitfileds are efficient on Octeon. What's about other CPUs > you have in > mind? x86 is not as efficient.
I thought both ARM and x86 had bitfield access instructions, but it looks like I was wrong about x86. x86_64 GCC seems to convert bitfield read to 'shr' and 'and', just like an open-coded bitfield. Bitfield write requires more instructions. > Thoughts from x86 folks. > >>> there is no performance impact using bitfields from the compiler _if any_. >>> 2) In some of the places, I can simply operate on APIs like >>> __atomic_and_fetch() with flags. >> I think you may still use such atomic operations. Just convert the >> struct to a uint64_t, which will essentially be a no-operation, and fire >> away. > Not sure, We think about the atomic "and" and fetch here. > That memcpy may translate additional load/store based on the compiler > optimization level.(say compiled with -O0) I would be surprised if that happened on anything but -O0. At least modern GCC on ARM and x86_64 don't seem to add any loads or stores. I assume you are not suggesting we should optimize for -O0. >> >> static uint64_t >> >> __rte_trace_raw(struct trace *t) >> >> { >> >> uint64_t raw; >> >> memcpy(&raw, t, sizeof(struct trace)); >> >> return raw; >> >> } >> >>