trace: implement trace register API

Mattias Rönnblom Mon, 23 Mar 2020 09:44:48 -0700

On 2020-03-23 16:08, Jerin Jacob wrote:
> On Mon, Mar 23, 2020 at 8:13 PM Mattias Rönnblom
> <[email protected]> wrote:
>> On 2020-03-23 14:37, Jerin Jacob wrote:
>>>>> +     }
>>>>> +
>>>>> +     /* Initialize the trace point */
>>>>> +     if (rte_strscpy(tp->name, name, TRACE_POINT_NAME_SIZE) < 0) {
>>>>> +             trace_err("name is too long");
>>>>> +             rte_errno = E2BIG;
>>>>> +             goto free;
>>>>> +     }
>>>>> +
>>>>> +     /* Copy the field data for future use */
>>>>> +     if (rte_strscpy(tp->ctf_field, field, TRACE_CTF_FIELD_SIZE) < 0) {
>>>>> +             trace_err("CTF field size is too long");
>>>>> +             rte_errno = E2BIG;
>>>>> +             goto free;
>>>>> +     }
>>>>> +
>>>>> +     /* Clear field memory for the next event */
>>>>> +     memset(field, 0, TRACE_CTF_FIELD_SIZE);
>>>>> +
>>>>> +     /* Form the trace handle */
>>>>> +     *handle = sz;
>>>>> +     *handle |= trace.nb_trace_points << __RTE_TRACE_FIELD_ID_SHIFT;
>>>>> +     *handle |= (uint64_t)level << __RTE_TRACE_FIELD_LEVEL_SHIFT;
>>>> If *handle would be a struct, you could use a bitfield instead, and much
>>>> simplify this code.
>>> I thought that initially, Two reasons why I did not do that
>>> 1) The flags have been used in fastpath, I prefer to work with flags
>>> in fastpath so that
>> Is it really that obvious that flags are faster than bitfield
>> operations? I think most modern architectures have machine instructions
>> for bitfield manipulation.
> Add x86 maintainers.
>
> There were comments in ml about bitfield inefficiency usage with x86.
>
> https://protect2.fireeye.com/v1/url?k=2bd2d3ad-7706d931-2bd29336-8631fc8bdea5-8a1bf17ed26f6ce6&q=1&e=0c620ac5-c028-44d9-a4e8-e04057940075&u=http%3A%2F%2Fpatches.dpdk.org%2Fpatch%2F16482%2F
>
> Search for: Bitfileds are efficient on Octeon. What's about other CPUs
> you have in
> mind? x86 is not as efficient.



I thought both ARM and x86 had bitfield access instructions, but it 
looks like I was wrong about x86. x86_64 GCC seems to convert bitfield 
read to 'shr' and 'and', just like an open-coded bitfield. Bitfield 
write requires more instructions.


> Thoughts from x86 folks.
>
>>> there is no performance impact using bitfields from the compiler _if any_.
>>> 2) In some of the places, I can simply operate on APIs like
>>> __atomic_and_fetch() with flags.
>> I think you may still use such atomic operations. Just convert the
>> struct to a uint64_t, which will essentially be a no-operation, and fire
>> away.
> Not sure, We think about the atomic "and" and fetch here.
> That memcpy may translate additional load/store based on the compiler
> optimization level.(say compiled with -O0)


I would be surprised if that happened on anything but -O0. At least 
modern GCC on ARM and x86_64 don't seem to add any loads or stores.


I assume you are not suggesting we should optimize for -O0.


>>
>> static uint64_t
>>
>> __rte_trace_raw(struct trace *t)
>>
>> {
>>
>>       uint64_t raw;
>>
>>       memcpy(&raw, t, sizeof(struct trace));
>>
>>       return raw;
>>
>> }
>>
>>

Re: [dpdk-dev] [PATCH v1 03/32] eal/trace: implement trace register API

Reply via email to