+1 - thanks Dave

On 20/01/2020 04:48, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: d...@barachs.net <d...@barachs.net>
>> Sent: Saturday, January 18, 2020 8:45 PM
>> To: 'Ray Kinsella' <m...@ashroe.eu>; Jerin Jacob Kollanukkaran
>> <jer...@marvell.com>; 'dpdk-dev' <dev@dpdk.org>
>> Subject: [EXT] RE: [RFC] [dpdk-dev] DPDK Trace support
>> It would be well worth considering one of the vpp techniques to minimize 
>> trace
>> impact:
>> static inline ring_handler_inline (..., int is_traced) {
>>   for (i = 0; i < vector_size; i++)
>>     {
>>       if (is_traced)
>>      {
>>        do_trace_work;
>>      }
>>       normal_packet_processing;
>>     }
>> }
>> ring_handler (...)
>> {
>>   if (PREDICT_FALSE(global_trace_flag != 0))
>>     return ring_handler_inline (..., 1 /* is_traced */);
>>   else
>>     return ring_handler_inline (..., 0 /* is_traced */); }
>> This reduces the runtime tax to the absolute minimum, but costs space.
>> Please consider it.
> Thanks Dave for your thoughts.
>> HTH... Dave
>> -----Original Message-----
>> From: Ray Kinsella <m...@ashroe.eu>
>> Sent: Monday, January 13, 2020 6:00 AM
>> To: Jerin Jacob Kollanukkaran <jer...@marvell.com>; dpdk-dev
>> <dev@dpdk.org>; d...@barachs.net
>> Subject: Re: [RFC] [dpdk-dev] DPDK Trace support
>> Hi Jerin,
>> Any idea why lttng performance is so poor?
>> I would have naturally gone there to benefit from the existing toolchain.
>> Have you looked at the FD.io logging/tracing infrastructure for inspiration?
>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__wiki.fd.io_view_VPP_elog&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1
>> DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=b9wJHO_k_ijKT84q47_
>> fO7MrN-LddnfpVSuNh6ce6Ks&s=WNwcIA86Rk2TY_C7O4bNTj3055Ofutab-
>> bMPuM9-D4A&e=
>> Ray K
>> On 13/01/2020 10:40, Jerin Jacob Kollanukkaran wrote:
>>> Hi All,
>>> I would like to add tracing support for DPDK.
>>> I am planning to add this support in v20.05 release.
>>> This RFC attempts to get feedback from the community on
>>> a) Tracing Use cases.
>>> b) Tracing Requirements.
>>> b) Implementation choices.
>>> c) Trace format.
>>> Use-cases
>>> ---------
>>> - Most of the cases, The DPDK provider will not have access to the DPDK
>> customer applications.
>>> To debug/analyze the slow path and fast path DPDK API usage from the
>>> field, we need to have integrated trace support in DPDK.
>>> - Need a low overhead Fast path multi-core PMD driver
>>> debugging/analysis infrastructure in DPDK to fix the functional and
>> performance issue(s) of PMD.
>>> - Post trace analysis tools can provide various status across the
>>> system such as cpu_idle() using the timestamp added in the trace.
>>> Requirements:
>>> -------------
>>> - Support for Linux, FreeBSD and Windows OS
>>> - Open trace format
>>> - Multi-platform Open source trace viewer
>>> - Absolute low overhead trace API for DPDK fast path tracing/debugging.
>>> - Dynamic enable/disable of trace events
>>> To enable trace support in DPDK, following items need to work out:
>>> a) Add the DPDK trace points in the DPDK source code.
>>> - This includes updating DPDK functions such as,
>>> rte_eth_dev_configure(), rte_eth_dev_start(), rte_eth_dev_rx_burst() to emit
>> the trace.
>>> b) Choosing suitable serialization-format
>>> - Common Trace Format, CTF, is an open format and language to describe
>> trace formats.
>>> This enables tool reuse, of which line-textual (babeltrace) and
>>> graphical (TraceCompass) variants already exist.
>>> CTF should look familiar to C programmers but adds stronger typing.
>>> See CTF - A Flexible, High-performance Binary Trace Format.
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__diamon.org_ctf_&d
>> =DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4
>> wCNtTa4
>>> UUKvcsvI&m=b9wJHO_k_ijKT84q47_fO7MrN-
>> LddnfpVSuNh6ce6Ks&s=QErjHnVHM1me2
>>> 4a6NGGIwiU6O5yot32ZW0vHbPnwZRg&e=
>>> c) Writing the on-target serialization code,
>>> See the section below.(Lttng CTF trace emitter vs DPDK specific CTF
>>> trace emitter)
>>> d) Deciding on and writing the I/O transport mechanics,
>>> For performance reasons, it should be backed by a huge-page and write to 
>>> file
>> IO.
>>> e) Writing the PC-side deserializer/parser,
>>> Both the babletrace(CLI tool) and Trace Compass(GUI tool) support CTF.
>>> See:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lttng.org_viewers
>> _&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s
>> 5f4wCNt
>>> Ta4UUKvcsvI&m=b9wJHO_k_ijKT84q47_fO7MrN-
>> LddnfpVSuNh6ce6Ks&s=JCCywchwpf
>>> jb7Cta5ykYG-SHkMnNUyqPRHh9QAFIcXg&e=
>>> f) Writing tools for filtering and presentation.
>>> See item (e)
>>> Lttng CTF trace emitter vs DPDK specific CTF trace emitter
>>> ----------------------------------------------------------
>>> I have written a performance evaluation application to measure the
>>> overhead of Lttng CTF emitter(The fastpath infrastructure used by
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lttng.org_&d=DwIF
>> aQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtT
>> a4UUKvc
>>> svI&m=b9wJHO_k_ijKT84q47_fO7MrN-
>> LddnfpVSuNh6ce6Ks&s=dgfSVlEy8_W0IovAga
>>> TnUT2ZbwCojfHimNxuyp4w7gI&e=  library to emit the trace)
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jerinj
>>> acobk_lttng-
>> 2Doverhead&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz
>>> 6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=b9wJHO_k_ijKT84q47_fO7MrN-
>> LddnfpVSu
>>> Nh6ce6Ks&s=uSB4IwIan6cs9NuEUvGezK_jfdJj7Rjp0qrbThjk08M&e=
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jerinj
>>> acobk_lttng-
>> 2Doverhead_blob_master_README&d=DwIFaQ&c=nKjWec2b6R0mOyPaz
>> 7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=b9wJHO_k_i
>> jKT84q
>>> 47_fO7MrN-LddnfpVSuNh6ce6Ks&s=CudvGIANC2gl_e-
>> TIAQt2IfpoczlIJIUee9IF78L
>>> GHo&e=
>>> I could improve the performance by 30% by adding the "DPDK"
>>> based plugin for get_clock() and get_cpu(), Here are the performance
>>> numbers after adding the plugin on
>>> x86 and various arm64 board that I have access to,
>>> On high-end x86, it comes around 236 cycles/~100ns @ 2.4GHz (See the
>>> last line in the log(ZERO_ARG)) On arm64, it varies from 312 cycles to 1100
>> cycles(based on the class of CPU).
>>> In short, Based on the "IPC capabilities", The cost would be around
>>> 100ns to 400ns for single void trace(a trace without any argument)
>>> [lttng-overhead-x86] $ sudo ./calibrate/build/app/calibrate -c 0xc0
>>> make[1]: Entering directory '/export/lttng-overhead-x86/calibrate'
>>> make[1]: Leaving directory '/export/lttng-overhead-x86/calibrate'
>>> EAL: Detected 56 lcore(s)
>>> EAL: Detected 2 NUMA nodes
>>> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
>>> EAL: Selected IOVA mode 'PA'
>>> EAL: Probing VFIO support...
>>> EAL: PCI device 0000:01:00.0 on NUMA socket 0
>>> EAL:   probe driver: 8086:1521 net_e1000_igb
>>> EAL: PCI device 0000:01:00.1 on NUMA socket 0
>>> EAL:   probe driver: 8086:1521 net_e1000_igb
>>> CPU Timer freq is 2600.000000MHz
>>> NOP: cycles=0.194834 ns=0.074936
>>> GET_CLOCK: cycles=47.854658 ns=18.405638
>>> GET_CPU: cycles=30.995892 ns=11.921497
>>> ZERO_ARG: cycles=236.945113 ns=91.132736
>>> We will have only 16.75ns to process 59.2 mpps(40Gbps), So IMO, Lttng
>>> CTF emitter may not fit the DPDK fast path purpose due to the cost
>> associated with generic Lttng features.
>>> One option could be to have, native CTF emitter in EAL/DPDK to emit
>>> the trace in a hugepage. I think it would be a handful of cycles if we
>>> limit the features to the requirements above:
>>> The upside of using Lttng CTF emitter:
>>> a) No need to write a new CTF trace emitter(the item (c))
>>> The downside of Lttng CTF emitter(the item (c))
>>> a) performance issue(See above)
>>> b) Lack of Windows OS support. It looks like, it has basic FreeBSD support.
>>> c) dpdk library dependency to lttng for trace.
>>> So, Probably it good to have native CTF emitter in DPDK and reuse all
>>> open-source trace viewer(babeltrace and  TraceCompass) and format(CTF)
>> infrastructure.
>>> I think, it would be best of both world.
>>> Any thoughts on this subject? Based on the community feedback, I can work
>> on the patch for v20.05.

Reply via email to