> -----Original Message----- > From: d...@barachs.net <d...@barachs.net> > Sent: Saturday, January 18, 2020 8:45 PM > To: 'Ray Kinsella' <m...@ashroe.eu>; Jerin Jacob Kollanukkaran > <jer...@marvell.com>; 'dpdk-dev' <dev@dpdk.org> > Subject: [EXT] RE: [RFC] [dpdk-dev] DPDK Trace support > > It would be well worth considering one of the vpp techniques to minimize trace > impact: > > static inline ring_handler_inline (..., int is_traced) { > for (i = 0; i < vector_size; i++) > { > if (is_traced) > { > do_trace_work; > } > normal_packet_processing; > } > } > > ring_handler (...) > { > if (PREDICT_FALSE(global_trace_flag != 0)) > return ring_handler_inline (..., 1 /* is_traced */); > else > return ring_handler_inline (..., 0 /* is_traced */); } > > This reduces the runtime tax to the absolute minimum, but costs space. > > Please consider it.
Thanks Dave for your thoughts. > > HTH... Dave > > -----Original Message----- > From: Ray Kinsella <m...@ashroe.eu> > Sent: Monday, January 13, 2020 6:00 AM > To: Jerin Jacob Kollanukkaran <jer...@marvell.com>; dpdk-dev > <dev@dpdk.org>; d...@barachs.net > Subject: Re: [RFC] [dpdk-dev] DPDK Trace support > > Hi Jerin, > > Any idea why lttng performance is so poor? > I would have naturally gone there to benefit from the existing toolchain. > > Have you looked at the FD.io logging/tracing infrastructure for inspiration? > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__wiki.fd.io_view_VPP_elog&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1 > DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=b9wJHO_k_ijKT84q47_ > fO7MrN-LddnfpVSuNh6ce6Ks&s=WNwcIA86Rk2TY_C7O4bNTj3055Ofutab- > bMPuM9-D4A&e= > > Ray K > > On 13/01/2020 10:40, Jerin Jacob Kollanukkaran wrote: > > Hi All, > > > > I would like to add tracing support for DPDK. > > I am planning to add this support in v20.05 release. > > > > This RFC attempts to get feedback from the community on > > > > a) Tracing Use cases. > > b) Tracing Requirements. > > b) Implementation choices. > > c) Trace format. > > > > Use-cases > > --------- > > - Most of the cases, The DPDK provider will not have access to the DPDK > customer applications. > > To debug/analyze the slow path and fast path DPDK API usage from the > > field, we need to have integrated trace support in DPDK. > > > > - Need a low overhead Fast path multi-core PMD driver > > debugging/analysis infrastructure in DPDK to fix the functional and > performance issue(s) of PMD. > > > > - Post trace analysis tools can provide various status across the > > system such as cpu_idle() using the timestamp added in the trace. > > > > > > Requirements: > > ------------- > > - Support for Linux, FreeBSD and Windows OS > > - Open trace format > > - Multi-platform Open source trace viewer > > - Absolute low overhead trace API for DPDK fast path tracing/debugging. > > - Dynamic enable/disable of trace events > > > > > > To enable trace support in DPDK, following items need to work out: > > > > a) Add the DPDK trace points in the DPDK source code. > > > > - This includes updating DPDK functions such as, > > rte_eth_dev_configure(), rte_eth_dev_start(), rte_eth_dev_rx_burst() to emit > the trace. > > > > b) Choosing suitable serialization-format > > > > - Common Trace Format, CTF, is an open format and language to describe > trace formats. > > This enables tool reuse, of which line-textual (babeltrace) and > > graphical (TraceCompass) variants already exist. > > > > CTF should look familiar to C programmers but adds stronger typing. > > See CTF - A Flexible, High-performance Binary Trace Format. > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__diamon.org_ctf_&d > > > =DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4 > wCNtTa4 > > UUKvcsvI&m=b9wJHO_k_ijKT84q47_fO7MrN- > LddnfpVSuNh6ce6Ks&s=QErjHnVHM1me2 > > 4a6NGGIwiU6O5yot32ZW0vHbPnwZRg&e= > > > > c) Writing the on-target serialization code, > > > > See the section below.(Lttng CTF trace emitter vs DPDK specific CTF > > trace emitter) > > > > d) Deciding on and writing the I/O transport mechanics, > > > > For performance reasons, it should be backed by a huge-page and write to > > file > IO. > > > > e) Writing the PC-side deserializer/parser, > > > > Both the babletrace(CLI tool) and Trace Compass(GUI tool) support CTF. > > See: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lttng.org_viewers > > > _&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s > 5f4wCNt > > Ta4UUKvcsvI&m=b9wJHO_k_ijKT84q47_fO7MrN- > LddnfpVSuNh6ce6Ks&s=JCCywchwpf > > jb7Cta5ykYG-SHkMnNUyqPRHh9QAFIcXg&e= > > > > f) Writing tools for filtering and presentation. > > > > See item (e) > > > > > > Lttng CTF trace emitter vs DPDK specific CTF trace emitter > > ---------------------------------------------------------- > > > > I have written a performance evaluation application to measure the > > overhead of Lttng CTF emitter(The fastpath infrastructure used by > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lttng.org_&d=DwIF > > > aQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtT > a4UUKvc > > svI&m=b9wJHO_k_ijKT84q47_fO7MrN- > LddnfpVSuNh6ce6Ks&s=dgfSVlEy8_W0IovAga > > TnUT2ZbwCojfHimNxuyp4w7gI&e= library to emit the trace) > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jerinj > > acobk_lttng- > 2Doverhead&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz > > 6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=b9wJHO_k_ijKT84q47_fO7MrN- > LddnfpVSu > > Nh6ce6Ks&s=uSB4IwIan6cs9NuEUvGezK_jfdJj7Rjp0qrbThjk08M&e= > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jerinj > > acobk_lttng- > 2Doverhead_blob_master_README&d=DwIFaQ&c=nKjWec2b6R0mOyPaz > > > 7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=b9wJHO_k_i > jKT84q > > 47_fO7MrN-LddnfpVSuNh6ce6Ks&s=CudvGIANC2gl_e- > TIAQt2IfpoczlIJIUee9IF78L > > GHo&e= > > > > I could improve the performance by 30% by adding the "DPDK" > > based plugin for get_clock() and get_cpu(), Here are the performance > > numbers after adding the plugin on > > x86 and various arm64 board that I have access to, > > > > On high-end x86, it comes around 236 cycles/~100ns @ 2.4GHz (See the > > last line in the log(ZERO_ARG)) On arm64, it varies from 312 cycles to 1100 > cycles(based on the class of CPU). > > In short, Based on the "IPC capabilities", The cost would be around > > 100ns to 400ns for single void trace(a trace without any argument) > > > > > > [lttng-overhead-x86] $ sudo ./calibrate/build/app/calibrate -c 0xc0 > > make[1]: Entering directory '/export/lttng-overhead-x86/calibrate' > > make[1]: Leaving directory '/export/lttng-overhead-x86/calibrate' > > EAL: Detected 56 lcore(s) > > EAL: Detected 2 NUMA nodes > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > > EAL: Selected IOVA mode 'PA' > > EAL: Probing VFIO support... > > EAL: PCI device 0000:01:00.0 on NUMA socket 0 > > EAL: probe driver: 8086:1521 net_e1000_igb > > EAL: PCI device 0000:01:00.1 on NUMA socket 0 > > EAL: probe driver: 8086:1521 net_e1000_igb > > CPU Timer freq is 2600.000000MHz > > NOP: cycles=0.194834 ns=0.074936 > > GET_CLOCK: cycles=47.854658 ns=18.405638 > > GET_CPU: cycles=30.995892 ns=11.921497 > > ZERO_ARG: cycles=236.945113 ns=91.132736 > > > > > > We will have only 16.75ns to process 59.2 mpps(40Gbps), So IMO, Lttng > > CTF emitter may not fit the DPDK fast path purpose due to the cost > associated with generic Lttng features. > > > > One option could be to have, native CTF emitter in EAL/DPDK to emit > > the trace in a hugepage. I think it would be a handful of cycles if we > > limit the features to the requirements above: > > > > The upside of using Lttng CTF emitter: > > a) No need to write a new CTF trace emitter(the item (c)) > > > > The downside of Lttng CTF emitter(the item (c)) > > a) performance issue(See above) > > b) Lack of Windows OS support. It looks like, it has basic FreeBSD support. > > c) dpdk library dependency to lttng for trace. > > > > So, Probably it good to have native CTF emitter in DPDK and reuse all > > open-source trace viewer(babeltrace and TraceCompass) and format(CTF) > infrastructure. > > I think, it would be best of both world. > > > > Any thoughts on this subject? Based on the community feedback, I can work > on the patch for v20.05. > >