+1 - thanks Dave
On 20/01/2020 04:48, Jerin Jacob Kollanukkaran wrote: >> -----Original Message----- >> From: d...@barachs.net <d...@barachs.net> >> Sent: Saturday, January 18, 2020 8:45 PM >> To: 'Ray Kinsella' <m...@ashroe.eu>; Jerin Jacob Kollanukkaran >> <jer...@marvell.com>; 'dpdk-dev' <dev@dpdk.org> >> Subject: [EXT] RE: [RFC] [dpdk-dev] DPDK Trace support >> >> It would be well worth considering one of the vpp techniques to minimize >> trace >> impact: >> >> static inline ring_handler_inline (..., int is_traced) { >> for (i = 0; i < vector_size; i++) >> { >> if (is_traced) >> { >> do_trace_work; >> } >> normal_packet_processing; >> } >> } >> >> ring_handler (...) >> { >> if (PREDICT_FALSE(global_trace_flag != 0)) >> return ring_handler_inline (..., 1 /* is_traced */); >> else >> return ring_handler_inline (..., 0 /* is_traced */); } >> >> This reduces the runtime tax to the absolute minimum, but costs space. >> >> Please consider it. > > Thanks Dave for your thoughts. > >> >> HTH... Dave >> >> -----Original Message----- >> From: Ray Kinsella <m...@ashroe.eu> >> Sent: Monday, January 13, 2020 6:00 AM >> To: Jerin Jacob Kollanukkaran <jer...@marvell.com>; dpdk-dev >> <dev@dpdk.org>; d...@barachs.net >> Subject: Re: [RFC] [dpdk-dev] DPDK Trace support >> >> Hi Jerin, >> >> Any idea why lttng performance is so poor? >> I would have naturally gone there to benefit from the existing toolchain. >> >> Have you looked at the FD.io logging/tracing infrastructure for inspiration? >> https://urldefense.proofpoint.com/v2/url?u=https- >> 3A__wiki.fd.io_view_VPP_elog&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1 >> DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=b9wJHO_k_ijKT84q47_ >> fO7MrN-LddnfpVSuNh6ce6Ks&s=WNwcIA86Rk2TY_C7O4bNTj3055Ofutab- >> bMPuM9-D4A&e= >> >> Ray K >> >> On 13/01/2020 10:40, Jerin Jacob Kollanukkaran wrote: >>> Hi All, >>> >>> I would like to add tracing support for DPDK. >>> I am planning to add this support in v20.05 release. >>> >>> This RFC attempts to get feedback from the community on >>> >>> a) Tracing Use cases. >>> b) Tracing Requirements. >>> b) Implementation choices. >>> c) Trace format. >>> >>> Use-cases >>> --------- >>> - Most of the cases, The DPDK provider will not have access to the DPDK >> customer applications. >>> To debug/analyze the slow path and fast path DPDK API usage from the >>> field, we need to have integrated trace support in DPDK. >>> >>> - Need a low overhead Fast path multi-core PMD driver >>> debugging/analysis infrastructure in DPDK to fix the functional and >> performance issue(s) of PMD. >>> >>> - Post trace analysis tools can provide various status across the >>> system such as cpu_idle() using the timestamp added in the trace. >>> >>> >>> Requirements: >>> ------------- >>> - Support for Linux, FreeBSD and Windows OS >>> - Open trace format >>> - Multi-platform Open source trace viewer >>> - Absolute low overhead trace API for DPDK fast path tracing/debugging. >>> - Dynamic enable/disable of trace events >>> >>> >>> To enable trace support in DPDK, following items need to work out: >>> >>> a) Add the DPDK trace points in the DPDK source code. >>> >>> - This includes updating DPDK functions such as, >>> rte_eth_dev_configure(), rte_eth_dev_start(), rte_eth_dev_rx_burst() to emit >> the trace. >>> >>> b) Choosing suitable serialization-format >>> >>> - Common Trace Format, CTF, is an open format and language to describe >> trace formats. >>> This enables tool reuse, of which line-textual (babeltrace) and >>> graphical (TraceCompass) variants already exist. >>> >>> CTF should look familiar to C programmers but adds stronger typing. >>> See CTF - A Flexible, High-performance Binary Trace Format. >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__diamon.org_ctf_&d >>> >> =DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4 >> wCNtTa4 >>> UUKvcsvI&m=b9wJHO_k_ijKT84q47_fO7MrN- >> LddnfpVSuNh6ce6Ks&s=QErjHnVHM1me2 >>> 4a6NGGIwiU6O5yot32ZW0vHbPnwZRg&e= >>> >>> c) Writing the on-target serialization code, >>> >>> See the section below.(Lttng CTF trace emitter vs DPDK specific CTF >>> trace emitter) >>> >>> d) Deciding on and writing the I/O transport mechanics, >>> >>> For performance reasons, it should be backed by a huge-page and write to >>> file >> IO. >>> >>> e) Writing the PC-side deserializer/parser, >>> >>> Both the babletrace(CLI tool) and Trace Compass(GUI tool) support CTF. >>> See: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lttng.org_viewers >>> >> _&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s >> 5f4wCNt >>> Ta4UUKvcsvI&m=b9wJHO_k_ijKT84q47_fO7MrN- >> LddnfpVSuNh6ce6Ks&s=JCCywchwpf >>> jb7Cta5ykYG-SHkMnNUyqPRHh9QAFIcXg&e= >>> >>> f) Writing tools for filtering and presentation. >>> >>> See item (e) >>> >>> >>> Lttng CTF trace emitter vs DPDK specific CTF trace emitter >>> ---------------------------------------------------------- >>> >>> I have written a performance evaluation application to measure the >>> overhead of Lttng CTF emitter(The fastpath infrastructure used by >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lttng.org_&d=DwIF >>> >> aQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtT >> a4UUKvc >>> svI&m=b9wJHO_k_ijKT84q47_fO7MrN- >> LddnfpVSuNh6ce6Ks&s=dgfSVlEy8_W0IovAga >>> TnUT2ZbwCojfHimNxuyp4w7gI&e= library to emit the trace) >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jerinj >>> acobk_lttng- >> 2Doverhead&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=1DGob4H4rxz >>> 6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=b9wJHO_k_ijKT84q47_fO7MrN- >> LddnfpVSu >>> Nh6ce6Ks&s=uSB4IwIan6cs9NuEUvGezK_jfdJj7Rjp0qrbThjk08M&e= >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jerinj >>> acobk_lttng- >> 2Doverhead_blob_master_README&d=DwIFaQ&c=nKjWec2b6R0mOyPaz >>> >> 7xtfQ&r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=b9wJHO_k_i >> jKT84q >>> 47_fO7MrN-LddnfpVSuNh6ce6Ks&s=CudvGIANC2gl_e- >> TIAQt2IfpoczlIJIUee9IF78L >>> GHo&e= >>> >>> I could improve the performance by 30% by adding the "DPDK" >>> based plugin for get_clock() and get_cpu(), Here are the performance >>> numbers after adding the plugin on >>> x86 and various arm64 board that I have access to, >>> >>> On high-end x86, it comes around 236 cycles/~100ns @ 2.4GHz (See the >>> last line in the log(ZERO_ARG)) On arm64, it varies from 312 cycles to 1100 >> cycles(based on the class of CPU). >>> In short, Based on the "IPC capabilities", The cost would be around >>> 100ns to 400ns for single void trace(a trace without any argument) >>> >>> >>> [lttng-overhead-x86] $ sudo ./calibrate/build/app/calibrate -c 0xc0 >>> make[1]: Entering directory '/export/lttng-overhead-x86/calibrate' >>> make[1]: Leaving directory '/export/lttng-overhead-x86/calibrate' >>> EAL: Detected 56 lcore(s) >>> EAL: Detected 2 NUMA nodes >>> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket >>> EAL: Selected IOVA mode 'PA' >>> EAL: Probing VFIO support... >>> EAL: PCI device 0000:01:00.0 on NUMA socket 0 >>> EAL: probe driver: 8086:1521 net_e1000_igb >>> EAL: PCI device 0000:01:00.1 on NUMA socket 0 >>> EAL: probe driver: 8086:1521 net_e1000_igb >>> CPU Timer freq is 2600.000000MHz >>> NOP: cycles=0.194834 ns=0.074936 >>> GET_CLOCK: cycles=47.854658 ns=18.405638 >>> GET_CPU: cycles=30.995892 ns=11.921497 >>> ZERO_ARG: cycles=236.945113 ns=91.132736 >>> >>> >>> We will have only 16.75ns to process 59.2 mpps(40Gbps), So IMO, Lttng >>> CTF emitter may not fit the DPDK fast path purpose due to the cost >> associated with generic Lttng features. >>> >>> One option could be to have, native CTF emitter in EAL/DPDK to emit >>> the trace in a hugepage. I think it would be a handful of cycles if we >>> limit the features to the requirements above: >>> >>> The upside of using Lttng CTF emitter: >>> a) No need to write a new CTF trace emitter(the item (c)) >>> >>> The downside of Lttng CTF emitter(the item (c)) >>> a) performance issue(See above) >>> b) Lack of Windows OS support. It looks like, it has basic FreeBSD support. >>> c) dpdk library dependency to lttng for trace. >>> >>> So, Probably it good to have native CTF emitter in DPDK and reuse all >>> open-source trace viewer(babeltrace and TraceCompass) and format(CTF) >> infrastructure. >>> I think, it would be best of both world. >>> >>> Any thoughts on this subject? Based on the community feedback, I can work >> on the patch for v20.05. >>> >