Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com> --- v2: -Addressed ARM64 specific review comments(Suggested by Thomas) http://dpdk.org/dev/patchwork/patch/16362/ --- doc/guides/prog_guide/profile_app.rst | 58 +++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+)
diff --git a/doc/guides/prog_guide/profile_app.rst b/doc/guides/prog_guide/profile_app.rst index 3226187..9f1b7ee 100644 --- a/doc/guides/prog_guide/profile_app.rst +++ b/doc/guides/prog_guide/profile_app.rst @@ -31,6 +31,14 @@ Profile Your Application ======================== +Introduction +------------ + +The following sections describe the methods to profile DPDK applications on +different architectures. + +x86 +~~~ Intel processors provide performance counters to monitor events. Some tools provided by Intel can be used to profile and benchmark an application. See the *VTune Performance Analyzer Essentials* publication from Intel Press for more information. @@ -50,3 +58,53 @@ The main situations that should be monitored through event counters are: Refer to the `Intel Performance Analysis Guide <http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf>`_ for details about application profiling. + +ARM64 +~~~~~ + +Perf +^^^^ +ARM64 architecture provide performance counters to monitor events. +The Linux perf tool can be used to profile and benchmark an application. +In addition to the standard events, perf can be used to profile arm64 specific +PMU events through raw events(-e -rXX) + +Refer to the +`ARM64 specific PMU events enumeration <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100095_0002_04_en/way1382543438508.html>`_ + +High-resolution cycle counter +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The default cntvct_el0 based rte_rdtsc() provides portable means to get wall +clock counter at user space. Typically it runs at <= 100MHz. + +The alternative method to enable rte_rdtsc() for high resolution +wall clock counter is through armv8 PMU subsystem. +The PMU cycle counter runs at CPU frequency, However, access to PMU cycle +counter from user space is not enabled by default in the arm64 linux kernel. +It is possible to enable cycle counter at user space access +by configuring the PMU from the privileged mode (kernel space). + +by default rte_rdtsc() implementation uses portable cntvct_el0 scheme. +Application can choose the PMU based implementation with +CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU + +Find below the example steps to configure the PMU based cycle counter on an +armv8 machine. + +.. code-block:: console + + git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0 + cd armv8_pmu_cycle_counter_el0 + make + sudo insmod pmu_el0_cycle_counter.ko + cd $DPDK_DIR + make config T=arm64-armv8a-linuxapp-gcc + echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config + make + +.. warning:: + + The PMU based scheme is useful for high accuracy performance profiling with + rte_rdtsc(). However, This method can not be used in conjunction with Linux + userspace profiling tools like perf as this scheme alters the PMU registers + state. -- 2.5.5