Re: [PATCH v3 1/1] x86/elf: Add a new .note section containing xfeatures buffer layout info to x86 core files

2024-07-13 Thread Thomas Gleixner
On Fri, Jul 12 2024 at 15:16, Vignesh Balasubramanian wrote: > diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h > index 1fb83d477..cad37090b 100644 > --- a/arch/x86/include/asm/elf.h > +++ b/arch/x86/include/asm/elf.h > @@ -13,6 +13,15 @@ > #include > #include > > +struct

[powerpc:merge] BUILD SUCCESS 582b0e554593e530b1386eacafee6c412c5673cc

2024-07-13 Thread kernel test robot
lmodconfig gcc-13.2.0 arc allnoconfig gcc-13.2.0 arc allyesconfig gcc-13.2.0 arc axs103_defconfig gcc-13.2.0 arc defconfig gcc-13.2.0 arc randconfig-00

[powerpc:next] BUILD SUCCESS 90e812ac40c4b813fdbafab22f426fe4cdf840a8

2024-07-13 Thread kernel test robot
arc randconfig-001-20240713 gcc-13.2.0 arc randconfig-002-20240713 gcc-13.2.0 arm allmodconfig gcc-13.2.0 arm allnoconfig gcc-13.2.0 arm allyesconfig gcc-13.2.0 arm

Re: [PATCH V6 17/18] tools/perf: Update data_type_cmp and sort__typeoff_sort function to include var_name in comparison

2024-07-13 Thread Namhyung Kim
On Sat, Jul 13, 2024 at 11:52:30AM +0530, Athira Rajeev wrote: > > > > On 13 Jul 2024, at 2:55 AM, Namhyung Kim wrote: > > > > On Sun, Jul 07, 2024 at 08:14:18PM +0530, Athira Rajeev wrote: > >> Currently data_type_cmp() only compares size and type name. > >> But in cases where the type name of

[PATCH V7 00/18] Add data type profiling support for powerpc

2024-07-13 Thread Athira Rajeev
The patchset from Namhyung added support for data type profiling in perf tool. This enabled support to associate PMU samples to data types they refer using DWARF debug information. With the upstream perf, currently it possible to run perf report or perf annotate to view the data type information on

[PATCH V7 03/18] tools/perf: Update TYPE_STATE_MAX_REGS to include max of regs in powerpc

2024-07-13 Thread Athira Rajeev
TYPE_STATE_MAX_REGS is arch-dependent. Currently this is defined to be 16. While checking if reg is valid using has_reg_type, max value is checked using TYPE_STATE_MAX_REGS value. Define this conditionally for powerpc. Signed-off-by: Athira Rajeev --- tools/perf/util/annotate-data.h | 4 1 f

[PATCH V7 01/18] tools/perf: Move the data structures related to register type to header file

2024-07-13 Thread Athira Rajeev
Data type profiling uses instruction tracking by checking each instruction and updating the register type state in some data structures. This is useful to find the data type in cases when the register state gets transferred from one reg to another. Example, in x86, "mov" instruction and in powerpc,

[PATCH V7 04/18] tools/perf: Add disasm_line__parse to parse raw instruction for powerpc

2024-07-13 Thread Athira Rajeev
Currently, the perf tool infrastructure disasm_line__parse function to parse disassembled line. Example snippet from objdump: objdump --start-address= --stop-address= -d --no-show-raw-insn -C c10224b4: lwz r10,0(r9) This line "lwz r10,0(r9)" is parsed to extract instruction

[PATCH V7 05/18] tools/perf: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility

2024-07-13 Thread Athira Rajeev
Add support to capture and parse raw instruction in powerpc. Currently, the perf tool infrastructure uses two ways to disassemble and understand the instruction. One is objdump and other option is via libcapstone. Currently, the perf tool infrastructure uses "--no-show-raw-insn" option with "objdu

[PATCH V7 02/18] tools/perf: Add "update_insn_state" callback function to handle arch specific instruction tracking

2024-07-13 Thread Athira Rajeev
Add "update_insn_state" callback to "struct arch" to handle instruction tracking. Currently updating instruction state is handled by static function "update_insn_state_x86" which is defined in "annotate-data.c". Make this as a callback for specific arch and move to archs specific file "arch/x86/ann

[PATCH V7 06/18] tools/perf: Update parameters for reg extract functions to use raw instruction on powerpc

2024-07-13 Thread Athira Rajeev
Use the raw instruction code and macros to identify memory instructions, extract register fields and also offset. The implementation addresses the D-form, X-form, DS-form instructions. Adds "mem_ref" field to check whether source/target has memory reference. Add function "get_powerpc_regs" which wi

[PATCH V7 07/18] tools/perf: Add parse function for memory instructions in powerpc

2024-07-13 Thread Athira Rajeev
Use the raw instruction code and macros to identify memory instructions, extract register fields and also offset. The implementation addresses the D-form, X-form, DS-form instructions. Two main functions are added. New parse function "load_store__parse" as instruction ops parser for memory instruct

[PATCH V7 08/18] tools/perf: Add support to identify memory instructions of opcode 31 in powerpc

2024-07-13 Thread Athira Rajeev
There are memory instructions in powerpc with opcode as 31. Example: "ldx RT,RA,RB" , Its X form is as below: __ | 31 | RT | RA | RB | 21 |/| -- 06 111621 30 31 The opcode for "ldx" i

[PATCH V7 09/18] tools/perf: Add some of the arithmetic instructions to support instruction tracking in powerpc

2024-07-13 Thread Athira Rajeev
Data type profiling has concept of instruction tracking. Example sequence in powerpc: ld r10,264(r3) mr r31,r3 < ld r9,312(r31) or differently lwz r10,264(r3) add r31, r3, RB lwz r9, 0(r31) If a sample is hit at

[PATCH V7 10/18] tools/perf: Add more instructions for instruction tracking

2024-07-13 Thread Athira Rajeev
Add few more instructions and use opcode as search key to find if it is supported by the architecture. Added ones are: addi, addic, addic., addis, subfic and mulli Signed-off-by: Athira Rajeev --- tools/perf/arch/powerpc/annotate/instructions.c | 14 ++ 1 file changed, 14 insertions(

[PATCH V7 11/18] tools/perf: Update instruction tracking for powerpc

2024-07-13 Thread Athira Rajeev
Add instruction tracking function "update_insn_state_powerpc" for powerpc. Example sequence in powerpc: ld r10,264(r3) mr r31,r3 < ld r9,312(r31) Consider ithe sample is pointing to: "ld r9,312(r31)". Here the memory reference is hit at "312(r31)" where 312 is the offset and r31 is

[PATCH V7 12/18] tools/perf: Make capstone_init non-static so that it can be used during symbol disassemble

2024-07-13 Thread Athira Rajeev
symbol__disassemble_capstone in util/disasm.c calls function open_capstone_handle to open/init the capstone. We already have a capstone_init function in "util/print_insn.c". But capstone_init is defined as a static function in util/print_insn.c. Change this and also add the function in print_insn.h

[PATCH V7 13/18] tools/perf: Use capstone_init and remove open_capstone_handle from disasm.c

2024-07-13 Thread Athira Rajeev
capstone_init is made availbale for all archs to use and updated to enable support for CS_ARCH_PPC as well. Patch removes open_capstone_handle and uses capstone_init in all the places. Signed-off-by: Athira Rajeev --- tools/perf/util/disasm.c | 42 +++- tools/p

[PATCH V7 14/18] tools/perf: Add support to use libcapstone in powerpc

2024-07-13 Thread Athira Rajeev
Now perf uses the capstone library to disassemble the instructions in x86. capstone is used (if available) for perf annotate to speed up. Currently it only supports x86 architecture. Patch includes changes to enable this in powerpc. For now, only for data type sort keys, this method is used and onl

[PATCH V7 15/18] tools/perf: Add support to find global register variables using find_data_type_global_reg

2024-07-13 Thread Athira Rajeev
There are cases where define a global register variable and associate it with a specified register. Example, in powerpc, two registers are defined to represent variable: 1. r13: represents local_paca register struct paca_struct *local_paca asm("r13"); 2. r1: represents stack_pointer register void

[PATCH V7 16/18] tools/perf: Add support for global_die to capture name of variable in case of register defined variable

2024-07-13 Thread Athira Rajeev
In case of register defined variable (found using find_data_type_global_reg), if the type of variable happens to be base type (example, long unsigned int), perf report captures it as: 12.85% long unsigned int long unsigned int +0 (no field) The above data type is actually referring to sampl

[PATCH V7 17/18] tools/perf: Update data_type_cmp and sort__typeoff_sort function to include var_name in comparison

2024-07-13 Thread Athira Rajeev
Currently data_type_cmp() only compares size and type name. But in cases where the type name of two data type entries is same, but var_name is different, the comparison can't distinguish two different types. Consider there is a "long unsigned int" with var_name as "X" and there is global variable

[PATCH V7 18/18] tools/perf: Set instruction name to be used with insn-stat when using raw instruction

2024-07-13 Thread Athira Rajeev
Since the "ins.name" is not set while using raw instruction, perf annotate with insn-stat gives wrong data: Result from "./perf annotate --data-type --insn-stat": Annotate Instruction stats total 615, ok 419 (68.1%), bad 196 (31.9%) Name : Good Bad ---

Re: [PATCH V6 17/18] tools/perf: Update data_type_cmp and sort__typeoff_sort function to include var_name in comparison

2024-07-13 Thread Athira Rajeev
> On 13 Jul 2024, at 8:25 PM, Namhyung Kim wrote: > > On Sat, Jul 13, 2024 at 11:52:30AM +0530, Athira Rajeev wrote: >> >> >>> On 13 Jul 2024, at 2:55 AM, Namhyung Kim wrote: >>> >>> On Sun, Jul 07, 2024 at 08:14:18PM +0530, Athira Rajeev wrote: Currently data_type_cmp() only compares

Re: [PATCH net-next 2/5] net: dpaa: eliminate NR_CPUS dependency in egress_fqs[] and conf_fqs[]

2024-07-13 Thread Jakub Kicinski
On Thu, 11 Jul 2024 02:00:22 +0300 Vladimir Oltean wrote: > + priv->egress_fqs = devm_kcalloc(dev, dpaa_max_num_txqs(), > + sizeof(*priv->egress_fqs), > + GFP_KERNEL); > + if (!priv->egress_fqs) > + goto fre

Re: [PATCH net-next 2/5] net: dpaa: eliminate NR_CPUS dependency in egress_fqs[] and conf_fqs[]

2024-07-13 Thread Vladimir Oltean
On Sat, Jul 13, 2024 at 03:35:32PM -0700, Jakub Kicinski wrote: > On Thu, 11 Jul 2024 02:00:22 +0300 Vladimir Oltean wrote: > > + priv->egress_fqs = devm_kcalloc(dev, dpaa_max_num_txqs(), > > + sizeof(*priv->egress_fqs), > > + GF

[PATCH v2 net-next 5/5] soc: fsl: qbman: FSL_DPAA depends on COMPILE_TEST

2024-07-13 Thread Vladimir Oltean
From: Breno Leitao As most of the drivers that depend on ARCH_LAYERSCAPE, make FSL_DPAA depend on COMPILE_TEST for compilation and testing. # grep -r depends.\*ARCH_LAYERSCAPE.\*COMPILE_TEST | wc -l 29 Signed-off-by: Breno Leitao Signed-off-by: Vladimir Oltean Acked-by: Madali

[PATCH v2 net-next 2/5] net: dpaa: eliminate NR_CPUS dependency in egress_fqs[] and conf_fqs[]

2024-07-13 Thread Vladimir Oltean
The driver uses the DPAA_TC_TXQ_NUM and DPAA_ETH_TXQ_NUM macros for TX queue handling, and they depend on CONFIG_NR_CPUS. In generic .config files, these can go to very large (8096 CPUs) values for the systems that DPAA1 is integrated in (1-24 CPUs). We allocate a lot of resources that will never

[PATCH v2 net-next 3/5] net: dpaa: stop ignoring TX queues past the number of CPUs

2024-07-13 Thread Vladimir Oltean
dpaa_fq_setup() iterates through the queues allocated by dpaa_alloc_all_fqs() and saved in &priv->dpaa_fq_list. The allocation for FQ_TYPE_TX looks as follows: if (!dpaa_fq_alloc(dev, 0, dpaa_max_num_txqs(), list, FQ_TYPE_TX)) goto fq_alloc_failed; Thus, iterating again t

[PATCH v2 net-next 4/5] net: dpaa: no need to make sure all CPUs receive a corresponding Tx queue

2024-07-13 Thread Vladimir Oltean
dpaa_fq_setup() iterates through the &priv->dpaa_fq_list elements allocated by dpaa_alloc_all_fqs(). This includes a call to: if (!dpaa_fq_alloc(dev, 0, dpaa_max_num_txqs(), list, FQ_TYPE_TX)) goto fq_alloc_failed; which gives us dpaa_max_num_txqs() elements of FQ_TYPE_TX

[PATCH v2 net-next 1/5] net: dpaa: avoid on-stack arrays of NR_CPUS elements

2024-07-13 Thread Vladimir Oltean
The dpaa-eth driver is written for PowerPC and Arm SoCs which have 1-24 CPUs. It depends on CONFIG_NR_CPUS having a reasonably small value in Kconfig. Otherwise, there are 2 functions which allocate on-stack arrays of NR_CPUS elements, and these can quickly explode in size, leading to warnings such

[PATCH v2 net-next 0/5] Eliminate CONFIG_NR_CPUS dependency in dpaa-eth and enable COMPILE_TEST in fsl_qbman

2024-07-13 Thread Vladimir Oltean
Breno's previous attempt at enabling COMPILE_TEST for the fsl_qbman driver (now included here as patch 5/5) triggered compilation warnings for large CONFIG_NR_CPUS values: https://lore.kernel.org/all/202406261920.l5pzm1rj-...@intel.com/ Patch 1/5 switches two NR_CPUS arrays in the dpaa-eth driver