On Fri, Jul 12 2024 at 15:16, Vignesh Balasubramanian wrote:
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index 1fb83d477..cad37090b 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -13,6 +13,15 @@
> #include
> #include
>
> +struct
lmodconfig gcc-13.2.0
arc allnoconfig gcc-13.2.0
arc allyesconfig gcc-13.2.0
arc axs103_defconfig gcc-13.2.0
arc defconfig gcc-13.2.0
arc randconfig-00
arc randconfig-001-20240713 gcc-13.2.0
arc randconfig-002-20240713 gcc-13.2.0
arm allmodconfig gcc-13.2.0
arm allnoconfig gcc-13.2.0
arm allyesconfig gcc-13.2.0
arm
On Sat, Jul 13, 2024 at 11:52:30AM +0530, Athira Rajeev wrote:
>
>
> > On 13 Jul 2024, at 2:55 AM, Namhyung Kim wrote:
> >
> > On Sun, Jul 07, 2024 at 08:14:18PM +0530, Athira Rajeev wrote:
> >> Currently data_type_cmp() only compares size and type name.
> >> But in cases where the type name of
The patchset from Namhyung added support for data type profiling
in perf tool. This enabled support to associate PMU samples to data
types they refer using DWARF debug information. With the upstream
perf, currently it possible to run perf report or perf annotate to
view the data type information on
TYPE_STATE_MAX_REGS is arch-dependent. Currently this is defined
to be 16. While checking if reg is valid using has_reg_type,
max value is checked using TYPE_STATE_MAX_REGS value. Define
this conditionally for powerpc.
Signed-off-by: Athira Rajeev
---
tools/perf/util/annotate-data.h | 4
1 f
Data type profiling uses instruction tracking by checking each
instruction and updating the register type state in some data
structures. This is useful to find the data type in cases when the
register state gets transferred from one reg to another. Example, in
x86, "mov" instruction and in powerpc,
Currently, the perf tool infrastructure disasm_line__parse function to
parse disassembled line.
Example snippet from objdump:
objdump --start-address= --stop-address= -d
--no-show-raw-insn -C
c10224b4: lwz r10,0(r9)
This line "lwz r10,0(r9)" is parsed to extract instruction
Add support to capture and parse raw instruction in powerpc.
Currently, the perf tool infrastructure uses two ways to disassemble
and understand the instruction. One is objdump and other option is
via libcapstone.
Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
with "objdu
Add "update_insn_state" callback to "struct arch" to handle instruction
tracking. Currently updating instruction state is handled by static
function "update_insn_state_x86" which is defined in "annotate-data.c".
Make this as a callback for specific arch and move to archs specific
file "arch/x86/ann
Use the raw instruction code and macros to identify memory instructions,
extract register fields and also offset. The implementation addresses
the D-form, X-form, DS-form instructions. Adds "mem_ref" field to check
whether source/target has memory reference. Add function
"get_powerpc_regs" which wi
Use the raw instruction code and macros to identify memory instructions,
extract register fields and also offset. The implementation addresses
the D-form, X-form, DS-form instructions. Two main functions are added.
New parse function "load_store__parse" as instruction ops parser for
memory instruct
There are memory instructions in powerpc with opcode as 31.
Example: "ldx RT,RA,RB" , Its X form is as below:
__
| 31 | RT | RA | RB | 21 |/|
--
06 111621 30 31
The opcode for "ldx" i
Data type profiling has concept of instruction tracking.
Example sequence in powerpc:
ld r10,264(r3)
mr r31,r3
<
ld r9,312(r31)
or differently
lwz r10,264(r3)
add r31, r3, RB
lwz r9, 0(r31)
If a sample is hit at
Add few more instructions and use opcode as search key
to find if it is supported by the architecture. Added ones
are: addi, addic, addic., addis, subfic and mulli
Signed-off-by: Athira Rajeev
---
tools/perf/arch/powerpc/annotate/instructions.c | 14 ++
1 file changed, 14 insertions(
Add instruction tracking function "update_insn_state_powerpc" for
powerpc. Example sequence in powerpc:
ld r10,264(r3)
mr r31,r3
<
ld r9,312(r31)
Consider ithe sample is pointing to: "ld r9,312(r31)".
Here the memory reference is hit at "312(r31)" where 312 is the offset
and r31 is
symbol__disassemble_capstone in util/disasm.c calls function
open_capstone_handle to open/init the capstone. We already have a
capstone_init function in "util/print_insn.c". But capstone_init
is defined as a static function in util/print_insn.c. Change this and
also add the function in print_insn.h
capstone_init is made availbale for all archs to use and updated to
enable support for CS_ARCH_PPC as well. Patch removes
open_capstone_handle and uses capstone_init in all the places.
Signed-off-by: Athira Rajeev
---
tools/perf/util/disasm.c | 42 +++-
tools/p
Now perf uses the capstone library to disassemble the instructions in
x86. capstone is used (if available) for perf annotate to speed up.
Currently it only supports x86 architecture. Patch includes changes to
enable this in powerpc. For now, only for data type sort keys, this
method is used and onl
There are cases where define a global register variable and associate it
with a specified register. Example, in powerpc, two registers are
defined to represent variable:
1. r13: represents local_paca
register struct paca_struct *local_paca asm("r13");
2. r1: represents stack_pointer
register void
In case of register defined variable (found using
find_data_type_global_reg), if the type of variable happens to be base
type (example, long unsigned int), perf report captures it as:
12.85% long unsigned int long unsigned int +0 (no field)
The above data type is actually referring to sampl
Currently data_type_cmp() only compares size and type name.
But in cases where the type name of two data type entries
is same, but var_name is different, the comparison can't distinguish
two different types.
Consider there is a "long unsigned int" with var_name as "X" and there
is global variable
Since the "ins.name" is not set while using raw instruction,
perf annotate with insn-stat gives wrong data:
Result from "./perf annotate --data-type --insn-stat":
Annotate Instruction stats
total 615, ok 419 (68.1%), bad 196 (31.9%)
Name : Good Bad
---
> On 13 Jul 2024, at 8:25 PM, Namhyung Kim wrote:
>
> On Sat, Jul 13, 2024 at 11:52:30AM +0530, Athira Rajeev wrote:
>>
>>
>>> On 13 Jul 2024, at 2:55 AM, Namhyung Kim wrote:
>>>
>>> On Sun, Jul 07, 2024 at 08:14:18PM +0530, Athira Rajeev wrote:
Currently data_type_cmp() only compares
On Thu, 11 Jul 2024 02:00:22 +0300 Vladimir Oltean wrote:
> + priv->egress_fqs = devm_kcalloc(dev, dpaa_max_num_txqs(),
> + sizeof(*priv->egress_fqs),
> + GFP_KERNEL);
> + if (!priv->egress_fqs)
> + goto fre
On Sat, Jul 13, 2024 at 03:35:32PM -0700, Jakub Kicinski wrote:
> On Thu, 11 Jul 2024 02:00:22 +0300 Vladimir Oltean wrote:
> > + priv->egress_fqs = devm_kcalloc(dev, dpaa_max_num_txqs(),
> > + sizeof(*priv->egress_fqs),
> > + GF
From: Breno Leitao
As most of the drivers that depend on ARCH_LAYERSCAPE, make FSL_DPAA
depend on COMPILE_TEST for compilation and testing.
# grep -r depends.\*ARCH_LAYERSCAPE.\*COMPILE_TEST | wc -l
29
Signed-off-by: Breno Leitao
Signed-off-by: Vladimir Oltean
Acked-by: Madali
The driver uses the DPAA_TC_TXQ_NUM and DPAA_ETH_TXQ_NUM macros for TX
queue handling, and they depend on CONFIG_NR_CPUS.
In generic .config files, these can go to very large (8096 CPUs) values
for the systems that DPAA1 is integrated in (1-24 CPUs). We allocate a
lot of resources that will never
dpaa_fq_setup() iterates through the queues allocated by dpaa_alloc_all_fqs()
and saved in &priv->dpaa_fq_list.
The allocation for FQ_TYPE_TX looks as follows:
if (!dpaa_fq_alloc(dev, 0, dpaa_max_num_txqs(), list, FQ_TYPE_TX))
goto fq_alloc_failed;
Thus, iterating again t
dpaa_fq_setup() iterates through the &priv->dpaa_fq_list elements
allocated by dpaa_alloc_all_fqs(). This includes a call to:
if (!dpaa_fq_alloc(dev, 0, dpaa_max_num_txqs(), list, FQ_TYPE_TX))
goto fq_alloc_failed;
which gives us dpaa_max_num_txqs() elements of FQ_TYPE_TX
The dpaa-eth driver is written for PowerPC and Arm SoCs which have 1-24
CPUs. It depends on CONFIG_NR_CPUS having a reasonably small value in
Kconfig. Otherwise, there are 2 functions which allocate on-stack arrays
of NR_CPUS elements, and these can quickly explode in size, leading to
warnings such
Breno's previous attempt at enabling COMPILE_TEST for the fsl_qbman
driver (now included here as patch 5/5) triggered compilation warnings
for large CONFIG_NR_CPUS values:
https://lore.kernel.org/all/202406261920.l5pzm1rj-...@intel.com/
Patch 1/5 switches two NR_CPUS arrays in the dpaa-eth driver
32 matches
Mail list logo