On Thu, May 9, 2024 at 10:27 AM Athira Rajeev
<atraj...@linux.vnet.ibm.com> wrote:
>
>
>
> > On 7 May 2024, at 3:05 PM, Christophe Leroy <christophe.le...@csgroup.eu> 
> > wrote:
> >
> >
> >
> > Le 06/05/2024 à 14:19, Athira Rajeev a écrit :
> >> Add support to capture and parse raw instruction in objdump.
> >
> > What's the purpose of using 'objdump' for reading raw instructions ?
> > Can't they be read directly without invoking 'objdump' ? It looks odd to
> > me to use objdump to provide readable text and then parse it back.
>
> Hi Christophe,
>
> Thanks for your review comments.
>
> Current implementation for data type profiling on X86 uses "objdump" tool to 
> get the disassembled code.
> And then the objdump result lines are parsed to get the instruction name and 
> register fields. The initial patchset I posted to enable the data type 
> profiling feature in powerpc was using the same way by getting disassembled 
> code from objdump and parsing the disassembled lines. But in V2, we are 
> introducing change for powerpc to use "raw instruction" and fetch opcode, reg 
> fields from the raw instruction.
>
> I tried to explain below that current objdump uses option 
> "--no-show-raw-insn" which doesn't capture raw instruction.  So to capture 
> raw instruction, V2 patchset has changes to use default option 
> "--show-raw-insn" and get the raw instruction [ for powerpc ] along with 
> human readable annotation [ which is used by other archs ]. Since perf tool 
> already has objdump implementation in place, I went in the direction to 
> enhance it to use "--show-raw-insn" for powerpc purpose.
>
> But as you mentioned, we can directly read raw instruction without using 
> "objdump" tool.
> perf has support to read object code. The dso open/read utilities and helper 
> functions are already present in "util/dso.c" And "dso__data_read_offset" 
> function reads data from dso file offset. We can use these functions and I 
> can make changes to directly read binary instruction without using objdump.
>
> Namhyung, Arnaldo, Christophe
> Looking for your valuable feedback on this approach. Please suggest if this 
> approach looks fine

Looks like you want to implement instruction decoding
like in arch/x86/lib/{insn,inat}.c.  I think it's ok to do that
but you need to decide which way is more convenient.

Also it works on the struct disasm_line so you need to
fill in the necessary info when not using objdump.  As
long as it produces the same output I don't care much
if you use objdump or not.  Actually it uses libcapstone
to disassemble x86 instructions if possible.  Maybe you
can use that on powerpc too.

Thanks,
Namhyung

>
>
> Thanks
> Athira
> >
> >> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option
> >> with "objdump" while disassemble. Example from powerpc with this option
> >> for an instruction address is:
> >
> > Yes and that makes sense because the purpose of objdump is to provide
> > human readable annotations, not to perform automated analysis. Am I
> > missing something ?
> >
> >>
> >> Snippet from:
> >> objdump  --start-address=<address> --stop-address=<address>  -d 
> >> --no-show-raw-insn -C <vmlinux>
> >>
> >> c0000000010224b4: lwz     r10,0(r9)
> >>
> >> This line "lwz r10,0(r9)" is parsed to extract instruction name,
> >> registers names and offset. Also to find whether there is a memory
> >> reference in the operands, "memory_ref_char" field of objdump is used.
> >> For x86, "(" is used as memory_ref_char to tackle instructions of the
> >> form "mov  (%rax), %rcx".
> >>
> >> In case of powerpc, not all instructions using "(" are the only memory
> >> instructions. Example, above instruction can also be of extended form (X
> >> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category
> >> and extract the source/target registers, patch adds support to use raw
> >> instruction. With raw instruction, macros are added to extract opcode
> >> and register fields.
> >>
> >> "struct ins_operands" and "struct ins" is updated to carry opcode and
> >> raw instruction binary code (raw_insn). Function "disasm_line__parse"
> >> is updated to fill the raw instruction hex value and opcode in newly
> >> added fields. There is no changes in existing code paths, which parses
> >> the disassembled code. The architecture using the instruction name and
> >> present approach is not altered. Since this approach targets powerpc,
> >> the macro implementation is added for powerpc as of now.
> >>
> >> Example:
> >> representation using --show-raw-insn in objdump gives result:
> >>
> >> 38 01 81 e8     ld      r4,312(r1)
> >>
> >> Here "38 01 81 e8" is the raw instruction representation. In powerpc,
> >> this translates to instruction form: "ld RT,DS(RA)" and binary code
> >> as:
> >> _____________________________________
> >> | 58 |  RT  |  RA |      DS       | |
> >> -------------------------------------
> >> 0    6     11    16              30 31
> >>
> >> Function "disasm_line__parse" is updated to capture:
> >>
> >> line:    38 01 81 e8     ld      r4,312(r1)
> >> opcode and raw instruction "38 01 81 e8"
> >> Raw instruction is used later to extract the reg/offset fields.
> >>
> >> Signed-off-by: Athira Rajeev <atraj...@linux.vnet.ibm.com>
> >> ---
>

Reply via email to