On Thu, May 9, 2024 at 10:27 AM Athira Rajeev <atraj...@linux.vnet.ibm.com> wrote: > > > > > On 7 May 2024, at 3:05 PM, Christophe Leroy <christophe.le...@csgroup.eu> > > wrote: > > > > > > > > Le 06/05/2024 à 14:19, Athira Rajeev a écrit : > >> Add support to capture and parse raw instruction in objdump. > > > > What's the purpose of using 'objdump' for reading raw instructions ? > > Can't they be read directly without invoking 'objdump' ? It looks odd to > > me to use objdump to provide readable text and then parse it back. > > Hi Christophe, > > Thanks for your review comments. > > Current implementation for data type profiling on X86 uses "objdump" tool to > get the disassembled code. > And then the objdump result lines are parsed to get the instruction name and > register fields. The initial patchset I posted to enable the data type > profiling feature in powerpc was using the same way by getting disassembled > code from objdump and parsing the disassembled lines. But in V2, we are > introducing change for powerpc to use "raw instruction" and fetch opcode, reg > fields from the raw instruction. > > I tried to explain below that current objdump uses option > "--no-show-raw-insn" which doesn't capture raw instruction. So to capture > raw instruction, V2 patchset has changes to use default option > "--show-raw-insn" and get the raw instruction [ for powerpc ] along with > human readable annotation [ which is used by other archs ]. Since perf tool > already has objdump implementation in place, I went in the direction to > enhance it to use "--show-raw-insn" for powerpc purpose. > > But as you mentioned, we can directly read raw instruction without using > "objdump" tool. > perf has support to read object code. The dso open/read utilities and helper > functions are already present in "util/dso.c" And "dso__data_read_offset" > function reads data from dso file offset. We can use these functions and I > can make changes to directly read binary instruction without using objdump. > > Namhyung, Arnaldo, Christophe > Looking for your valuable feedback on this approach. Please suggest if this > approach looks fine
Looks like you want to implement instruction decoding like in arch/x86/lib/{insn,inat}.c. I think it's ok to do that but you need to decide which way is more convenient. Also it works on the struct disasm_line so you need to fill in the necessary info when not using objdump. As long as it produces the same output I don't care much if you use objdump or not. Actually it uses libcapstone to disassemble x86 instructions if possible. Maybe you can use that on powerpc too. Thanks, Namhyung > > > Thanks > Athira > > > >> Currently, the perf tool infrastructure uses "--no-show-raw-insn" option > >> with "objdump" while disassemble. Example from powerpc with this option > >> for an instruction address is: > > > > Yes and that makes sense because the purpose of objdump is to provide > > human readable annotations, not to perform automated analysis. Am I > > missing something ? > > > >> > >> Snippet from: > >> objdump --start-address=<address> --stop-address=<address> -d > >> --no-show-raw-insn -C <vmlinux> > >> > >> c0000000010224b4: lwz r10,0(r9) > >> > >> This line "lwz r10,0(r9)" is parsed to extract instruction name, > >> registers names and offset. Also to find whether there is a memory > >> reference in the operands, "memory_ref_char" field of objdump is used. > >> For x86, "(" is used as memory_ref_char to tackle instructions of the > >> form "mov (%rax), %rcx". > >> > >> In case of powerpc, not all instructions using "(" are the only memory > >> instructions. Example, above instruction can also be of extended form (X > >> form) "lwzx r10,0,r19". Inorder to easy identify the instruction category > >> and extract the source/target registers, patch adds support to use raw > >> instruction. With raw instruction, macros are added to extract opcode > >> and register fields. > >> > >> "struct ins_operands" and "struct ins" is updated to carry opcode and > >> raw instruction binary code (raw_insn). Function "disasm_line__parse" > >> is updated to fill the raw instruction hex value and opcode in newly > >> added fields. There is no changes in existing code paths, which parses > >> the disassembled code. The architecture using the instruction name and > >> present approach is not altered. Since this approach targets powerpc, > >> the macro implementation is added for powerpc as of now. > >> > >> Example: > >> representation using --show-raw-insn in objdump gives result: > >> > >> 38 01 81 e8 ld r4,312(r1) > >> > >> Here "38 01 81 e8" is the raw instruction representation. In powerpc, > >> this translates to instruction form: "ld RT,DS(RA)" and binary code > >> as: > >> _____________________________________ > >> | 58 | RT | RA | DS | | > >> ------------------------------------- > >> 0 6 11 16 30 31 > >> > >> Function "disasm_line__parse" is updated to capture: > >> > >> line: 38 01 81 e8 ld r4,312(r1) > >> opcode and raw instruction "38 01 81 e8" > >> Raw instruction is used later to extract the reg/offset fields. > >> > >> Signed-off-by: Athira Rajeev <atraj...@linux.vnet.ibm.com> > >> --- >