eloparco added inline comments.

================
Comment at: lldb/tools/lldb-vscode/lldb-vscode.cpp:2177
+  const auto bytes_offset = -instruction_offset * bytes_per_instruction;
+  auto start_addr = base_addr - bytes_offset;
+  const auto disassemble_bytes = instruction_count * bytes_per_instruction;
----------------
clayborg wrote:
> This will work if the min and max opcode byte size are the same, like for 
> arm64, the min and max are 4. This won't work for x86 or arm32 in thumb mode. 
> So when backing up, we need to do an address lookup on the address we think 
> we want to go to, and then adjust the starting address accordingly. Something 
> like:
> 
> ```
> SBAddress start_sbaddr = (base_addr - bytes_offset, g_vsc.target);
> ```
> now we have a section offset address that can tell us more about what it is. 
> We can find the SBFunction or SBSymbol for this address and use those to find 
> the right instructions. This will allow us to correctly disassemble code 
> bytes. 
> 
> We can also look at the section that the memory comes from and see what the 
> section contains. If the section is data, then emit something like:
> ```
> 0x00001000 .byte 0x23
> 0x00001001 .byte 0x34
> ...
> ```
> To find the section type we can do:
> ```
> SBSection section = start_sbaddr.GetSection();
> if (section.IsValid() && section.GetSectionType() == lldb::eSectionTypeCode) {
>  // Disassemble from a valid boundary
> } else {
>   // Emit a byte or long at a time with ".byte 0xXX" or other ASM directive 
> for binary data
> }
> ```
> We need to ensure we start disassembling on the correct instruction boundary 
> as well as our math for "start_addr" might be in between actual opcode 
> boundaries. If we are in a lldb::eSectionTypeCode, then we know we have 
> instructions, and if we are not, then we can emit ".byte" or other binary 
> data directives. So if we do have lldb::eSectionTypeCode as our section type, 
> then we should have a function or symbol, and we can get instructions from 
> those objects easily:
> 
> ```
> if (section.IsValid() && section.GetSectionType() == lldb::eSectionTypeCode) {
>  lldb::SBInstructionList instructions;
>  lldb::SBFunction function = start_sbaddr.GetFunction();
>  if (function.IsValid()) {
>     instructions = function.GetInstructions(g_vsc.target);
>  } else {
>     symbol = start_sbaddr.GetSymbol();
>     if (symbol.IsValid())
>       instructions = symbol.GetInstructions(g_vsc.target);
> }
> const size_t num_instrs = instructions.GetSize();
> if (num_instrs > 0) {
>   // we found instructions from a function or symbol and we need to 
>   // find the matching instruction that we want to start from by iterating
>   // over the instructions and finding the right address
>   size_t matching_idx = num_instrs; // Invalid index
>   for (size_t i=0; i<num_instrs; ++i) {
>     lldb::SBInstruction inst = instructions.GetInstructionAtIndex(i);
>     if (inst.GetAddress().GetLoadAddress(g_vsc.target) >= start_addr) {
>       matching_idx = i;
>       break;
>     }
>   }
>   if (matching_idx < num_instrs) {
>     // Now we can print the instructions from [matching_idx, num_instrs)
>     // then we need to repeat the search for the next function or symbol. 
>     // note there may be bytes between functions or symbols which we can 
> disassemble
>     // by calling _get_instructions_from_memory(...) but we must find the next
>     // symbol or function boundary and get back on track
>   }
>   
> ```
Sorry, I should have provided a proper explanation.

I use the maximum instruction size as the "worst case". Basically, I need to 
read a portion of memory but I do not know the start address and the size. For 
the start address, if I want to read N instructions before `base_addr` I need 
to read at least starting from `base_addr - N * max_instruction_size`: if all 
instructions are of size `max_instruction_size` I will read exactly N 
instructions; otherwise I will read more than N instructions and prune the 
additional ones afterwards. Same for applies for the size.

Since `start_addr` is based on a "worst case", it may be an address in the 
middle of an instruction. In that case, that first instruction will be 
misinterpreted, but I think that is negligible.

The logic is similar to what is implemented in other VS Code extensions, like 
https://github.com/vadimcn/vscode-lldb/blob/master/adapter/src/debug_session.rs#L1134.

Does it make sense?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140358/new/

https://reviews.llvm.org/D140358

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to