labath added inline comments.

================
Comment at: lldb/source/Plugins/Language/ObjC/CFBasicHash.cpp:47-62
+  DataBufferHeap buffer(size, 0);
+  m_exe_ctx_ref.GetProcessSP()->ReadMemory(addr, buffer.GetBytes(), size,
+                                           error);
+  if (error.Fail())
+    return false;
+
+  DataExtractor data_extractor = DataExtractor(
----------------
mib wrote:
> labath wrote:
> > Why don't you just pass `m_ht.get()` to `ReadMemory` ?
> IIUC, `ReadMemory` will read memory matching the inferior endianness. That's 
> why, I'm using a `DataExtractor`, to translate, the copied bytes to the host 
> endianness.
Well, I am glad that you are thinking about non-native endianness, but I am 
afraid things just don't work that way. There's no way an API like 
`CopyData(void *, size_t) ` can automatically fix endianness problems. Why? 
Because the fix depends on the structure of the data.

Imagine this sequence of bytes: `00 01 02 03 04 05 06 07`. If this sequence 
really represents a sequence of (little-endian)  bytes (`uint8_t`s), then the 
corresponding "big-endian" sequence would be identical. If, however, it 
represents 2-byte words (uint16_t), then the big-endian representation would be 
`01 00 03 02 05 04 07 06`. For 4-byte words, it would be `03 02 01 00 07 06 05 
04`, etc.

In short, you need to know the structure of the data to translate endiannes, 
and `CopyData` does not know that. Indeed, if you look at the implementation of 
that function you'll see that it just does a `memcpy`.

My suggestion for reading directly into the object was based on the assumption 
that are fine with things working only if everything is of the same endianness. 
I wouldn't have allowed that in more cross-platform code, but I didn't feel 
like policing all corners of lldb.

However, if you do want to make this endian-correct (which I do encourage), 
then you have a couple of options:
- ditch the structs and use data extractor functions to read (GetU32, 
GetAddress, etc) the individual fields -- these know the size of the object 
they are accessing, and can adjust endianness appropriately
- compare host and target endianness and call llvm::sys::swapByteOrder on the 
individual fields if they differ
- use llvm's `packed_endian_specific_integral` instead of native types in the 
struct definitions. This would require passing the endiannes as a template 
parameter and then dispatching to the appropriate struct based on the runtime 
value similar to how you've already done for byte sizes.

Each of these options has its drawbacks, but I've seen them all places 
throughout llvm. In this particular case, I have a feeling the simplest option 
would be to go the DataExtractor route...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78396/new/

https://reviews.llvm.org/D78396



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to