paolosev added inline comments.
================ Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:49 + lldb::offset_t initial_offset = *offset_ptr; + uint64_t value = section_header_data.GetULEB128(offset_ptr); + if (*offset_ptr == initial_offset || value > 127) ---------------- clayborg wrote: > Is it ok if we consume more than 1 byte here? What is the offset points to a > larger ULEB, are we ok with advancing the offset by multiple bytes or should > we back it up and return llvm::None? > > This might be a good candidate to add to DataExtractor directly as: > > ``` > /// Extract a ULEB128 number with a specified max value. If the extracted > value exceeds > /// "max_value" the offset will be left unchanged and llvm::None will be > returned. > llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t *offset_ptr, > uint64_t max_value); > ``` > > There are many places where we extract a uint64_t, but only need a uint16_t > (like in the DWARF parser where all DW_TAG_XXXX, DW_AT_XXX and DW_FORM_XXX > values must only be uint16_t values but are encoded as ULEB128 values. So > this could be used elsewhere if we do put it into DataExtractor. Modified in DataExtractor, as suggested. However we need to return a `llvm::Optional<uint64_t>`, which can be not ideal because the result could need to be casted to another integer type. We could also male `DataExtractor::GetULEB128` templatized on the integer type, getting `max_value` as `std::numeric_limits<int>::max()`, but I don't want to overly complicate the code. ================ Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:162-163 + // - the actual contents. + llvm::Optional<uint8_t> section_id = + GetVaruint7(section_header_data, &offset); + if (!section_id) ---------------- clayborg wrote: > Is this a one byte section ID or is it a ULEB? Not sure why it would be > encoded as a ULEB if it is always one byte? IF this really is just a one byte > value, then replace with: > > ``` > uint8_t section_id = data.GetU8(&offset); > ``` You are right, it is a one-bye section. I cannot use `data.GetU8(&offset);` though, because it returns 0 on failure. ================ Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:342-343 + section_type, // Section type. + sect_info.offset & 0xffffffff, // VM address. + sect_info.size, // VM size in bytes of this section. + sect_info.offset & ---------------- labath wrote: > paolosev wrote: > > labath wrote: > > > Are the debug info sections actually loaded into memory for wasm? Should > > > these be zero (that's what they are for elf)? > > Yes, I was thinking that the debug sections should be loaded into memory; > > not sure how this works for ELF, how does the debugger find the debug info > > in that case? > Are you referring to the load-from-memory, or load-from-file scenario? > Normally, the debug info is not loaded into memory, and if lldb is loading > the module from a file, then the debug info is loaded from the file (and we > use the "file offset" field to locate them). Loading files from memory does > not work in general (because we can't find the whole file there -- e.g., > section headers are missing). The only case it does work is in case of jitted > files. I'm not 100% sure how that works, but given that there is no > `if(memory)` block in the section creation code, those sections must too get > `vm size = 0`. > > In practice, I don't think it does not matter that much what you put here -- > I expect things will mostly work regardless. I am just trying to make this > consistent in some way. If these sections are not found in the memory at the > address which you set here (possibly adjusted by the load bias) then I think > it makes sense to set `vm_size=vm_addr=0`. If they are indeed present there, > then setting it to those values is perfectly fine. Modified, but I am not sure I completely understood the difference of file addresses and vm addresses. In the case of Wasm, we can have two cases: a) Wasm module contains executable code and also DWARF data embedded in DWARF sections b) Wasm module contains code and has a custom section that points to a separated Wasm module that only contains DWARF sections The file of Wasm modules that contains code should never be loaded directly by LLDB; LLDB should use GDB-remote to connect to the Wasm engine that loaded the module and retrieve the module content from the engine. But when the DWARF data has been stripped into a separate Wasm file, that file should be directly loaded by LLDB in order to load the debug sections. So, if I am not wrong, only in the first case we should assume that the debug info is loaded into memory, and have vm_size != 0? ================ Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:382 + + DecodeSections(load_address); + ---------------- labath wrote: > paolosev wrote: > > labath wrote: > > > This is strange.. I wouldn't expect that the section decoding logic > > > should depend on the actual address that the object is loaded in memory. > > > Can you explain the reasoning here? > > There is a reason for this: DecodeNextSection() calls: > > ```ReadImageData(offset, size)``` > > and when the debug info sections are embedded in the wasm module (not > > loaded from a separated symbol file), ReadImageData() calls > > Process::ReadMemory() that, for a GDB remote connection, goes to > > ProcessGDBRemote::DoReadMemory(). Here I pass the offset because it also > > represents the specific id of the module loaded in the target debuggee > > process, as explained in the comment above. > What you say about ReadMemory makes sense, but it's not clear to me why you > couldn't use the value in `m_memory_addr` for this. For in-memory object file > this value should contain the actual address the object file was loaded from > (which, I would expect, will include the `module_id` business), and so you > wouldn't need the dynamic loader address in order to locate it. I believe > this is how the other object file plugins do their in-memory loading.. Good point! Changed. ================ Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:34 +/// Checks whether the data buffer starts with a valid Wasm module header. +static bool ValidateModuleHeader(const DataBufferSP &data_sp) { + if (!data_sp || data_sp->GetByteSize() < kWasmHeaderSize) ---------------- clayborg wrote: > I typically would put "data_sp" into a DataExtractor and extract a uint32_t > and then check the decoded value with something like: > > ``` > static bool ValidateModuleHeader(DataExtractor &data, uint64_t *offset_ptr) { > auto magic = data.GetU32(offset_ptr); > if (magic == WASM_MAGIC) > return true; > if (magic == WASM_CIGAM) { > // Set byte order in DataExtractor > data.SetByteOrder(data.GetByteOrder() == eByteOrderBig ? eByteOrderLittle > : eByteOrderBig); > return true; > } > return false; > } > ``` > > This function expects a DataExtractor to be passed in that has "data_sp" > inside of it with the host endian set as the byte order. It will set the byte > order correctly. It also expects to have two uint32_t macros defined: > WASM_MAGIC and WASM_CIGAM. These contain the non byte swapped and the byte > swapped magic values. Easy to replace those with real definitions from else > where (llvm::file_magic::wasm_object? Not sure of the type of this though, > seemed like a StringRef). Not sure about this... WebAssembly is always little-endian and a DataExtractor is not really needed here. I made sure to set the byte order where the DataExtractor is created in `ReadImageData`. ================ Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:59 +/// Reads a LEB128 variable-length unsigned integer, limited to 32 bits. +static llvm::Optional<uint32_t> GetVaruint32(DataExtractor §ion_header_data, + lldb::offset_t *offset_ptr) { ---------------- clayborg wrote: > remove if we add: > ``` > llvm::Optional<uint8_t> DataExtractor::GetULEB128(uint64_t *offset_ptr, > uint64_t max_value); > ``` Actually, section_id is encoded as a byte not as a varuint7, so I modified the code accordingly, with `llvm::Optional<uint8_t> GetByte(DataExtractor &, lldb::offset_t *)`. Even this could be moved to DataExtractor, maybe? ================ Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:187-195 + const uint8_t *name_bytes = section_header_data.PeekData(offset, *name_len); + // If a custom section has a name longer than the allocated buffer or longer + // than the data left in the image, ignore this section. + if (!name_bytes) + return false; + + llvm::StringRef sect_name(reinterpret_cast<const char *>(name_bytes), ---------------- clayborg wrote: > All these lines can use the GetCStr with a length: > ``` > ConstString sect_name(data.GetCStr(&offset, *name_len)); > if (!sect_name) > return false; > ``` I had tried to use `DataExtractor::GetCStr()` but it doesn't work because it wants null-terminated strings, and this is not the case in Wasm, where strings are encoded as len (varuint32) followed by an array of len bytes that represent UTF8 chars (https://webassembly.github.io/spec/core/binary/values.html#names). ================ Comment at: lldb/test/Shell/ObjectFile/wasm/basic.yaml:21-79 +Sections: + - Type: TYPE + Signatures: + - Index: 0 + ParamTypes: + - I32 + ReturnTypes: ---------------- labath wrote: > Could you remove the sections which are not relevant for this test? This > makes it easier to see the correspondence between the yaml and the test > expectations.. Ok. What can be confusing is that Wasm modules (almost) always have at least a few standard sections (MEMORY, GLOBAL, CODE, ...) but these sections can be ignored by LLDB, so also by these tests. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71575/new/ https://reviews.llvm.org/D71575 _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits