[Lldb-commits] [PATCH] D32167: Add support for type units (.debug_types) to LLDB in a way that is compatible with DWARF 5

Jan Kratochvil via Phabricator via lldb-commits Tue, 10 Apr 2018 14:47:11 -0700

jankratochvil added inline comments.


================
Comment at: source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp:595
+      uint64_t debug_info_size = get_debug_info_data().GetByteSize();
+      data_segment.m_data.OffsetData(debug_info_size);
+    }
----------------
jankratochvil wrote:
> clayborg wrote:
> > jankratochvil wrote:
> > > I do not like this `DWARFDataExtractor::m_start` modification, it sort of 
> > > corrupts the `DataExtractor` and various operations stop working then - 
> > > such as `DWARFDataExtractor::GetByteSize()`. DWZ patch makes from current 
> > > `dw_offset_t` a virtual (remapped) offset and introduces new physical 
> > > file section offset which is looked up for data extraction. The file 
> > > offset is represented as `DWARFFileOffset` in D40474, instead of `bool 
> > > m_is_dwz;` there could be some `enum { DEBUG_INFO, DEBUG_TYPES, 
> > > DWZ_DEBUG_INFO } m_where;` instead.
> > This means that this diff doesn't affect all of the other DWARF code. 
> > Nothing in .debug_types will refer to anything else (not DW_FORM_ref_addr, 
> > or any external references). So this trick allows us to just treat 
> > .debug_info as if .debug_types was appended to the end since nothing in 
> > .debug_types refers to any DIE outside of its type unit. This also mirrors 
> > what will actually happen with DWARF5 when all of the data is contained 
> > inside of the .debug_info section. This allows each DIE to have a unique 
> > "ID". Any other change requires a lot of changes to the DWARF parser and 
> > logic. So I actually like this feature. We can fix the GetByteSize() if 
> > needed. Basically every object in DWARf right now must be able to make a 
> > unique 64 bit unsigned integer ID in some way that we can get back to that 
> > info and partially parse more. These are handed out as lldb::user_id_t 
> > values for types, functions and more. Each flavor of DWARF will encode what 
> > they want into here. The normal DWARF it is just the absolute offset within 
> > the .debug_info. With .debug_types we just add the size of the .debug_info 
> > to the ID. For DWARF in .o files on Darwin, we encode the compile unit 
> > index into the top 32 bits and the DIE offset into the lower, DWO does 
> > something just as DWZ will need to. DWARFFileOffset doesn't mean much if 
> > there are multiple files. We have many competing type uniquing/debug info 
> > size reduction strategies being employed here. I can't believe we have DWO, 
> > DWZ, and debug types... But we have to make them all work. We can't just 
> > use the absolute file offset because DWO used external files where the file 
> > offsets could be the same in the external .o files... Not sure how this 
> > works with DWZ or what the best option is. I will read up on DWZ so I can 
> > propose some viable options. But each new flavor of the day that gets added 
> > the DWARF parser is adding a bunch of logic and edge cases. If two 
> > technologies (DWZ + DWO, DWZ + debug_types, etc) are used together, we need 
> > to ensure they can.
> > Any other change requires a lot of changes to the DWARF parser and logic. 
> > So I actually like this feature.
> 
> I agree it is a fine quick&dirty hack. Just if my DWZ support gets accepted 
> later anyway then this `.debug_types` feature could be implemented by its 
> framework in a clean way (as a regular DIEs remapping which is required for 
> DWZ anyway).
> 
> > If two technologies (DWZ + DWO, DWZ + debug_types, etc) are used together, 
> > we need to ensure they can.
> 
> `DWZ + DWO` do not make sense to me.  I haven't tried to use DWZ for DWO but 
> DWZ finds common DWARF subtrees typically across CUs so it would not find 
> much.
> 
> `DWZ + debug_types` is explicitly supported by the DWZ tool although that is 
> IMO for compatibility only, DWZ can make the common type references slightly 
> smaller than debug_types. I will sure need to implement debug_types support 
> into my DWZ-for-LLDB patchset later.
> 
> For DWARF in .o files on Darwin, we encode the compile unit index into the 
> top 32 bits and the DIE offset into the lower

BTW that prevents implementing DWARF64 so I wanted to prevent using the 32/32 
split. Currently Fedora/RHEL already contains files with `.debug_info` size 
near the 32-bit limit:
```http://ftp.muni.cz/pub/linux/fedora/linux/development/rawhide/Everything/x86_64/debug/tree/Packages/q/qt5-qtwebkit-debuginfo-5.212.0-0.20.alpha2.fc29.x86_64.rpm
/usr/lib/debug/usr/lib64/libQt5WebKit.so.5.212.0-5.212.0-0.20.alpha2.fc29.x86_64.debug
.debug_info size = 0x9bff1b08 = 2.4GiB
```
Although that is because neither DWZ (as it would run out of memory) nor 
`.debug_types` (as it is expected DWZ will handle the duplicities) are applied 
and so all CUs are separate and even if it was >4GB there would still be no 
need for 64-bit `DW_FORM_ref_addr`. Still we are already close to the 32-bit 
limit, also shifting the split for example to 40/24 would make some limit on 
number of CUs etc.



https://reviews.llvm.org/D32167



_______________________________________________
lldb-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D32167: Add support for type units (.debug_types) to LLDB in a way that is compatible with DWARF 5

Reply via email to