[Lldb-commits] [PATCH] D84402: [lldb/DWARF] Add more line table validation

Pavel Labath via Phabricator via lldb-commits Mon, 27 Jul 2020 04:36:23 -0700

labath marked an inline comment as done.
labath added inline comments.


================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp:1044
   for (const llvm::DWARFDebugLine::Sequence &seq : line_table->Sequences) {
+    if (!list.ContainsFileAddressRange(seq.LowPC, seq.HighPC - seq.LowPC))
+      continue;
----------------
MaskRay wrote:
> clayborg wrote:
> > labath wrote:
> > > clayborg wrote:
> > > > dblaikie wrote:
> > > > > Could you specifically look for/propagate tombstones here, to reduce 
> > > > > the risk of large functions overlapping with the valid address range, 
> > > > > even when they're tombstoned to zero? (because zero+large function 
> > > > > size could still end up overlapping with valid code)
> > > > > 
> > > > > To do that well, I guess it'd have to be implemented at a 
> > > > > lower-layer, inside the line table state machine - essentially 
> > > > > dropping all lines derived from a "set address" operation that 
> > > > > specifies a tombstone.
> > > > Just checking if the section lists contains an address doesn't help 
> > > > weed out addresses that were tombstoned to zero since our PT_LOAD[0] 
> > > > will almost always contain zero for shared libraries. It might be nice 
> > > > to make a list of addresses that come from sections with read + execute 
> > > > permissions and store that in SymbolFileDWARF one time at startup. Then 
> > > > these searches will be faster as we are looking in less ranges, and 
> > > > most likely will not contain address zero. This code will catch the -1 
> > > > and -2 tombstones, but most linkers I have run into use zero and the 
> > > > tombstone. 
> > > > 
> > > > If our algorithm only checks sections with no subsections and then 
> > > > makes a list of file addresses for and section ranges for those, we 
> > > > should have a great list. The entire PT_LOAD[0] will usually be mapped 
> > > > read + execute, so we can't just check top level sections for ELF. 
> > > > Mach-o also has this issue __TEXT in mac is also mapped read + execute 
> > > > and usually contains zero for shared libraries, but since the sections 
> > > > must come after the mach-o header, the sections within the __TEXT 
> > > > segment have correct permissions and would work, just like they would 
> > > > for ELF.
> > > You're right -- this would not handle shared libraries with base zero.
> > > 
> > > I am slightly uneasy about requiring executable permissions for all line 
> > > tables. While it does not seem terribly useful to have line tables for 
> > > non-executable code, if someone does have a line table for it for 
> > > whatever reason (maybe he wants to make it executable at runtime?) it 
> > > would be a shame not to display it. Also the choice of using section 
> > > rather than segment permissions feels slightly arbitrary (although I 
> > > could make a case for it), as it's the segment permissions which will 
> > > actually define the runtime memory permissions.
> > > 
> > > Since this is really about filtering out (near) zero addresses, how about 
> > > we make that explicit? Find the lowest executable (section) address and 
> > > reject anything below that? Additionally, I'd also reject all addresses 
> > > which are completely outside of the module range, as those not going to 
> > > get used for anything, and they are generating bogus line-table dumps.
> > > 
> > > What do you think?
> > > 
> > > David: The -1 tombstones are already sort of handled in llvm (and in lldb 
> > > since D83957). They are "handled" in the sense that the all sequences 
> > > with and end PC lower than the start PC are rejected (and line sequences 
> > > starting with (unsigned)-1 will definitely wrap). This is trying to do 
> > > something about the zero tombstones.
> > > Since this is really about filtering out (near) zero addresses, how about 
> > > we make that explicit? Find the lowest executable (section) address and 
> > > reject anything below that? Additionally, I'd also reject all addresses 
> > > which are completely outside of the module range, as those not going to 
> > > get used for anything, and they are generating bogus line-table dumps.
> > > 
> > > What do you think?
> > 
> > That will work for me. My main goal is to get anything that should have 
> > been dead stripped out from appearing in results for line lookups or 
> > function lookups. The quicker we can short circuit these cases the better 
> > for performance. We can also use this when we try to lookup functions and 
> > don't return any matches for functions whose start address falls below this 
> > value.
> > 
> > 
> There is a concern about ContainsFileAddressRange's performance.
> 
> FindSectionContainingFileAddress iterates all sections and can be slow when 
> the number of sections is large.
> 
> >  if someone does have a line table for it for whatever reason (maybe he 
> > wants to make it executable at runtime?) it would be a shame not to display 
> > it.
> 
> +1
> 
> Instead of a [lowpc,highpc) range check, I wonder whether we should just 
> filter out lowpc. We don't seem to benefit much from a range check. A data 
> point is that gdb filters with the CU address range and only checks lowpc 
> (https://sourceware.org/git/?p=binutils-gdb.git;a=blobdiff;f=gdb/dwarf2/read.c;h=405b5fb3348c94aad10e3bb40f393137ddb0759c;hp=4622d14a05c6819b482c9c97c14a14755876aa72;hb=a8caed5d7faa639a1e6769eba551d15d8ddd9510;hpb=33d1369f183f1c276e3f0f52b5573fb2f5843b1c
>  )
> If the CU's lowpc is 0, it will allow a line sequence at address 0, otherwise 
> disallow it. There is some bare metal usage with zero address.
> There is a concern about ContainsFileAddressRange's performance.
> FindSectionContainingFileAddress iterates all sections and can be slow when 
> the number of sections is large.
Yeah, that function isn't very optimized, but OTOH, it is used in pretty much 
every address translation in lldb (Address class internally represents things 
as section+offset), and it hasn't been a problem so far. (can't say we run many 
benchmarks, but we do run _some_). My thinking was that if it ever becomes a 
problem, it could be solved at the source so that all users benefit.

> If the CU's lowpc is 0, it will allow a line sequence at address 0, otherwise 
> disallow it.
Hmm.... are you sure that code actually works? Because using DW_AT_low_pc=0 in 
combination with DW_AT_ranges is a common practice for describing code from 
multiple sections (which is pretty much all code in c++). It relies on the fact 
that when DW_AT_ranges is used, the only purpose of DW_AT_low_pc is to set a 
base address for range and location lists. An absolute base of zero allows 
those entries to have relocations applied to them. I'm not sure this practice 
is still useful/needed for dwarf 5, but clang does it nonetheless.
```
$ clang -gdwarf-5 -x c -o - -c -g - <<<"void f(){} void g(){}" 
-ffunction-sections | llvm-dwarfdump - | grep -C 1 -e DW_AT_ranges
              DW_AT_low_pc      (0x0000000000000000)
              DW_AT_ranges      (indexed (0x0) rangelist = 0x00000010
                 [0x0000000000000000, 0x0000000000000006)
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84402/new/

https://reviews.llvm.org/D84402



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D84402: [lldb/DWARF] Add more line table validation

Reply via email to