[Lldb-commits] [PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

Pavel Labath via Phabricator via lldb-commits Mon, 14 Oct 2019 07:02:05 -0700

labath marked an inline comment as done.
labath added inline comments.


================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:291-295
+  EntryIterator Absolute =
+      getAbsoluteLocations(
+          SectionedAddress{BaseAddr, SectionedAddress::UndefSection},
+          LookupPooledAddress)
+          .begin();
----------------
dblaikie wrote:
> labath wrote:
> > dblaikie wrote:
> > > labath wrote:
> > > > dblaikie wrote:
> > > > > labath wrote:
> > > > > > dblaikie wrote:
> > > > > > > labath wrote:
> > > > > > > > This parallel iteration is not completely nice, but I think 
> > > > > > > > it's worth being able to reuse the absolute range computation 
> > > > > > > > code. I'm open to ideas for improvement though.
> > > > > > > Ah, I see - this is what you meant about "In particular it makes 
> > > > > > > it possible to reuse this stuff in the dumping code, which would 
> > > > > > > have been pretty hard with callbacks.".
> > > > > > > 
> > > > > > > I'm wondering if that might be worth revisiting somewhat. A full 
> > > > > > > iterator abstraction for one user here (well, two once you 
> > > > > > > include lldb - but I assume it's likely going to build its own 
> > > > > > > data structure from the iteration anyway, right? (it's not going 
> > > > > > > to keep the iterator around, do anything interesting like partial 
> > > > > > > iterations, re-iterate/etc - such that a callback would suffice))
> > > > > > > 
> > > > > > > I could imagine two callback APIs for this - one that gets 
> > > > > > > entries and locations and one that only gets locations by 
> > > > > > > filtering on the entry version.
> > > > > > > 
> > > > > > > eg:
> > > > > > > 
> > > > > > >   // for non-verbose output:
> > > > > > >   LL.forEachEntry([&](const Entry &E, Expected<DWARFLocation> L) {
> > > > > > >     if (Verbose && actually dumping debug_loc)
> > > > > > >       print(E) // print any LLE_*, raw parameters, etc
> > > > > > >     if (L)
> > > > > > >       print(*L) // print the resulting address range, section 
> > > > > > > name (if verbose), 
> > > > > > >     else
> > > > > > >       print(error stuff)
> > > > > > >   });
> > > > > > > 
> > > > > > > One question would be "when/where do we print the DWARF 
> > > > > > > expression" - if there's an error computing the address range, we 
> > > > > > > can still print the expression, so maybe that happens 
> > > > > > > unconditionally at the end of the callback, using the expression 
> > > > > > > in the Entry? (then, arguably, the expression doesn't need to be 
> > > > > > > in the DWARFLocation - and I'd say make the DWARFLocation a 
> > > > > > > sectioned range, exactly the same type as for ranges so that part 
> > > > > > > of the dumping code, etc, can be maximally reused)
> > > > > > Actually, what lldb currently does is that it does not build any 
> > > > > > data structures at all (except storing the pointer to the right 
> > > > > > place in the debug_loc section. Then, whenever it wants to do 
> > > > > > something to the loclist, it parses it afresh. I don't know why it 
> > > > > > does this exactly, but I assume it has something to do with most 
> > > > > > locations never being used, or being only a couple of times, and 
> > > > > > the actual parsing being fairly fast. What this means is that lldb 
> > > > > > is not really a single "user", but there are like four or five 
> > > > > > places where it iterates through the list, depending on what does 
> > > > > > it actually want to do with it. It also does partial iteration 
> > > > > > where it stops as soon as it find the entry it was interested in.
> > > > > > Now, all of that is possible with a callback (though I am generally 
> > > > > > trying to avoid them), but it does resurface the issue of what 
> > > > > > should be the value of the second argument for DW_LLE_base_address 
> > > > > > entries (the thing which I originally used a error type for).
> > > > > > Maybe this should be actually one callback API, taking two callback 
> > > > > > functions, with one of them being invoked for base_address entries, 
> > > > > > and one for others? However, if we stick to the current approaches 
> > > > > > in both LLE and RLE of making the address pool resolution function 
> > > > > > a parameter (which I'd like to keep, as it makes my job in lldb 
> > > > > > easier), then this would actually be three callbacks, which starts 
> > > > > > to get unwieldy. Though one of those callbacks could be removed 
> > > > > > with the "DWARFUnit implementing a AddrOffsetResolver interface" 
> > > > > > idea, which I really like. :)
> > > > > Ah, thanks for the details on LLDB's location parsing logic. That's 
> > > > > interesting indeed!
> > > > > 
> > > > > I can appreciate an iterator-based API if that's the sort of usage 
> > > > > we've got, though I expect it doesn't have any interest in the 
> > > > > low-level encoding & just wants the fully processed address 
> > > > > ranges/locations - it doesn't want base_address or end_of_list 
> > > > > entries? & I think the dual-iteration is a fairly awkward API design, 
> > > > > trying to iterate them in lock-step, etc. I'd rather avoid that if 
> > > > > reasonably possible.
> > > > > 
> > > > > Either having an iterator API that gives only the fully processed 
> > > > > data/semantic view & a completely different API if you want to access 
> > > > > the low level primitives (LLE, etc) (this is how ranges works - 
> > > > > there's an API that gives a collection of ranges & abstracts over 
> > > > > v4/v5/rnglists/etc - though that's partly motivated by a strong 
> > > > > multi-client need for that functionality for symbolizing, etc - but I 
> > > > > think it's a good abstraction/model anyway (& one of the reasons the 
> > > > > inline range list printing doesn't include encoding information, the 
> > > > > API it uses is too high level to even have access to it))
> > > > > 
> > > > > > Now, all of that is possible with a callback (though I am generally 
> > > > > > trying to avoid them), but it does resurface the issue of what 
> > > > > > should be the value of the second argument for DW_LLE_base_address 
> > > > > > entries (the thing which I originally used a error type for).
> > > > > 
> > > > > Sorry, my intent in the above API was for the second argument to be 
> > > > > Optional's "None" state when... oh, I see, I did use Expected there, 
> > > > > rather than Optional, because there are legit error cases.
> > > > > 
> > > > > I know it's sort of awkward, but I might be inclined to use 
> > > > > Optional<Expected<AddressRange>> there. I realize two layers of 
> > > > > wrapping is a bit weird, but I think it'd be nicer than having an 
> > > > > error state for what, I think, isn't erroneous.
> > > > > 
> > > > > > Maybe this should be actually one callback API, taking two callback 
> > > > > > functions, with one of them being invoked for base_address entries, 
> > > > > > and one for others? However, if we stick to the current approaches 
> > > > > > in both LLE and RLE of making the address pool resolution function 
> > > > > > a parameter (which I'd like to keep, as it makes my job in lldb 
> > > > > > easier), then this would actually be three callbacks, which starts 
> > > > > > to get unwieldy.
> > > > > 
> > > > > Don't mind three callbacks too much.
> > > > > 
> > > > > > Though one of those callbacks could be removed with the "DWARFUnit 
> > > > > > implementing a AddrOffsetResolver interface" idea, which I really 
> > > > > > like. :)
> > > > > 
> > > > > Sorry, I haven't really looked at where the address resolver callback 
> > > > > is registered and alternative designs being discussed - but yeah, 
> > > > > going off just the one-sentence, it seems reasonable to have the 
> > > > > DWARFUnit own an address resolver/be the thing you consult when you 
> > > > > want to resolve an address (just through a normal function call in 
> > > > > DWARFUnit, perhaps - which might, internally, use a callback 
> > > > > registered when it was constructed).
> > > > > I know it's sort of awkward, but I might be inclined to use 
> > > > > Optional<Expected<AddressRange>> there. I realize two layers of 
> > > > > wrapping is a bit weird, but I think it'd be nicer than having an 
> > > > > error state for what, I think, isn't erroneous.
> > > > Actually, my very first attempt at this patch used an 
> > > > `Expected<Optional<Whatever>>`, but then I scrapped it because I didn't 
> > > > think you'd like it. It's not the friendliest of APIs, but I think we 
> > > > can go with that.
> > > > 
> > > > > Sorry, I haven't really looked at where the address resolver callback 
> > > > > is registered and alternative designs being discussed - but yeah, 
> > > > > going off just the one-sentence, it seems reasonable to have the 
> > > > > DWARFUnit own an address resolver/be the thing you consult when you 
> > > > > want to resolve an address (just through a normal function call in 
> > > > > DWARFUnit, perhaps - which might, internally, use a callback 
> > > > > registered when it was constructed).
> > > > 
> > > > I think you got that backwards. I don't want the DWARFUnit to be the 
> > > > source of truth for address pool resolutions, as that would make it 
> > > > hard to use from lldb (it's far from ready to start using the llvm 
> > > > version right now). What I wanted was to replace the 
> > > > lambda/function_ref with a single-method interface. Then both 
> > > > DWARFUnits could implement that interface so that passing a DWARFUnit& 
> > > > would "just work" (but you wouldn't be limited to DWARFUnits as anyone 
> > > > could implement that interface, just like anyone can write a lambda).
> > > As for Expected<Optional<Whatever>> (or Optional<Expected<>>) - yeah, I 
> > > think this is a non-obvious API (both the general problem and this 
> > > specific solution). I think it's probably worth discussing this design a 
> > > bit more to save you time writing/rewriting things a bit. I guess there 
> > > are a few layers of failure here.
> > > 
> > > There's the possibility that the iteration itself could fail - even for 
> > > debug_loc style lists (if we reached the end of the section before 
> > > encountering a terminating {0,0}). That would suggest a fallible iterator 
> > > idiom: 
> > > http://llvm.org/docs/ProgrammersManual.html#building-fallible-iterators-and-iterator-ranges
> > > 
> > > But then, yes, when looking at the "processed"/semantic view, that could 
> > > fail too in the case of an invalid address index, etc.
> > > 
> > > The generic/processed/abstracted-over-ranges-and-rnglists API for ranges 
> > > produces a fully computer vector (& then returns Expected<vector> of that 
> > > range) - is that reasonable? (this does mean manifesting a whole location 
> > > in memory, which may not be needed so I could understand avoiding that 
> > > even without fully implementing & demonstrating the vector solution is 
> > > inadequate).
> > > 
> > > But I /think/ maybe the we could/should have two APIs - one generic API 
> > > that abstracts over loc/loclists and only provides the fully processed 
> > > view, and another that is type specific for dumping the underlying 
> > > representation (only used in dumping debug_loclists). 
> > If we were computing the final address ranges from scratch (which would be 
> > the best match for the current lldb usage, but which I am not considering 
> > now for fear of changing too many things), then I agree that we would need 
> > the fallible_iterator iterator thingy. But in this case we are 
> > "interpreting" the already parsed ranges, so we can assume some level of 
> > correctness here, and the thing that can fail is only the computation of a 
> > single range, which does not affect our ability to process the next entry.
> > This indicates to me that either each entry in the list should be an 
> > Expected<>, or that the invalid entries should be just dropped (possibly 
> > accompanied by some flag which would tell the caller that the result was 
> > not exhaustive).
> > 
> > This is connected to one of the issues I have with the debug ranges API -- 
> > it tries _really_ hard to return *something* -- if resolving the indirect 
> > base address entry fails, it is perfectly happy to use the address _index_ 
> > as the base address. This makes sense for dumping, where you want to show 
> > something (though it would still be good to indicate that you're not 
> > showing a real address), but it definitely does not help consumers which 
> > then need to make decisions based on the returned data.
> > 
> > Anyway, yes, I agree that we need to APIs, and probably callbacks are the 
> > easiest way to achieve that. We could have a "base" callback that is not 
> > particularly nice to use, but provides the full information via a 
> > combination of `UnparsedLL` and `Optional<Expected<ParsedLL>>` arguments. 
> > The dumper could use that to print out everything it needs. And then we 
> > could have a second API, built on top of the first one, which ignores base 
> > address entries and the raw data and returns just a bunch of 
> > `Expected<ParsedLL>`. This could be used by users like lldb, who just want 
> > to see the final data. The `ParsedLL` type would be independent of the 
> > location list type, so that the debug_loc parser could provide the same 
> > kind of API (but implemented on top of something else, as the `UnparsedLL` 
> > types would differ). Also, under the hood, the location list dumper for 
> > debug_loclists (but not debug_loc) could reuse some implementation details 
> > with the debug_rnglists dumper via a suitable combination of templates and 
> > callbacks.
> > 
> > How does that sound?
> What sort of things are you concerned about with deeper API changes here? I 
> think it's probably worth building the "right" thing now - as good a time as 
> any. LLVM's debug info APIs, as you've pointed out, aren't exactly "sturdy" 
> (treating address indexes as offsets, etc, etc), so no time like the present 
> to clean it up.
> 
> I think if we had an abstraction over v4 and v5 location descriptions, 
> parsing from scratch, fallible iterators, etc - that'd be the ideal thing to 
> use in the inline dumping code (that dumps inside debug_info) - which 
> currently uses "parseOneLocationList" - so it is parsing from scratch and 
> dumping. 
> 
> But equally I understand not wanting to make you/me/anyone fix everything 
> when just trying to get something actually done. 
I think I am mainly concerned with the scope explosion of doing changes like 
that. I don't know how well founded is that concern, because I don't know all 
the use cases for this information right now (but that's a part of the 
problem). But anyway, let's continue the discussion.

The main problem I see with "inline" dumping using some higher-level 
abstraction is what should one do in case of incomplete data (e.g. dwo files). 
If this abstraction returns "cooked" data, then the inline dump could not show 
anything for dwo files dumped standalone. This is perfectly fine for lldb and 
maybe other tools (which never look at dwo files separately, and mostly don't 
care about the reason why the "cooking" failed -- if they just need to get the 
data it can rely on), but for something like llvm-dwarfdump, we'd probably want 
to display _something_. I guess this is why the ranges api returns indexes as 
offsets, but I think we agree we don't want that.

Then the question is what should that "something" be? If it's going to be a raw 
dump of the location list entry, then solution would not be fully generic, as 
they raw entry type will depend on the location list kind. Though, we could 
still arrange it so that the various location lists can be processed uniformly 
via a template if they e.g. have a dump() function with the same signature...

Suppose we go down that path (this is the path I wanted to go down in my 
previous comment, modulo the parse-from-scratch part). The question then is 
what to do with the section-based dumping. Should it use the same mechanism? It 
probably should, because the first callback/iterator will provide all data it 
can possibly want. But then, should it also build the parsed representation 
like it does now?
If yes, then what for? Should we also build some mechanism to "cook" that data 
too.
If not, are we ok with having all users reparse from scratch always? (There are 
only two users I found right now, and they look like they'd be fine with it.)
Or should the DWARFDebugLoclists object cache the cooked data instead? 
llvm-dwarfdump --statistics would actually prefer that, but that might make the 
DWARF verifier sad (though I guess it could always reparse from scratch if 
needed).


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68270/new/

https://reviews.llvm.org/D68270



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

Reply via email to