[Lldb-commits] [PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

Pavel Labath via Phabricator via lldb-commits Thu, 31 Oct 2019 09:33:46 -0700

labath abandoned this revision.
labath marked an inline comment as done.
labath added a comment.


Abandoning. I'll create separate patches with the new implementation.



================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:291-295
+  EntryIterator Absolute =
+      getAbsoluteLocations(
+          SectionedAddress{BaseAddr, SectionedAddress::UndefSection},
+          LookupPooledAddress)
+          .begin();
----------------
JDevlieghere wrote:
> labath wrote:
> > dblaikie wrote:
> > > labath wrote:
> > > > dblaikie wrote:
> > > > > labath wrote:
> > > > > > dblaikie wrote:
> > > > > > > labath wrote:
> > > > > > > > dblaikie wrote:
> > > > > > > > > labath wrote:
> > > > > > > > > > This parallel iteration is not completely nice, but I think 
> > > > > > > > > > it's worth being able to reuse the absolute range 
> > > > > > > > > > computation code. I'm open to ideas for improvement though.
> > > > > > > > > Ah, I see - this is what you meant about "In particular it 
> > > > > > > > > makes it possible to reuse this stuff in the dumping code, 
> > > > > > > > > which would have been pretty hard with callbacks.".
> > > > > > > > > 
> > > > > > > > > I'm wondering if that might be worth revisiting somewhat. A 
> > > > > > > > > full iterator abstraction for one user here (well, two once 
> > > > > > > > > you include lldb - but I assume it's likely going to build 
> > > > > > > > > its own data structure from the iteration anyway, right? 
> > > > > > > > > (it's not going to keep the iterator around, do anything 
> > > > > > > > > interesting like partial iterations, re-iterate/etc - such 
> > > > > > > > > that a callback would suffice))
> > > > > > > > > 
> > > > > > > > > I could imagine two callback APIs for this - one that gets 
> > > > > > > > > entries and locations and one that only gets locations by 
> > > > > > > > > filtering on the entry version.
> > > > > > > > > 
> > > > > > > > > eg:
> > > > > > > > > 
> > > > > > > > >   // for non-verbose output:
> > > > > > > > >   LL.forEachEntry([&](const Entry &E, Expected<DWARFLocation> 
> > > > > > > > > L) {
> > > > > > > > >     if (Verbose && actually dumping debug_loc)
> > > > > > > > >       print(E) // print any LLE_*, raw parameters, etc
> > > > > > > > >     if (L)
> > > > > > > > >       print(*L) // print the resulting address range, section 
> > > > > > > > > name (if verbose), 
> > > > > > > > >     else
> > > > > > > > >       print(error stuff)
> > > > > > > > >   });
> > > > > > > > > 
> > > > > > > > > One question would be "when/where do we print the DWARF 
> > > > > > > > > expression" - if there's an error computing the address 
> > > > > > > > > range, we can still print the expression, so maybe that 
> > > > > > > > > happens unconditionally at the end of the callback, using the 
> > > > > > > > > expression in the Entry? (then, arguably, the expression 
> > > > > > > > > doesn't need to be in the DWARFLocation - and I'd say make 
> > > > > > > > > the DWARFLocation a sectioned range, exactly the same type as 
> > > > > > > > > for ranges so that part of the dumping code, etc, can be 
> > > > > > > > > maximally reused)
> > > > > > > > Actually, what lldb currently does is that it does not build 
> > > > > > > > any data structures at all (except storing the pointer to the 
> > > > > > > > right place in the debug_loc section. Then, whenever it wants 
> > > > > > > > to do something to the loclist, it parses it afresh. I don't 
> > > > > > > > know why it does this exactly, but I assume it has something to 
> > > > > > > > do with most locations never being used, or being only a couple 
> > > > > > > > of times, and the actual parsing being fairly fast. What this 
> > > > > > > > means is that lldb is not really a single "user", but there are 
> > > > > > > > like four or five places where it iterates through the list, 
> > > > > > > > depending on what does it actually want to do with it. It also 
> > > > > > > > does partial iteration where it stops as soon as it find the 
> > > > > > > > entry it was interested in.
> > > > > > > > Now, all of that is possible with a callback (though I am 
> > > > > > > > generally trying to avoid them), but it does resurface the 
> > > > > > > > issue of what should be the value of the second argument for 
> > > > > > > > DW_LLE_base_address entries (the thing which I originally used 
> > > > > > > > a error type for).
> > > > > > > > Maybe this should be actually one callback API, taking two 
> > > > > > > > callback functions, with one of them being invoked for 
> > > > > > > > base_address entries, and one for others? However, if we stick 
> > > > > > > > to the current approaches in both LLE and RLE of making the 
> > > > > > > > address pool resolution function a parameter (which I'd like to 
> > > > > > > > keep, as it makes my job in lldb easier), then this would 
> > > > > > > > actually be three callbacks, which starts to get unwieldy. 
> > > > > > > > Though one of those callbacks could be removed with the 
> > > > > > > > "DWARFUnit implementing a AddrOffsetResolver interface" idea, 
> > > > > > > > which I really like. :)
> > > > > > > Ah, thanks for the details on LLDB's location parsing logic. 
> > > > > > > That's interesting indeed!
> > > > > > > 
> > > > > > > I can appreciate an iterator-based API if that's the sort of 
> > > > > > > usage we've got, though I expect it doesn't have any interest in 
> > > > > > > the low-level encoding & just wants the fully processed address 
> > > > > > > ranges/locations - it doesn't want base_address or end_of_list 
> > > > > > > entries? & I think the dual-iteration is a fairly awkward API 
> > > > > > > design, trying to iterate them in lock-step, etc. I'd rather 
> > > > > > > avoid that if reasonably possible.
> > > > > > > 
> > > > > > > Either having an iterator API that gives only the fully processed 
> > > > > > > data/semantic view & a completely different API if you want to 
> > > > > > > access the low level primitives (LLE, etc) (this is how ranges 
> > > > > > > works - there's an API that gives a collection of ranges & 
> > > > > > > abstracts over v4/v5/rnglists/etc - though that's partly 
> > > > > > > motivated by a strong multi-client need for that functionality 
> > > > > > > for symbolizing, etc - but I think it's a good abstraction/model 
> > > > > > > anyway (& one of the reasons the inline range list printing 
> > > > > > > doesn't include encoding information, the API it uses is too high 
> > > > > > > level to even have access to it))
> > > > > > > 
> > > > > > > > Now, all of that is possible with a callback (though I am 
> > > > > > > > generally trying to avoid them), but it does resurface the 
> > > > > > > > issue of what should be the value of the second argument for 
> > > > > > > > DW_LLE_base_address entries (the thing which I originally used 
> > > > > > > > a error type for).
> > > > > > > 
> > > > > > > Sorry, my intent in the above API was for the second argument to 
> > > > > > > be Optional's "None" state when... oh, I see, I did use Expected 
> > > > > > > there, rather than Optional, because there are legit error cases.
> > > > > > > 
> > > > > > > I know it's sort of awkward, but I might be inclined to use 
> > > > > > > Optional<Expected<AddressRange>> there. I realize two layers of 
> > > > > > > wrapping is a bit weird, but I think it'd be nicer than having an 
> > > > > > > error state for what, I think, isn't erroneous.
> > > > > > > 
> > > > > > > > Maybe this should be actually one callback API, taking two 
> > > > > > > > callback functions, with one of them being invoked for 
> > > > > > > > base_address entries, and one for others? However, if we stick 
> > > > > > > > to the current approaches in both LLE and RLE of making the 
> > > > > > > > address pool resolution function a parameter (which I'd like to 
> > > > > > > > keep, as it makes my job in lldb easier), then this would 
> > > > > > > > actually be three callbacks, which starts to get unwieldy.
> > > > > > > 
> > > > > > > Don't mind three callbacks too much.
> > > > > > > 
> > > > > > > > Though one of those callbacks could be removed with the 
> > > > > > > > "DWARFUnit implementing a AddrOffsetResolver interface" idea, 
> > > > > > > > which I really like. :)
> > > > > > > 
> > > > > > > Sorry, I haven't really looked at where the address resolver 
> > > > > > > callback is registered and alternative designs being discussed - 
> > > > > > > but yeah, going off just the one-sentence, it seems reasonable to 
> > > > > > > have the DWARFUnit own an address resolver/be the thing you 
> > > > > > > consult when you want to resolve an address (just through a 
> > > > > > > normal function call in DWARFUnit, perhaps - which might, 
> > > > > > > internally, use a callback registered when it was constructed).
> > > > > > > I know it's sort of awkward, but I might be inclined to use 
> > > > > > > Optional<Expected<AddressRange>> there. I realize two layers of 
> > > > > > > wrapping is a bit weird, but I think it'd be nicer than having an 
> > > > > > > error state for what, I think, isn't erroneous.
> > > > > > Actually, my very first attempt at this patch used an 
> > > > > > `Expected<Optional<Whatever>>`, but then I scrapped it because I 
> > > > > > didn't think you'd like it. It's not the friendliest of APIs, but I 
> > > > > > think we can go with that.
> > > > > > 
> > > > > > > Sorry, I haven't really looked at where the address resolver 
> > > > > > > callback is registered and alternative designs being discussed - 
> > > > > > > but yeah, going off just the one-sentence, it seems reasonable to 
> > > > > > > have the DWARFUnit own an address resolver/be the thing you 
> > > > > > > consult when you want to resolve an address (just through a 
> > > > > > > normal function call in DWARFUnit, perhaps - which might, 
> > > > > > > internally, use a callback registered when it was constructed).
> > > > > > 
> > > > > > I think you got that backwards. I don't want the DWARFUnit to be 
> > > > > > the source of truth for address pool resolutions, as that would 
> > > > > > make it hard to use from lldb (it's far from ready to start using 
> > > > > > the llvm version right now). What I wanted was to replace the 
> > > > > > lambda/function_ref with a single-method interface. Then both 
> > > > > > DWARFUnits could implement that interface so that passing a 
> > > > > > DWARFUnit& would "just work" (but you wouldn't be limited to 
> > > > > > DWARFUnits as anyone could implement that interface, just like 
> > > > > > anyone can write a lambda).
> > > > > As for Expected<Optional<Whatever>> (or Optional<Expected<>>) - yeah, 
> > > > > I think this is a non-obvious API (both the general problem and this 
> > > > > specific solution). I think it's probably worth discussing this 
> > > > > design a bit more to save you time writing/rewriting things a bit. I 
> > > > > guess there are a few layers of failure here.
> > > > > 
> > > > > There's the possibility that the iteration itself could fail - even 
> > > > > for debug_loc style lists (if we reached the end of the section 
> > > > > before encountering a terminating {0,0}). That would suggest a 
> > > > > fallible iterator idiom: 
> > > > > http://llvm.org/docs/ProgrammersManual.html#building-fallible-iterators-and-iterator-ranges
> > > > > 
> > > > > But then, yes, when looking at the "processed"/semantic view, that 
> > > > > could fail too in the case of an invalid address index, etc.
> > > > > 
> > > > > The generic/processed/abstracted-over-ranges-and-rnglists API for 
> > > > > ranges produces a fully computer vector (& then returns 
> > > > > Expected<vector> of that range) - is that reasonable? (this does mean 
> > > > > manifesting a whole location in memory, which may not be needed so I 
> > > > > could understand avoiding that even without fully implementing & 
> > > > > demonstrating the vector solution is inadequate).
> > > > > 
> > > > > But I /think/ maybe the we could/should have two APIs - one generic 
> > > > > API that abstracts over loc/loclists and only provides the fully 
> > > > > processed view, and another that is type specific for dumping the 
> > > > > underlying representation (only used in dumping debug_loclists). 
> > > > If we were computing the final address ranges from scratch (which would 
> > > > be the best match for the current lldb usage, but which I am not 
> > > > considering now for fear of changing too many things), then I agree 
> > > > that we would need the fallible_iterator iterator thingy. But in this 
> > > > case we are "interpreting" the already parsed ranges, so we can assume 
> > > > some level of correctness here, and the thing that can fail is only the 
> > > > computation of a single range, which does not affect our ability to 
> > > > process the next entry.
> > > > This indicates to me that either each entry in the list should be an 
> > > > Expected<>, or that the invalid entries should be just dropped 
> > > > (possibly accompanied by some flag which would tell the caller that the 
> > > > result was not exhaustive).
> > > > 
> > > > This is connected to one of the issues I have with the debug ranges API 
> > > > -- it tries _really_ hard to return *something* -- if resolving the 
> > > > indirect base address entry fails, it is perfectly happy to use the 
> > > > address _index_ as the base address. This makes sense for dumping, 
> > > > where you want to show something (though it would still be good to 
> > > > indicate that you're not showing a real address), but it definitely 
> > > > does not help consumers which then need to make decisions based on the 
> > > > returned data.
> > > > 
> > > > Anyway, yes, I agree that we need to APIs, and probably callbacks are 
> > > > the easiest way to achieve that. We could have a "base" callback that 
> > > > is not particularly nice to use, but provides the full information via 
> > > > a combination of `UnparsedLL` and `Optional<Expected<ParsedLL>>` 
> > > > arguments. The dumper could use that to print out everything it needs. 
> > > > And then we could have a second API, built on top of the first one, 
> > > > which ignores base address entries and the raw data and returns just a 
> > > > bunch of `Expected<ParsedLL>`. This could be used by users like lldb, 
> > > > who just want to see the final data. The `ParsedLL` type would be 
> > > > independent of the location list type, so that the debug_loc parser 
> > > > could provide the same kind of API (but implemented on top of something 
> > > > else, as the `UnparsedLL` types would differ). Also, under the hood, 
> > > > the location list dumper for debug_loclists (but not debug_loc) could 
> > > > reuse some implementation details with the debug_rnglists dumper via a 
> > > > suitable combination of templates and callbacks.
> > > > 
> > > > How does that sound?
> > > What sort of things are you concerned about with deeper API changes here? 
> > > I think it's probably worth building the "right" thing now - as good a 
> > > time as any. LLVM's debug info APIs, as you've pointed out, aren't 
> > > exactly "sturdy" (treating address indexes as offsets, etc, etc), so no 
> > > time like the present to clean it up.
> > > 
> > > I think if we had an abstraction over v4 and v5 location descriptions, 
> > > parsing from scratch, fallible iterators, etc - that'd be the ideal thing 
> > > to use in the inline dumping code (that dumps inside debug_info) - which 
> > > currently uses "parseOneLocationList" - so it is parsing from scratch and 
> > > dumping. 
> > > 
> > > But equally I understand not wanting to make you/me/anyone fix everything 
> > > when just trying to get something actually done. 
> > I think I am mainly concerned with the scope explosion of doing changes 
> > like that. I don't know how well founded is that concern, because I don't 
> > know all the use cases for this information right now (but that's a part of 
> > the problem). But anyway, let's continue the discussion.
> > 
> > The main problem I see with "inline" dumping using some higher-level 
> > abstraction is what should one do in case of incomplete data (e.g. dwo 
> > files). If this abstraction returns "cooked" data, then the inline dump 
> > could not show anything for dwo files dumped standalone. This is perfectly 
> > fine for lldb and maybe other tools (which never look at dwo files 
> > separately, and mostly don't care about the reason why the "cooking" failed 
> > -- if they just need to get the data it can rely on), but for something 
> > like llvm-dwarfdump, we'd probably want to display _something_. I guess 
> > this is why the ranges api returns indexes as offsets, but I think we agree 
> > we don't want that.
> > 
> > Then the question is what should that "something" be? If it's going to be a 
> > raw dump of the location list entry, then solution would not be fully 
> > generic, as they raw entry type will depend on the location list kind. 
> > Though, we could still arrange it so that the various location lists can be 
> > processed uniformly via a template if they e.g. have a dump() function with 
> > the same signature...
> > 
> > Suppose we go down that path (this is the path I wanted to go down in my 
> > previous comment, modulo the parse-from-scratch part). The question then is 
> > what to do with the section-based dumping. Should it use the same 
> > mechanism? It probably should, because the first callback/iterator will 
> > provide all data it can possibly want. But then, should it also build the 
> > parsed representation like it does now?
> > If yes, then what for? Should we also build some mechanism to "cook" that 
> > data too.
> > If not, are we ok with having all users reparse from scratch always? (There 
> > are only two users I found right now, and they look like they'd be fine 
> > with it.)
> > Or should the DWARFDebugLoclists object cache the cooked data instead? 
> > llvm-dwarfdump --statistics would actually prefer that, but that might make 
> > the DWARF verifier sad (though I guess it could always reparse from scratch 
> > if needed).
> I'm very late to the discussion and I'm not as familiar as both of you with 
> the details and the API uses, so please ignore my suggestion if it doesn't 
> make any sense... 
> Could we have an API where we parse a *list* of Uncooked (to reuse Pavel's 
> nomenclature) and then have the ability to resolve each Uncooked entry into a 
> Cooked entry? Then both LLDB and dwarfdump could get the list of Uncooked 
> entries and try to get the cooked variant. If that works, great, we dump/use 
> that, if not we move on and have separate failure modes for errors 
> originating from parsing and from cooking. 
We've discussed this with David last week, and we have hopefully agreed on the 
rough direction forward (and I think it's going to roughly correspond to what 
you had in mind). I'm preparing a patch (first of many) to implement that and 
I'm hoping I'll be able to upload something today.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68270/new/

https://reviews.llvm.org/D68270



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D68270: DWARFDebugLoc: Add a function to get the address range of an entry

Reply via email to