+static void escape(llvm::raw_ostream &OS, StringRef String) { + for (auto C: String) { + if (strchr("\\\"", C)) + OS << '\\'; + OS << C; + } +}
Do we not have an existing function somewhere to do that? Also, that strchr call is a very complex way of writing (C == '\\' || C == '"') On Mon, Aug 24, 2015 at 2:01 PM, David Blaikie <dblai...@gmail.com> wrote: > > > On Mon, Aug 24, 2015 at 1:23 PM, Adrian Prantl <apra...@apple.com> wrote: > >> >> On Aug 19, 2015, at 1:20 PM, David Blaikie <dblai...@gmail.com> wrote: >> >> >> >> On Mon, Aug 10, 2015 at 5:00 PM, Adrian Prantl <apra...@apple.com> wrote: >> >>> >>> On Jul 24, 2015, at 12:33 PM, David Blaikie <dblai...@gmail.com> wrote: >>> >>> *reads back through the thread* >>> >>> >>> appreciated, it’s long :-) >>> >>> So what I originally had in mind about a year ago when we discussed >>> this, was that the module data could have an extra table from type hash to >>> whatever useful internal representation to find the type in the PCM. >>> >>> >>> It turned out that the most useful internal type representation to find >>> a type in a PCM is the type’s DeclContext+Name; this is how (surprise!) >>> clang looks up types in a PCM and the format is supposed to be fast for >>> these kind of lookups. >>> >> >> Still, I would imagine there would be some kind of direct access (the >> offset in the file, or somesuch) rather than actually having to go through >> hashtables, etc. No? (how does one module refer to types in another module? >> Really by name?) >> >> >> Entities in PCMs have local integer IDs (just consecutive numbers) that >> are used to encode references inside a record on disk. An external >> reference gets a global ID which is the local ID + the ID of the other >> module. This numbering scheme of course only makes sense within a module >> (and perhaps also within a chained PCH). Every (local and global) ID maps >> to an entry in the PCM’s identifier table. When deserializing, the >> in-memory global IdentifierTable is built and each module gets a map that >> remaps its internal global IDs to the “global” global IDs. >> >> For debug info these IDs are not very useful, because they are not >> resilient against even the smallest additive change to the module. We could >> add an integer attribute with the module-internal entity ID to the forward >> declaration that the debugger can use if the module hash and the CU’s dwoid >> are matching, but I’m not yet convinced that it would be worth it. Adding >> it to the definition of the type in the module dwarf seems not worth it >> because then we already have to do a similarly expensive lookup to find the >> module containing the definition. >> > > It seems that ideally a module-aware debugger would not consult the DWARF > at all, so I wouldn't advocate having the module format type ID in the > DWARF, but in a side table (either part of the module itself, or just a > section of its own in the module-as-object-file). > > Why is it important that the IDs be resilient to changes in the module? > I'm suggesting that non-module debug info should only use type hashes, then > within the module itself there would be a hash-to-ID mapping (a flat table, > probably, wouldn't need to be anything fancy). This would keep the DWARF > less polluted by module concepts - it'd just be up to the consumer "oh, > here's a type hash identifier, resolve that to an AST - either by looking > in DWARF and building an AST from the DWARF type unit with the matching > hash, or by looking in the module and finding the ID/loading the type"? > > >> >> >>> Everything else would just be DWARF with type units and fission (with >>> the slight wrinkle of type units that aren't resolvable within a single >>> object file - they could reference cross-object/dwo file) - emitting a >>> fission CU for each referenced module. >>> >>> Needing modules to disambiguate/avoid collisions/support non-odr >>> languages wasn't something I understood/had considered back then. That >>> explains the need to add module references to the CU, so the debugger can >>> know which modules to search for the types in (& doesn't just go loading >>> all of them, etc). >>> >>> I would still picture this as "normal type units + a table in the module >>> to resolve types", but if you guys particularly like using the mangled >>> string name (or other identifier) in the DWARF that may avoid the need for >>> an intermediate table (but it doesn't sound like you are avoiding an >>> intermediate table - you said something about having an >>> accelerator-table-like thing to aid in the DWARF->AST mapping? So could >>> that be key'd of the type hash/signature we're using, thus keeping the >>> DWARF more plain/vanilla DWARF5 (type units + fission)?) >>> >>> >>> I originally disliked type signatures and favored using mangled names >>> because the mangled names contained the DeclContext necessary to find types >>> in the PCM. But if we can squeeze the DeclContext somewhere else, that’s >>> fine. >>> >>> From the discussion we had on FlagExternalTypeRef I got the impression >>> that long-form forward declarations are starting to look more attractive: >>> If every external type reference is a reference to a forward declaration >>> that has a complete decl context, >>> >> >> While that's conveniently what we output currently, I'm not sure it's a >> great idea to rely on it. We might one day optimize type references (& >> we'll certainly need to optimize non-type references like member functions, >> etc - since emitting a stub for those would start to, more visibly, reduce >> the benefit of doing this work in the first place, I would imagine) so that >> when there's no contents (which will be more common once we can reference >> members directly with Bag O DWARF) we just have a DW_AT_type encoded as a >> DW_FORM_ref_sig8 directly. >> >> >> Emitting a ref_sig8 directly only works in C++, where the ODR guarantees >> that each signature is globally unique. >> > > An identifier will be needed for types without the ODR too - DWARF > suggests hashing the type description (the actual DIE hierarchy) the same > as is done for Fission. > > >> A consumer would have to search for the type definition in all modules >> (rather than finding it via the decl context), which is less efficient, but >> still works because of the ODR. Thus we’re only depending on the long-form >> forward declarations in non-ODR languages, where we need the decl context >> to disambiguate between non-unique signatures. >> > > We shouldn't use the mangled name as the hash for types that don't have an > ODR. The current type units implementation does not create type units for > non-ODR types. (the same way we only do type refs for ODR types, the > current type units implementation piggy backs on exactly the same property > and for the same reasons) > > Searching all .dwos for DWARF type units is an existing problem for pure > DWARF type units with Fission - it might be worth considering (talking to > the committee) what would be the right solution for DWARF type > units+Fission and then seeing if that expands to module debug info. > > >> >> >> with a DW_TAG_module at the root of the decl context chain, >>> >> >> This ^ is something I didn't have in mind and would complicate things >> somewhat. I'd really like to keep things as close to the existing standard >> type unit + split dwarf standards as possible except where necessary to do >> otherwise. >> >> >> In what way does it complicate things? >> > > Debuggers probably aren't setup to handle types inside modules in this way > (at least for C++) - I've never seen this construct & imagine debuggers > haven't either & may have some trouble with it (how to name it/make it > nameable by the user, etc - maybe they'd end up putting a module name > prefix on it or something that would be unhelpful?) > > I'd really like to stick to known/existing DWARF as much as possible & > consider very carefully anywhere we diverge from that. > > >> >> >> >>> and a DW_AT_name+DW_AT_signature at the other end, we would have all the >>> information we need without introducing any further LLVM-specific DWARF >>> extensions. To look up an external type from the PCM, the consumer imports >>> the DW_TAG_module and deserializes the type found by declcontext+name. To >>> load the type from DWARF, the consumer grabs the signature from the forward >>> declaration and magically (1) finds the PCM and looks up the type by >>> signature (2). >>> >>> (1) My suggestion is to extend LLVM so it can put the DW_TAG_module with >>> the forward declaration inside the skeleton compile unit (which has the >>> path to the PCM and its DWOid). >>> (2) On ELF with type units this works out of the box, >>> >> >> Not necessarily - the use of DW_TAG_modules in the scope chain might >> confuse/break things. It's pretty unprecedented/non-standard, I would think? >> >> >> It’s use for C++ is unprecedented, but the tag itself very standard. >> > > Sure - I get that the tag is standard, but what it means (& especially > what it means to have types inside it) in the context of C++ debugging is > something I'm fairly concerned about. > >> on MachO without type units we need some kind of index mapping signature >>> -> DIE (bag of DWARF style?). >>> >> >> I was rather hoping you guys would implement type units (since they'll be >> a step towards Bag O DWARF anyway) on MachO... - at least for this case. >> They wouldn't have to be COMDAt'd or each in their own section, they'd just >> be in a debug_types section one after the other in the module .o file. >> >> >> How does a consumer today find a type unit for a given signature? Does it >> build its own index based on the signatures of the COMDAT sections? DWP >> files define a .debug_tu_index accelerator section to this end, but how is >> this normally handled? >> > > .dwo files have no COMDAT sections - you just put all the type units in > them directly. > > In any case, that's still going to be a perf hit, because you'll at least > have to read some part of every .dwo file to see which types are in it. > > I would believe/assume/imagine this is a performance problem for Type > Units + Fission as they are defined today (as I've alluded to above) & may > be worth considering general DWARF solutions (talk with the committee sort > of stuff) that may generalize to module debug info too. It's also possible > I've missed something and there are existing solutions to the Type Units + > Fission performance problem. > > >> >> >> Assuming that many external types will share a similar DeclContext prefix >>> I am not very worried by the space needed to store the long forward >>> references. Compared to storing the mangled name for every type >>> >> >> What I was picturing wasn't to storet the mangled name anywhere - but to >> have, in the module object file somewhere a table from type hash to <useful >> way of accessing a type in a module, which I hope is just a byte offset or >> something similary cheap, small, and fast - not a table lookup, etc, but >> whatever the table lookup has as its values already>. >> >> >> With the decl context and the TAG_module on top a consumer knows which >> module to import (fast) and can do the import by name (hash lookup). With >> the table in the module object and no TAG_module in the decl context, it >> has to scan all modules for the type signature (multiple hash lookups) and >> can directly import the type (fast). >> >> -- adrian >> >> >> >>> it will often actually take up less space. Also, on Darwin at least, >>> llvm-dsymutil can strip them out of the way after resolving the external >>> type references. >>> >>> -- adrian >>> >>> >>> - Dave >>> >>> On Wed, May 6, 2015 at 3:43 PM, Adrian Prantl <apra...@apple.com> wrote: >>> >>>> >>>> On May 6, 2015, at 3:24 PM, David Blaikie <dblai...@gmail.com> wrote: >>>> >>>> >>>> >>>> On Wed, May 6, 2015 at 3:15 PM, Adrian Prantl <apra...@apple.com> >>>> wrote: >>>> >>>>> >>>>> On May 6, 2015, at 2:52 PM, David Blaikie <dblai...@gmail.com> wrote: >>>>> >>>>> >>>>> >>>>> On Wed, May 6, 2015 at 2:45 PM, Adrian Prantl <apra...@apple.com> >>>>> wrote: >>>>> >>>>>> >>>>>> On May 6, 2015, at 2:35 PM, Eric Christopher <echri...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> >>>>>> That said, add enough to the name for hashing purposes to make it >>>>>> hash uniquely? Or you can go down the path of hashing the type similar to >>>>>> the fission CU hashing (which is what type units were arguably designed >>>>>> to >>>>>> do in the first place if you take a look at the standard, we just only >>>>>> use >>>>>> them for ODR compliant languages etc right now). >>>>>> >>>>>> >>>>>> I suppose one could hash the entire module configuration + the >>>>>> mangled name and get something that is relatively stable. >>>>>> For implementation reasons it would be terrible to do the full >>>>>> fission hashing because that would mean that we would actually have to >>>>>> look >>>>>> up (and deserialize the type) in order to get to its ID when emitting an >>>>>> external type reference, which would void at least some of the >>>>>> performance >>>>>> gains we want from module debugging. >>>>>> >>>>> >>>>> I thought you were proposing using the mangled name of the type for >>>>> the identifier anyway? Perhaps I misunderstood - what are you proposing to >>>>> use? In any case, I'd prefer to see whatever it is hashed and used as the >>>>> type unit signature for compatibility with DWARF5, rather than adding an >>>>> extra/separate/new/non-standard way to do cross-unit/cross-fission type >>>>> references. >>>>> >>>>> >>>>> In the IR I’d /like/ to have a DIExternalTypeRef(DW_TAG_class_type, >>>>> !”_ZTC6TypeName”, !1) with !1 being a reference to either the DIModule or >>>>> the skeleton CU. Then the backend would emit the hash of the name if type >>>>> units are enabled (C++/gdb) or the mangled name (+ the accelerator table >>>>> entry) otherwise (ObjC and/or Darwin). If there is significant pushback to >>>>> the latter, I’d be willing to have the backend emit a hash in both cases >>>>> but we’d have to careful about what to exactly to hash for all the >>>>> aforementioned reasons. >>>>> >>>> >>>> I don't follow - if the mangled name is sufficient, then a hash of the >>>> mangled name should be.... what am I missing? >>>> >>>> Nothing, these are two separate issues: >>>> >>>> If you don't have an ODR to rely on, then a mangled name seems >>>> insufficient just as the hash would be. >>>> >>>> >>>> The decision for mangled name vs hash is motivated by the mangled name >>>> also doubling as a key to look up the type in the AST. >>>> The other problem is (partially) solved by the accelerator table entry >>>> that associates the mangled name with a module. I’m starting to think now >>>> that it might be better to include a fission-style forward declaration + >>>> decl context into the TAG_module instead. The DWARF-style decl context >>>> could in theory be smaller than the mangled name because two types could >>>> share common ancestors and then we could emit the same hash of the mangled >>>> names as we do for type units. But let’s discuss this when it comes up >>>> (together with the patch that makes use of DIExternalTypeRef). >>>> >>>> -- adrian >>>> >>> >> >
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits