Re: Re: Canonical type nodes, or, comptypes considered harmful

Doug Gregor Wed, 08 Nov 2006 06:00:23 -0800

On 11/8/06, Mike Stump <[EMAIL PROTECTED]> wrote:

On Nov 7, 2006, at 7:13 PM, Doug Gregor wrote:
> Now, how much do we worry about the fact that we won't be printing
> typedef names


The only potential language gotcha I can think of is:

5 If the typedef declaration defines an unnamed  class  (or  enum),  the
   first  typedef-name  declared by the declaration to be that class
type
   (or enum type) is used to denote the class type  (or  enum  type)
for
   linkage purposes only (_basic.link_).  [Example:
   typedef struct { } *ps, S;      // S is the class name for linkage
purposes
     --end example]

we still gotta get that right.  I think this just changes the
underlying type and doesn't have any typedef bits to it.


Right. If we want to remember that the struct { } was originally
anonymous, we could just set a flag on the RECORD_TYPE.

As for warning/error messages, I'd hate to regress on that front.
Doug, as a heavy template user, I'm sure you've see plenty of warning/
error messages...  How important is the typedef name in your experience?


As Benjamin said, it's very important, and we're not doing so well on
that front now. std::string is a particularly nasty case, because we
tend to print out the full template name rather than the typedef name.
Users would love us if we got this right :)

I agree, it will complicate things if we want the typedef name
around.  I'm thinking of an inverse mapping that takes us from tuple
(context,type) to typedef name.  I want to boost the typedef name
out, kinda like a flyweight pattern, but inverted, instead of
boosting the invariant out, I want to raise the variant (the typedef
name) out of the type.  Imagine if we had a map (tuple(context, type)
-> typedef name, and when we wanted the possible typedef name, we
take the type and the context in which we got the type and look it up
in the map.  If not found, no typedef name.  If found, it is the
typedef name we're interested in.  The benefit of this scheme would
be that constant time type equality operators remain constant time.
Producing error messages is slow, but we don't care about that.


Interesting idea; the constant time comparison could be win against
the almost-linear time we get with a union-find data structure.
However, this approach could have some odd side effects when there are
multiple mappings within one context. For instance, we could have
something like:

 typedef int foo_t;
 typedef int bar_t;

 foo_t* x = strlen("oops");

The error message that pops out would likely reference "bar_t *"
rather than "foo_t *" or "int *", because bar_t was the last mapping
from "int" that would be in the context. The example above is silly,
but when we're instantiating a template, it's common to have lots of
typedef names that all instantiate to the same type.

This approach wouldn't help with the implementation of concepts,
because we need to be able to take two distinct types (say, template
type parameters T and U) and consider them equivalent in the type
system. We can't literally combine T and U into a single canonical
type node, because they start out as different types. Granted, we
could layer a union-find implementation (that better supports
concepts) on top of this approach.

For reference, here's the union-find implementation: we add a new kind
of _TYPE node that is just a new name for an existing type. This new
type node (let's call it TYPE_ALIAS_TYPE) only contains two things: a
name and a pointer to the underlying type. Now, we use TYPE_ALIAS_TYPE
in the AST, but whenever we need to do some kind of structural check
on the tree node, we need to follow the chain of TYPE_ALIAS_TYPEs to
get to the canonical type node, which I would call the
"representative" of the equivalence class of types. Any code that does
something like:

 type = TREE_TYPE (exp);
 if (TREE_CODE (type) == REFERENCE_TYPE)
   type = TREE_TYPE (type);

would need to be modified to get the representation of "type" before
looking at its TREE_CODE, e.g.,

 type = type_representative (TREE_TYPE (exp));
 if (TREE_CODE (type) == REFERENCE_TYPE)
   type = TREE_TYPE (type);

We could find all of these places by "poisoning" TREE_CODE for
TYPE_ALIAS_TYPE nodes, then patch up the compiler to make the
appropriate type_representative calls. We'd want to save the original
type for diagnostics.

An alternative to poisoning TREE_CODE would be to have TREE_TYPE do
the mapping itself and have another macro to access the original
(named) type:

 #define TREE_TYPE(NODE) type_representative ((NODE)->common.type)
 #define TREE_ORIGINAL_TYPE(NODE) ((NODE)->common.type)

There are other macros like TREE_TYPE that would also need this
change, but TREE_TYPE should cover most of it.

When implementing concepts, *every* _TYPE node can have a
representative that is different from itself.

> Since we know that type canonicalization is incremental, could we work
> toward type canonicalization in the GCC 4.3 time frame?

If by we you mean you, I don't see why that would be a bad
idea.  :-)  The risk is if one invests all this effort, and the win
turns out to be < 1% on real code and 10x on benchmark code, one
feels bad.


ConceptGCC has hit the point where compile times have gotten
prohibitive, and we need to illustrate that concepts can be compiled
efficiently. So, I'm stuck working toward type canonicalization in
ConceptGCC for performance reasons. Since type comparison is literally

50% of my compile time now, I'm sure to win. I just don't know if

what I come up with will be a win when concepts aren't present.

The big question is whether I end up doing this work in ConceptGCC
(only) or also in FSF GCC, and if anyone is willing to help with the
FSF GCC version.

 Cheers,
 Doug

Re: Re: Canonical type nodes, or, comptypes considered harmful

Reply via email to