[PATCH] D65591: [AST] Add a flag indicating if any subexpression had errors

Richard Smith - zygoloid via Phabricator via cfe-commits Wed, 27 Nov 2019 11:52:22 -0800

rsmith added a comment.

In D65591#1625744 <https://reviews.llvm.org/D65591#1625744>, @aaron.ballman 
wrote:

> In D65591#1625228 <https://reviews.llvm.org/D65591#1625228>, @ilya-biryukov 
> wrote:
>
> > We can also assume they're cheap, use the visitor-based implementation and 
> > later switch if this turn out to be a problem.
> >  I think we should prefer the approach that guarantees the compiler is not 
> > getting slow ever rather than chase the slowness later.
>
>
> The problem is: those bits are not infinite and we only have a handful left 
> until bumping the allocation size; is this use case critical enough to use 
> one of those bits?

In the abstract, improving error recovery and toolability are among the more 
important things we could do, and it certainly seems worth spending a bit on 
every `Expr` on this to me, if there's no better alternative. (That is: I think 
this approach is OK if we don't find something better.)

It's not obvious to me whether recomputing this information by AST traversal 
would be fast enough: *if* we want to perform the "does this contain errors?" 
check from contexts we encounter frequently (eg, when deciding whether to 
produce warnings, whether we should skip certain semantic actions, and 
similar), we can't do that if it requires a traversal.

In D65591#1626969 <https://reviews.llvm.org/D65591#1626969>, @aaron.ballman 
wrote:

> In D65591#1626638 <https://reviews.llvm.org/D65591#1626638>, @ilya-biryukov 
> wrote:
>
> > Alternatively, we could keep an equivalent of `set<Expr*> InvalidExprs` in 
> > `ASTContext`. Gives the same computational complexity, but has a higher 
> > constant overhead factor.
> >  Does that look reasonable?
>
>
> Yup, that is also very reasonable.

I think this will be very expensive from a maintenance (and potentially 
performance) perspective: every constructor for every expression node will need 
to check whether its subexpressions are in the set, and if so, add themselves 
to the set.

As a variant on this, we could build a `map<Expr*, bool>` to track whether each 
expression is known to contain errors, or known to not contain errors, populate 
it lazily when queried, and shortcut the whole mechanism if we've never created 
an error node. That still seems a little expensive, but at least we'd only pay 
the cost for invalid programs. If we could produce a dense numbering of 
`Expr`s, this'd suddenly seem pretty good, though, and we know that `Expr`s are 
allocated on the `ASTContext`'s bump allocator, so maybe there's something we 
can do there...

Another option is to represent "contains an error" with a specific `Type`. 
This'd mean we'd not preserve type information in less-common cases such as:  
`f( T(some-erroneous-expression) )`, and wouldn't be able to diagnose if 
there's no function `f` that takes a `T`, but I think that's an acceptable 
loss. The downside is that we would need logic to create representations with 
the appropriate "contains an error" type throughout `Sema` -- everywhere where 
we create nodes with dependent types, plus some additional cases like cast 
expressions that are never type-dependent (but can contain errors).

Here's my current thinking:

- Using dependence to model "contains errors" is a mistake. We should stop 
doing that.
- We therefore need a different type to indicate "couldn't determine the type 
of this because it contains errors", and should add a new builtin type for that.
- In cases like `int( thing-containing-errors )`, it's probably completely fine 
that we can't determine whether the expression contains errors, and we 
shouldn't provide an interface to ask that question. (The current flag bit 
doesn't help with that anyway, since the errors could be nested indirectly, say 
inside a lambda, and we don't propagate the flag through that.)

So my proposal would be: stop trying to track whether expressions contain 
errors. Instead, add a builtin type to indicate whether the type of an 
expression couldn't be determined due to an error, and propagate that in the 
same way we propagate type-dependence. Would that address the use cases here?

> I'm not certain there are real problems there, but I am wondering whether 
> something like a BMI should include serialized error AST nodes or not. Would 
> a consumer of the module expect that? I suppose it could if we wanted it to.

We already do preserve invalid AST through AST files if we're asked to do so; 
this isn't new. (For example, the `Invalid` flag on `Decl`s is preserved.) This 
is important for some tooling scenarios: you want to be able to produce and use 
a precompiled preamble even if it contains errors.

> Ah, drat, I was hoping we had at least one test that already did this, but I 
> don't see one.

Maybe we should introduce a `__builtin_dump(expr)` function that dumps its 
operand at the point where semantic analysis is first done? That'd be useful 
for other things too (`#pragma clang __debug dump X` is a little unwieldy).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65591/new/

https://reviews.llvm.org/D65591

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D65591: [AST] Add a flag indicating if any subexpression had errors

Reply via email to