Hello,
I agree about most of what you wrote, but I still prefer the version
without __self__. I would be fine with __self__ though, as long
as we can avoid delayed parsing. (For those who care, it I may be
helpful to add a comment to the clang discussion.)
I just add some brief comments.
Am Donnerstag, dem 23.01.2025 um 15:27 +0100 schrieb Michael Matz:
> > Hello,
> >
> > On Wed, 22 Jan 2025, Martin Uecker wrote:
> >
> > > > > > > > > > If y is not a member it must be an expression, true. But
> > > > > > > > > > if it's
> > > > > > > > > > a member you don't know, it may be a designation or an
> > > > > > > > > > expression.
> > > > > > > >
> > > > > > > > In an initializer I know all the members.
> > > > > >
> > > > > > My sentence was ambiguous :-) Trying again: When it's a member,
> > > > > > and
> > > > > > you know it's a member, then you still don't know if it's going to
> > > > > > be
> > > > > > a designation or an expression. It can be both.
> > > >
> > > > I guess this depends on what you mean by "it can be". The rule would
> > > > simply be that it is not an expression.
> >
> > So, that then exactly introduces the notion of
> > expression-but-not-quite-expression that Joseph mentioned. Ala
> > '". identifier" is a primary expression, but only within counted_by'.
> > That's a major modification of the C grammar, regarding name lookup rules
> > and top-level non-terminals.
I can see why it could be seen in this way. But the designator
syntax could also be seen (more or less) as a tiny subset of
the expression syntax allowing only assignments. It is just
not currently expressed in this way in the grammer. So I see
this grammar issue more as a symptom of how it is currently
phrased in the standard (but see below).
My line of thinking is that designators solve one of the problems
we have now, distinguishing between member names and ordinary
identifiers in a context where the referenced object is implied:
enum { INDEX = 2 };
struct { int a[10]; } x = { .a[INDEX] = 1 };
where a is the member and INDEX a previously declared identifier.
Using __self__ would also
struct { int a[10]; } x = { __self__.a[INDEX] = 1 };
work here, but is more verbose.
> >
> > > > The rationale is the following:
> >
> > Sure, I see all that. In a recursive descent parser it can even be
> > trivially hacked upon (not so easily with a parser written in e.g. bison).
> > But it's IMHO bad language design.
> > Josephs initial idea of __self__, or
> > something along the line, would instead be a composable extension of
> > existing constructs: a simple new conditionally defined identifier that is
> > in no way special for the grammar, it fits just right in and everything
> > falls into place automatically.
I am ok with __self__. I just do not think it is needed and one then
has to explain to programmers why one has to use in some contexts
while it is not needed in other similar contexts (initializers)
> >
> > > > If it is inside the initializer of a structure and references a member
> > > > of the same structure, then it can not simultaneously be inside the
> > > > argument to a counted_by attribute used in the declaration of this
> > > > structure (which at this time has already been parsed completely). So
> > > > there is no reason to allow it be interpreted as an expression and the
> > > > rule I proposed would therefor simply state that it then is not an
> > > > expression.
> >
> > Yes, at _that_ place, but what about other places that accept expressions?
> > You basically introduce ".x as expression, except when (list of
> > exceptions)". Conditional syntax (in difference to conditional semantics)
> > is always a bad thing. Look at the c++ stmt/decl ambiguity requiring
> > exactly this infinite look-ahead delayed parsing I'm worried about.
I agree that infinite look-ahead and delayed parsing would be very
problematic, not just because of the burden to C parsers but also because
the rules are then semantically different to what we have elsewhere
in C and also very confusing.
To me
constexpr int max1 = 10;
constexpr int max2 = 10;
struct foo {
char (*buf)[max1] __counted_by(max2);
int max1;
int max2;
};
where one identifers refers to the member and the other to the ordinary
identifier is just a very obvious disaster which should be avoided.
(In C++ it is UB)
> >
> > > > struct {
> > > > int n;
> > > > int *buf [[counted_by(.n)]]; // this n is in a counted_by
> > > > } x = { .n }; // this n can not be in counted_by for the same struct
> > > >
> > > >
> > > > If we allowed this to be interpreted as an expression, then you could
> > > > use
> > > > it to reference a member during initialization, e.g.
> > > >
> > > > struct { int y; int x; } a = { .y = 1, .x = .y };
> > > >
> > > > but this would be another extension unrelated to counted_by, which I
> > > > did
> > > > not intend to suggest.
> >
> > Yes, that's what I was also getting at. I'm aware that you didn't want to
> > suggest that. But it is what you get when you allow ". ident" as primary
> > expression generally. After doing that, one then needs all kinds of
> > exceptions to that acceptance to not actually allow such self-references,
> > or whatever other issues may come up with dot-ident being a
> > primary-expression in random places.
One just needs to use the rule I proposed, but I assume you want
parsing not to be context-sensitive (or not more than it already is
in C). I think one could reformulate the grammar
to also handle designators as generic expressions and then have
constraints that restrict to its subset.
> >
> > > > There are other possibilities for disambiguation, we could also simply
> > > > state that in initializers at the syntatic position where a designator
> > > > is allowed, it is always a designator and not expression, and it then
> > > > does not reference a member of the structure being initialized, it is
> > > > an
> > > > error. Maybe this is even preferable.
> >
> > Perhaps. It does solve the designator-or-expression ambiguity. But you
> > still would have dot-ident as expression in other contexts, and you still
> > need to worry about what to do when they are not within counted_by. A
> > conditionally defined identifier like __self__ effectively solves this at
> > the name lookup level.
> >
> > One might say that there's no difference between
> > conditionally activating a grammar production like ".ident ->
> > primary-expr" and conditionally defining an identifier (and just letting
> > the existing "ident -> primary-expr" production do its thing). But that
> > would be wrong, it's a very big difference.
Why?
You could also unconditionally always allow .ident and later check
whether it made sense in the context it was used.
Even with __self__ you need to later check whether __self__.y actually
references a member of the struct.
> >
> > > > I would like to mention that what clang currently has in a prototype
> > > > uses a mechnanism called "delayed parsing" which is essentially
> > > > infinite
> > > > lookahead (in addition to being confusing and incoherent with C
> > > > language
> > > > rules). So IMHO we need something better.
> >
> > Definitily. The infinite look-ahead trial parsing necessary for C++ in
> > various places is terrible. If something requires more than two tokens in
> > C, then it's a mis-designed proposal.
> >
> > > > My proposal for the moment would be to only allow very restricted
> > > > syntactic forms, and not generic expressions, side stepping all these
> > > > issues.
> >
> > Restricting syntax will have it's own problems. If you really want only
> > restricted syntactical forms then you need grammar rules for all these
> > cases that you do want to allow. That will be trivial initially, but
> > inevitably someone will come along and wants to extend the acceptable
> > cases on the grounds of "you know, this is really an expression here,
> > please let me write '.x + 0'". Eventually you again end up with a set of
> > grammar productions that awfully look like assignment-expression, but not
> > quite.
Possibly, but it would simplify implementation for the moment and
it should be restricted anyhow, e.g. regarding side effects.
> >
> > What you ideally want is introducing all these concepts _without_ major
> > changes to syntax (at least not introducing ambiguities), and do the
> > context dependend things one has to do either at the name-lookup or
> > semantic level, which _are_ already context dependend. Something along
> > the lines of counted_by accepting general expressions, but then in the
> > constraints saying that it must be a self-referentional expression (for
> > lack of a better term, and to be precisely defined :) ).
That makes sense to me. This actually how would try to specify this
on the language spec, allowing general expressions which may include
.y syntax but then adding constraints for designatic initializer and
counted_by similar to how it is now done for constant expressions.
> >
> > (Or course I agree that at least initially the acceptable types of
> > expressions should be limited, for the reasons you already stated in
> > other mails).
Martin