On 8/8/24 4:44 AM, Jakub Jelinek wrote:
This is an attempt to implement the https://wg21.link/p3034r1 paper,
but I'm afraid the wording in the paper is bad for multiple reasons.
I think I understand the intent, that the module name and partition
if any shouldn't come from macros so that they can be scanned for
without preprocessing, but on the other side doesn't want to disable
macro expansion in pp-module altogether, because e.g. the optional
attribute in module-declaration would be nice to come from macros
as which exact attribute is needed might need to be decided based on
preprocessor checks.
Yes.
The paper added https://eel.is/c++draft/cpp.module#2
which uses partly the wording from https://eel.is/c++draft/cpp.module#1
The first issue I see is that using that "defined as an object-like macro"
from there means IMHO something very different in those 2 paragraphs.
As per https://eel.is/c++draft/cpp.pre#7.sentence-1 preprocessing tokens
in preprocessing directives aren't subject to macro expansion unless
otherwise stated, and so the export and module tokens aren't expanded
and so the requirement that they aren't defined as an object-like macro
makes perfect sense. The problem with the new paragraph is that
https://eel.is/c++draft/cpp.module#3.sentence-1 says that the rest of
the tokens are macro expanded and after macro expansion none of the
tokens can be defined as an object-like macro, if they would be, they'd
be expanded to that.
I don't see the problem; the restrictions in paragraphs 1 and 2 seem to
refer to phase 3 pp-tokens, and could be checked prior to macro expansion.
So, I think either the wording needs to change
such that not all preprocessing tokens after module are macro expanded,
only those which are after the pp-module-name and if any pp-module-partition
tokens, or all tokens after module are macro expanded but none of the tokens in
pp-module-name and pp-module-partition if any must come from macro
expansion. The patch below implements it as if the former would be
specified (but see later), so essentially scans the preprocessing tokens
after module without expansion, if the first one is an identifier, it
disables expansion for it and then if followed by . or : expects another
such identifier (again with disabled expansion), but stops after second
: is seen.
Sounds good.
Second issue is that while the global-module-fragment start is fine, matches
the syntax of the new paragraph where the pp-tokens[opt] aren't present,
there is also private-module-fragment in the syntax where module is
followed by : private ; and in that case the colon doesn't match the
pp-module-name grammar and appears now to be invalid. I think the
https://eel.is/c++draft/cpp.module#2
paragraph needs to change so that it allows also that pp-tokens of
a pp-module may also be : pp-tokens[opt] (and in that case, I think
the colon shouldn't come from a macro and private and/or ; can).
It seems to me that pp-global-module-fragment and
pp-private-module-fragment are different nonterminals, they aren't
pp-module.
Third issue is that there are too many pp-tokens in
https://eel.is/c++draft/cpp.module , one is all the tokens between
module keyword and the semicolon and one is the optional extra tokens
after pp-module-partition (if any, if missing, after pp-module).
Perhaps introducing some other non-terminal would help talking about it?
So in "where the pp-tokens (if any) shall not begin with a ( preprocessing
token" it isn't obvious which pp-tokens it is talking about (my assumption
is the latter)
Agreed, I think it's clear that the "where" is referring to the
immediately preceding grammar.
But since pp-global-module-fragment has its own nonterminal, there's no
need for the first pp-tokens to be [opt], and we could just
pp-module:
export[opt] module pp-module-name pp-module-partition[opt]
pp-tokens[opt] ; new-line
and also whether ( can't appear there just before macro
expansion or also after expansion. The patch expects only before expansion,
Agreed, it's still talking about phase 3 pp-tokens. And the intent is
to avoid expanding the last identifier as a function-like macro.
so
#define F ();
export module foo F
would be valid during preprocessing but obviously invalid during
compilation, but
#define foo(n) n;
export module foo (3)
would be invalid already during preprocessing.
Yes.
The last issue applies only if the first issue is resolved to allow
expansion of tokens after : if first token,
If we see "module :" I think it's not a pp-module, so it doesn't get the
macro expansion from paragraph 3.
or after pp-module-partition
if present or after pp-module-name if present. When non-preprocessing
scanner sees
export module foo.bar:baz.qux;
it knows nothing can come from preprocessing macros and is ok, but if it
sees
export module foo.bar:baz qux
then it can't know whether it will be
export module foo.bar:baz;
or
export module foo.bar:baz [[]];
or
export module foo.bar:baz.freddy.garply;
because qux could be validly a macro, which expands to ; or [[]];
or .freddy.garply; etc. So, either the non-preprocessing scanner would
need to note it as possible export of foo.bar:baz* module partitions
and preprocess if it needs to know the details or just compile, or if that
is not ok, the wording would need to rule out that the expansion of (the
second) pp-tokens if any can't start with . or : (colon would be only
problematic if it isn't present in the tokens before it already).
So, if e.g. defining qux above to . whatever is invalid, then the scanner
can rely it sees the whole module name and partition.
This does seem like a hole in the wording. I think the clear intent is
that the name/partition must neither be macro-expanded nor come from
macro expansion.
The patch below implements what is above described as the first variant
of the first issue resolution, i.e. disables expansion of as many tokens
as could be in the valid module name and module partition syntax, but
as soon as it e.g. sees two adjacent identifiers, the second one can be
macro expanded. So, effectively:
#define SEMI ;
export module SEMI
used to be valid and isn't anymore,
#define FOO bar
export module FOO;
isn't valid,
#define COLON :
export module COLON private;
isn't valid,
#define BAR baz
export module foo.bar:baz.qux.BAR;
isn't valid,
Agreed.
but
#define BAZ .qux
export module foo BAZ;
or
#define QUX [[]]
export module foo QUX;
or
#define FREDDY :garply
export module foo FREDDY;
or
#define GARPLY private
module : GARPLY;
etc. is.
I think QUX is valid, but the others are intended to be invalid.
Do you agree with the above or does the wording look clear to you?
If you agree, shall I file an issue, or will you handle that?
I think the patch is at least a step in the direction of the paper's
intent, but perhaps not full. If we need to check for initial : or .
in the expansion of the first identifier after the module name or
module partition, not sure how it would be implemented, currently
the patch just lexes direct tokens and pushes them back as backups
with possibly NO_EXPAND added on those, but not sure if there is
an easy way to peek at the first token from expansion.
--- gcc/testsuite/g++.dg/modules/dir-only-4.C.jj 2020-12-22
23:50:17.057972516 +0100
+++ gcc/testsuite/g++.dg/modules/dir-only-4.C 2024-08-08 09:33:09.454522024
+0200
@@ -1,8 +1,8 @@
// { dg-additional-options {-fmodules-ts -fpreprocessed -fdirectives-only} }
-// { dg-module-cmi !foo }
+// { dg-module-cmi !baz }
module;
#define foo baz
-export module foo;
+export module baz;
class import {};
--- gcc/testsuite/g++.dg/modules/atom-preamble-2_a.C.jj 2020-12-22 23:50:17.055972539 +0100
+++ gcc/testsuite/g++.dg/modules/atom-preamble-2_a.C 2024-08-08
09:35:56.093364042 +0200
@@ -1,6 +1,6 @@
// { dg-additional-options "-fmodules-ts" }
#define malcolm kevin
-export module malcolm;
+export module kevin;
// { dg-module-cmi kevin }
export class X;
--- gcc/testsuite/g++.dg/modules/atom-preamble-4.C.jj 2020-12-22
23:50:17.055972539 +0100
+++ gcc/testsuite/g++.dg/modules/atom-preamble-4.C 2024-08-08
09:35:32.463670046 +0200
@@ -1,5 +1,5 @@
// { dg-additional-options "-fmodules-ts" }
-#define NAME(X) X;
+#define NAME(X) ;
-export module NAME(bob)
+export module bob NAME(bob)
It looks like these tests were specifically testing patterns that this
paper makes ill-formed, and so we should check for errors instead of
changing them to be well-formed.
Jason