On Thu, Jan 25, 2024 at 04:18:19AM -0800, Andi Kleen wrote:
> Some programing styles use a lot of inline assembler, and it is common
> to use very complex preprocessor macros to generate the assembler
> strings for the asm statements. In C++ there would be a typesafe alternative
> using templates and constexpr to generate the assembler strings, but
> unfortunately the asm statement requires plain string literals, so this
> doesn't work.
> 
> This patch modifies the C++ parser to accept strings generated by
> constexpr instead of just plain strings. This requires new syntax
> because e.g. asm("..." : "r" (expr)) would be ambigious with a function
> call. I chose () to make it unique. For example now you can write
> 
> constexpr const char *genasm() { return "insn"; }
> constexpr const char *genconstraint() { return "r"; }
> 
>       asm(genasm() :: (genconstraint()) (input));
> 
> The constexpr strings are allowed for the asm template, the
> constraints and the clobbers (every time current asm accepts a string)
> 
> The drawback of this scheme is that the constexpr doesn't have
> full control over the input/output/clobber lists, but that can be
> usually handled with a switch statement.  One could imagine
> more flexible ways to handle that, for example supporting constexpr
> vectors for the clobber list, or similar. But even without
> that it is already useful.
> 
> Bootstrapped and full test on x86_64-linux.
> ---
>  gcc/cp/parser.cc                       | 76 ++++++++++++++++++--------
>  gcc/doc/extend.texi                    | 17 +++++-
>  gcc/testsuite/g++.dg/constexpr-asm-1.C | 30 ++++++++++
>  3 files changed, 99 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/constexpr-asm-1.C
> 
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 3748ccd49ff3..cc323dc8557a 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -22654,6 +22654,43 @@ cp_parser_using_directive (cp_parser* parser)
>    cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
>  }
>  
> +/* Parse a string literal or constant expression yielding a string.
> +   The constant expression uses extra parens to avoid ambiguity with "x" 
> (expr).
> +
> +   asm-string-expr:
> +     string-literal
> +     ( constant-expr ) */
> +
> +static tree
> +cp_parser_asm_string_expression (cp_parser *parser)
> +{
> +  location_t sloc = cp_lexer_peek_token (parser->lexer)->location;
> +
> +  if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))

Why should it be wrapped in ()s?

> +    {
> +      matching_parens parens;
> +      parens.consume_open (parser);
> +      tree string = cp_parser_constant_expression (parser);
> +      if (string != error_mark_node)
> +     string = cxx_constant_value (string, tf_error);
> +      if (TREE_CODE (string) == NOP_EXPR)
> +     string = TREE_OPERAND (string, 0);
> +      if (TREE_CODE (string) == ADDR_EXPR && TREE_CODE (TREE_OPERAND 
> (string, 0)) == STRING_CST)

Too long line.

> +     string = TREE_OPERAND (string, 0);
> +      if (TREE_CODE (string) == VIEW_CONVERT_EXPR)
> +     string = TREE_OPERAND (string, 0);
> +      if (TREE_CODE (string) != STRING_CST && string != error_mark_node)
> +     {
> +       error_at (sloc, "Expected string valued constant expression for 
> %<asm%>, not type %qT",
> +                         TREE_TYPE (string));

Again, too long line, diagnostics should never start with a capital letter,
but more importantly, this will handle only a small subset of what one can
construct with constexpr functions, not everything they can return even if
they return const char * is necessarily a STRING_LITERAL, could be an array
of chars or something similar, especially if the intent is not just to
return prepared whole string literals, but construct the template etc. from
substrings.

Given the https://wg21.link/P2741R3 C++26 addition, I wonder if it wouldn't
be much better to stay compatible with the static_assert extension there and
use similar thing for inline asm.
See https://gcc.gnu.org/r14-5771 and r14-5956 follow-up for the actual
implementation.
One issue is the character set question.  The strings in inline asm right
now are untranslated, i.e. remain in SOURCE_CHARSET, usually UTF-8 (in
theory there is also UTF-EBCDIC support, but nobody knows if it actually
works), which is presumably what the assembler expects too.
Most of the string literals and character literals constexpr deals with
are in the execution character set though.  For static_assert we just assume
the user knows what he is doing when trying to emit non-ASCII characters in
the message when using say -fexec-charset=EBCDICUS , but should that be the
case for inline asm too?  Or should we try to translate those strings back
from execution character set to source character set?  Or require that it
actually constructs UTF-8 string literals and for the UTF-EBCDIC case
translate from UTF-8 to UTF-EBCDIC?  So the user constexpr functions then
would return u8"insn"; or construct from u8'i' etc. character literals...

In any case, as has been said earlier, this isn't stage4 material.

        Jakub

Reply via email to