https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115496

            Bug ID: 115496
           Summary: RFE: new warning to detect suspicious multline string
                    literals
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: diagnostic
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dmalcolm at gcc dot gnu.org
  Target Milestone: ---

I find I often forget to add trailing newlines when composing multiline string
literals in C/C++.

For example:

const char *str
  = ("this is the first line\n"
     "this is the second line"
     "this is the third line\n");

where the supposed "second line" is missing a trailing "\n" and thus the string
is actually:

this is the first line
this is the second linethis is the third line

For a more insidious example, consider:
  https://fobes.dev/general/2024/02/29/inline-assembly-dangers.html

where the code had this string literal:

   asm volatile(
        "# Clear bss area"
        "la   $2, _fbss"
        "la   $3, _end"
        "1:"
        "sltu   $1, $2, $3"
        "beq   $1, $0, 2f"
        "nop"
        "sq   $0, ($2)"
        "addiu   $2, $2, 16"
        "j   1b"
        "nop"
        "2:"
        "                       \n"
        "# Save first argument  \n"
        "la     $2, %0 \n"
        "sw     $4, ($2)        \n"
        "                       \n"
        "# SetupThread          \n"
        "la     $4, _gp         \n"
        "la     $5, _stack      \n"
        "la     $6, _stack_size \n"
        "la     $7, %1          \n"
        "la     $8, ExitThread  \n"
        "move   $gp, $4         \n"
        "addiu  $3, $0, 60      \n"
        "syscall                \n"
        "move   $sp, $2         \n"
        "                       \n"
        "# Jump to _main        \n"
        "j      %2           \n"
        : /* No outputs. */
        : "R"(args_start), "R"(args), "Csy"(_main));

Note that some lines have trailing "\n", but others don't, so that this is
actually:

        "# Clear bss areala   $2, _fbssla   $3, _end1:sltu   $1, $2, $3beq  
$1, $0, 2fnopsq   $0, ($2)[...snip...]"
        "                       \n"
        "# Save first argument  \n"
        "la     $2, %0 \n"
        "sw     $4, ($2)        \n"
        "                       \n"
        [...snip...]

and thus the "comment to end of line" in the assembler was treating much more
that one might expect from a casual reading of the C source.

Proposal: a new warning that detects suspicious line continuations in multiline
strings.  I'm not sure precisely what the heuristics should be, but something
like:

- the string literal is on multiple source lines
- at least some of the lines have terminating "\n" characters
- some of the lines don't have terminating "\n" characters (warn for these) -
but probably don't worry about the final line
- probably some other heuristics

Perhaps: "-Wsuspicious-multiline-string-literal" or somesuch?

Reply via email to