https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116724

--- Comment #5 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
(In reply to Hans-Peter Nilsson from comment #4)
> (In reply to David Malcolm from comment #1)
> 
> > Perhaps we should try to capture both the untranslated text and the
> > translated text?  SARIF has various abilities for handling translations.

To clarify, consider this hypothethical diagnostic:

  error_at (location,
              "missing %qs after %qs",
              "decl-name", "foo");
with a hypothetical "pig-latin" locale and translation (pig-latin.po); see
https://en.wikipedia.org/wiki/Pig_Latin

where
  "missing %qs after %qs"
has this translation in the .po file:
  "issingmay %qs afteray %qs"

The classic text output format might read:

  foo.c:42:11: erroray: issingmay `decl-name' afteray `foo'

and currently GCC's SARIF output would presumably capture the text of the
message with:

message: {"text": {"issingmay `decl-name' afteray `foo'"}}

i.e. currently GCC's SARIF output for a formatted string "bakes in" both
localization of the format string *and* param substitution.  

We could instead defer parameter substitution to the SARIF consumer via ยง3.11.5
"Messages with placeholders"
(https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141790716)
like this:

message: {"text": {"issingmay `{0}' afteray `{1}'",
          "arguments: ["decl-name", "foo"]}

and, with that, potentially capture the pre-translated message string *and* its
translation in the currently after .po file e.g.:
message: {id: "missing %qs after %qs",
          "arguments: ["decl-name", "foo"]}

with something like this: (see 3.11.7 "Message string lookup")

 "translations": [
    {                              # A toolComponent object.
      "language": "pig-latin",
      "contents": ["localizedData"],
      "globalMessageStrings": [
         {"missing %qs after %qs": {"text": "issingmay {0} afteray {1}"}}]}]

where we'd list the subset of format strings that got used by diagnostics in
the particular log, and their translations, with the caveats that:
- I'm not sure that that's how translations of strings are meant to be stored
(the SARIF spec's tutorial doesn't seem to cover translations yet)
- I'm using (abusing?) the string as its own "id"

If gettext supported it, could even try to capture translations from *all* .po
files.  But if needed that's probably much easier to handle via a
post-processing script.

> Works for me! The use-case I was thinking of, is for the SARIF output to be
> a nice containment of the non-source-code part of bug-reports: "instead of
> quoting stderr, use --diagnostics-format=sarif-file and send
> sourcename.sarif".  

Sounds like an interesting idea; can you open this as a separate RFE please?

> But, to fulfill that, more is needed, including the gcc
> arguments.  (Maybe that's all.)

I've added support for capturing the command-line arguments in GCC 15:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658206.html
though note that it's capturing the arguments as supplied by the driver to e.g.
cc1, as opposed to those that the user supplied to the driver.  

> I don't see that included, right?
> Sorry for the "creaturization request"!

Thanks for the feedback; hope the above makes sense.

Reply via email to