https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116724
--- Comment #5 from David Malcolm <dmalcolm at gcc dot gnu.org> --- (In reply to Hans-Peter Nilsson from comment #4) > (In reply to David Malcolm from comment #1) > > > Perhaps we should try to capture both the untranslated text and the > > translated text? SARIF has various abilities for handling translations. To clarify, consider this hypothethical diagnostic: error_at (location, "missing %qs after %qs", "decl-name", "foo"); with a hypothetical "pig-latin" locale and translation (pig-latin.po); see https://en.wikipedia.org/wiki/Pig_Latin where "missing %qs after %qs" has this translation in the .po file: "issingmay %qs afteray %qs" The classic text output format might read: foo.c:42:11: erroray: issingmay `decl-name' afteray `foo' and currently GCC's SARIF output would presumably capture the text of the message with: message: {"text": {"issingmay `decl-name' afteray `foo'"}} i.e. currently GCC's SARIF output for a formatted string "bakes in" both localization of the format string *and* param substitution. We could instead defer parameter substitution to the SARIF consumer via ยง3.11.5 "Messages with placeholders" (https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141790716) like this: message: {"text": {"issingmay `{0}' afteray `{1}'", "arguments: ["decl-name", "foo"]} and, with that, potentially capture the pre-translated message string *and* its translation in the currently after .po file e.g.: message: {id: "missing %qs after %qs", "arguments: ["decl-name", "foo"]} with something like this: (see 3.11.7 "Message string lookup") "translations": [ { # A toolComponent object. "language": "pig-latin", "contents": ["localizedData"], "globalMessageStrings": [ {"missing %qs after %qs": {"text": "issingmay {0} afteray {1}"}}]}] where we'd list the subset of format strings that got used by diagnostics in the particular log, and their translations, with the caveats that: - I'm not sure that that's how translations of strings are meant to be stored (the SARIF spec's tutorial doesn't seem to cover translations yet) - I'm using (abusing?) the string as its own "id" If gettext supported it, could even try to capture translations from *all* .po files. But if needed that's probably much easier to handle via a post-processing script. > Works for me! The use-case I was thinking of, is for the SARIF output to be > a nice containment of the non-source-code part of bug-reports: "instead of > quoting stderr, use --diagnostics-format=sarif-file and send > sourcename.sarif". Sounds like an interesting idea; can you open this as a separate RFE please? > But, to fulfill that, more is needed, including the gcc > arguments. (Maybe that's all.) I've added support for capturing the command-line arguments in GCC 15: https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658206.html though note that it's capturing the arguments as supplied by the driver to e.g. cc1, as opposed to those that the user supplied to the driver. > I don't see that included, right? > Sorry for the "creaturization request"! Thanks for the feedback; hope the above makes sense.