On Fri, Mar 24, 2023 at 9:04 PM David Malcolm via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> PR analyzer/109098 notes that the SARIF spec mandates that .sarif
> files are UTF-8 encoded, but -fdiagnostics-format=sarif-file naively
> assumes that the source files are UTF-8 encoded when quoting source
> artefacts in the .sarif output, which can lead to us writing out
> .sarif files with non-UTF-8 bytes in them (which break my reporting
> scripts).
>
> The root cause is that sarif_builder::maybe_make_artifact_content_object
> was using maybe_read_file to load the file content as bytes, and
> assuming they were UTF-8 encoded.
>
> This patch reworks both overloads of this function (one used for the
> whole file, the other for snippets of quoted lines) so that they go
> through input.cc's file cache, which attempts to decode the input files
> according to the input charset, and then encode as UTF-8.  They also
> check that the result actually is UTF-8, for cases where the input
> charset is missing, or incorrectly specified, and omit the quoted
> source for such awkward cases.
>
> Doing so fixes all of the cases I've encountered.
>
> The patch adds a new:
>   { dg-final { verify-sarif-file } }
> directive to all SARIF test cases in the test suite, which verifies
> that the output is UTF-8 encoded, and is valid JSON.  In particular
> it verifies that when we complain about encoding problems, the .sarif
> report we emit is itself correctly encoded.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Integration testing shows no regressions, and a fix for the case
> seen in haproxy-2.7.1.
> Pushed to trunk as r13-6861-gd495ea2b232f3e.

Hi David-

Regarding the patch series I had about _Pragma locations (most
recently https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609472.html
and https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html).
That one will need some work now in order to apply on top of these
changes to input.cc. Happy to do that, but I thought I better check in
first to see if you had any feedback please on the new approach to
input.cc that's in the v2 patch? Do you think it's a worthwhile
feature, or you'd rather I just drop it? Thanks!

-Lewis

Reply via email to