Re: [PATCH] GCOV: introduce --json-format.

David Malcolm Thu, 27 Sep 2018 06:56:00 -0700

On Thu, 2018-09-27 at 09:46 +0200, Martin Liška wrote:
> Hi.
> 
> For some time we've been providing an intermediate format as
> output of GCOV tool. It's documented in our code that primary
> consumer of it is lcov. Apparently that's not true:
> https://github.com/linux-test-project/lcov/issues/38#issuecomment-371
> 203750
> 
> So that I decided to come up with providing the very similar
> intermediate
> format in JSON format. It's much easier for consumers to work with.
> 
> I'm planning to leave the legacy format for GCC 9.1 and I'll document
> that
> it's deprecated. We can then remove that in next release.
> 
> The patch contains a small change to json.{h,cc}, hope David can
> approve that?
> Patch is tested on x86_64-linux-gnu.


I'm not officially a reviewer for the json stuff, but I can comment, at
least.  The changes to json.h/cc are fine by me, FWIW.

Some high-level observations:
* how big are the JSON files?  One of the comments at my Cauldron talk
on optimization records was a concern that the output could be very
large.  The JSON files compress really well, so maybe this patch should
gzip on output?  Though I don't know if it's appropriate for this case.
  iirc, gfortran's module code has an example of gzipping a
pretty_printer buffer.

* json::object::print doesn't preserve the insertion order of its
name/value pairs; they're written out in whatever order the hashing
leads to (maybe we should fix that though).  The top-level JSON value
in your file format is currently a JSON object containing "version"
metadata etc.  There's thus a chance that could be at the end, after
the data.  Perhaps the top-level JSON value should be an array instead
(as a "tuple"), to guarantee that the versioning metadata occurs at the
start?  Or are consumers assumed to read the whole file into memory and
traverse it, tree-like?

* Similar to the optimization records code, this patch is building a
tree of dynamically-allocated JSON objects in memory, and then printing
it to a pretty_printer, and flushing the pp's buffer to a FILE *.  This
is simple and flexible, but I suspect that I may need to rewrite this
for the optimization records case to avoid bloating the memory usage
(e.g. to eliminate the in-memory JSON tree in favor of printing as we
go). Would that concern apply here, or is the pattern OK?

* FWIW I'm working on DejaGnu test coverage for JSON output, but it's
ugly - a nasty shim between .exp and (optional) .py scripts for parsing
the JSON and verifying properties of it, sending the results back in a
form that DejaGnu can digest and integrate into the .sum/.log files.  I
hope to post it before stage 1 closes (I'd much prefer to have good
test coverage in place before optimizing how this stuff gets written)

[...snip...]
> diff --git a/gcc/gcov.c b/gcc/gcov.c
> index e255e4e3922..39d9329d6d0 100644
> --- a/gcc/gcov.c
> +++ b/gcc/gcov.c

[...snip...]

> @@ -1346,6 +1481,15 @@ generate_results (const char *file_name)
>       }
>      }
>  
> +  json::object *root = new json::object ();

[...snip...]
>  
> -  if (flag_gcov_file && flag_intermediate_format && !flag_use_stdout)
> +  if (flag_gcov_file && flag_json_format)
> +    root->dump (out);
> +

It looks like "root" gets leaked here (and thus all of the objects in
the JSON tree also).   I don't know if that's a problem, but an easy
fix would be to make "root" be an on-stack object, rather than
allocated on the heap - this ought to lead everything to be cleaned up
when root's dtor runs.

Dave

Re: [PATCH] GCOV: introduce --json-format.

Reply via email to