Re: [PATCH] GCOV: introduce --json-format.

Martin Liška Mon, 01 Oct 2018 03:42:01 -0700

On 9/27/18 3:55 PM, David Malcolm wrote:
> On Thu, 2018-09-27 at 09:46 +0200, Martin Liška wrote:
>> Hi.
>>
>> For some time we've been providing an intermediate format as
>> output of GCOV tool. It's documented in our code that primary
>> consumer of it is lcov. Apparently that's not true:
>> https://github.com/linux-test-project/lcov/issues/38#issuecomment-371
>> 203750
>>
>> So that I decided to come up with providing the very similar
>> intermediate
>> format in JSON format. It's much easier for consumers to work with.
>>
>> I'm planning to leave the legacy format for GCC 9.1 and I'll document
>> that
>> it's deprecated. We can then remove that in next release.
>>
>> The patch contains a small change to json.{h,cc}, hope David can
>> approve that?
>> Patch is tested on x86_64-linux-gnu.
> 
> I'm not officially a reviewer for the json stuff, but I can comment, at
> least.  The changes to json.h/cc are fine by me, FWIW.


Hello.

Appreciate your feedback!

> 
> Some high-level observations:
> * how big are the JSON files?  One of the comments at my Cauldron talk
> on optimization records was a concern that the output could be very
> large.  The JSON files compress really well, so maybe this patch should
> gzip on output?  Though I don't know if it's appropriate for this case.
>   iirc, gfortran's module code has an example of gzipping a
> pretty_printer buffer.

Probably sounds reasonable, for tramp3d I get: 5.8M, where original
intermediate format is 1.5M big. Gzipped JSON format is 216K.
I'll update the patch to address that.

> 
> * json::object::print doesn't preserve the insertion order of its
> name/value pairs; they're written out in whatever order the hashing
> leads to (maybe we should fix that though).  The top-level JSON value
> in your file format is currently a JSON object containing "version"
> metadata etc.  There's thus a chance that could be at the end, after
> the data.  Perhaps the top-level JSON value should be an array instead
> (as a "tuple"), to guarantee that the versioning metadata occurs at the
> start?  Or are consumers assumed to read the whole file into memory and
> traverse it, tree-like?

Well, it's not nice to do list from something that is object. It's 
micro-optimization
in my opinion.

On the other hand I would preserve the order of keys as they were added into
an object. Moreover, we can have option for that:

https://docs.python.org/3/library/json.html#basic-usage

Search for sort_keys. Similarly I would appreciate 'indent' option, it's handy
for debugging.

> 
> * Similar to the optimization records code, this patch is building a
> tree of dynamically-allocated JSON objects in memory, and then printing
> it to a pretty_printer, and flushing the pp's buffer to a FILE *.  This
> is simple and flexible, but I suspect that I may need to rewrite this
> for the optimization records case to avoid bloating the memory usage
> (e.g. to eliminate the in-memory JSON tree in favor of printing as we
> go). Would that concern apply here, or is the pattern OK?

One should not bloat memory in GCOV I guess.

> 
> * FWIW I'm working on DejaGnu test coverage for JSON output, but it's
> ugly - a nasty shim between .exp and (optional) .py scripts for parsing
> the JSON and verifying properties of it, sending the results back in a
> form that DejaGnu can digest and integrate into the .sum/.log files.  I
> hope to post it before stage 1 closes (I'd much prefer to have good
> test coverage in place before optimizing how this stuff gets written)

Would be good, then I would definitely add a test for GCOV JSON format.

> 
> [...snip...]
>> diff --git a/gcc/gcov.c b/gcc/gcov.c
>> index e255e4e3922..39d9329d6d0 100644
>> --- a/gcc/gcov.c
>> +++ b/gcc/gcov.c
> 
> [...snip...]
> 
>> @@ -1346,6 +1481,15 @@ generate_results (const char *file_name)
>>      }
>>      }
>>  
>> +  json::object *root = new json::object ();
> 
> [...snip...]
>>  
>> -  if (flag_gcov_file && flag_intermediate_format && !flag_use_stdout)
>> +  if (flag_gcov_file && flag_json_format)
>> +    root->dump (out);
>> +
> 
> It looks like "root" gets leaked here (and thus all of the objects in
> the JSON tree also).   I don't know if that's a problem, but an easy
> fix would be to make "root" be an on-stack object, rather than
> allocated on the heap - this ought to lead everything to be cleaned up
> when root's dtor runs.

I've just tried valgrind and I can't see any leak in current implementation?

Martin

> 
> Dave
>

Re: [PATCH] GCOV: introduce --json-format.

Reply via email to