This is a proof of concept series that formalizes the structure of trace2 event
output using JSON-Schema [1].

It provides a validator (written in Go) that verifies the events in a given
trace2 event output file match the schema. I am happy to rewrite this validator
in some other language, provided that the language has a JSON-Schema library
supporting at least draft-04.

It runs the validator as part of the CI suite (it increase the runtime
by about 15 minutes). It tests that the trace output of "make test"
conforms to the schema. Users of the trace2 event output can be
relatively confident that the output format has not changed so long as
the schema file remains the same and the regression test is passing.

I would appreciate any feedback on better ways to integrate the
validator into the CI suite.

I have not added support for standalone schema validators (as requested
in the discussion of V1 of this series) because the few that I tested on
my workstation ran for multiple hours (vs. 15 minutes for the validator
included in this series). If someone can suggest a performant standalone
validator, I will be happy to test that.

[1]: https://json-schema.org/

Changes since V2 of this series:
* corrected commit message regarding the different schema variations
* cleaned up the Makefile
* added comment noting that the validator expects JSON-Lines input
* added a --progress flag to the validator
* improved validation error output

Changes since V1 of this series:
* dropped the documenation fix, as it can be submitted separately from
  this series
* added JSON-array versions of the schema (currently unused)
* added the validation test to the CI suite

Josh Steadmon (3):
  trace2: Add a JSON schema for trace2 events
  trace2: add a schema validator for trace2 events
  ci: run trace2 schema validation in the CI suite

 ci/run-build-and-tests.sh                     |   6 +
 t/trace_schema_validator/.gitignore           |   1 +
 t/trace_schema_validator/Makefile             |  18 +
 t/trace_schema_validator/README               |  23 +
 t/trace_schema_validator/event_schema.json    | 398 ++++++++++++++
 t/trace_schema_validator/list_schema.json     | 401 ++++++++++++++
 .../strict_list_schema.json                   | 514 ++++++++++++++++++
 t/trace_schema_validator/strict_schema.json   | 511 +++++++++++++++++
 .../trace_schema_validator.go                 |  82 +++
 9 files changed, 1954 insertions(+)
 create mode 100644 t/trace_schema_validator/.gitignore
 create mode 100644 t/trace_schema_validator/Makefile
 create mode 100644 t/trace_schema_validator/README
 create mode 100644 t/trace_schema_validator/event_schema.json
 create mode 100644 t/trace_schema_validator/list_schema.json
 create mode 100644 t/trace_schema_validator/strict_list_schema.json
 create mode 100644 t/trace_schema_validator/strict_schema.json
 create mode 100644 t/trace_schema_validator/trace_schema_validator.go

Range-diff against v2:
1:  a949db776c ! 1:  d4e82796bc trace2: Add a JSON schema for trace2 events
    @@ Commit message
         objects. This can be used to add regression tests to verify that the
         event output format does not change unexpectedly.
     
    -    Two versions of the schema are provided:
    +    Four versions of the schema are provided:
         * event_schema.json is more permissive. It verifies that all expected
    -      fields are present in each trace event, but it allows traces to have
    +      fields are present in a trace event, but it allows traces to have
           unexpected additional fields. This allows the schema to be specified
           more concisely by factoring out the common fields into a reusable
           sub-schema.
         * strict_schema.json is more restrictive. It verifies that all expected
    -      fields are present and no unexpected fields are present in each trace
    +      fields are present and no unexpected fields are present in the trace
           event. Due to this additional restriction, the common fields cannot 
be
           factored out into a re-usable subschema (at least as-of draft-07) 
[2],
           and must be repeated for each event definition.
    +    * list_schema.json is like event_schema.json above, but validates a 
JSON
    +      array of trace events, rather than a single event.
    +    * strict_list_schema.json is like strict_schema.json above, but
    +      validates a JSON array of trace events, rather than a single event.
     
         [1]: https://json-schema.org/
         [2]: 
https://json-schema.org/understanding-json-schema/reference/combining.html#allof
2:  3fa4e9eef8 ! 2:  97cb6a3eb4 trace2: add a schema validator for trace2 events
    @@ t/trace_schema_validator/.gitignore (new)
     
      ## t/trace_schema_validator/Makefile (new) ##
     @@
    ++RM = rm -f
    ++PROGRAMS = trace_schema_validator
    ++GOCMD = go
    ++GOBUILD = $(GOCMD) build
    ++GOGET = $(GOCMD) get
    ++
     +.PHONY: fetch_deps clean
     +
    ++all: $(PROGRAMS)
    ++
     +trace_schema_validator: fetch_deps trace_schema_validator.go
    -+  go build
    ++  $(GOBUILD) -o trace_schema_validator
     +
     +fetch_deps:
    -+  go get github.com/xeipuuv/gojsonschema
    ++  $(GOGET) github.com/xeipuuv/gojsonschema
     +
     +clean:
    -+  rm -f trace_schema_validator
    ++  $(RM) $(PROGRAMS)
     
      ## t/trace_schema_validator/trace_schema_validator.go (new) ##
     @@
     +// trace_schema_validator validates individual lines of an input file 
against a
     +// provided JSON-Schema for git trace2 event output.
     +//
    ++// Note that this expects each object to validate to be on its own line 
in the
    ++// input file (AKA JSON-Lines format). This is what Git natively writes 
with
    ++// GIT_TRACE2_EVENT enabled.
    ++//
     +// Traces can be collected by setting the GIT_TRACE2_EVENT environment 
variable
     +// to an absolute path and running any Git command; traces will be 
appended to
     +// the file.
     +//
     +// Traces can then be verified like so:
     +//   trace_schema_validator \
    -+//     --trace2_event_file /path/to/trace/output \
    -+//     --schema_file /path/to/schema
    ++//     --trace2-event-file /path/to/trace/output \
    ++//     --schema-file /path/to/schema
     +package main
     +
     +import (
    @@ t/trace_schema_validator/trace_schema_validator.go (new)
     +)
     +
     +// Required flags
    -+var schemaFile = flag.String("schema_file", "", "JSON-Schema filename")
    -+var trace2EventFile = flag.String("trace2_event_file", "", "trace2 event 
filename")
    ++var schemaFile = flag.String("schema-file", "", "JSON-Schema filename")
    ++var trace2EventFile = flag.String("trace2-event-file", "", "trace2 event 
filename")
    ++var progress = flag.Int("progress", 0, "Print progress message each time 
we have validated this many lines. --progress=0 means no messages are printed")
     +
     +func main() {
     +  flag.Parse()
     +  if *schemaFile == "" || *trace2EventFile == "" {
    -+          log.Fatal("Both --schema_file and --trace2_event_file are 
required.")
    ++          log.Fatal("Both --schema-file and --trace2-event-file are 
required.")
     +  }
     +  schemaURI, err := filepath.Abs(*schemaFile)
     +  if err != nil {
    @@ t/trace_schema_validator/trace_schema_validator.go (new)
     +
     +  count := 0
     +  for ; scanner.Scan(); count++ {
    -+          if count%10000 == 0 {
    -+                  // Travis-CI expects regular output or it will time out.
    ++          if *progress != 0 && count%*progress == 0 {
     +                  log.Print("Validated items: ", count)
     +          }
     +          event := gojsonschema.NewStringLoader(scanner.Text())
    @@ t/trace_schema_validator/trace_schema_validator.go (new)
     +                  log.Fatal(err)
     +          }
     +          if !result.Valid() {
    -+                  log.Print("Trace event is invalid: ", scanner.Text())
    ++                  log.Printf("Trace event line %d is invalid: %s", 
count+1, scanner.Text())
     +                  for _, desc := range result.Errors() {
     +                          log.Print("- ", desc)
     +                  }
3:  acf3aebcaa ! 3:  a07458b2e4 ci: run trace2 schema validation in the CI suite
    @@ ci/run-build-and-tests.sh: then
        make test
     +  t/trace_schema_validator/trace_schema_validator \
     +          --trace2_event_file=${GIT_TRACE2_EVENT} \
    -+          --schema_file=t/trace_schema_validator/strict_schema.json
    ++          --schema_file=t/trace_schema_validator/strict_schema.json \
    ++          --progress=10000
      fi
      
      check_unignored_build_artifacts
-- 
2.22.0.709.g102302147b-goog

Reply via email to