Improving data source documentation

Shane Curcuru Sat, 10 Jun 2017 15:38:14 -0700

We have a lot of different tools generating and using various
consolidated JSON files (and lib/whimsy) that are useful representations
of underlying ASF organizational data.  But... it's not clear what data
comes from where and what the formats expected are.


Many of the generation scripts include documentation in the code, but
this is suboptimal for other tool developers who want to figure out the
best place to find a list of X or the condensed source for Y.

What would be the best framework to document both the data sources and
formats, what specific ASF data they mirror, and how the various
lib/whimsy models expose this data?  Even when a lib/whimsy model
provides the data, some tool writers (and the projects.a.o website) will
still use the raw /public/*.json files.

A high-level overview is done, but it doesn't provide enough details to
allow new tool writers to figure out what to use without digging into
several different code files:

  https://whimsy.apache.org/test/dataflow.cgi

It would be great to expose the format (array of hashes of hashes,
whatever) for each JSON, along with the specific way the data is
collected from different sources in a way we can store the data with the
scripts, but expose them all in one place.  Is RDoc worth configuring,
or just build a simple source tree scanner for just a specific tag
within .rb files to pull out just "data format and sources description"?

For example, for my brand work, I need a list of the names of all
software product releases made by TLPs, including the name of the PMC
for each for sorting and linking - but it's not obvious which datafile
is easiest to get this from.

- Shane
-- 

- Shane
  https://www.apache.org/foundation/marks/resources

Improving data source documentation

Reply via email to