Cole-Greer opened a new pull request, #3455: URL: https://github.com/apache/tinkerpop/pull/3455
## Summary This PR replaces TinkerPop's legacy shell/AWK documentation preprocessor + postprocessor pipeline with a Maven-based AsciidoctorJ extension (tools/tinkerpop-docs). The new extension walks each AsciiDoc book's AST, executes [gremlin-groovy] code blocks against a long-lived Gremlin Console subprocess, and renders the console output as tabbed, syntax-highlighted HTML — producing output structurally equivalent to the published 3.7.7-SNAPSHOT docs while being easier to maintain, test, and run. ## Motivation The old build was a fragile pipeline of bash + awk scripts under docs/preprocessor/ and docs/postprocessor/ that was hard to test, OS-sensitive (required GNU coreutils on macOS), silently swallowed Gremlin execution errors, and depended on a manually configured pseudo-distributed Hadoop cluster. The replacement is a single Maven module with unit tests, fail-fast error handling, and a local-filesystem Hadoop configuration that needs no daemons. ## What changed New AsciidoctorJ extension (tools/tinkerpop-docs) - GremlinTreeprocessor — AST walk, block execution, per-graph initialization, sugar-plugin handling, and multi-line statement grouping. - GremlinConsole — manages the bin/gremlin.sh subprocess, prompt-based output capture, and error-prompt detection. - TabbedHtmlBuilder / GremlinPostprocessor — tabbed HTML output, CodeRay syntax highlighting (via JRuby), callout/conum rendering, and version substitution. - ConsoleRestartHandler / PluginDirectoryRestartHandler — per-book plugin isolation (see below). - SPI registration + a docs-specific local-filesystem Hadoop config (hadoop-conf/core-site.xml). Orchestration — bin/process-docs.sh rewritten to validate the console/server distributions, install plugins, start a Gremlin Server and Gephi mock, and invoke Maven. Supports --dryRun (render without executing). Per-book plugin isolation — Neo4j 3.4 (Scala 2.11) and Spark (Scala 2.12) cannot share the console's flat classpath. A :gremlin-docs-plugins-exclude: section attribute drives a console restart with the conflicting plugin directories toggled aside, so both the Neo4j and Spark examples render correctly in the same run. Plugin dependencies are installed into ext/<plugin>/plugin/ (not the shared lib/) so they can be isolated, and the toggle is idempotent/resilient to interrupted builds. Docs source updates - Added :gremlin-docs-plugins-exclude: attributes to the neo4j, hadoop, spark, and gremlin-variants chapters. - Scoped the Hadoop hdfs.ls() examples to the copied graph file so rendered docs avoid listing the build machine's home directory. - Fixed an undefined-variable typo (marko → vMarko) and converted the Spark-on-YARN recipe to a static example (requires dependency on a live YARN cluster). - Rewrote the developer-doc "Documentation Environment" section to describe the new Maven/AsciidoctorJ build and removed the retired preprocessor references. Removed — the entire docs/preprocessor/ and docs/postprocessor/ script trees (15 files). ## Testing - 92 unit tests in tools/tinkerpop-docs (console I/O, treeprocessor, tabbed HTML, postprocessor, dry-run, plugin-directory toggling), plus an integration fixture exercising gremlin blocks, manual/standalone tabs, existing, errors, callouts, and version replacement. - Full bin/process-docs.sh build completes BUILD SUCCESS with execution errors fatal. - Output diffed against the published 3.7.7-SNAPSHOT docs across all 8 books: structural metrics (headings, listing blocks, tab sections, callouts) match within ~2%; zero stacktrace bloat; all differences attributable to intended source updates, the file:/// vs hdfs:// environment, or branch-vs-snapshot content drift. ## Tips for reviewers I've taken the liberty of redeploying the [3.7.7-SNAPSHOT docs](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/) from this branch. I would recommend focusing the review on evaluating the built docs. There are a few notable differences worth calling out: - The CSharp tabs now have functioning syntax highlighting ([as seen in the Basic Gremlin section of the reference docs](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/reference/#basic-gremlin)) - The [HDFS examples](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/reference/#_oltp_hadoop_gremlin) have replaces calls to `hdfs.ls()` with `hdfs.ls('tinkerpop-modern.kryo')`. This is a minor workaround as the docs build substitutes in the filesystem from the host machine instead of running a local hadoop cluster. This change is to avoid dumping existing contents of the hosts home directory. The old format could be restored by having the docs system internally manage a MiniDFSCluster. This is a viable fix but I've left it out of scope from this PR to limit complexity. - The [OLAP Spark YARN recipe](https://tinkerpop.apache.org/docs/3.7.7-SNAPSHOT/recipes/#olap-spark-yarn) has been converted to a static example, it is no longer executed during docs build. ## Future The goal of this work was to replace the old docs system with a goal of a 1:1 equivalency in docs output. I think this new extension gives us a better platform to build future enhancements on the docs. - For 3.8 and above, it becomes quite trivial to link the gremlin-lang translators into all of the `gremlin-groovy` examples, and automatically add tabs for all language variants (excluding groovy-specific examples) - There is some complexity in the system to load and unload console plugins depending on needs for each doc book (needed due to conflicting dependencies between spark and neo4j). This could be ripped out and simplified in master as neo4j and sparql plugins are no longer necessary. - I expect we can extend the new asciidoctor plugin to add new features to the docs, such as improved docs navigation and an integrated search capability. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
