[PR] Replace AWK/shell docs preprocessing with AsciidoctorJ extension [tinkerpop]

via GitHub Sat, 02 May 2026 22:37:25 -0700


Cole-Greer opened a new pull request, #3418:
URL: https://github.com/apache/tinkerpop/pull/3418


   ## Replace AWK/shell docs preprocessing with AsciidoctorJ extension
   
   ### Overview
   
   This PR introduces the `gremlin-docs` module — a custom AsciidoctorJ 
TreeProcessor extension that replaces the entire AWK/shell preprocessing 
pipeline used to build TinkerPop documentation. The old system consisted of 10 
AWK scripts, 5 shell scripts, and required a running Gremlin Server, Hadoop 
daemons, and a built Gremlin Console distribution. The new system runs entirely 
within the Maven build via an embedded `GremlinGroovyScriptEngine`.
   
   ### What it does
   
   - **Executes `[gremlin-groovy]` code blocks** in an embedded script engine 
during Asciidoctor rendering, producing live console output with `gremlin>` 
prompts and `==>` results
   - **Auto-generates language variant tabs** (Java, Python, JavaScript, C#, 
Go) for every translatable Gremlin example using the ANTLR-based 
`GremlinTranslator`
   - **Supports Hadoop/Spark examples** via local-mode Spark with a sandboxed 
filesystem for `hdfs` operations
   - **Handles standalone tab groups** (`[source,LANG,tab]` blocks) for 
manually-authored multi-language examples
   - **Detects console-only blocks** (`:remote`, `:>`, `:submit`) and renders 
them as static code
   - **Loads GremlinPlugin customizers** via SPI (hadoop-gremlin, 
spark-gremlin, gremlin-console) for imports, bindings, and utility functions 
like `describeGraph()`
   
   ### How to use
   
   ```bash
   bin/process-docs-new.sh              # full build with live gremlin execution
   bin/process-docs-new.sh --dry-run    # skip execution (fast, for layout 
checks)
   ```
   
   ### Changes
   
   - New `gremlin-docs/` module (not in the Maven reactor — built separately by 
the build script)
   - New `bin/process-docs-new.sh` build entry point
   - Root `pom.xml`: added `gremlin-docs` as `asciidoctor-maven-plugin` 
dependency, switched syntax highlighter from CodeRay to highlight.js 11.9.0, 
added `tabs-1` CSS rule
   - `docs/stylesheets/tinkerpop.css`: added `.tabs-1` CSS for single-tab blocks
   
   ### Known Issues
   
   **Callout numbers not rendering in code blocks.** The circled callout 
numbers (①②③) that appear inline in code examples are not visible. The `<b 
class="conum">` elements are present in the HTML but highlight.js destroys them 
when it processes the `<code>` block innerHTML. The callout explanation lists 
below the code blocks render correctly — only the inline markers are affected.
   
   **Path and other datatype formatting differs from console output.** The 
Gremlin Console applies custom string conversions for certain datatypes (e.g., 
`Path` renders as `[v[1],v[2],vadas]` in the console but `path[v[1], v[2], 
vadas]` in the embedded engine). This affects the displayed results but not 
their correctness. Other datatypes with console-specific formatting may also 
differ.
   
   **SPARQL examples fail (~18 errors).** The `sparql-gremlin` plugin's 
`SparqlTraversalSource` and `g.sparql()` calls fail because the SPARQL plugin 
is not fully loaded in the embedded engine. These blocks fall back to dry-run 
output showing `gremlin>` prompts without results.
   
   **Neo4j examples fail (~6 errors).** Blocks using `Neo4jGraph`, 
`graph.cypher()`, `vertex.addLabel()`, and other Neo4j-specific APIs fail 
because Neo4j is not on the classpath. These fall back to dry-run output.
   
   **Sugar plugin examples fail (~8 errors).** Groovy sugar syntax (`g.V.name`, 
`g.V[0..2]`, `g.V.outE.weight`) fails because the Sugar plugin's AST 
transformations are not active in the embedded engine.
   
   **Remote connection examples fail (~10 errors).** Blocks using 
`Cluster.open()`, `DriverRemoteConnection`, and 
`traversal().with('conf/remote-graph.properties')` fail because there is no 
running Gremlin Server. These are detected and rendered as static code when 
they contain `:remote`/`:>` commands, but some blocks use the driver API 
directly without console commands.
   
   **A few Hadoop `hdfs.head()` calls fail.** The `hdfs.head('output/~g')` and 
`hdfs.head('output', GryoInputFormat)` calls fail because the Spark computation 
output format differs slightly in local mode vs a real cluster.
   
   **No `x.y.z` version replacement in tab content.** The version placeholder 
`x.y.z` is replaced in the post-processing step via sed, but content inside 
tabs generated by the extension may not be caught if the placeholder appears in 
translated code.
   
   ### Not changed
   
   - The old `bin/process-docs.sh` and AWK pipeline are untouched — both 
systems can coexist
   - All AsciiDoc source files are unchanged
   - Document structure, sections, anchors, and static content are preserved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Replace AWK/shell docs preprocessing with AsciidoctorJ extension [tinkerpop]

Reply via email to