This is an automated email from the ASF dual-hosted git repository. Cole-Greer pushed a commit to branch docs-3.7 in repository https://gitbox.apache.org/repos/asf/tinkerpop.git
commit a5d924a5aaa673ba9cc3faf40798db5a1e416840 Author: Cole Greer <[email protected]> AuthorDate: Thu Jun 4 10:38:33 2026 -0700 Update developer docs for the AsciidoctorJ documentation build (tinkerpop-6jq.11) The old shell/AWK preprocessor and postprocessor directories have been removed, but the developer documentation still described that system. Rewrite the "Documentation Environment" section to describe the Maven-based AsciidoctorJ extension: it now states the build is Maven-driven, runs OLAP examples against the local filesystem (fs.defaultFS=file:///) so no Hadoop cluster is required, notes the Spark-on-YARN recipe is rendered from pre-captured output, and adds the prerequisite distribution build and --dryRun option. Drop the obsolete pseudo-distributed Hadoop / yarn-site / mapred-site instructions and the AWK/GNU-utils requirements. Point the OLAP jar-conflict note at the new per-book plugin exclusion mechanism, and update stale "preprocessor" wording in the committer docs. Assisted-by: Kiro:claude-opus-4.8 [kiro-cli] --- CHANGELOG.asciidoc | 1 + .../dev/developer/development-environment.asciidoc | 126 ++++++--------------- docs/src/dev/developer/for-committers.asciidoc | 6 +- 3 files changed, 39 insertions(+), 94 deletions(-) diff --git a/CHANGELOG.asciidoc b/CHANGELOG.asciidoc index 2ca89883f0..6dc2f95779 100644 --- a/CHANGELOG.asciidoc +++ b/CHANGELOG.asciidoc @@ -26,6 +26,7 @@ image::https://raw.githubusercontent.com/apache/tinkerpop/master/docs/static/ima === TinkerPop 3.7.7 (Release Date: NOT OFFICIALLY RELEASED YET) * Restart the documentation build's Gremlin Console with conflicting plugins excluded per-book (via the `gremlin-docs-plugins-exclude` attribute), so Neo4j (Scala 2.11) and Spark (Scala 2.12) no longer collide on a shared classpath. +* Updated the developer documentation to describe the Maven/AsciidoctorJ documentation build and removed references to the retired shell/AWK preprocessor. * Fixed conjoin has incorrect null handling. * Expanded `gremlin-python` CI matrix to test against Python 3.9, 3.10, 3.11, 3.12, and 3.13. diff --git a/docs/src/dev/developer/development-environment.asciidoc b/docs/src/dev/developer/development-environment.asciidoc index 5e667fe434..4fd92f9be1 100644 --- a/docs/src/dev/developer/development-environment.asciidoc +++ b/docs/src/dev/developer/development-environment.asciidoc @@ -122,95 +122,38 @@ an issue when working with SNAPSHOT dependencies. [[documentation-environment]] === Documentation Environment -The documentation generation process is not Maven-based and uses shell scripts to process the project's asciidoc. The -scripts should work on Mac and Linux. Javadocs should be built using Java 11. +The documentation generation process is Maven-based: an link:https://asciidoctor.org/[AsciidoctorJ] extension +(`tools/tinkerpop-docs`) walks each AsciiDoc book, executes the `[gremlin-groovy]` code blocks against a long-lived +Gremlin Console subprocess, and renders the console output as tabbed HTML. The orchestration script +`bin/process-docs.sh` wraps this: it validates the Gremlin Console and Gremlin Server distributions, installs the +required plugins into the console, starts a Gremlin Server (for the `:remote` examples) and a Gephi mock, then invokes +Maven to run the extension. Javadocs should be built using Java 11. + +NOTE: A previous implementation of this process was not Maven-based and instead relied on a pipeline of shell and AWK +scripts under `docs/preprocessor` and `docs/postprocessor`. Those scripts have been removed; the console session scope +also changed from per-file to per-book as a result (see <<docs-plugin-exclusions>>). + +The build runs Spark/Hadoop OLAP examples against the local filesystem (`fs.defaultFS=file:///`), so a running Hadoop +cluster is *not* required for an ordinary documentation build. The one exception is the +link:https://tinkerpop.apache.org/docs/x.y.z/recipes/#olap-spark-yarn[Spark-on-YARN recipe], whose example targets a +real YARN cluster and is therefore rendered from pre-captured output rather than executed live. `bin/process-docs.sh` +sets `HADOOP_GREMLIN_LIBS` for the console automatically. The YARN recipe text also references the `zip` program, so +install it if you do not already have it. + +Before generating documentation, build the Gremlin Console and Gremlin Server distributions that the process consumes +(include the Neo4j artifacts so the Neo4j examples can run): -TIP: We recommend performing documentation generation on Linux. For the scripts to work on Mac, you will need to -install GNU versions of the utility programs via `homebrew`, e.g.`grep`, `awk`, `sed`, `findutils`, and `diffutils`. - -To generate documentation, it is required that link:https://hadoop.apache.org[Hadoop 3.3.x] is running in -link:https://hadoop.apache.org/docs/r3.3.1/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation[pseudo-distributed] -mode. Be sure to set the `HADOOP_GREMLIN_LIBS` environment variable as described in the -link:https://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-gremlin[reference documentation]. It is also important -to set the `CLASSPATH` to point at the directory containing the Hadoop configuration files, like `mapred-site.xml`. - -The `/etc/hadoop/yarn-site.xml` file prefers this configuration over the one provided in the Hadoop documentation -referenced above: - -[source,xml] ----- -<configuration> - <property> - <name>yarn.nodemanager.aux-services</name> - <value>mapreduce_shuffle</value> - </property> - <property> - <name>yarn.nodemanager.vmem-check-enabled</name> - <value>false</value> - </property> - <property> - <name>yarn.nodemanager.vmem-pmem-ratio</name> - <value>4</value> - </property> -</configuration> ----- - -The `/etc/hadoop/mapred-site.xml` file prefers the following configuration: - -[source,xml] ----- -<configuration> - <property> - <name>mapreduce.framework.name</name> - <value>yarn</value> - </property> - <property> - <name>mapred.map.tasks</name> - <value>4</value> - </property> - <property> - <name>mapred.reduce.tasks</name> - <value>4</value> - </property> - <property> - <name>mapreduce.job.counters.limit</name> - <value>1000</value> - </property> - <property> - <name>mapreduce.jobtracker.address</name> - <value>localhost:9001</value> - </property> - <property> - <name>mapreduce.map.memory.mb</name> - <value>2048</value> - </property> - <property> - <name>mapreduce.reduce.memory.mb</name> - <value>4096</value> - </property> - <property> - <name>mapreduce.map.java.opts</name> - <value>-Xmx2048m</value> - </property> - <property> - <name>mapreduce.reduce.java.opts</name> - <value>-Xmx4096m</value> - </property> -</configuration> ----- - -Also note that link:http://www.grymoire.com/Unix/Awk.html[awk] version `4.0.1` is required for documentation generation. -The link:https://tinkerpop.apache.org/docs/x.y.z/recipes/#olap-spark-yarn[YARN recipe] also uses the `zip` program to -create an archive so that needs to be installed, too, if you don't have it already. - -The Hadoop 3.3.x installation instructions call for installing `pdsh` but installing that seems to cause permission -problems when executing `sbin/start-dfs.sh`. Skipping that prerequisite seems to solve the problem. +[source,text] +mvn clean install -pl :gremlin-console,:gremlin-server -am -DskipTests -DincludeNeo4j -Documentation can be generated locally with: +Documentation can then be generated locally with: [source,text] bin/process-docs.sh +A `--dryRun` option renders the books without starting a console or server and without executing any code blocks, +which is useful for quickly checking AsciiDoc/formatting changes. + Documentation is generated to the `target/docs` directory. It is also possible to generate documentation locally with Docker. `docker/build.sh -d`. @@ -219,14 +162,15 @@ failed`. It often helps in this case to delete the directories for the dependenc in the `.m2` (`~/.m2/`) and in the `grapes` (`~/.groovy/grapes/`) cache. E.g., if the error is about `asm#asm;3.2!asm.jar`, then remove the `asm/asm` sub directory in both directories. -NOTE: Unexpected failures with OLAP often point to a jar conflict that arises in scenarios where Hadoop or Spark -dependencies (or other dependencies for that matter) are modified and conflict. It is not picked up by the enforcer -plugin because the inconsistency arises through plugin installation in Gremlin Console at document generation time. -Making adjustments to the various paths by way of the `<manifestEntries>` on the jar given the functionality provided -by the `DependencyGrabber` class which allows you to manipulate (typically deleting conflicting files from `/lib` and -`/plugin`) plugin loading will usually resolve it, though it could also be a more general environmental problem with -Spark or Hadoop. The easiest way to see the error is to simply run the examples in the Gremlin Console which more -plainly displays the error than the failure of the documentation generation process. +NOTE: Unexpected failures with OLAP often point to a jar conflict that arises when Hadoop, Spark, or Neo4j +dependencies are modified and collide on the console's classpath. It is not picked up by the enforcer plugin because +the inconsistency arises through plugin installation in the Gremlin Console at document generation time. The most +common case -- Neo4j (Scala 2.11) and Spark (Scala 2.12) -- is handled by the per-book plugin exclusion mechanism +described in <<docs-plugin-exclusions>>. For other conflicts, the `<manifestEntries>` (`Gremlin-Plugin-Paths` / +`Gremlin-Lib-Paths`) on the plugin jar control how the `DependencyGrabber` lays jars out under `ext/<plugin>/plugin` +and `ext/<plugin>/lib`, which can be adjusted to resolve ordering or duplicate-jar problems. The easiest way to see +the underlying error is to run the offending example directly in the Gremlin Console, which displays it more plainly +than the documentation build does. [[docs-plugin-exclusions]] ==== Per-book Plugin Exclusions diff --git a/docs/src/dev/developer/for-committers.asciidoc b/docs/src/dev/developer/for-committers.asciidoc index 5d40a18757..3e03b9a1b1 100644 --- a/docs/src/dev/developer/for-committers.asciidoc +++ b/docs/src/dev/developer/for-committers.asciidoc @@ -873,9 +873,9 @@ of the Apache "Licensing How-to" for more information. The documentation for TinkerPop is stored in the git repository in `docs/src/` and are then split into several subdirectories, each representing a "book" (or its own publishable body of work). If a new AsciiDoc file is added to -a book, then it should also be included in the `index.asciidoc` file for that book, otherwise the preprocessor will -ignore it. Likewise, if a whole new book (subdirectory) is added, it must include an `index.asciidoc` file to be -recognized by the AsciiDoc preprocessor. +a book, then it should also be included in the `index.asciidoc` file for that book, otherwise the documentation build +will ignore it. Likewise, if a whole new book (subdirectory) is added, it must include an `index.asciidoc` file to be +recognized by the documentation build. Adding a book also requires a change to the root `pom.xml` file. Find the "asciidoc" Maven profile and add a new `<execution>` to the `asciidoctor-maven-plugin` configuration. For each book in `docs/src/`, there should be a
