mattcasters opened a new issue, #7372:
URL: https://github.com/apache/hop/issues/7372

   ### What needs to happen?
   
   # Investigation: Migrating Apache Hop to a Single Unified Classpath
   
   This document outlines the design, changes, and benefits of migrating Apache 
Hop from separate classpaths (and classloaders) per plugin to a single, unified 
classpath.
   
   ---
   
   ## 1. Executive Summary
   
   Currently, Apache Hop uses a modular classloading architecture where each 
plugin directory (under `plugins/`) has its own dependencies and is loaded 
using a separate instance of `HopURLClassLoader`. 
   
   While this provides dependency isolation between plugins, it introduces:
   * High classloader overhead and potential memory leaks (metaspace 
exhaustion).
   * Complexity in containerized environments (Docker) and distributed 
executors (Apache Spark, Apache Flink, Google Cloud Dataflow via the Apache 
Beam engine), where classloading isolation causes issues.
   * Difficulty in native compilation (GraalVM).
   
   ### Proposed Solution: Manifest-only Pathing Jar
   To move to a single classpath, we can pre-calculate the list of all jar 
files in the distribution (including core, beam, and all plugins) during the 
maven assembly build of `hop-client` and package them into a manifest-only jar: 
`lib/hop-classpath.jar`. 
   
   This jar contains only a `META-INF/MANIFEST.MF` file listing all individual 
jar relative paths under `lib/` and `plugins/` in its `Class-Path` header. This 
avoids OS-specific command-line length limits (particularly the 8,191-character 
limit on Windows) while achieving a single classpath.
   
   ---
   
   ## 2. Java Runtime Changes
   
   To support running on a single classpath, we can introduce a system 
property/environment flag, e.g., `HOP_SINGLE_CLASSPATH=Y`. When set, the 
`PluginRegistry` bypasses separate classloader creation and delegates all 
plugin loading to the system (App) classloader.
   
   ### Proposed Diff: `PluginRegistry.java`
   
   ```diff
   diff --git 
a/core/src/main/java/org/apache/hop/core/plugins/PluginRegistry.java 
b/core/src/main/java/org/apache/hop/core/plugins/PluginRegistry.java
   --- a/core/src/main/java/org/apache/hop/core/plugins/PluginRegistry.java
   +++ b/core/src/main/java/org/apache/hop/core/plugins/PluginRegistry.java
   @@ -867,6 +867,10 @@
      public ClassLoader getClassLoader(IPlugin plugin) throws 
HopPluginException {
    
        if (plugin == null) {
          throw new HopPluginException(
              BaseMessages.getString(
                  PKG, 
"PluginRegistry.RuntimeError.NoValidTransformOrPlugin.PLUGINREGISTRY001"));
        }
    
   +    // If single classpath mode is active, use the main system classloader 
for all plugins
   +    if (plugin.isNativePlugin() || 
"Y".equalsIgnoreCase(System.getProperty("HOP_SINGLE_CLASSPATH"))) {
   +      return this.getClass().getClassLoader();
   +    }
   +
        try {
   ```
   
   ### Jandex Plugin Scanning
   Because plugin jars are on the system classpath, `JarCache.getNativeJars()` 
will automatically discover all plugin `META-INF/jandex.idx` files and register 
them. The system property `HOP_SINGLE_CLASSPATH=Y` ensures that even when 
standard folders are scanned, they map back to the system classloader at 
runtime.
   
   ---
   
   ## 3. Maven Build & Pre-Calculation Changes
   
   To pre-calculate the list of jars during the client build process, we modify 
the build cycle of `assemblies/client`.
   
   ### Workflow
   1. **Unpack Dependencies**: Use `maven-dependency-plugin` to unpack all 
dependency zips (static, plugins, core, engine, etc.) into a staging directory 
`${project.build.directory}/stage/hop`.
   2. **Pre-calculate Classpath**: Execute a Groovy script using 
`groovy-maven-plugin` (or `gmavenplus-plugin`) during the `prepare-package` 
phase to scan `${project.build.directory}/stage/hop` for all jars, compute 
their relative paths (excluding platform-specific SWT jars), and write a 
manifest-only jar `lib/hop-classpath.jar`.
   3. **Assemble Zip**: Configure `maven-assembly-plugin` to pack files 
directly from the stage folder.
   
   ### Proposed Configuration in `assemblies/client/pom.xml`
   
   ```xml
   <build>
       <plugins>
           <!-- 1. Unpack all zip dependencies to a staging directory -->
           <plugin>
               <groupId>org.apache.maven.plugins</groupId>
               <artifactId>maven-dependency-plugin</artifactId>
               <executions>
                   <execution>
                       <id>unpack-dependencies</id>
                       <phase>prepare-package</phase>
                       <goals>
                           <goal>unpack-dependencies</goal>
                       </goals>
                       <configuration>
                           <includeTypes>zip</includeTypes>
                           
<outputDirectory>${project.build.directory}/stage/hop</outputDirectory>
                           <excludeTransitive>true</excludeTransitive>
                       </configuration>
                   </execution>
               </executions>
           </plugin>
   
           <!-- 2. Pre-calculate classpath and generate manifest-only jar -->
           <plugin>
               <groupId>org.codehaus.gmavenplus</groupId>
               <artifactId>gmavenplus-plugin</artifactId>
               <version>3.0.2</version>
               <executions>
                   <execution>
                       <phase>prepare-package</phase>
                       <goals>
                           <goal>execute</goal>
                       </goals>
                       <configuration>
                           <scripts>
                               <script><![CDATA[
                                   import java.util.jar.JarOutputStream
                                   import java.util.jar.Manifest
                                   import java.util.jar.Attributes
   
                                   File stageDir = new 
File(project.build.directory, "stage/hop")
                                   List<String> relativePaths = []
   
                                   // Scan staging directory recursively for 
jar files
                                   stageDir.eachFileRecurse { file ->
                                       if (file.name.endsWith(".jar") && 
!file.path.contains("/swt/")) {
                                           // Compute path relative to lib/ 
directory
                                           String relPath = 
stageDir.toPath().relativize(file.toPath()).toString().replace('\\', '/')
                                           if (relPath.startsWith("lib/")) {
                                               
relativePaths.add(relPath.substring(4))
                                           } else {
                                               relativePaths.add("../" + 
relPath)
                                           }
                                       }
                                   }
   
                                   // Build the manifest with the Class-Path 
attribute
                                   Manifest manifest = new Manifest()
                                   
manifest.mainAttributes.put(Attributes.Name.MANIFEST_VERSION, "1.0")
                                   
manifest.mainAttributes.put(Attributes.Name.CLASS_PATH, relativePaths.join(" "))
   
                                   // Write manifest-only pathing jar
                                   File classpathJar = new File(stageDir, 
"lib/hop-classpath.jar")
                                   classpathJar.parentFile.mkdirs()
                                   new JarOutputStream(new 
FileOutputStream(classpathJar), manifest).close()
                                   log.info("Generated pre-calculated classpath 
manifest jar: " + classpathJar)
                               ]]></script>
                           </scripts>
                       </configuration>
                   </execution>
               </executions>
           </plugin>
       </plugins>
   </build>
   ```
   
   ### Proposed Changes to `assemblies/client/src/assembly/assembly.xml`
   
   ```xml
       <fileSets>
           <!-- Package from the staging dir which now includes 
hop-classpath.jar -->
           <fileSet>
               <directory>${project.build.directory}/stage/hop</directory>
               <outputDirectory>.</outputDirectory>
           </fileSet>
       </fileSets>
   ```
   
   ---
   
   ## 4. Startup Script Changes
   
   In the shell and batch scripts, we replace the directory wildcards 
(`lib/core/*:lib/beam/*`) with `lib/hop-classpath.jar`, dynamically adding only 
the platform-dependent SWT library jar. We also pass the 
`-DHOP_SINGLE_CLASSPATH=Y` runtime option.
   
   ### A. Linux & OSX Scripts (`.sh` scripts: `hop-gui.sh`, `hop-run.sh`, etc.)
   
   ```diff
   -  if "${_HOP_JAVA}" -XshowSettings:properties -version 2>&1 | grep -q 
"os.arch = aarch64"; then
   -    CLASSPATH="lib/core/*:lib/beam/*:lib/swt/linux/arm64/*"
   -  else
   -    CLASSPATH="lib/core/*:lib/beam/*:lib/swt/linux/$(uname -m)/*"
   -  fi
   +  if "${_HOP_JAVA}" -XshowSettings:properties -version 2>&1 | grep -q 
"os.arch = aarch64"; then
   +    SWT_JAR="lib/swt/linux/arm64/swt.jar"
   +  else
   +    SWT_JAR="lib/swt/linux/$(uname -m)/swt.jar"
   +  fi
   +  CLASSPATH="lib/hop-classpath.jar:${SWT_JAR}"
   +  HOP_OPTIONS="${HOP_OPTIONS} -DHOP_SINGLE_CLASSPATH=Y"
   ```
   
   ### B. Windows Scripts (`.bat` scripts: `hop-gui.bat`, `hop-run.bat`, etc.)
   
   ```diff
   -set CLASSPATH=lib\core\*;lib\beam\*;lib\swt\win64\*
   +set CLASSPATH=lib\hop-classpath.jar;lib\swt\win64\swt.jar
   +set HOP_OPTIONS=%HOP_OPTIONS% -DHOP_SINGLE_CLASSPATH=Y
   ```
   
   ---
   
   ## 5. Benefits & Drawbacks
   
   ### Benefits
   * **Significantly Simplified Classloading**: Standard class resolution is 
handled directly by the JVM system classloader.
   * **Massive Reduction in Overhead**: Avoids spawning hundreds of 
`HopURLClassLoader` instances, reducing JVM Metaspace consumption and startup 
time.
   * **Distributed Compatibility**: Facilitates executing pipelines on external 
clusters (Spark/Flink) since the entire Hop execution engine + plugins can be 
shipped using regular classpaths/jars.
   * **Solves Windows Command Limit**: The manifest jar keeps the execution 
command short, preventing Windows 8,191-character limit crashes.
   
   ### Drawbacks & Mitigation
   * **Dependency Conflicts**: If two plugins include different versions of the 
same third-party library, classpath conflicts (NoSuchMethodError, etc.) may 
arise.
     * *Mitigation*: These conflicts are now surfaced at compile/build time 
when constructing the client rather than silently failing or causing 
classloader issues at runtime. Surfacing this enforces dependency alignment 
across the project.
   * **Dynamic Loading of JDBC Drivers**: The driver installer hot-loading 
logic (`DriverInstaller.hotLoad()`) expects `HopURLClassLoader` to inject 
downloaded JDBC drivers.
     * *Mitigation*: Users should add custom JDBC drivers to the shared folders 
(`HOP_SHARED_JDBC_FOLDERS`), or place them directly in the classpath via 
options.
   
   
   ### Issue Priority
   
   Priority: 3
   
   ### Issue Component
   
   Component: API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to