mattcasters opened a new issue, #7372:
URL: https://github.com/apache/hop/issues/7372
### What needs to happen?
# Investigation: Migrating Apache Hop to a Single Unified Classpath
This document outlines the design, changes, and benefits of migrating Apache
Hop from separate classpaths (and classloaders) per plugin to a single, unified
classpath.
---
## 1. Executive Summary
Currently, Apache Hop uses a modular classloading architecture where each
plugin directory (under `plugins/`) has its own dependencies and is loaded
using a separate instance of `HopURLClassLoader`.
While this provides dependency isolation between plugins, it introduces:
* High classloader overhead and potential memory leaks (metaspace
exhaustion).
* Complexity in containerized environments (Docker) and distributed
executors (Apache Spark, Apache Flink, Google Cloud Dataflow via the Apache
Beam engine), where classloading isolation causes issues.
* Difficulty in native compilation (GraalVM).
### Proposed Solution: Manifest-only Pathing Jar
To move to a single classpath, we can pre-calculate the list of all jar
files in the distribution (including core, beam, and all plugins) during the
maven assembly build of `hop-client` and package them into a manifest-only jar:
`lib/hop-classpath.jar`.
This jar contains only a `META-INF/MANIFEST.MF` file listing all individual
jar relative paths under `lib/` and `plugins/` in its `Class-Path` header. This
avoids OS-specific command-line length limits (particularly the 8,191-character
limit on Windows) while achieving a single classpath.
---
## 2. Java Runtime Changes
To support running on a single classpath, we can introduce a system
property/environment flag, e.g., `HOP_SINGLE_CLASSPATH=Y`. When set, the
`PluginRegistry` bypasses separate classloader creation and delegates all
plugin loading to the system (App) classloader.
### Proposed Diff: `PluginRegistry.java`
```diff
diff --git
a/core/src/main/java/org/apache/hop/core/plugins/PluginRegistry.java
b/core/src/main/java/org/apache/hop/core/plugins/PluginRegistry.java
--- a/core/src/main/java/org/apache/hop/core/plugins/PluginRegistry.java
+++ b/core/src/main/java/org/apache/hop/core/plugins/PluginRegistry.java
@@ -867,6 +867,10 @@
public ClassLoader getClassLoader(IPlugin plugin) throws
HopPluginException {
if (plugin == null) {
throw new HopPluginException(
BaseMessages.getString(
PKG,
"PluginRegistry.RuntimeError.NoValidTransformOrPlugin.PLUGINREGISTRY001"));
}
+ // If single classpath mode is active, use the main system classloader
for all plugins
+ if (plugin.isNativePlugin() ||
"Y".equalsIgnoreCase(System.getProperty("HOP_SINGLE_CLASSPATH"))) {
+ return this.getClass().getClassLoader();
+ }
+
try {
```
### Jandex Plugin Scanning
Because plugin jars are on the system classpath, `JarCache.getNativeJars()`
will automatically discover all plugin `META-INF/jandex.idx` files and register
them. The system property `HOP_SINGLE_CLASSPATH=Y` ensures that even when
standard folders are scanned, they map back to the system classloader at
runtime.
---
## 3. Maven Build & Pre-Calculation Changes
To pre-calculate the list of jars during the client build process, we modify
the build cycle of `assemblies/client`.
### Workflow
1. **Unpack Dependencies**: Use `maven-dependency-plugin` to unpack all
dependency zips (static, plugins, core, engine, etc.) into a staging directory
`${project.build.directory}/stage/hop`.
2. **Pre-calculate Classpath**: Execute a Groovy script using
`groovy-maven-plugin` (or `gmavenplus-plugin`) during the `prepare-package`
phase to scan `${project.build.directory}/stage/hop` for all jars, compute
their relative paths (excluding platform-specific SWT jars), and write a
manifest-only jar `lib/hop-classpath.jar`.
3. **Assemble Zip**: Configure `maven-assembly-plugin` to pack files
directly from the stage folder.
### Proposed Configuration in `assemblies/client/pom.xml`
```xml
<build>
<plugins>
<!-- 1. Unpack all zip dependencies to a staging directory -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>unpack-dependencies</id>
<phase>prepare-package</phase>
<goals>
<goal>unpack-dependencies</goal>
</goals>
<configuration>
<includeTypes>zip</includeTypes>
<outputDirectory>${project.build.directory}/stage/hop</outputDirectory>
<excludeTransitive>true</excludeTransitive>
</configuration>
</execution>
</executions>
</plugin>
<!-- 2. Pre-calculate classpath and generate manifest-only jar -->
<plugin>
<groupId>org.codehaus.gmavenplus</groupId>
<artifactId>gmavenplus-plugin</artifactId>
<version>3.0.2</version>
<executions>
<execution>
<phase>prepare-package</phase>
<goals>
<goal>execute</goal>
</goals>
<configuration>
<scripts>
<script><![CDATA[
import java.util.jar.JarOutputStream
import java.util.jar.Manifest
import java.util.jar.Attributes
File stageDir = new
File(project.build.directory, "stage/hop")
List<String> relativePaths = []
// Scan staging directory recursively for
jar files
stageDir.eachFileRecurse { file ->
if (file.name.endsWith(".jar") &&
!file.path.contains("/swt/")) {
// Compute path relative to lib/
directory
String relPath =
stageDir.toPath().relativize(file.toPath()).toString().replace('\\', '/')
if (relPath.startsWith("lib/")) {
relativePaths.add(relPath.substring(4))
} else {
relativePaths.add("../" +
relPath)
}
}
}
// Build the manifest with the Class-Path
attribute
Manifest manifest = new Manifest()
manifest.mainAttributes.put(Attributes.Name.MANIFEST_VERSION, "1.0")
manifest.mainAttributes.put(Attributes.Name.CLASS_PATH, relativePaths.join(" "))
// Write manifest-only pathing jar
File classpathJar = new File(stageDir,
"lib/hop-classpath.jar")
classpathJar.parentFile.mkdirs()
new JarOutputStream(new
FileOutputStream(classpathJar), manifest).close()
log.info("Generated pre-calculated classpath
manifest jar: " + classpathJar)
]]></script>
</scripts>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
```
### Proposed Changes to `assemblies/client/src/assembly/assembly.xml`
```xml
<fileSets>
<!-- Package from the staging dir which now includes
hop-classpath.jar -->
<fileSet>
<directory>${project.build.directory}/stage/hop</directory>
<outputDirectory>.</outputDirectory>
</fileSet>
</fileSets>
```
---
## 4. Startup Script Changes
In the shell and batch scripts, we replace the directory wildcards
(`lib/core/*:lib/beam/*`) with `lib/hop-classpath.jar`, dynamically adding only
the platform-dependent SWT library jar. We also pass the
`-DHOP_SINGLE_CLASSPATH=Y` runtime option.
### A. Linux & OSX Scripts (`.sh` scripts: `hop-gui.sh`, `hop-run.sh`, etc.)
```diff
- if "${_HOP_JAVA}" -XshowSettings:properties -version 2>&1 | grep -q
"os.arch = aarch64"; then
- CLASSPATH="lib/core/*:lib/beam/*:lib/swt/linux/arm64/*"
- else
- CLASSPATH="lib/core/*:lib/beam/*:lib/swt/linux/$(uname -m)/*"
- fi
+ if "${_HOP_JAVA}" -XshowSettings:properties -version 2>&1 | grep -q
"os.arch = aarch64"; then
+ SWT_JAR="lib/swt/linux/arm64/swt.jar"
+ else
+ SWT_JAR="lib/swt/linux/$(uname -m)/swt.jar"
+ fi
+ CLASSPATH="lib/hop-classpath.jar:${SWT_JAR}"
+ HOP_OPTIONS="${HOP_OPTIONS} -DHOP_SINGLE_CLASSPATH=Y"
```
### B. Windows Scripts (`.bat` scripts: `hop-gui.bat`, `hop-run.bat`, etc.)
```diff
-set CLASSPATH=lib\core\*;lib\beam\*;lib\swt\win64\*
+set CLASSPATH=lib\hop-classpath.jar;lib\swt\win64\swt.jar
+set HOP_OPTIONS=%HOP_OPTIONS% -DHOP_SINGLE_CLASSPATH=Y
```
---
## 5. Benefits & Drawbacks
### Benefits
* **Significantly Simplified Classloading**: Standard class resolution is
handled directly by the JVM system classloader.
* **Massive Reduction in Overhead**: Avoids spawning hundreds of
`HopURLClassLoader` instances, reducing JVM Metaspace consumption and startup
time.
* **Distributed Compatibility**: Facilitates executing pipelines on external
clusters (Spark/Flink) since the entire Hop execution engine + plugins can be
shipped using regular classpaths/jars.
* **Solves Windows Command Limit**: The manifest jar keeps the execution
command short, preventing Windows 8,191-character limit crashes.
### Drawbacks & Mitigation
* **Dependency Conflicts**: If two plugins include different versions of the
same third-party library, classpath conflicts (NoSuchMethodError, etc.) may
arise.
* *Mitigation*: These conflicts are now surfaced at compile/build time
when constructing the client rather than silently failing or causing
classloader issues at runtime. Surfacing this enforces dependency alignment
across the project.
* **Dynamic Loading of JDBC Drivers**: The driver installer hot-loading
logic (`DriverInstaller.hotLoad()`) expects `HopURLClassLoader` to inject
downloaded JDBC drivers.
* *Mitigation*: Users should add custom JDBC drivers to the shared folders
(`HOP_SHARED_JDBC_FOLDERS`), or place them directly in the classpath via
options.
### Issue Priority
Priority: 3
### Issue Component
Component: API
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]