Yomanz opened a new pull request, #19364:
URL: https://github.com/apache/druid/pull/19364

   Fixes #18791.
   
   ### Description
   
   `distribution/docker/peon.sh` never sourced `$SERVICE_CONF_DIR/jvm.config`. 
As a result, JVM options declared in a `jvm.config` ConfigMap, which is 
documented at `docs/development/extensions-core/k8s-jobs.md` under the Custom 
Template Pod Adapter are silently ignored by peon containers. 
   Peons started with whatever `$JAVA_OPTS` the container happened to inherit 
and then go back to container-aware JDK defaults (`MaxRAMPercentage=25%`, 
`MaxDirectMemorySize=Xmx`). This causes `OutOfMemoryError` with a mysterious 
cause.
   
   The bug affects all three K8s pod adapters equally at the shell level. 
`peon.sh` is the entry point for every K8s-launched peon but is most impactful 
for `customTemplateAdapter` users.
   `K8sTaskAdapter` (used by `overlordSingleContainer` / 
`overlordMultiContainer`) additionally injects a `JAVA_OPTS` env var from 
`druid.indexer.runner.javaOptsArray`, so its users had a working configuration 
path even without `jvm.config` sourcing. `PodTemplateTaskAdapter#getEnv` does 
not inject `JAVA_OPTS`, leaving `customTemplateAdapter` users with only the 
documented-but-broken `jvm.config` path and the workaround of hardcoding 
`JAVA_OPTS` into the pod template's `env:` block.
   
   #### Fix: source jvm.config from peon.sh
   
   Mirror the behaviour that `distribution/docker/druid.sh` already has at line 
188:
   
   ```sh
   if [ -f "$SERVICE_CONF_DIR/jvm.config" ]; then
       JAVA_OPTS="$(cat $SERVICE_CONF_DIR/jvm.config | xargs) $JAVA_OPTS"
   fi
   ```
   
   The `-f` guard is the only deviation from `druid.sh`. It preserves existing 
behaviour for clusters that don't mount a `jvm.config` at all, avoiding a new 
`cat: No such file or directory` stderr line on every peon startup.
   Options in `JAVA_OPTS` take precedence over those in `jvm.config` under 
OpenJDK (same caveat as `druid.sh`), so adapters that inject `JAVA_OPTS` via 
`javaOptsArray` continue to win duplicates.
   
   #### Docs update
   
   Added a bit of info to `docs/development/extensions-core/k8s-jobs.md` under 
the Custom Template Pod Adapter section, making the mechanism explicit: 
`peon.sh` sources `jvm.config` from the directory mounted as 
`nodetype-config-volume`, prepends it to `JAVA_OPTS`, and `JAVA_OPTS` wins for 
duplicates. The existing example ConfigMap with `-Xmx1024M 
-XX:MaxDirectMemorySize=1000M` now behaves as documented.
   
   #### Test added
   
   Added `KubernetesPeonJvmConfigDockerTest` under `embedded-tests/.../k8s/`. 
It uses a new operator manifest variant that injects 
`-Ddruid.test.peon.jvmconfig.marker=true` into the cluster-level `jvm.options` 
(written by the Druid operator into each node's `jvm.config`), submits a 
long-running `NoopTask`, port-forwards to the resulting peon pod via the 
fabric8 client, and asserts the marker appears in the peon JVM's 
`/status/properties`. The test is disabled though, like the existing 
`KubernetesTaskRunnerDirectModeDockerTest` / 
`KubernetesTaskRunnerCachingModeDockerTest` pending resolution of the 
`charts.datainfra.io` availability issue (#19047).
   
   Small harness refactor to support this:
   - `BaseKubernetesTaskRunnerDockerTest` now exposes `getManifestTemplate()` 
as an overridable hook and promotes the local `k3sCluster` variable to a 
protected field so subclasses can reach into the cluster.
   - `K3sClusterResource` gains a public `getKubernetesClient()` getter 
returning the fabric8 client.
   
   #### Release note
   
   Fixed: JVM options declared in a `jvm.config` ConfigMap are now honoured by 
peons launched via the Kubernetes overlord extension (including when using the 
`customTemplateAdapter`). Previously, `peon.sh` silently ignored `jvm.config`, 
causing the peon JVM to fall back to container-aware JDK defaults. 
   
   <hr>
   
   ##### Key changed/added files in this PR
    * `distribution/docker/peon.sh`
    * `docs/development/extensions-core/k8s-jobs.md`
    * 
`embedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/KubernetesPeonJvmConfigDockerTest.java`
 (new)
    * 
`embedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/BaseKubernetesTaskRunnerDockerTest.java`
    * 
`embedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/K3sClusterResource.java`
    * 
`embedded-tests/src/test/resources/manifests/druid-service-with-operator-peonjvmconfig.yaml`
 (new)
   
   <hr>
   
   This PR has:
   
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [x] a release note entry in the PR description.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] added integration tests.
   - [x] been tested in a test Druid cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to