Yomanz opened a new pull request, #19364:
URL: https://github.com/apache/druid/pull/19364
Fixes #18791.
### Description
`distribution/docker/peon.sh` never sourced `$SERVICE_CONF_DIR/jvm.config`.
As a result, JVM options declared in a `jvm.config` ConfigMap, which is
documented at `docs/development/extensions-core/k8s-jobs.md` under the Custom
Template Pod Adapter are silently ignored by peon containers.
Peons started with whatever `$JAVA_OPTS` the container happened to inherit
and then go back to container-aware JDK defaults (`MaxRAMPercentage=25%`,
`MaxDirectMemorySize=Xmx`). This causes `OutOfMemoryError` with a mysterious
cause.
The bug affects all three K8s pod adapters equally at the shell level.
`peon.sh` is the entry point for every K8s-launched peon but is most impactful
for `customTemplateAdapter` users.
`K8sTaskAdapter` (used by `overlordSingleContainer` /
`overlordMultiContainer`) additionally injects a `JAVA_OPTS` env var from
`druid.indexer.runner.javaOptsArray`, so its users had a working configuration
path even without `jvm.config` sourcing. `PodTemplateTaskAdapter#getEnv` does
not inject `JAVA_OPTS`, leaving `customTemplateAdapter` users with only the
documented-but-broken `jvm.config` path and the workaround of hardcoding
`JAVA_OPTS` into the pod template's `env:` block.
#### Fix: source jvm.config from peon.sh
Mirror the behaviour that `distribution/docker/druid.sh` already has at line
188:
```sh
if [ -f "$SERVICE_CONF_DIR/jvm.config" ]; then
JAVA_OPTS="$(cat $SERVICE_CONF_DIR/jvm.config | xargs) $JAVA_OPTS"
fi
```
The `-f` guard is the only deviation from `druid.sh`. It preserves existing
behaviour for clusters that don't mount a `jvm.config` at all, avoiding a new
`cat: No such file or directory` stderr line on every peon startup.
Options in `JAVA_OPTS` take precedence over those in `jvm.config` under
OpenJDK (same caveat as `druid.sh`), so adapters that inject `JAVA_OPTS` via
`javaOptsArray` continue to win duplicates.
#### Docs update
Added a bit of info to `docs/development/extensions-core/k8s-jobs.md` under
the Custom Template Pod Adapter section, making the mechanism explicit:
`peon.sh` sources `jvm.config` from the directory mounted as
`nodetype-config-volume`, prepends it to `JAVA_OPTS`, and `JAVA_OPTS` wins for
duplicates. The existing example ConfigMap with `-Xmx1024M
-XX:MaxDirectMemorySize=1000M` now behaves as documented.
#### Test added
Added `KubernetesPeonJvmConfigDockerTest` under `embedded-tests/.../k8s/`.
It uses a new operator manifest variant that injects
`-Ddruid.test.peon.jvmconfig.marker=true` into the cluster-level `jvm.options`
(written by the Druid operator into each node's `jvm.config`), submits a
long-running `NoopTask`, port-forwards to the resulting peon pod via the
fabric8 client, and asserts the marker appears in the peon JVM's
`/status/properties`. The test is disabled though, like the existing
`KubernetesTaskRunnerDirectModeDockerTest` /
`KubernetesTaskRunnerCachingModeDockerTest` pending resolution of the
`charts.datainfra.io` availability issue (#19047).
Small harness refactor to support this:
- `BaseKubernetesTaskRunnerDockerTest` now exposes `getManifestTemplate()`
as an overridable hook and promotes the local `k3sCluster` variable to a
protected field so subclasses can reach into the cluster.
- `K3sClusterResource` gains a public `getKubernetesClient()` getter
returning the fabric8 client.
#### Release note
Fixed: JVM options declared in a `jvm.config` ConfigMap are now honoured by
peons launched via the Kubernetes overlord extension (including when using the
`customTemplateAdapter`). Previously, `peon.sh` silently ignored `jvm.config`,
causing the peon JVM to fall back to container-aware JDK defaults.
<hr>
##### Key changed/added files in this PR
* `distribution/docker/peon.sh`
* `docs/development/extensions-core/k8s-jobs.md`
*
`embedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/KubernetesPeonJvmConfigDockerTest.java`
(new)
*
`embedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/BaseKubernetesTaskRunnerDockerTest.java`
*
`embedded-tests/src/test/java/org/apache/druid/testing/embedded/k8s/K3sClusterResource.java`
*
`embedded-tests/src/test/resources/manifests/druid-service-with-operator-peonjvmconfig.yaml`
(new)
<hr>
This PR has:
- [x] been self-reviewed.
- [x] added documentation for new or modified features or behaviors.
- [x] a release note entry in the PR description.
- [x] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [x] added integration tests.
- [x] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]