Laszlo Gaal has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/22583 )
Change subject: IMPALA-13825: Extend Docker container build to custom base images ...................................................................... IMPALA-13825: Extend Docker container build to custom base images Downstream system vendors, users and customers have lately expressed interest in consuming Impala in containerized forms, taking advantage of various specialized, hardened container base image offerings, like container offerings based on the Wolfi project by Chainguard; see: https://github.com/wolfi-dev. This patch enables Impala container images to be built on top of custom base images, and adds an implementation example that uses the publicly available Wolfi base image. Building a customized Docker image follows a hybrid approach. Instead of replicating the complete Impala build process inside a Wolfi container for a fully native binary build, it relies on an existing build platform that is compatible with the binary packages available inside the custom container image. For Wolfi the Impala binaries are supplied by the Red Hat 9 build of Impala. This is made possible by the fact that major library dependencies of Impala have the same versions on Wolfi OS and Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi with no changes. The binaries produced by the regular build process are then installed into a Docker image built on top of an explicitly specified custom base image. The selection of a custom base image is controlled by two environment variables: - USE_CUSTOM_IMPALA_BASE_IMAGE (boolean): If set to 'true', triggers the use of the custom image. When set to 'false' or left unspecified, the Docker base image is selected by the existing logic of matching the build platform's operating system. - IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image These environment variables can be overridden from the environment, from impala-config-branch.sh, or impala-config-local.sh. They are reported at the end of bin/impala-config.sh where important environment variables are listed. They are also added to the list of variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure that they can be used in the context of Jenkins jobs as well. The unified script that installs Impala's required dependencies into the container image is extended for Wolfi to handle APK packages. A new script is added to install Bash in the Docker image if it is missing. Impala build scripts (including the scripts used during Docker image builds) as well as container startup scripts require Bash, but minimal container base images usually omit it, favoring a smaller alternative. To improve the debugging experience for a containerized Impala minicluster, the minicluster starter script bin/start-impala-cluster.py is extended with the following features: - synchronizes every launched container's timezone to the host. This is needed for Iceberg time-travel test, which create timestamped Iceberg metadata items in the impalad context inside a container, but check creation/modification times of the same items in the test scripts running on the host, outside the containers. The tests scripts have the implicit expectation that the same local time is shared across all these contexts, but this is not necessarily true if the host, where tests are running is set to a timezone other than UTC. Time sycnhronization is achieved by injecting the TZ environment variable into the container, holding the name of the timezone used on the host. The timezone name is taken either from the host's TZ variable (if set), or from the host's /etc/localtime symlink, checking the name of the timezone file it points to. In case /etc/localtime is not a symlink (and TZ is not set on the host), the host's /etc/localtime file is mounted into the container. - sets up a directory for each container to collect the Java VMs error files (hs_err_pidNNNN.log) from the containers. - adds the --mount_sources command line parameter, which mounts the complete $IMPALA_HOME subtree into the container at /opt/impala/sources to make source code available inside the container for easier debugging. Tested by running core-mode tests in the following environments: - Regular run (impalad running natively on the platform) on Ubuntu 20.04 - Regular run on Rocky Linux 9.2 - Dockerised run (impalad instances running in their individual containers) using Ubuntu 20.04 containers - Dockerised run (impalad instances running in their individual containers) using Rocky Linux 9.2 containers - Dockerised run (impalad instances running in their individual containers) using Wolfi's wolfi-base containers Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc Reviewed-on: http://gerrit.cloudera.org:8080/22583 Reviewed-by: Laszlo Gaal <laszlo.g...@cloudera.com> Reviewed-by: Csaba Ringhofer <csringho...@cloudera.com> Reviewed-by: Jason Fehr <jf...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M bin/impala-config.sh M bin/jenkins/dockerized-impala-bootstrap-and-test.sh M bin/start-impala-cluster.py M docker/CMakeLists.txt M docker/daemon_entrypoint.sh M docker/docker-build.sh M docker/impala_base/Dockerfile M docker/impala_profile_tool/Dockerfile A docker/install_bash_if_needed.sh M docker/install_os_packages.sh M docker/setup_build_context.py M tests/common/impala_connection.py 12 files changed, 298 insertions(+), 39 deletions(-) Approvals: Laszlo Gaal: Looks good to me, but someone else must approve Csaba Ringhofer: Looks good to me, but someone else must approve Jason Fehr: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/22583 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc Gerrit-Change-Number: 22583 Gerrit-PatchSet: 7 Gerrit-Owner: Laszlo Gaal <laszlo.g...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com> Gerrit-Reviewer: Laszlo Gaal <laszlo.g...@cloudera.com> Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>