Laszlo Gaal has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/22583 )

Change subject: IMPALA-13825: Extend Docker container build to custom base 
images
......................................................................

IMPALA-13825: Extend Docker container build to custom base images

Downstream system vendors, users and customers have lately expressed
interest in consuming Impala in containerized forms, taking advantage of
various specialized, hardened container base image offerings, like
container offerings based on the Wolfi project by Chainguard;
see: https://github.com/wolfi-dev.

This patch enables Impala container images to be built on top of custom
base images, and adds an implementation example that uses the publicly
available Wolfi base image.

Building a customized Docker image follows a hybrid approach. Instead of
replicating the complete Impala build process inside a Wolfi container
for a fully native binary build, it relies on an existing build platform
that is compatible with the binary packages available inside the custom
container image. For Wolfi the Impala binaries are supplied by the
Red Hat 9 build of Impala. This is made possible by the fact that major
library dependencies of Impala have the same versions on Wolfi OS and
Red Hat 9, so binaries built on Red Hat 9 can be run on Wolfi
with no changes.

The binaries produced by the regular build process are then installed
into a Docker image built on top of an explicitly specified custom base
image. The selection of a custom base image is controlled by two
environment variables:
- USE_CUSTOM_IMPALA_BASE_IMAGE (boolean):
  If set to 'true', triggers the use of  the custom image.
  When set to 'false' or left unspecified, the Docker base image is
  selected by the existing logic of matching the build platform's
  operating system.
- IMPALA_CUSTOM_DOCKER_BASE (string): specifies the URI of the base image

These environment variables can be overridden from the environment,
from impala-config-branch.sh, or impala-config-local.sh.
They are reported at the end of bin/impala-config.sh where important
environment variables are listed. They are also added to the list of
variables in bin/jenkins/dockerized-impala-preserve-vars.py to ensure
that they can be used in the context of Jenkins jobs as well.

The unified script that installs Impala's required dependencies into the
container image is extended for Wolfi to handle APK packages.

A new script is added to install Bash in the Docker image if it is
missing. Impala build scripts (including the scripts used during Docker
image builds) as well as container startup scripts require Bash,
but minimal container base images usually omit it, favoring a smaller
alternative.

To improve the debugging experience for a containerized Impala
minicluster, the minicluster starter script bin/start-impala-cluster.py
is extended with the following features:

- synchronizes every launched container's timezone to the host.
  This is needed for Iceberg time-travel test, which create timestamped
  Iceberg metadata items in the impalad context inside a container, but
  check creation/modification times of the same items in the test scripts
  running on the host, outside the containers. The tests scripts have
  the implicit expectation that the same local time is shared across
  all these contexts, but this is not necessarily true if the host,
  where tests are running is set to a timezone other than UTC.

  Time sycnhronization is achieved by injecting the TZ environment
  variable into the container, holding the name of the timezone used
  on the host. The timezone name is taken either from the host's TZ
  variable (if set), or from the host's /etc/localtime symlink,
  checking the name of the timezone file it points to.
  In case /etc/localtime is not a symlink (and TZ is not set on the
  host), the host's /etc/localtime file is mounted into the container.

- sets up a directory for each container to collect the Java VMs error
  files (hs_err_pidNNNN.log) from the containers.

- adds the --mount_sources command line parameter, which mounts the
  complete $IMPALA_HOME subtree into the container at
  /opt/impala/sources to make source code available inside the container
  for easier debugging.

Tested by running core-mode tests in the following environments:
- Regular run (impalad running natively on the platform) on Ubuntu 20.04
- Regular run on Rocky Linux 9.2
- Dockerised run (impalad instances running in their individual
  containers) using Ubuntu 20.04 containers
- Dockerised run (impalad instances running in their individual
  containers) using Rocky Linux 9.2 containers
- Dockerised run (impalad instances running in their individual
  containers) using Wolfi's wolfi-base containers

Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc
Reviewed-on: http://gerrit.cloudera.org:8080/22583
Reviewed-by: Laszlo Gaal <laszlo.g...@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringho...@cloudera.com>
Reviewed-by: Jason Fehr <jf...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
---
M bin/impala-config.sh
M bin/jenkins/dockerized-impala-bootstrap-and-test.sh
M bin/start-impala-cluster.py
M docker/CMakeLists.txt
M docker/daemon_entrypoint.sh
M docker/docker-build.sh
M docker/impala_base/Dockerfile
M docker/impala_profile_tool/Dockerfile
A docker/install_bash_if_needed.sh
M docker/install_os_packages.sh
M docker/setup_build_context.py
M tests/common/impala_connection.py
12 files changed, 298 insertions(+), 39 deletions(-)

Approvals:
  Laszlo Gaal: Looks good to me, but someone else must approve
  Csaba Ringhofer: Looks good to me, but someone else must approve
  Jason Fehr: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/22583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia5e39f399664fe66f3774caa316ed5d4df24befc
Gerrit-Change-Number: 22583
Gerrit-PatchSet: 7
Gerrit-Owner: Laszlo Gaal <laszlo.g...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Jason Fehr <jf...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <laszlo.g...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>

Reply via email to