This is an automated email from the ASF dual-hosted git repository.
kunwp1 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/texera.git
The following commit(s) were added to refs/heads/main by this push:
new a0a6008906 chore: Re-enable R support flags in Docker/CI and ensure
LargeBinary works on Kubernetes (#4168)
a0a6008906 is described below
commit a0a60089066e13ea4697d4467fdca31278b8370d
Author: Chris <[email protected]>
AuthorDate: Thu Jan 22 19:24:25 2026 -0800
chore: Re-enable R support flags in Docker/CI and ensure LargeBinary works
on Kubernetes (#4168)
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
2. Ensure you have added or run the appropriate tests for your PR
3. If the PR is work in progress, mark it a draft on GitHub.
4. Please write your PR title to summarize what this PR proposes, we
are following Conventional Commits style for PR titles as well.
5. Be sure to keep the PR description updated to reflect all changes.
-->
### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
3. If it is a refactoring, clarify what has been changed.
3. It would be helpful to include a before-and-after comparison using
screenshots or GIFs.
4. Please consider writing useful notes for better and faster reviews.
-->
Previously, #4124 removed the R support flags from the Dockerfiles and
CI workflow after the R UDF runtime logic was removed. With the runtime
now restored in #4164, this PR adds those flags back. In addition,
because the R UDF Plugin (https://github.com/Texera/texera-rudf)
includes LargeBinary support, this PR also includes Kubernetes-related
updates to ensure LargeBinary works correctly in a Kubernetes
environment.
### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes #1234`, `Resolves
#1234`
or `Closes #1234`. If it is only related, simply mention the issue
number.
2. If there is design documentation, please add the link.
3. If there is a discussion in the mailing list, please add the link.
-->
Discussion: #4155
PR: #4124, #4164
### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
1. Create a Kubernetes Environment
2. Tested with this workflow
[Test.json](https://github.com/user-attachments/files/24662090/Test.json)
#### Plugin Uninstalled Computing Unit
<img width="1116" height="957" alt="Screenshot 2026-01-21 at 2 43 33 PM"
src="https://github.com/user-attachments/assets/877b6dfc-1d74-44a4-923b-822ed7d508b3"
/>
#### Plugin Installed Computing Unit
<img width="1267" height="959" alt="Screenshot 2026-01-21 at 3 07 49 PM"
src="https://github.com/user-attachments/assets/91ff17ea-9b69-41e5-b643-640d9f3befdf"
/>
### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'.
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
No.
---
.github/workflows/build-and-push-images.yml | 13 +++++
bin/computing-unit-master.dockerfile | 53 +++++++++++++++++++-
bin/computing-unit-worker.dockerfile | 58 ++++++++++++++++++++--
bin/k8s/templates/external-names.yaml | 8 +++
...workflow-computing-unit-manager-deployment.yaml | 13 +++++
.../resource/ComputingUnitManagingResource.scala | 5 ++
6 files changed, 146 insertions(+), 4 deletions(-)
diff --git a/.github/workflows/build-and-push-images.yml
b/.github/workflows/build-and-push-images.yml
index 478ce28ad2..c7fa0f9d04 100644
--- a/.github/workflows/build-and-push-images.yml
+++ b/.github/workflows/build-and-push-images.yml
@@ -49,6 +49,11 @@ on:
- both
- amd64
- arm64
+ with_r_support:
+ description: 'Enable R support for workflow-execution-coordinator'
+ required: false
+ default: false
+ type: boolean
schedule:
# Run nightly at 2:00 AM UTC
- cron: '0 2 * * *'
@@ -70,6 +75,7 @@ jobs:
docker_registry: ${{ steps.set-params.outputs.docker_registry }}
services: ${{ steps.set-params.outputs.services }}
platforms: ${{ steps.set-params.outputs.platforms }}
+ with_r_support: ${{ steps.set-params.outputs.with_r_support }}
steps:
- name: Set build parameters
id: set-params
@@ -82,6 +88,7 @@ jobs:
echo "docker_registry=apache" >> $GITHUB_OUTPUT
echo "services=*" >> $GITHUB_OUTPUT
echo "platforms=both" >> $GITHUB_OUTPUT
+ echo "with_r_support=false" >> $GITHUB_OUTPUT
else
echo "Manual workflow_dispatch - using user inputs"
echo "branch=${{ github.event.inputs.branch || 'main' }}" >>
$GITHUB_OUTPUT
@@ -89,6 +96,7 @@ jobs:
echo "docker_registry=${{ github.event.inputs.docker_registry ||
'apache' }}" >> $GITHUB_OUTPUT
echo "services=${{ github.event.inputs.services || '*' }}" >>
$GITHUB_OUTPUT
echo "platforms=${{ github.event.inputs.platforms || 'both' }}" >>
$GITHUB_OUTPUT
+ echo "with_r_support=${{ github.event.inputs.with_r_support ||
'false' }}" >> $GITHUB_OUTPUT
fi
# Step 1: Generate JOOQ code once and share it
@@ -350,6 +358,8 @@ jobs:
tags: ${{ env.DOCKER_REGISTRY }}/${{ matrix.image_name }}:${{
needs.set-parameters.outputs.image_tag }}-amd64
cache-from: type=gha,scope=${{ matrix.image_name }}-amd64
cache-to: type=gha,mode=max,scope=${{ matrix.image_name }}-amd64
+ build-args: |
+ ${{ (matrix.service == 'computing-unit-master' || matrix.service
== 'computing-unit-worker') && needs.set-parameters.outputs.with_r_support ==
'true' && 'WITH_R_SUPPORT=true' || '' }}
labels: |
org.opencontainers.image.title=${{ matrix.image_name }}
org.opencontainers.image.description=Apache Texera ${{
matrix.image_name }} (AMD64)
@@ -427,6 +437,8 @@ jobs:
tags: ${{ env.DOCKER_REGISTRY }}/${{ matrix.image_name }}:${{
needs.set-parameters.outputs.image_tag }}-arm64
cache-from: type=gha,scope=${{ matrix.image_name }}-arm64
cache-to: type=gha,mode=max,scope=${{ matrix.image_name }}-arm64
+ build-args: |
+ ${{ (matrix.service == 'computing-unit-master' || matrix.service
== 'computing-unit-worker') && needs.set-parameters.outputs.with_r_support ==
'true' && 'WITH_R_SUPPORT=true' || '' }}
labels: |
org.opencontainers.image.title=${{ matrix.image_name }}
org.opencontainers.image.description=Apache Texera ${{
matrix.image_name }} (ARM64)
@@ -487,6 +499,7 @@ jobs:
echo "- **Tag:** \`${{ needs.set-parameters.outputs.image_tag }}\`"
>> $GITHUB_STEP_SUMMARY
echo "- **Services:** ${{ needs.set-parameters.outputs.services }}"
>> $GITHUB_STEP_SUMMARY
echo "- **Platforms:** ${{ needs.set-parameters.outputs.platforms
}}" >> $GITHUB_STEP_SUMMARY
+ echo "- **R Support:** ${{
needs.set-parameters.outputs.with_r_support }}" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
echo "## Build Method" >> $GITHUB_STEP_SUMMARY
echo "**Parallel platform builds** (faster)" >> $GITHUB_STEP_SUMMARY
diff --git a/bin/computing-unit-master.dockerfile
b/bin/computing-unit-master.dockerfile
index eeb60c1390..89f61f17fc 100644
--- a/bin/computing-unit-master.dockerfile
+++ b/bin/computing-unit-master.dockerfile
@@ -43,25 +43,76 @@ RUN unzip amber/target/universal/amber-*.zip -d
amber/target/
FROM eclipse-temurin:11-jdk-jammy AS runtime
+# Build argument to enable/disable R support (default: false)
+ARG WITH_R_SUPPORT=false
+
WORKDIR /texera/amber
COPY --from=build /texera/amber/requirements.txt /tmp/requirements.txt
COPY --from=build /texera/amber/operator-requirements.txt
/tmp/operator-requirements.txt
-# Install Python runtime dependencies
+# Install Python runtime dependencies (always) and R runtime dependencies
(conditional)
RUN apt-get update && apt-get install -y \
python3-pip \
python3-dev \
libpq-dev \
curl \
unzip \
+ gnupg \
+ software-properties-common \
+ dirmngr \
+ git \
+ $(if [ "$WITH_R_SUPPORT" = "true" ]; then echo "\
+ gfortran \
+ libxml2-dev \
+ libssl-dev \
+ libcurl4-openssl-dev"; fi) \
&& apt-get clean
+# Install R from CRAN repository (pre-built, much faster than source
compilation)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+ # Add CRAN GPG key and repository
+ curl -fsSL
https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \
+ gpg --dearmor -o /usr/share/keyrings/cran-ubuntu-keyring.gpg && \
+ echo "deb [signed-by=/usr/share/keyrings/cran-ubuntu-keyring.gpg]
https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/" | \
+ tee /etc/apt/sources.list.d/cran.list && \
+ apt-get update && \
+ apt-get install -y r-base r-base-dev && \
+ R --version; \
+ fi
+
# Install Python packages
RUN pip3 install --upgrade pip setuptools wheel && \
pip3 install -r /tmp/requirements.txt && \
pip3 install -r /tmp/operator-requirements.txt
+# Install texera-rudf and its dependencies (conditional)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+ pip3 install git+https://github.com/Texera/texera-rudf.git; \
+ fi
+
+# Install R packages with pinned versions for texera-rudf (conditional)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+ Rscript -e "options(repos = c(CRAN = 'https://cran.r-project.org')); \
+ if (!requireNamespace('remotes', quietly=TRUE)) \
+ install.packages('remotes', Ncpus =
parallel::detectCores()); \
+ remotes::install_version('arrow', version='14.0.2.1', \
+ repos='https://cran.r-project.org', upgrade='never', \
+ Ncpus = parallel::detectCores()); \
+ remotes::install_version('coro', version='1.1.0', \
+ repos='https://cran.r-project.org', upgrade='never', \
+ Ncpus = parallel::detectCores()); \
+ remotes::install_version('aws.s3', version='0.3.22', \
+ repos='https://cran.r-project.org', upgrade='never', \
+ Ncpus = parallel::detectCores()); \
+ cat('R package versions:\n'); \
+ cat(' arrow: ', as.character(packageVersion('arrow')),
'\n'); \
+ cat(' coro: ', as.character(packageVersion('coro')),
'\n'); \
+ cat(' aws.s3: ', as.character(packageVersion('aws.s3')),
'\n')"; \
+ fi
+
+ENV LD_LIBRARY_PATH=/usr/lib/R/lib:$LD_LIBRARY_PATH
+
# Copy the built texera binary from the build phase
COPY --from=build /texera/.git /texera/amber/.git
COPY --from=build /texera/amber/target/amber-* /texera/amber/
diff --git a/bin/computing-unit-worker.dockerfile
b/bin/computing-unit-worker.dockerfile
index 6cf00719ff..ef8a57b617 100644
--- a/bin/computing-unit-worker.dockerfile
+++ b/bin/computing-unit-worker.dockerfile
@@ -43,24 +43,76 @@ RUN unzip amber/target/universal/amber-*.zip -d
amber/target/
FROM eclipse-temurin:11-jre-jammy AS runtime
+# Build argument to enable/disable R support (default: false)
+ARG WITH_R_SUPPORT=false
+
WORKDIR /texera/amber
COPY --from=build /texera/amber/requirements.txt /tmp/requirements.txt
COPY --from=build /texera/amber/operator-requirements.txt
/tmp/operator-requirements.txt
-# Install Python runtime dependencies
+# Install Python runtime dependencies (always) and R runtime dependencies
(conditional)
RUN apt-get update && apt-get install -y \
python3-pip \
python3-dev \
libpq-dev \
+ curl \
+ gnupg \
+ software-properties-common \
+ dirmngr \
+ git \
+ $(if [ "$WITH_R_SUPPORT" = "true" ]; then echo "\
+ gfortran \
+ libxml2-dev \
+ libssl-dev \
+ libcurl4-openssl-dev"; fi) \
&& apt-get clean
+# Install R from CRAN repository (pre-built, much faster than source
compilation)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+ # Add CRAN GPG key and repository
+ curl -fsSL
https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \
+ gpg --dearmor -o /usr/share/keyrings/cran-ubuntu-keyring.gpg && \
+ echo "deb [signed-by=/usr/share/keyrings/cran-ubuntu-keyring.gpg]
https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/" | \
+ tee /etc/apt/sources.list.d/cran.list && \
+ apt-get update && \
+ apt-get install -y r-base r-base-dev && \
+ R --version; \
+ fi
+
# Install Python packages
RUN pip3 install --upgrade pip setuptools wheel && \
pip3 install python-lsp-server python-lsp-server[websockets] && \
pip3 install -r /tmp/requirements.txt && \
- pip3 install --no-cache-dir --find-links https://pypi.org/simple/ -r
/tmp/operator-requirements.txt || \
- pip3 install --no-cache-dir wordcloud==1.9.2
+ (pip3 install --no-cache-dir --find-links https://pypi.org/simple/ -r
/tmp/operator-requirements.txt || \
+ pip3 install --no-cache-dir wordcloud==1.9.2)
+
+# Install texera-rudf and its dependencies (conditional)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+ pip3 install git+https://github.com/Texera/texera-rudf.git; \
+ fi
+
+# Install R packages with pinned versions for texera-rudf (conditional)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+ Rscript -e "options(repos = c(CRAN = 'https://cran.r-project.org')); \
+ if (!requireNamespace('remotes', quietly=TRUE)) \
+ install.packages('remotes', Ncpus =
parallel::detectCores()); \
+ remotes::install_version('arrow', version='22.0.0.1', \
+ repos='https://cran.r-project.org', upgrade='never', \
+ Ncpus = parallel::detectCores()); \
+ remotes::install_version('coro', version='1.1.0', \
+ repos='https://cran.r-project.org', upgrade='never', \
+ Ncpus = parallel::detectCores()); \
+ remotes::install_version('aws.s3', version='0.3.22', \
+ repos='https://cran.r-project.org', upgrade='never', \
+ Ncpus = parallel::detectCores()); \
+ cat('R package versions:\n'); \
+ cat(' arrow: ', as.character(packageVersion('arrow')),
'\n'); \
+ cat(' coro: ', as.character(packageVersion('coro')),
'\n'); \
+ cat(' aws.s3: ', as.character(packageVersion('aws.s3')),
'\n')"; \
+ fi
+
+ENV LD_LIBRARY_PATH=/usr/lib/R/lib:$LD_LIBRARY_PATH
# Copy the built texera binary from the build phase
COPY --from=build /texera/amber/target/amber-* /texera/amber/
diff --git a/bin/k8s/templates/external-names.yaml
b/bin/k8s/templates/external-names.yaml
index 259c5fa695..69540067b8 100644
--- a/bin/k8s/templates/external-names.yaml
+++ b/bin/k8s/templates/external-names.yaml
@@ -73,4 +73,12 @@ to access services in the main namespace using the same
service names.
"externalName" (printf "%s-svc.%s.svc.cluster.local" .Values.webserver.name
$namespace)
) | nindent 0 }}
+---
+{{/* MinIO ExternalName */}}
+{{- include "external-name-service" (dict
+ "name" (printf "%s-minio" .Release.Name)
+ "namespace" $workflowComputingUnitPoolNamespace
+ "externalName" (printf "%s-minio.%s.svc.cluster.local" .Release.Name
$namespace)
+) | nindent 0 }}
+
diff --git a/bin/k8s/templates/workflow-computing-unit-manager-deployment.yaml
b/bin/k8s/templates/workflow-computing-unit-manager-deployment.yaml
index 2226bc6716..60ddbcc913 100644
--- a/bin/k8s/templates/workflow-computing-unit-manager-deployment.yaml
+++ b/bin/k8s/templates/workflow-computing-unit-manager-deployment.yaml
@@ -60,6 +60,19 @@ spec:
value: http://{{ .Values.fileService.name
}}-svc:9092/api/dataset/presign-download
- name: FILE_SERVICE_UPLOAD_ONE_FILE_TO_DATASET_ENDPOINT
value: http://{{ .Values.fileService.name
}}-svc:9092/api/dataset/did/upload
+ # S3 Access (for R UDF large binary support)
+ - name: STORAGE_S3_ENDPOINT
+ value: http://{{ .Release.Name }}-minio:9000
+ - name: STORAGE_S3_AUTH_USERNAME
+ valueFrom:
+ secretKeyRef:
+ name: {{ .Release.Name }}-minio
+ key: root-user
+ - name: STORAGE_S3_AUTH_PASSWORD
+ valueFrom:
+ secretKeyRef:
+ name: {{ .Release.Name }}-minio
+ key: root-password
# LakeFS Access (should be removed in production environment)
- name: STORAGE_LAKEFS_ENDPOINT
value: http://{{ .Release.Name }}-lakefs.{{ .Release.Namespace
}}:8000/api/v1
diff --git
a/computing-unit-managing-service/src/main/scala/org/apache/texera/service/resource/ComputingUnitManagingResource.scala
b/computing-unit-managing-service/src/main/scala/org/apache/texera/service/resource/ComputingUnitManagingResource.scala
index f1fddcd4d9..8161643bd6 100644
---
a/computing-unit-managing-service/src/main/scala/org/apache/texera/service/resource/ComputingUnitManagingResource.scala
+++
b/computing-unit-managing-service/src/main/scala/org/apache/texera/service/resource/ComputingUnitManagingResource.scala
@@ -79,6 +79,11 @@ object ComputingUnitManagingResource {
// LakeFS endpoint is passed to CU to make CU work in dev mode(using
localhost & using default LakeFS credentials)
// LakeFS credentials should NOT be passed to CU
EnvironmentalVariable.ENV_LAKEFS_ENDPOINT -> StorageConfig.lakefsEndpoint,
+ // S3 variables are passed to CU for R UDF large binary support
+ EnvironmentalVariable.ENV_S3_ENDPOINT -> StorageConfig.s3Endpoint,
+ EnvironmentalVariable.ENV_S3_REGION -> StorageConfig.s3Region,
+ EnvironmentalVariable.ENV_S3_AUTH_USERNAME -> StorageConfig.s3Username,
+ EnvironmentalVariable.ENV_S3_AUTH_PASSWORD -> StorageConfig.s3Password,
EnvironmentalVariable.ENV_FILE_SERVICE_GET_PRESIGNED_URL_ENDPOINT ->
EnvironmentalVariable
.get(EnvironmentalVariable.ENV_FILE_SERVICE_GET_PRESIGNED_URL_ENDPOINT)
.get,