This is an automated email from the ASF dual-hosted git repository.

kunwp1 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/texera.git


The following commit(s) were added to refs/heads/main by this push:
     new a0a6008906 chore: Re-enable R support flags in Docker/CI and ensure 
LargeBinary works on Kubernetes (#4168)
a0a6008906 is described below

commit a0a60089066e13ea4697d4467fdca31278b8370d
Author: Chris <[email protected]>
AuthorDate: Thu Jan 22 19:24:25 2026 -0800

    chore: Re-enable R support flags in Docker/CI and ensure LargeBinary works 
on Kubernetes (#4168)
    
    <!--
    Thanks for sending a pull request (PR)! Here are some tips for you:
    1. If this is your first time, please read our contributor guidelines:
    [Contributing to
    Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
      2. Ensure you have added or run the appropriate tests for your PR
      3. If the PR is work in progress, mark it a draft on GitHub.
      4. Please write your PR title to summarize what this PR proposes, we
        are following Conventional Commits style for PR titles as well.
      5. Be sure to keep the PR description updated to reflect all changes.
    -->
    
    ### What changes were proposed in this PR?
    <!--
    Please clarify what changes you are proposing. The purpose of this
    section
    is to outline the changes. Here are some tips for you:
      1. If you propose a new API, clarify the use case for a new API.
      2. If you fix a bug, you can clarify why it is a bug.
      3. If it is a refactoring, clarify what has been changed.
      3. It would be helpful to include a before-and-after comparison using
         screenshots or GIFs.
      4. Please consider writing useful notes for better and faster reviews.
    -->
    
    Previously, #4124 removed the R support flags from the Dockerfiles and
    CI workflow after the R UDF runtime logic was removed. With the runtime
    now restored in #4164, this PR adds those flags back. In addition,
    because the R UDF Plugin (https://github.com/Texera/texera-rudf)
    includes LargeBinary support, this PR also includes Kubernetes-related
    updates to ensure LargeBinary works correctly in a Kubernetes
    environment.
    
    ### Any related issues, documentation, discussions?
    <!--
    Please use this section to link other resources if not mentioned
    already.
    1. If this PR fixes an issue, please include `Fixes #1234`, `Resolves
    #1234`
    or `Closes #1234`. If it is only related, simply mention the issue
    number.
      2. If there is design documentation, please add the link.
      3. If there is a discussion in the mailing list, please add the link.
    -->
    Discussion: #4155
    PR: #4124, #4164
    
    
    ### How was this PR tested?
    <!--
    If tests were added, say they were added here. Or simply mention that if
    the PR
    is tested with existing test cases. Make sure to include/update test
    cases that
    check the changes thoroughly including negative and positive cases if
    possible.
    If it was tested in a way different from regular unit tests, please
    clarify how
    you tested step by step, ideally copy and paste-able, so that other
    reviewers can
    test and check, and descendants can verify in the future. If tests were
    not added,
    please describe why they were not added and/or why it was difficult to
    add.
    -->
    1. Create a Kubernetes Environment
    2. Tested with this workflow
    [Test.json](https://github.com/user-attachments/files/24662090/Test.json)
    
    #### Plugin Uninstalled Computing Unit
    <img width="1116" height="957" alt="Screenshot 2026-01-21 at 2 43 33 PM"
    
src="https://github.com/user-attachments/assets/877b6dfc-1d74-44a4-923b-822ed7d508b3";
    />
    
    #### Plugin Installed Computing Unit
    <img width="1267" height="959" alt="Screenshot 2026-01-21 at 3 07 49 PM"
    
src="https://github.com/user-attachments/assets/91ff17ea-9b69-41e5-b643-640d9f3befdf";
    />
    
    
    ### Was this PR authored or co-authored using generative AI tooling?
    <!--
    If generative AI tooling has been used in the process of authoring this
    PR,
    please include the phrase: 'Generated-by: ' followed by the name of the
    tool
    and its version. If no, write 'No'.
    Please refer to the [ASF Generative Tooling
    Guidance](https://www.apache.org/legal/generative-tooling.html) for
    details.
    -->
    No.
---
 .github/workflows/build-and-push-images.yml        | 13 +++++
 bin/computing-unit-master.dockerfile               | 53 +++++++++++++++++++-
 bin/computing-unit-worker.dockerfile               | 58 ++++++++++++++++++++--
 bin/k8s/templates/external-names.yaml              |  8 +++
 ...workflow-computing-unit-manager-deployment.yaml | 13 +++++
 .../resource/ComputingUnitManagingResource.scala   |  5 ++
 6 files changed, 146 insertions(+), 4 deletions(-)

diff --git a/.github/workflows/build-and-push-images.yml 
b/.github/workflows/build-and-push-images.yml
index 478ce28ad2..c7fa0f9d04 100644
--- a/.github/workflows/build-and-push-images.yml
+++ b/.github/workflows/build-and-push-images.yml
@@ -49,6 +49,11 @@ on:
           - both
           - amd64
           - arm64
+      with_r_support:
+        description: 'Enable R support for workflow-execution-coordinator'
+        required: false
+        default: false
+        type: boolean
   schedule:
     # Run nightly at 2:00 AM UTC
     - cron: '0 2 * * *'
@@ -70,6 +75,7 @@ jobs:
       docker_registry: ${{ steps.set-params.outputs.docker_registry }}
       services: ${{ steps.set-params.outputs.services }}
       platforms: ${{ steps.set-params.outputs.platforms }}
+      with_r_support: ${{ steps.set-params.outputs.with_r_support }}
     steps:
       - name: Set build parameters
         id: set-params
@@ -82,6 +88,7 @@ jobs:
             echo "docker_registry=apache" >> $GITHUB_OUTPUT
             echo "services=*" >> $GITHUB_OUTPUT
             echo "platforms=both" >> $GITHUB_OUTPUT
+            echo "with_r_support=false" >> $GITHUB_OUTPUT
           else
             echo "Manual workflow_dispatch - using user inputs"
             echo "branch=${{ github.event.inputs.branch || 'main' }}" >> 
$GITHUB_OUTPUT
@@ -89,6 +96,7 @@ jobs:
             echo "docker_registry=${{ github.event.inputs.docker_registry || 
'apache' }}" >> $GITHUB_OUTPUT
             echo "services=${{ github.event.inputs.services || '*' }}" >> 
$GITHUB_OUTPUT
             echo "platforms=${{ github.event.inputs.platforms || 'both' }}" >> 
$GITHUB_OUTPUT
+            echo "with_r_support=${{ github.event.inputs.with_r_support || 
'false' }}" >> $GITHUB_OUTPUT
           fi
 
   # Step 1: Generate JOOQ code once and share it
@@ -350,6 +358,8 @@ jobs:
           tags: ${{ env.DOCKER_REGISTRY }}/${{ matrix.image_name }}:${{ 
needs.set-parameters.outputs.image_tag }}-amd64
           cache-from: type=gha,scope=${{ matrix.image_name }}-amd64
           cache-to: type=gha,mode=max,scope=${{ matrix.image_name }}-amd64
+          build-args: |
+            ${{ (matrix.service == 'computing-unit-master' || matrix.service 
== 'computing-unit-worker') && needs.set-parameters.outputs.with_r_support == 
'true' && 'WITH_R_SUPPORT=true' || '' }}
           labels: |
             org.opencontainers.image.title=${{ matrix.image_name }}
             org.opencontainers.image.description=Apache Texera ${{ 
matrix.image_name }} (AMD64)
@@ -427,6 +437,8 @@ jobs:
           tags: ${{ env.DOCKER_REGISTRY }}/${{ matrix.image_name }}:${{ 
needs.set-parameters.outputs.image_tag }}-arm64
           cache-from: type=gha,scope=${{ matrix.image_name }}-arm64
           cache-to: type=gha,mode=max,scope=${{ matrix.image_name }}-arm64
+          build-args: |
+            ${{ (matrix.service == 'computing-unit-master' || matrix.service 
== 'computing-unit-worker') && needs.set-parameters.outputs.with_r_support == 
'true' && 'WITH_R_SUPPORT=true' || '' }}
           labels: |
             org.opencontainers.image.title=${{ matrix.image_name }}
             org.opencontainers.image.description=Apache Texera ${{ 
matrix.image_name }} (ARM64)
@@ -487,6 +499,7 @@ jobs:
           echo "- **Tag:** \`${{ needs.set-parameters.outputs.image_tag }}\`" 
>> $GITHUB_STEP_SUMMARY
           echo "- **Services:** ${{ needs.set-parameters.outputs.services }}" 
>> $GITHUB_STEP_SUMMARY
           echo "- **Platforms:** ${{ needs.set-parameters.outputs.platforms 
}}" >> $GITHUB_STEP_SUMMARY
+          echo "- **R Support:** ${{ 
needs.set-parameters.outputs.with_r_support }}" >> $GITHUB_STEP_SUMMARY
           echo "" >> $GITHUB_STEP_SUMMARY
           echo "## Build Method" >> $GITHUB_STEP_SUMMARY
           echo "**Parallel platform builds** (faster)" >> $GITHUB_STEP_SUMMARY
diff --git a/bin/computing-unit-master.dockerfile 
b/bin/computing-unit-master.dockerfile
index eeb60c1390..89f61f17fc 100644
--- a/bin/computing-unit-master.dockerfile
+++ b/bin/computing-unit-master.dockerfile
@@ -43,25 +43,76 @@ RUN unzip amber/target/universal/amber-*.zip -d 
amber/target/
 
 FROM eclipse-temurin:11-jdk-jammy AS runtime
 
+# Build argument to enable/disable R support (default: false)
+ARG WITH_R_SUPPORT=false
+
 WORKDIR /texera/amber
 
 COPY --from=build /texera/amber/requirements.txt /tmp/requirements.txt
 COPY --from=build /texera/amber/operator-requirements.txt 
/tmp/operator-requirements.txt
 
-# Install Python runtime dependencies
+# Install Python runtime dependencies (always) and R runtime dependencies 
(conditional)
 RUN apt-get update && apt-get install -y \
     python3-pip \
     python3-dev \
     libpq-dev \
     curl \
     unzip \
+    gnupg \
+    software-properties-common \
+    dirmngr \
+    git \
+    $(if [ "$WITH_R_SUPPORT" = "true" ]; then echo "\
+    gfortran \
+    libxml2-dev \
+    libssl-dev \
+    libcurl4-openssl-dev"; fi) \
     && apt-get clean
 
+# Install R from CRAN repository (pre-built, much faster than source 
compilation)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+        # Add CRAN GPG key and repository
+        curl -fsSL 
https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \
+            gpg --dearmor -o /usr/share/keyrings/cran-ubuntu-keyring.gpg && \
+        echo "deb [signed-by=/usr/share/keyrings/cran-ubuntu-keyring.gpg] 
https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/" | \
+            tee /etc/apt/sources.list.d/cran.list && \
+        apt-get update && \
+        apt-get install -y r-base r-base-dev && \
+        R --version; \
+    fi
+
 # Install Python packages
 RUN pip3 install --upgrade pip setuptools wheel && \
     pip3 install -r /tmp/requirements.txt && \
     pip3 install -r /tmp/operator-requirements.txt
 
+# Install texera-rudf and its dependencies (conditional)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+        pip3 install git+https://github.com/Texera/texera-rudf.git; \
+    fi
+
+# Install R packages with pinned versions for texera-rudf (conditional)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+        Rscript -e "options(repos = c(CRAN = 'https://cran.r-project.org')); \
+                    if (!requireNamespace('remotes', quietly=TRUE)) \
+                      install.packages('remotes', Ncpus = 
parallel::detectCores()); \
+                    remotes::install_version('arrow', version='14.0.2.1', \
+                      repos='https://cran.r-project.org', upgrade='never', \
+                      Ncpus = parallel::detectCores()); \
+                    remotes::install_version('coro', version='1.1.0', \
+                      repos='https://cran.r-project.org', upgrade='never', \
+                      Ncpus = parallel::detectCores()); \
+                    remotes::install_version('aws.s3', version='0.3.22', \
+                      repos='https://cran.r-project.org', upgrade='never', \
+                      Ncpus = parallel::detectCores()); \
+                    cat('R package versions:\n'); \
+                    cat('  arrow: ', as.character(packageVersion('arrow')), 
'\n'); \
+                    cat('  coro: ', as.character(packageVersion('coro')), 
'\n'); \
+                    cat('  aws.s3: ', as.character(packageVersion('aws.s3')), 
'\n')"; \
+    fi
+
+ENV LD_LIBRARY_PATH=/usr/lib/R/lib:$LD_LIBRARY_PATH
+
 # Copy the built texera binary from the build phase
 COPY --from=build /texera/.git /texera/amber/.git
 COPY --from=build /texera/amber/target/amber-* /texera/amber/
diff --git a/bin/computing-unit-worker.dockerfile 
b/bin/computing-unit-worker.dockerfile
index 6cf00719ff..ef8a57b617 100644
--- a/bin/computing-unit-worker.dockerfile
+++ b/bin/computing-unit-worker.dockerfile
@@ -43,24 +43,76 @@ RUN unzip amber/target/universal/amber-*.zip -d 
amber/target/
 
 FROM eclipse-temurin:11-jre-jammy AS runtime
 
+# Build argument to enable/disable R support (default: false)
+ARG WITH_R_SUPPORT=false
+
 WORKDIR /texera/amber
 
 COPY --from=build /texera/amber/requirements.txt /tmp/requirements.txt
 COPY --from=build /texera/amber/operator-requirements.txt 
/tmp/operator-requirements.txt
 
-# Install Python runtime dependencies
+# Install Python runtime dependencies (always) and R runtime dependencies 
(conditional)
 RUN apt-get update && apt-get install -y \
     python3-pip \
     python3-dev \
     libpq-dev \
+    curl \
+    gnupg \
+    software-properties-common \
+    dirmngr \
+    git \
+    $(if [ "$WITH_R_SUPPORT" = "true" ]; then echo "\
+    gfortran \
+    libxml2-dev \
+    libssl-dev \
+    libcurl4-openssl-dev"; fi) \
     && apt-get clean
 
+# Install R from CRAN repository (pre-built, much faster than source 
compilation)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+        # Add CRAN GPG key and repository
+        curl -fsSL 
https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \
+            gpg --dearmor -o /usr/share/keyrings/cran-ubuntu-keyring.gpg && \
+        echo "deb [signed-by=/usr/share/keyrings/cran-ubuntu-keyring.gpg] 
https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/" | \
+            tee /etc/apt/sources.list.d/cran.list && \
+        apt-get update && \
+        apt-get install -y r-base r-base-dev && \
+        R --version; \
+    fi
+
 # Install Python packages
 RUN pip3 install --upgrade pip setuptools wheel && \
     pip3 install python-lsp-server python-lsp-server[websockets] && \
     pip3 install -r /tmp/requirements.txt && \
-    pip3 install --no-cache-dir --find-links https://pypi.org/simple/ -r 
/tmp/operator-requirements.txt || \
-    pip3 install --no-cache-dir wordcloud==1.9.2
+    (pip3 install --no-cache-dir --find-links https://pypi.org/simple/ -r 
/tmp/operator-requirements.txt || \
+     pip3 install --no-cache-dir wordcloud==1.9.2)
+
+# Install texera-rudf and its dependencies (conditional)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+        pip3 install git+https://github.com/Texera/texera-rudf.git; \
+    fi
+
+# Install R packages with pinned versions for texera-rudf (conditional)
+RUN if [ "$WITH_R_SUPPORT" = "true" ]; then \
+        Rscript -e "options(repos = c(CRAN = 'https://cran.r-project.org')); \
+                    if (!requireNamespace('remotes', quietly=TRUE)) \
+                      install.packages('remotes', Ncpus = 
parallel::detectCores()); \
+                    remotes::install_version('arrow', version='22.0.0.1', \
+                      repos='https://cran.r-project.org', upgrade='never', \
+                      Ncpus = parallel::detectCores()); \
+                    remotes::install_version('coro', version='1.1.0', \
+                      repos='https://cran.r-project.org', upgrade='never', \
+                      Ncpus = parallel::detectCores()); \
+                    remotes::install_version('aws.s3', version='0.3.22', \
+                      repos='https://cran.r-project.org', upgrade='never', \
+                      Ncpus = parallel::detectCores()); \
+                    cat('R package versions:\n'); \
+                    cat('  arrow: ', as.character(packageVersion('arrow')), 
'\n'); \
+                    cat('  coro: ', as.character(packageVersion('coro')), 
'\n'); \
+                    cat('  aws.s3: ', as.character(packageVersion('aws.s3')), 
'\n')"; \
+    fi
+
+ENV LD_LIBRARY_PATH=/usr/lib/R/lib:$LD_LIBRARY_PATH
 
 # Copy the built texera binary from the build phase
 COPY --from=build /texera/amber/target/amber-* /texera/amber/
diff --git a/bin/k8s/templates/external-names.yaml 
b/bin/k8s/templates/external-names.yaml
index 259c5fa695..69540067b8 100644
--- a/bin/k8s/templates/external-names.yaml
+++ b/bin/k8s/templates/external-names.yaml
@@ -73,4 +73,12 @@ to access services in the main namespace using the same 
service names.
   "externalName" (printf "%s-svc.%s.svc.cluster.local" .Values.webserver.name 
$namespace)
 ) | nindent 0 }}
 
+---
+{{/* MinIO ExternalName */}}
+{{- include "external-name-service" (dict 
+  "name" (printf "%s-minio" .Release.Name)
+  "namespace" $workflowComputingUnitPoolNamespace
+  "externalName" (printf "%s-minio.%s.svc.cluster.local" .Release.Name 
$namespace)
+) | nindent 0 }}
+
 
diff --git a/bin/k8s/templates/workflow-computing-unit-manager-deployment.yaml 
b/bin/k8s/templates/workflow-computing-unit-manager-deployment.yaml
index 2226bc6716..60ddbcc913 100644
--- a/bin/k8s/templates/workflow-computing-unit-manager-deployment.yaml
+++ b/bin/k8s/templates/workflow-computing-unit-manager-deployment.yaml
@@ -60,6 +60,19 @@ spec:
               value: http://{{ .Values.fileService.name 
}}-svc:9092/api/dataset/presign-download
             - name: FILE_SERVICE_UPLOAD_ONE_FILE_TO_DATASET_ENDPOINT
               value: http://{{ .Values.fileService.name 
}}-svc:9092/api/dataset/did/upload
+            # S3 Access (for R UDF large binary support)
+            - name: STORAGE_S3_ENDPOINT
+              value: http://{{ .Release.Name }}-minio:9000
+            - name: STORAGE_S3_AUTH_USERNAME
+              valueFrom:
+                secretKeyRef:
+                  name: {{ .Release.Name }}-minio
+                  key: root-user
+            - name: STORAGE_S3_AUTH_PASSWORD
+              valueFrom:
+                secretKeyRef:
+                  name: {{ .Release.Name }}-minio
+                  key: root-password
             # LakeFS Access (should be removed in production environment)
             - name: STORAGE_LAKEFS_ENDPOINT
               value: http://{{ .Release.Name }}-lakefs.{{ .Release.Namespace 
}}:8000/api/v1
diff --git 
a/computing-unit-managing-service/src/main/scala/org/apache/texera/service/resource/ComputingUnitManagingResource.scala
 
b/computing-unit-managing-service/src/main/scala/org/apache/texera/service/resource/ComputingUnitManagingResource.scala
index f1fddcd4d9..8161643bd6 100644
--- 
a/computing-unit-managing-service/src/main/scala/org/apache/texera/service/resource/ComputingUnitManagingResource.scala
+++ 
b/computing-unit-managing-service/src/main/scala/org/apache/texera/service/resource/ComputingUnitManagingResource.scala
@@ -79,6 +79,11 @@ object ComputingUnitManagingResource {
     // LakeFS endpoint is passed to CU to make CU work in dev mode(using 
localhost & using default LakeFS credentials)
     // LakeFS credentials should NOT be passed to CU
     EnvironmentalVariable.ENV_LAKEFS_ENDPOINT -> StorageConfig.lakefsEndpoint,
+    // S3 variables are passed to CU for R UDF large binary support
+    EnvironmentalVariable.ENV_S3_ENDPOINT -> StorageConfig.s3Endpoint,
+    EnvironmentalVariable.ENV_S3_REGION -> StorageConfig.s3Region,
+    EnvironmentalVariable.ENV_S3_AUTH_USERNAME -> StorageConfig.s3Username,
+    EnvironmentalVariable.ENV_S3_AUTH_PASSWORD -> StorageConfig.s3Password,
     EnvironmentalVariable.ENV_FILE_SERVICE_GET_PRESIGNED_URL_ENDPOINT -> 
EnvironmentalVariable
       .get(EnvironmentalVariable.ENV_FILE_SERVICE_GET_PRESIGNED_URL_ENDPOINT)
       .get,

Reply via email to