[jira] [Commented] (FLINK-33992) Add option to fetch the jar from private repository in FlinkSessionJob

Ahmed Soliman (Jira) Mon, 04 Mar 2024 04:36:16 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823159#comment-17823159
 ]


Ahmed Soliman commented on FLINK-33992:
---------------------------------------

Hello [~skala] 

I have some thoughts here and please correct me if I am wrong, 

In Kubernetes, an 
{{[initContainer|https://kubernetes.io/docs/concepts/workloads/pods/init-containers/]}}
 is a special kind of container that runs before the main container in a Pod 
and completes its task before the main container starts. This is often used for 
setup tasks that need to be done before the main container can start. If you're 
using an {{initContainer}} to download the JAR file, you would need to make 
sure that the main container can access the downloaded file. This is where 
Kubernetes [volumes|https://kubernetes.io/docs/concepts/storage/volumes/] come 
in.

A Kubernetes volume is essentially a directory that is accessible to all 
containers running in a Pod. Data in a volume is preserved across container 
restarts, and it can be shared between multiple containers in a Pod.

so that's being said, you might use a volume to share the JAR file between the 
{{initContainer}} and the main container:
 # Define a volume in your Pod spec. This could be an {{emptyDir}} volume, 
which is first created when a Pod is assigned to a Node, and exists as long as 
that Pod is running on that node.

 # In the {{initContainer}} spec, specify a volume mount that points to the 
volume you defined. Download the JAR file to a path in this volume.

 # In the main container spec, specify a volume mount that points to the same 
volume. The main container will now be able to access the JAR file downloaded 
by the {{{}initContainer{}}}.

This way, the {{initContainer}} can download the JAR file and store it in a 
location that the main container can access, allowing the main container to use 
the JAR file when it starts.


cc: [~gyfora] Do you think the explanation makes sense? if yes, if we think of 
a case where a session cluster will have tens of session jobs, with different 
job jars (if this is a valid use case). Is it worth implementing a way to 
download from private repo in the job spec other than using this initContainer 
way? 

I have some thoughts on how to implement it, if we agree that the feature makes 
sense. 

> Add option to fetch the jar from private repository in FlinkSessionJob
> ----------------------------------------------------------------------
>
>                 Key: FLINK-33992
>                 URL: https://issues.apache.org/jira/browse/FLINK-33992
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>            Reporter: Sweta Kalakuntla
>            Priority: Major
>
> FlinkSessionJob spec does not have a capability to download job jar from 
> remote private repository. It can currently only download from public 
> repositories. 
> Adding capability to supply credentials  to the *spec.job.jarURI* in 
> FlinkSessionJob, will solve that problem.
> If I use initContainer to download the jar in FlinkDeployment and try to 
> access that in FlinkSessionJob, the operator is unable to find the jar in the 
> defined path.
> ---
> apiVersion: flink.apache.org/v1beta1
> kind: FlinkSessionJob
> metadata:
>   name: job1
> spec:
>   deploymentName: session-cluster
>   job:
>     jarURI: file:///opt/flink/job.jar
>     parallelism: 4
>     upgradeMode: savepoint
> (edited)
> caused by: java.io.FileNotFoundException: /opt/flink/job.jar (No such file or 
> directory)
> at java.base/java.io.FileInputStream.open0(Native Method)
> at java.base/java.io.FileInputStream.open(Unknown Source)
> at java.base/java.io.FileInputStream.<init>(Unknown Source)
> at 
> org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:134)
> at 
> org.apache.flink.kubernetes.operator.artifact.FileSystemBasedArtifactFetcher.fetch(FileSystemBasedArtifactFetcher.java:44)
> at 
> org.apache.flink.kubernetes.operator.artifact.ArtifactManager.fetch(ArtifactManager.java:63)
> at 
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService.uploadJar(AbstractFlinkService.java:707)
> at 
> org.apache.flink.kubernetes.operator.service.AbstractFlinkService.submitJobToSessionCluster(AbstractFlinkService.java:212)
> at 
> org.apache.flink.kubernetes.operator.reconciler.sessionjob.SessionJobReconciler.deploy(SessionJobReconciler.java:73)
> at 
> org.apache.flink.kubernetes.operator.reconciler.sessionjob.SessionJobReconciler.deploy(SessionJobReconciler.java:44)
> at 
> org.apache.flink.kubernetes.operator.reconciler.deployment.AbstractFlinkResourceReconciler.reconcile(AbstractFlinkResourceReconciler.java:120)
> at 
> org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:109)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33992) Add option to fetch the jar from private repository in FlinkSessionJob

Reply via email to