Re: docker image distribution in Kubernetes cluster

2021-12-08 Thread Prasad Paravatha
I agree with Khalid and Rob. We absolutely need different properties for Driver and Executor images for ML use-cases. Here is a real-world example of setup at our company - Default setup via configmaps: When our Data scientists request Spark on k8s clusters (they are not familiar with Docke

Hadoop profile change to hadoop-2 and hadoop-3 since Spark 3.3

2021-12-08 Thread angers zhu
Hi all, Since Spark 3.2, we have supported Hadoop 3.3.1 now, but its profile name is *hadoop-3.2* (and *hadoop-2.7*) that is not correct. So we made a change in https://github.com/apache/spark/pull/34715 Starting from Spark 3.3, we use hadoop profile *hadoop-2* and *hadoop-3 *, and default hadoop

Re: docker image distribution in Kubernetes cluster

2021-12-08 Thread Mich Talebzadeh
Fine. If I go back to the list itself Property NameDefaultMeaning spark.kubernetes.container.image (none) Container image to use for the Spark application. This is usually of the form example.com/repo/spark:v1.0.0. This configuration is required and must be provided by the user, unless explicit i

Re: docker image distribution in Kubernetes cluster

2021-12-08 Thread Rob Vesse
So the point Khalid was trying to make is that there are legitimate reasons you might use different container images for the driver pod vs the executor pod.  It has nothing to do with Docker versions. Since the bulk of the actual work happens on the executor you may want additional libraries

Re: docker image distribution in Kubernetes cluster

2021-12-08 Thread Mich Talebzadeh
Thanks Khalid for your notes I have not come across a use case where the docker version on the driver and executors need to be different. My thinking is that spark.kubernetes.executor.container.image is the correct reference as in the Kubernetes where container is the correct terminology and al

Re: docker image distribution in Kubernetes cluster

2021-12-08 Thread Khalid Mammadov
Hi Mitch IMO, it's done to provide most flexibility. So, some users can have limited/restricted version of the image or with some additional software that they use on the executors that is used during processing. So, in your case you only need to provide the first one since the other two configs

Re: docker image distribution in Kubernetes cluster

2021-12-08 Thread Mich Talebzadeh
Just a correction that in Spark 3.2 documentation it states that Property NameDefaultMeaning spark.kubernetes.container.image (none) Container image to use for the Spark application. This is usually of the form example