Thank you for initiating the discussion in the community. Yes, we need to give 
more context in the dev mailing list.

This root cause is not about SPARK-40941 or SPARK-40513. Technically, this 
situation started 16 days ago due to SPARK-43148 because it made some breaking 
changes.

https://github.com/apache/spark-docker/pull/33
SPARK-43148 Add Apache Spark 3.4.0 Dockerfiles

1. The size regression: `apache/spark:3.4.0` tag which is claimed to be a 
replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is 500MB 
while the original v3.4.0 is 405MB. 25% is huge in terms of the size.

2. Accidental overwrite: `apache/spark:latest` was accidentally overwritten by 
`apache/spark:python3` image which has a bigger size due to the additional 
python binary. This is a breaking change to enforce the downstream users to 
change to something like `apache/spark:scala`.

I believe (1) and (2) were our mistakes. We had better recover them ASAP.
For Java questions, I prefer to be consistent with Apache Spark repo's default.

Dongjoon.

On 2023/05/08 08:56:26 Yikun Jiang wrote:
> This is a call for discussion for how we can unified Apache Spark Docker
> image tag fluently.
> 
> As you might know, there is an apache/spark-docker
> <https://github.com/apache/spark-docker> repo to store the dockerfiles and
> help to publish the docker images, also intended to replace the original
> manually publish workflow.
> 
> The scope of new images is to cover previous image cases (K8s / docker run)
> and also cover base image, standalone, Docker Official Image.
> 
> - (Previous) apache/spark:v3.4.0, apache/spark-py:v3.4.0,
> apache/spark-r:v3.4.0
> 
>     * The image build from apache/spark spark on k8s dockerfiles
> <https://github.com/apache/spark/tree/branch-3.4/resource-managers/kubernetes/docker/src/main/dockerfiles/spark>
> 
>     * Java version: Java 17 (It was Java 11 before v3.4.0, such as
> v3.3.0/v3.3.1/v3.3.2), set Java 17 by default in SPARK-40941
> <https://github.com/apache/spark/pull/38417>.
> 
>     * Support: K8s / docker run
> 
>     * See also: Time to start publishing Spark Docker Images
> <https://lists.apache.org/thread/h729bxrf1o803l4wz7g8bngkjd56y6x8>
> 
> * Link: https://hub.docker.com/r/apache/spark-py,
> https://hub.docker.com/r/apache/spark-r,
> https://hub.docker.com/r/apache/spark
> 
> - (New) apache/spark:3.4.0-python3(3.4.0/latest), apache/spark:3.4.0-r,
> apache/spark:3.4.0-scala, and also a all in one image:
> apache/spark:3.4.0-scala2.12-java11-python3-r-ubuntu
> 
>     * The image build from apache/spark-docker dockerfiles
> <https://github.com/apache/spark-docker/tree/master/3.4.0>
> 
>     * Java version: Java 11, Java17 is supported by SPARK-40513
> <https://github.com/apache/spark-docker/pull/35> (under review)
> 
>     * Support: K8s / docker run / base image / standalone / Docker Official
> Image
> 
>     * See detail in: Support Docker Official Image for Spark
> <https://issues.apache.org/jira/browse/SPARK-40513>
> 
>     * About dropping prefix `v`:
> https://github.com/docker-library/official-images/issues/14506
> 
>     * Link: https://hub.docker.com/r/apache/spark
> 
> We had some initial discuss on spark-website#458
> <https://github.com/apache/spark-website/pull/458#issuecomment-1522426236>,
> the mainly discussion is around version tag and default Java version
> behavior changes, so we’d like to hear your idea in here about below
> questions:
> 
> *#1.Which Java version should be used by default (latest tag)? Java8 or
> Java 11 or Java 17 or Any*
> 
> *#2.Which tag should be used in apache/spark? v3.4.0 (with prefix v) or
> 3.4.0 (dropping prefix v) or Both or Any*
> 
> Starts with my prefer:
> 
> 1. Java8 or Java17 are also ok to me (mainly considering the Java
> maintenance cycle). BTW, other apache projects: flink (8/11, 11 as default
> <https://github.com/docker-library/official-images/blob/93270eb07fb448fe7756b28af5495428242dcd6b/library/flink#L10>),
> solr (11 as default
> <https://github.com/apache/solr-docker/blob/989825ee6dce2f6bf7b31051f1ba053b6c4426f2/8.11/Dockerfile#L4>
> for 8.x, 17 as default
> <https://github.com/apache/solr-docker/blob/989825ee6dce2f6bf7b31051f1ba053b6c4426f2/9.2/Dockerfile#L17>
> since solr9), zookeeper (11 as default
> <https://github.com/31z4/zookeeper-docker/blob/181e5862c85b517e4599d79eb5c2c7339e60a4aa/3.8.1/Dockerfile#L1>
> )
> 
> 2. Only 3.4.0 (dropping prefix v). It will help us transition to the new
> tags with less confusion and also consider DOI suggestions
> <https://github.com/docker-library/official-images/issues/14506>.
> 
> Please feel free to share your ideas.
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to