The whole content of SPIP (Support Docker Official Image for Spark) aims to add (1) newly, not to corrupt or destroy the existing (2).
(1) https://hub.docker.com/_/spark (2) https://hub.docker.com/r/apache/spark/tags The reference model repos were also documented like the followings. https://hub.docker.com/_/flink https://hub.docker.com/_/storm https://hub.docker.com/_/solr https://hub.docker.com/_/zookeeper In short, according to the SPIP's `Docker Official Image` definition, new images should go to (1) only in order to achieve `Support Docker Official Image for Spark`, shouldn't they? Dongjoon. On Mon, May 8, 2023 at 6:22 PM Yikun Jiang <yikunk...@gmail.com> wrote: > > 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be > a replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is > 500MB while the original v3.4.0 is 405MB. 25% is huge in terms of the size. > > > 2. Accidental overwrite: `apache/spark:latest` was accidentally > overwritten by `apache/spark:python3` image which has a bigger size due to > the additional python binary. This is a breaking change to enforce the > downstream users to change to something like `apache/spark:scala`. > > Just FYI, we also had a discussion about tag policy (latest/3.4.0) and > also rough size estimation [1] in "SPIP: Support Docker Official Image for > Spark". > > [1] > https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=AAAAf2TyFr0 > > Regards, > Yikun > > > On Tue, May 9, 2023 at 5:03 AM Dongjoon Hyun <dongj...@apache.org> wrote: > >> Thank you for initiating the discussion in the community. Yes, we need to >> give more context in the dev mailing list. >> >> This root cause is not about SPARK-40941 or SPARK-40513. Technically, >> this situation started 16 days ago due to SPARK-43148 because it made some >> breaking changes. >> >> https://github.com/apache/spark-docker/pull/33 >> SPARK-43148 Add Apache Spark 3.4.0 Dockerfiles >> >> 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be a >> replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is 500MB >> while the original v3.4.0 is 405MB. 25% is huge in terms of the size. >> >> 2. Accidental overwrite: `apache/spark:latest` was accidentally >> overwritten by `apache/spark:python3` image which has a bigger size due to >> the additional python binary. This is a breaking change to enforce the >> downstream users to change to something like `apache/spark:scala`. >> >> I believe (1) and (2) were our mistakes. We had better recover them ASAP. >> For Java questions, I prefer to be consistent with Apache Spark repo's >> default. >> >> Dongjoon. >> >> On 2023/05/08 08:56:26 Yikun Jiang wrote: >> > This is a call for discussion for how we can unified Apache Spark Docker >> > image tag fluently. >> > >> > As you might know, there is an apache/spark-docker >> > <https://github.com/apache/spark-docker> repo to store the dockerfiles >> and >> > help to publish the docker images, also intended to replace the original >> > manually publish workflow. >> > >> > The scope of new images is to cover previous image cases (K8s / docker >> run) >> > and also cover base image, standalone, Docker Official Image. >> > >> > - (Previous) apache/spark:v3.4.0, apache/spark-py:v3.4.0, >> > apache/spark-r:v3.4.0 >> > >> > * The image build from apache/spark spark on k8s dockerfiles >> > < >> https://github.com/apache/spark/tree/branch-3.4/resource-managers/kubernetes/docker/src/main/dockerfiles/spark >> > >> > >> > * Java version: Java 17 (It was Java 11 before v3.4.0, such as >> > v3.3.0/v3.3.1/v3.3.2), set Java 17 by default in SPARK-40941 >> > <https://github.com/apache/spark/pull/38417>. >> > >> > * Support: K8s / docker run >> > >> > * See also: Time to start publishing Spark Docker Images >> > <https://lists.apache.org/thread/h729bxrf1o803l4wz7g8bngkjd56y6x8> >> > >> > * Link: https://hub.docker.com/r/apache/spark-py, >> > https://hub.docker.com/r/apache/spark-r, >> > https://hub.docker.com/r/apache/spark >> > >> > - (New) apache/spark:3.4.0-python3(3.4.0/latest), apache/spark:3.4.0-r, >> > apache/spark:3.4.0-scala, and also a all in one image: >> > apache/spark:3.4.0-scala2.12-java11-python3-r-ubuntu >> > >> > * The image build from apache/spark-docker dockerfiles >> > <https://github.com/apache/spark-docker/tree/master/3.4.0> >> > >> > * Java version: Java 11, Java17 is supported by SPARK-40513 >> > <https://github.com/apache/spark-docker/pull/35> (under review) >> > >> > * Support: K8s / docker run / base image / standalone / Docker >> Official >> > Image >> > >> > * See detail in: Support Docker Official Image for Spark >> > <https://issues.apache.org/jira/browse/SPARK-40513> >> > >> > * About dropping prefix `v`: >> > https://github.com/docker-library/official-images/issues/14506 >> > >> > * Link: https://hub.docker.com/r/apache/spark >> > >> > We had some initial discuss on spark-website#458 >> > < >> https://github.com/apache/spark-website/pull/458#issuecomment-1522426236 >> >, >> > the mainly discussion is around version tag and default Java version >> > behavior changes, so we’d like to hear your idea in here about below >> > questions: >> > >> > *#1.Which Java version should be used by default (latest tag)? Java8 or >> > Java 11 or Java 17 or Any* >> > >> > *#2.Which tag should be used in apache/spark? v3.4.0 (with prefix v) or >> > 3.4.0 (dropping prefix v) or Both or Any* >> > >> > Starts with my prefer: >> > >> > 1. Java8 or Java17 are also ok to me (mainly considering the Java >> > maintenance cycle). BTW, other apache projects: flink (8/11, 11 as >> default >> > < >> https://github.com/docker-library/official-images/blob/93270eb07fb448fe7756b28af5495428242dcd6b/library/flink#L10 >> >), >> > solr (11 as default >> > < >> https://github.com/apache/solr-docker/blob/989825ee6dce2f6bf7b31051f1ba053b6c4426f2/8.11/Dockerfile#L4 >> > >> > for 8.x, 17 as default >> > < >> https://github.com/apache/solr-docker/blob/989825ee6dce2f6bf7b31051f1ba053b6c4426f2/9.2/Dockerfile#L17 >> > >> > since solr9), zookeeper (11 as default >> > < >> https://github.com/31z4/zookeeper-docker/blob/181e5862c85b517e4599d79eb5c2c7339e60a4aa/3.8.1/Dockerfile#L1 >> > >> > ) >> > >> > 2. Only 3.4.0 (dropping prefix v). It will help us transition to the new >> > tags with less confusion and also consider DOI suggestions >> > <https://github.com/docker-library/official-images/issues/14506>. >> > >> > Please feel free to share your ideas. >> > >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>