We've been discussing the topic of container images a bit more. The kubernetes back-end operates by executing some specific CMD and ENTRYPOINT logic, which is different than mesos, and which is probably not practical to unify at this level.
However: These CMD and ENTRYPOINT configurations are essentially just a thin skin on top of an image which is just an install of a spark distro. We feel that a single "spark-base" image should be publishable, that is consumable by kube-spark images, and mesos-spark images, and likely any other community image whose primary purpose is running spark components. The kube-specific dockerfiles would be written "FROM spark-base" and just add the small command and entrypoint layers. Likewise, the mesos images could add any specialization layers that are necessary on top of the "spark-base" image. Does this factorization sound reasonable to others? Cheers, Erik On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <mri...@gmail.com> wrote: > We do support running on Apache Mesos via docker images - so this > would not be restricted to k8s. > But unlike mesos support, which has other modes of running, I believe > k8s support more heavily depends on availability of docker images. > > > Regards, > Mridul > > > On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <so...@cloudera.com> wrote: > > Would it be logical to provide Docker-based distributions of other > pieces of > > Spark? or is this specific to K8S? > > The problem is we wouldn't generally also provide a distribution of Spark > > for the reasons you give, because if that, then why not RPMs and so on. > > > > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan < > ramanath...@google.com> > > wrote: > >> > >> In this context, I think the docker images are similar to the binaries > >> rather than an extension. > >> It's packaging the compiled distribution to save people the effort of > >> building one themselves, akin to binaries or the python package. > >> > >> For reference, this is the base dockerfile for the main image that we > >> intend to publish. It's not particularly complicated. > >> The driver and executor images are based on said base image and only > >> customize the CMD (any file/directory inclusions are extraneous and > will be > >> removed). > >> > >> Is there only one way to build it? That's a bit harder to reason about. > >> The base image I'd argue is likely going to always be built that way. > The > >> driver and executor images, there may be cases where people want to > >> customize it - (like putting all dependencies into it for example). > >> In those cases, as long as our images are bare bones, they can use the > >> spark-driver/spark-executor images we publish as the base, and build > their > >> customization as a layer on top of it. > >> > >> I think the composability of docker images, makes this a bit different > >> from say - debian packages. > >> We can publish canonical images that serve as both - a complete image > for > >> most Spark applications, as well as a stable substrate to build > >> customization upon. > >> > >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <m...@clearstorydata.com> > >> wrote: > >>> > >>> It's probably also worth considering whether there is only one, > >>> well-defined, correct way to create such an image or whether this is a > >>> reasonable avenue for customization. Part of why we don't do something > like > >>> maintain and publish canonical Debian packages for Spark is because > >>> different organizations doing packaging and distribution of > infrastructures > >>> or operating systems can reasonably want to do this in a custom (or > >>> non-customary) way. If there is really only one reasonable way to do a > >>> docker image, then my bias starts to tend more toward the Spark PMC > taking > >>> on the responsibility to maintain and publish that image. If there is > more > >>> than one way to do it and publishing a particular image is more just a > >>> convenience, then my bias tends more away from maintaining and publish > it. > >>> > >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <so...@cloudera.com> wrote: > >>>> > >>>> Source code is the primary release; compiled binary releases are > >>>> conveniences that are also released. A docker image sounds fairly > different > >>>> though. To the extent it's the standard delivery mechanism for some > artifact > >>>> (think: pyspark on PyPI as well) that makes sense, but is that the > >>>> situation? if it's more of an extension or alternate presentation of > Spark > >>>> components, that typically wouldn't be part of a Spark release. The > ones the > >>>> PMC takes responsibility for maintaining ought to be the core, > critical > >>>> means of distribution alone. > >>>> > >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan > >>>> <ramanath...@google.com.invalid> wrote: > >>>>> > >>>>> Hi all, > >>>>> > >>>>> We're all working towards the Kubernetes scheduler backend (full > steam > >>>>> ahead!) that's targeted towards Spark 2.3. One of the questions that > comes > >>>>> up often is docker images. > >>>>> > >>>>> While we're making available dockerfiles to allow people to create > >>>>> their own docker images from source, ideally, we'd want to publish > official > >>>>> docker images as part of the release process. > >>>>> > >>>>> I understand that the ASF has procedure around this, and we would > want > >>>>> to get that started to help us get these artifacts published by 2.3. > I'd > >>>>> love to get a discussion around this started, and the thoughts of the > >>>>> community regarding this. > >>>>> > >>>>> -- > >>>>> Thanks, > >>>>> Anirudh Ramanathan > >>> > >>> > >> > >> > >> > >> -- > >> Anirudh Ramanathan > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >