Re: Publishing official docker images for KubernetesSchedulerBackend

Erik Erlandson Thu, 14 Dec 2017 16:07:22 -0800

We've been discussing the topic of container images a bit more.  The
kubernetes back-end operates by executing some specific CMD and ENTRYPOINT
logic, which is different than mesos, and which is probably not practical
to unify at this level.


However: These CMD and ENTRYPOINT configurations are essentially just a
thin skin on top of an image which is just an install of a spark distro.
We feel that a single "spark-base" image should be publishable, that is
consumable by kube-spark images, and mesos-spark images, and likely any
other community image whose primary purpose is running spark components.
The kube-specific dockerfiles would be written "FROM spark-base" and just
add the small command and entrypoint layers.  Likewise, the mesos images
could add any specialization layers that are necessary on top of the
"spark-base" image.

Does this factorization sound reasonable to others?
Cheers,
Erik


On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <mri...@gmail.com>
wrote:

> We do support running on Apache Mesos via docker images - so this
> would not be restricted to k8s.
> But unlike mesos support, which has other modes of running, I believe
> k8s support more heavily depends on availability of docker images.
>
>
> Regards,
> Mridul
>
>
> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <so...@cloudera.com> wrote:
> > Would it be logical to provide Docker-based distributions of other
> pieces of
> > Spark? or is this specific to K8S?
> > The problem is we wouldn't generally also provide a distribution of Spark
> > for the reasons you give, because if that, then why not RPMs and so on.
> >
> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan <
> ramanath...@google.com>
> > wrote:
> >>
> >> In this context, I think the docker images are similar to the binaries
> >> rather than an extension.
> >> It's packaging the compiled distribution to save people the effort of
> >> building one themselves, akin to binaries or the python package.
> >>
> >> For reference, this is the base dockerfile for the main image that we
> >> intend to publish. It's not particularly complicated.
> >> The driver and executor images are based on said base image and only
> >> customize the CMD (any file/directory inclusions are extraneous and
> will be
> >> removed).
> >>
> >> Is there only one way to build it? That's a bit harder to reason about.
> >> The base image I'd argue is likely going to always be built that way.
> The
> >> driver and executor images, there may be cases where people want to
> >> customize it - (like putting all dependencies into it for example).
> >> In those cases, as long as our images are bare bones, they can use the
> >> spark-driver/spark-executor images we publish as the base, and build
> their
> >> customization as a layer on top of it.
> >>
> >> I think the composability of docker images, makes this a bit different
> >> from say - debian packages.
> >> We can publish canonical images that serve as both - a complete image
> for
> >> most Spark applications, as well as a stable substrate to build
> >> customization upon.
> >>
> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <m...@clearstorydata.com>
> >> wrote:
> >>>
> >>> It's probably also worth considering whether there is only one,
> >>> well-defined, correct way to create such an image or whether this is a
> >>> reasonable avenue for customization. Part of why we don't do something
> like
> >>> maintain and publish canonical Debian packages for Spark is because
> >>> different organizations doing packaging and distribution of
> infrastructures
> >>> or operating systems can reasonably want to do this in a custom (or
> >>> non-customary) way. If there is really only one reasonable way to do a
> >>> docker image, then my bias starts to tend more toward the Spark PMC
> taking
> >>> on the responsibility to maintain and publish that image. If there is
> more
> >>> than one way to do it and publishing a particular image is more just a
> >>> convenience, then my bias tends more away from maintaining and publish
> it.
> >>>
> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <so...@cloudera.com> wrote:
> >>>>
> >>>> Source code is the primary release; compiled binary releases are
> >>>> conveniences that are also released. A docker image sounds fairly
> different
> >>>> though. To the extent it's the standard delivery mechanism for some
> artifact
> >>>> (think: pyspark on PyPI as well) that makes sense, but is that the
> >>>> situation? if it's more of an extension or alternate presentation of
> Spark
> >>>> components, that typically wouldn't be part of a Spark release. The
> ones the
> >>>> PMC takes responsibility for maintaining ought to be the core,
> critical
> >>>> means of distribution alone.
> >>>>
> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan
> >>>> <ramanath...@google.com.invalid> wrote:
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> We're all working towards the Kubernetes scheduler backend (full
> steam
> >>>>> ahead!) that's targeted towards Spark 2.3. One of the questions that
> comes
> >>>>> up often is docker images.
> >>>>>
> >>>>> While we're making available dockerfiles to allow people to create
> >>>>> their own docker images from source, ideally, we'd want to publish
> official
> >>>>> docker images as part of the release process.
> >>>>>
> >>>>> I understand that the ASF has procedure around this, and we would
> want
> >>>>> to get that started to help us get these artifacts published by 2.3.
> I'd
> >>>>> love to get a discussion around this started, and the thoughts of the
> >>>>> community regarding this.
> >>>>>
> >>>>> --
> >>>>> Thanks,
> >>>>> Anirudh Ramanathan
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Anirudh Ramanathan
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Publishing official docker images for KubernetesSchedulerBackend

Reply via email to