Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

Prashant Sharma Sun, 12 Jul 2020 01:22:08 -0700

Driver HA, is not yet available in k8s mode. It can be a good area, to
work. I want to take a look at it. I personally refer to spark official
documentation for reference.
Thanks,




On Fri, Jul 10, 2020, 9:30 PM Varshney, Vaibhav <
vaibhav.varsh...@siemens.com> wrote:

> Hi Prashant,
>
>
>
> It sounds encouraging. During scale down of the cluster, probably few of
> the spark jobs are impacted due to re-computation of shuffle data. This is
> not of supreme importance for us for now.
>
> Is there any reference deployment architecture available, which is HA ,
> scalable and dynamic-allocation-enabled for deploying Spark on K8s? Any
> suggested github repo or link?
>
>
>
> Thanks,
>
> Vaibhav V
>
>
>
>
>
> *From:* Prashant Sharma <scrapco...@gmail.com>
> *Sent:* Friday, July 10, 2020 12:57 AM
> *To:* user@spark.apache.org
> *Cc:* Sean Owen <sro...@gmail.com>; Ramani, Sai (DI SW CAS MP AFC ARC) <
> sai.ram...@siemens.com>; Varshney, Vaibhav (DI SW CAS MP AFC ARC) <
> vaibhav.varsh...@siemens.com>
> *Subject:* Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production
> deployment
>
>
>
> Hi,
>
>
>
> Whether it is a blocker or not, is upto you to decide. But, spark k8s
> cluster supports dynamic allocation, through a different mechanism, that
> is, without using an external shuffle service.
> https://issues.apache.org/jira/browse/SPARK-27963. There are pros and
> cons of both approaches. The only disadvantage of scaling without external
> shuffle service is, when the cluster scales down or it loses executors due
> to some external cause ( for example losing spot instances), we lose the
> shuffle data (data that was computed as an intermediate to some overall
> computation) on that executor. This situation may not lead to data loss, as
> spark can recompute the lost shuffle data.
>
>
>
> Dynamically, scaling up and down scaling, is helpful when the spark
> cluster is running off, "spot instances on AWS" for example or when the
> size of data is not known in advance. In other words, we cannot estimate
> how much resources would be needed to process the data. Dynamic scaling,
> lets the cluster increase its size only based on the number of pending
> tasks, currently this is the only metric implemented.
>
>
>
> I don't think it is a blocker for my production use cases.
>
>
>
> Thanks,
>
> Prashant
>
>
>
> On Fri, Jul 10, 2020 at 2:06 AM Varshney, Vaibhav <
> vaibhav.varsh...@siemens.com> wrote:
>
> Thanks for response. We have tried it in dev env. For production, if Spark
> 3.0 is not leveraging k8s scheduler, then would Spark Cluster in K8s be
> "static"?
> As per https://issues.apache.org/jira/browse/SPARK-24432 it seems it is
> still blocker for production workloads?
>
> Thanks,
> Vaibhav V
>
> -----Original Message-----
> From: Sean Owen <sro...@gmail.com>
> Sent: Thursday, July 9, 2020 3:20 PM
> To: Varshney, Vaibhav (DI SW CAS MP AFC ARC) <vaibhav.varsh...@siemens.com
> >
> Cc: user@spark.apache.org; Ramani, Sai (DI SW CAS MP AFC ARC) <
> sai.ram...@siemens.com>
> Subject: Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production
> deployment
>
> I haven't used the K8S scheduler personally, but, just based on that
> comment I wouldn't worry too much. It's been around for several versions
> and AFAIK works fine in general. We sometimes aren't so great about
> removing "experimental" labels. That said I know there are still some
> things that could be added to it and more work going on, and maybe people
> closer to that work can comment. But yeah you shouldn't be afraid to try it.
>
> On Thu, Jul 9, 2020 at 3:18 PM Varshney, Vaibhav <
> vaibhav.varsh...@siemens.com> wrote:
> >
> > Hi Spark Experts,
> >
> >
> >
> > We are trying to deploy spark on Kubernetes.
> >
> > As per doc
> http://spark.apache.org/docs/latest/running-on-kubernetes.html, it looks
> like K8s deployment is experimental.
> >
> > "The Kubernetes scheduler is currently experimental ".
> >
> >
> >
> > Spark 3.0 does not support production deployment using k8s scheduler?
> >
> > What’s the plan on full support of K8s scheduler?
> >
> >
> >
> > Thanks,
> >
> > Vaibhav V
>
>

Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

Reply via email to