Hello everyone,

I wanted to ask what's the state of support of Spark dynamic allocation as
of now, if there's any issue where I could track its advancement and
missing features.

We've just started evaluating possible alternatives for a production
architectural setup for our use case, and dynamic allocation could be
useful since our processing batches have a moderate variance in terms of
number of processed objects during the lifetime of the application. Hence,
we'd like to see if K8s may fit as a Cluster Manager.

Our test environment is an Hadoop cluster (HDP 3.0, used because we had it
already around), but since Hadoop/HDFS is not a hard requirement, I'd like
to ask what's considered the best cluster manager: why should we use a
standalone cluster, wrt a YARN or MESOS cluster? I mean, obviously, if we
already had one of those clusters production-ready, the choice would be
easier, but starting from scratch what are the pros and cons of the various
spark-compatible alternatives.

Possibly, I'd like to ask if there's anyone who's had experience running
Spark on a public cloud (AWS, Azure etc.) and whether their experience
included Hadoop PaaS (such as EMR and HDInsight), full IaaS, any K8s aaS
(AKS, EKS etc.).

Thank you very much for your time,
Federico

Reply via email to