The standalone mode was introduced for this exact reason for customers who
don’t want to run YARN.

Have you considered running Samza in stand-alone mode? In this mode, Samza
is an embedded library - very similar to Kafka Streams.

https://samza.apache.org/learn/documentation/latest/deployment/standalone.html

A good rule of thumb when deciding the number of partitions(N) is  : “how
much data do you want to retain per-partition at anytime?” You can pick N
such that you retain around 20G. Another factor to consider is whether you
are getting adequate compute parallelism.


On Tuesday, February 19, 2019, Jeremiah Adams <jad...@helixeducation.com>
wrote:

> We are finding YARN and AWS Ec2 to be too costly for us. We are having to
> scale the cluster to support more jobs and have plans to write more jobs.
> We are scaling because cluster doesn’t have enough VCores to support all
> the Containers, not enough RAM for jobs, etc.
>
> Has anyone had luck running Samza jobs in an alternative scheduler? Say,
> Nomad, Kubernetes or something else?
>
> Similarly, anyone have any luck with Samza on something like Kafka’s
> streams where I don’t have to have the overhead of YARN and a scheduler at
> all?
>
> Also, at a small scale shop – what is the minimum number of partitions I
> can get away with? Any advice on determining the appropriate number of
> partitions?  Kafka, Zookeeper and Secor  are also costs we could
> potentially reduce via partition count.
>
>
> Thanks for any input.
>
>
>
> Jeremiah Adams
> Software Engineer
> www.helixeducation.com<http://www.helixeducation.com/>
> Blog<http://www.helixeducation.com/blog/> | Twitter<https://twitter.com/
> HelixEducation> | Facebook<https://www.facebook.com/HelixEducation> |
> LinkedIn<http://www.linkedin.com/company/3609946>
>
>

-- 
Sent from my iphone.

Reply via email to