The standalone mode was introduced for this exact reason for customers who don’t want to run YARN.
Have you considered running Samza in stand-alone mode? In this mode, Samza is an embedded library - very similar to Kafka Streams. https://samza.apache.org/learn/documentation/latest/deployment/standalone.html A good rule of thumb when deciding the number of partitions(N) is : “how much data do you want to retain per-partition at anytime?” You can pick N such that you retain around 20G. Another factor to consider is whether you are getting adequate compute parallelism. On Tuesday, February 19, 2019, Jeremiah Adams <jad...@helixeducation.com> wrote: > We are finding YARN and AWS Ec2 to be too costly for us. We are having to > scale the cluster to support more jobs and have plans to write more jobs. > We are scaling because cluster doesn’t have enough VCores to support all > the Containers, not enough RAM for jobs, etc. > > Has anyone had luck running Samza jobs in an alternative scheduler? Say, > Nomad, Kubernetes or something else? > > Similarly, anyone have any luck with Samza on something like Kafka’s > streams where I don’t have to have the overhead of YARN and a scheduler at > all? > > Also, at a small scale shop – what is the minimum number of partitions I > can get away with? Any advice on determining the appropriate number of > partitions? Kafka, Zookeeper and Secor are also costs we could > potentially reduce via partition count. > > > Thanks for any input. > > > > Jeremiah Adams > Software Engineer > www.helixeducation.com<http://www.helixeducation.com/> > Blog<http://www.helixeducation.com/blog/> | Twitter<https://twitter.com/ > HelixEducation> | Facebook<https://www.facebook.com/HelixEducation> | > LinkedIn<http://www.linkedin.com/company/3609946> > > -- Sent from my iphone.