Hi Geoffry, We've been using Samza in production on AWS for a little over a month. We're just using the YARN runner on a mostly stock hadoop 2.4.0 cluster (not EMR). Our experience is that c3s work well for the YARN instances and i2s work well for the Kafka instances. Things have been pretty solid with that setup.
For scaling up and scaling down YARN, we just terminate instances or add instances, and this works pretty well. It can take a few minutes for the cluster to realize a node has gone and respawn containers elsewhere. We have a separate Kafka cluster just for Samza's use, different from our main Kafka cluster. The main reason is that we wanted to isolate off the disk and network load of state compactions and restores (we don't use compacted topics in our main Kafka cluster, but we do use them with Samza, and the extra load on Kafka can be substantial). Gian On Mon, Feb 9, 2015 at 1:24 PM, Geoffry Sumter <vit...@gmail.com> wrote: > Hello, > > I'm looking to experiment with Samza more but wanted to get feedback on > using it in AWS, particularly in production. Are you using YARN? Mesos? > Something custom? Have you documented tradeoffs you've made, reliability > concerns, or pitfalls you've discovered? Is there anything you wish you had > known first? I'd love to benefit from past experience if you have time! :) > > I see "This means that YARN can be replaced with other virtualization > frameworks — in particular, we are interested in adding direct AWS > integration. Many companies run in AWS which is itself a virtualization > framework" from > > http://samza.apache.org/learn/documentation/0.8/comparisons/introduction.html > Is there work currently being done on this effort? > > Thanks for the help, > Geoffry >