Hey Gian, Thanks for this info. I've updated Samza's FAQ with these recommendations.
Cheers, Chris On Sat, Feb 14, 2015 at 8:45 AM, Gian Merlino <g...@metamarkets.com> wrote: > Hi Geoffry, > > We've been using Samza in production on AWS for a little over a month. > We're just using the YARN runner on a mostly stock hadoop 2.4.0 cluster > (not EMR). Our experience is that c3s work well for the YARN instances and > i2s work well for the Kafka instances. Things have been pretty solid with > that setup. > > For scaling up and scaling down YARN, we just terminate instances or add > instances, and this works pretty well. It can take a few minutes for the > cluster to realize a node has gone and respawn containers elsewhere. > > We have a separate Kafka cluster just for Samza's use, different from our > main Kafka cluster. The main reason is that we wanted to isolate off the > disk and network load of state compactions and restores (we don't use > compacted topics in our main Kafka cluster, but we do use them with Samza, > and the extra load on Kafka can be substantial). > > Gian > > On Mon, Feb 9, 2015 at 1:24 PM, Geoffry Sumter <vit...@gmail.com> wrote: > > > Hello, > > > > I'm looking to experiment with Samza more but wanted to get feedback on > > using it in AWS, particularly in production. Are you using YARN? Mesos? > > Something custom? Have you documented tradeoffs you've made, reliability > > concerns, or pitfalls you've discovered? Is there anything you wish you > had > > known first? I'd love to benefit from past experience if you have time! > :) > > > > I see "This means that YARN can be replaced with other virtualization > > frameworks — in particular, we are interested in adding direct AWS > > integration. Many companies run in AWS which is itself a virtualization > > framework" from > > > > > http://samza.apache.org/learn/documentation/0.8/comparisons/introduction.html > > Is there work currently being done on this effort? > > > > Thanks for the help, > > Geoffry > > >