Hey Gian,

Thanks for this info. I've updated Samza's FAQ with these recommendations.

Cheers,
Chris

On Sat, Feb 14, 2015 at 8:45 AM, Gian Merlino <g...@metamarkets.com> wrote:

> Hi Geoffry,
>
> We've been using Samza in production on AWS for a little over a month.
> We're just using the YARN runner on a mostly stock hadoop 2.4.0 cluster
> (not EMR). Our experience is that c3s work well for the YARN instances and
> i2s work well for the Kafka instances. Things have been pretty solid with
> that setup.
>
> For scaling up and scaling down YARN, we just terminate instances or add
> instances, and this works pretty well. It can take a few minutes for the
> cluster to realize a node has gone and respawn containers elsewhere.
>
> We have a separate Kafka cluster just for Samza's use, different from our
> main Kafka cluster. The main reason is that we wanted to isolate off the
> disk and network load of state compactions and restores (we don't use
> compacted topics in our main Kafka cluster, but we do use them with Samza,
> and the extra load on Kafka can be substantial).
>
> Gian
>
> On Mon, Feb 9, 2015 at 1:24 PM, Geoffry Sumter <vit...@gmail.com> wrote:
>
> > Hello,
> >
> > I'm looking to experiment with Samza more but wanted to get feedback on
> > using it in AWS, particularly in production. Are you using YARN? Mesos?
> > Something custom? Have you documented tradeoffs you've made, reliability
> > concerns, or pitfalls you've discovered? Is there anything you wish you
> had
> > known first? I'd love to benefit from past experience if you have time!
> :)
> >
> > I see "This means that YARN can be replaced with other virtualization
> > frameworks — in particular, we are interested in adding direct AWS
> > integration. Many companies run in AWS which is itself a virtualization
> > framework" from
> >
> >
> http://samza.apache.org/learn/documentation/0.8/comparisons/introduction.html
> > Is there work currently being done on this effort?
> >
> > Thanks for the help,
> > Geoffry
> >
>

Reply via email to