Hi Geoffry,

We've been using Samza in production on AWS for a little over a month.
We're just using the YARN runner on a mostly stock hadoop 2.4.0 cluster
(not EMR). Our experience is that c3s work well for the YARN instances and
i2s work well for the Kafka instances. Things have been pretty solid with
that setup.

For scaling up and scaling down YARN, we just terminate instances or add
instances, and this works pretty well. It can take a few minutes for the
cluster to realize a node has gone and respawn containers elsewhere.

We have a separate Kafka cluster just for Samza's use, different from our
main Kafka cluster. The main reason is that we wanted to isolate off the
disk and network load of state compactions and restores (we don't use
compacted topics in our main Kafka cluster, but we do use them with Samza,
and the extra load on Kafka can be substantial).

Gian

On Mon, Feb 9, 2015 at 1:24 PM, Geoffry Sumter <vit...@gmail.com> wrote:

> Hello,
>
> I'm looking to experiment with Samza more but wanted to get feedback on
> using it in AWS, particularly in production. Are you using YARN? Mesos?
> Something custom? Have you documented tradeoffs you've made, reliability
> concerns, or pitfalls you've discovered? Is there anything you wish you had
> known first? I'd love to benefit from past experience if you have time! :)
>
> I see "This means that YARN can be replaced with other virtualization
> frameworks — in particular, we are interested in adding direct AWS
> integration. Many companies run in AWS which is itself a virtualization
> framework" from
>
> http://samza.apache.org/learn/documentation/0.8/comparisons/introduction.html
> Is there work currently being done on this effort?
>
> Thanks for the help,
> Geoffry
>

Reply via email to