Thank you, Chris.  I just wrote a separate question, "How to deal with
bootstrapping" where I describe the problem in detail.

On Wed, Apr 15, 2015 at 1:35 PM, Chris Riccomini <criccom...@apache.org>
wrote:

> Hey Jeremy,
>
> Samza will be fine, but at this scale you need to start worrying about
> Kafka and YARN. 1 million jobs will likely start to put pressure on YARN's
> RM due to memory usage and CPU usage for the scheduler. With 1 million
> jobs, assuming 1 container each, you'll have over 1 million connections to
> Kafka, which means you'll need enough brokers to handle those connections.
>
> Can you describe your use case in more detail? Running 1 million jobs seems
> like it might be a mis-use of this technology.
>
> Cheers,
> Chris
>
> On Wed, Apr 15, 2015 at 10:24 AM, jeremy p <athomewithagroove...@gmail.com
> >
> wrote:
>
> > What's the maximum number of Samza jobs I can run simultaneously on a
> > single cluster?  Let's say these jobs are very lightweight -- they
> require
> > little memory or processing power.  However, I need a lot of them --
> let's
> > say I need to have 1,000,000 running at any given time.  Is this
> reasonable
> > or even possible?
> >
>

Reply via email to