Thank you, Chris. I just wrote a separate question, "How to deal with bootstrapping" where I describe the problem in detail.
On Wed, Apr 15, 2015 at 1:35 PM, Chris Riccomini <criccom...@apache.org> wrote: > Hey Jeremy, > > Samza will be fine, but at this scale you need to start worrying about > Kafka and YARN. 1 million jobs will likely start to put pressure on YARN's > RM due to memory usage and CPU usage for the scheduler. With 1 million > jobs, assuming 1 container each, you'll have over 1 million connections to > Kafka, which means you'll need enough brokers to handle those connections. > > Can you describe your use case in more detail? Running 1 million jobs seems > like it might be a mis-use of this technology. > > Cheers, > Chris > > On Wed, Apr 15, 2015 at 10:24 AM, jeremy p <athomewithagroove...@gmail.com > > > wrote: > > > What's the maximum number of Samza jobs I can run simultaneously on a > > single cluster? Let's say these jobs are very lightweight -- they > require > > little memory or processing power. However, I need a lot of them -- > let's > > say I need to have 1,000,000 running at any given time. Is this > reasonable > > or even possible? > > >