Each job will get all the partitions and each task (500 of them) within the
job will get 1 partition. So there will be 500 processes working through the
log.
I'd try to figure out what your scaling needs are for the next 2-3 years and
then calculate your resource requirements accordingly (how many parallel
executing tasks you would need). If you need to split, it is not trivial,
but doable.
Lukas
-----Original Message-----
From: Michael Ravits
Sent: Thursday, May 21, 2015 11:17 AM
To: dev@samza.apache.org
Subject: Re: Number of partitions
Well, since the number of partitions can't be changed after the system
starts running I wanted to have the flexibility to grow a lot without
stopping for upgrade.
Just wonder what would be a tolerable number for Samza.
For example if I'd start with 5 jobs, each will get 100 partitions. Is this
reasonable? Or too much for a single job instance?
On Thu, May 21, 2015 at 7:46 PM, Lukas Steiblys <lu...@doubledutch.me>
wrote:
500 is a bit extreme unless you're planning on running the job on some 200
machines and try to exploit their full power. I personally run 4 in
production for our system processing 100 messages/s and there's plenty of
room to grow.
Lukas
On Thursday, May 21, 2015, Michael Ravits <michaelr...@gmail.com> wrote:
> Hi,
>
> I wonder what are the considerations I need to account for in regard to
the
> number of partitions in input topics for Samza.
> When testing with a 500 partitions topic with one Samza job I noticed
> the
> start up time to be very long.
> Are there any problems that might occur when dealing with this number of
> partitions?
>
> Thanks,
> Michael
>