Using a custom partitioner lets you do a "gather" step and exploit data
locality.

Example use case: topic messages consumer splits message by customer id.
 Each customer id has their own database table.  With a custom partitioner,
you can send all data for a given customer id to same partition and reduce
write contention on the database.

Big danger is that you are now entirely responsible for handling skew in
your partitioning function.  If you partition by customer id and one
customer is 90% of your traffic, then you have a scalability bottleneck
unless you mitigate somehow.

- Niek





On Tue, Nov 5, 2013 at 11:14 AM, Philip O'Toole <phi...@loggly.com> wrote:

> We use 0.72 -- I am not sure if this matters with 0.8.
>
> Why would one choose a partition, as opposed to a random partition choice?
> What design pattern(s) would mean choosing a partition? When is it a good
> idea?
>
> Any feedback out there?
>
> Thanks,
>
> Philip
>

Reply via email to