Large # of Topics/Partitions

Jens Rantil Fri, 16 Sep 2016 01:12:13 -0700

Hi,

This might also be of interest: http://www.confluent
.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/


Cheers,
Jens

On Monday, August 8, 2016, Daniel Fagnan <dan...@segment.com> wrote:

> Thanks Tom! This was very helpful and I’ll explore having a more static
> set of partitions as that seems to fit Kafka a lot better.
>
> Cheers,
> Daniel
>
> > On Aug 8, 2016, at 12:27 PM, Tom Crayford <tcrayf...@heroku.com> wrote:
> >
> > Hi Daniel,
> >
> > Kafka doesn't provide this kind of isolation or scalability for many many
> > streams. The usual design is to use a consistent hash of some "key" to
> > attribute your data to a particular partition. That of course, doesn't
> > isolate things fully, but has everything in a partition dependent on each
> > other.
> >
> > We've found that over a few thousand to a few tens of thousands of
> > partitions clusters hit a lot of issues (it depends on the write pattern,
> > how much memory you give brokers and zookeeper, and if you plan on ever
> > deleting topics).
> >
> > Another option is to manage multiple clusters, and keep under a certain
> > limit of partitions in each cluster. That is of course additional
> > operational overhead and complexity.
> >
> > I'm not sure I 100% understand your mechanism for tracking pending
> offsets,
> > but it seems like that might be your best option.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
> > On Mon, Aug 8, 2016 at 8:12 PM, Daniel Fagnan <dan...@segment.com>
> wrote:
> >
> >> Hey all,
> >>
> >> I’m currently in the process of designing a system around Kafka and I’m
> >> wondering the recommended way to manage topics. Each event stream we
> have
> >> needs to be isolated from each other. A failure from one should not
> affect
> >> another event stream from processing (by failure, we mean a downstream
> >> failure that would require us to replay the messages).
> >>
> >> So my first thought was to create a topic per event stream. This allows
> a
> >> larger event stream to be partitioned for added parallelism but keep the
> >> default # of partitions down as much as possible. This would solve the
> >> isolation requirement in that a topic can keep failing and we’ll
> continue
> >> replaying the messages without affected all the other topics.
> >>
> >> We read it’s not recommended to have your data model dictate the # of
> >> partitions or topics in Kafka and we’re unsure about this approach if we
> >> need to triple our event stream.
> >>
> >> We’re currently looking at 10,000 event streams (or topics) but we don’t
> >> want to be spinning up additional brokers just so we can add more event
> >> stream, especially if the load for each is reasonable.
> >>
> >> Another option we were looking into was to not isolate at the
> >> topic/partition level but to keep a set of pending offsets persisted
> >> somewhere (seemingly what Twitter Heron or Storm does but they don’t
> seem
> >> to persist the pending offsets).
> >>
> >> Thoughts?
>
>

-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Large # of Topics/Partitions

Reply via email to