Hi, This might also be of interest: http://www.confluent .io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
Cheers, Jens On Monday, August 8, 2016, Daniel Fagnan <dan...@segment.com> wrote: > Thanks Tom! This was very helpful and I’ll explore having a more static > set of partitions as that seems to fit Kafka a lot better. > > Cheers, > Daniel > > > On Aug 8, 2016, at 12:27 PM, Tom Crayford <tcrayf...@heroku.com> wrote: > > > > Hi Daniel, > > > > Kafka doesn't provide this kind of isolation or scalability for many many > > streams. The usual design is to use a consistent hash of some "key" to > > attribute your data to a particular partition. That of course, doesn't > > isolate things fully, but has everything in a partition dependent on each > > other. > > > > We've found that over a few thousand to a few tens of thousands of > > partitions clusters hit a lot of issues (it depends on the write pattern, > > how much memory you give brokers and zookeeper, and if you plan on ever > > deleting topics). > > > > Another option is to manage multiple clusters, and keep under a certain > > limit of partitions in each cluster. That is of course additional > > operational overhead and complexity. > > > > I'm not sure I 100% understand your mechanism for tracking pending > offsets, > > but it seems like that might be your best option. > > > > Thanks > > > > Tom Crayford > > Heroku Kafka > > > > On Mon, Aug 8, 2016 at 8:12 PM, Daniel Fagnan <dan...@segment.com> > wrote: > > > >> Hey all, > >> > >> I’m currently in the process of designing a system around Kafka and I’m > >> wondering the recommended way to manage topics. Each event stream we > have > >> needs to be isolated from each other. A failure from one should not > affect > >> another event stream from processing (by failure, we mean a downstream > >> failure that would require us to replay the messages). > >> > >> So my first thought was to create a topic per event stream. This allows > a > >> larger event stream to be partitioned for added parallelism but keep the > >> default # of partitions down as much as possible. This would solve the > >> isolation requirement in that a topic can keep failing and we’ll > continue > >> replaying the messages without affected all the other topics. > >> > >> We read it’s not recommended to have your data model dictate the # of > >> partitions or topics in Kafka and we’re unsure about this approach if we > >> need to triple our event stream. > >> > >> We’re currently looking at 10,000 event streams (or topics) but we don’t > >> want to be spinning up additional brokers just so we can add more event > >> stream, especially if the load for each is reasonable. > >> > >> Another option we were looking into was to not isolate at the > >> topic/partition level but to keep a set of pending offsets persisted > >> somewhere (seemingly what Twitter Heron or Storm does but they don’t > seem > >> to persist the pending offsets). > >> > >> Thoughts? > > -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook <https://www.facebook.com/#!/tink.se> Linkedin <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary> Twitter <https://twitter.com/tink>