We require: - many topics - ordering of messages for every topic - Consumers hit different Http EndPoints which may be slow (in a push model). In case of a Pull model, consumers may pull at the rate at which they can process. - We need parallelism to hit with as many consumers. Hence, we currently have around 50 consumers/topic => 50 partitions.
Currently we have: 2000 topics x 50 => 1,00,000 partitions. The incoming rate of ingestion at max is 100 MB/sec. We are planning for a big cluster with many brokers. We have exactly the same use cases as mentioned in this video (usage at LinkedIn): https://www.youtube.com/watch?v=19DvtEC0EbQ To handle the zookeeper scenario, as mentioned in the above video, we are planning to use SSDs and would upgrade to the new consumer (0.9+) once its available as per the below video. https://www.youtube.com/watch?v=7TZiN521FQA On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar <j_thak...@yahoo.com.invalid > wrote: > Technically/conceptually it is possible to have 200,000 topics, but do you > really need it like that?What do you intend to do with those messages - > i.e. how do you forsee them being processed downstream? And are those > topics really there to segregate different kinds of processing or different > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have one > topic per user or one topic per kind of event (e.g. login, pageview, > adview, etc.)Remember there is significant book-keeping done within > Zookeeper - and these many topics will make that book-keeping significant. > As for storage, I don't think it should be an issue with sufficient > spindles, servers and higher than default memory configuration. > Jayesh > From: Achanta Vamsi Subhash <achanta.va...@flipkart.com> > To: "users@kafka.apache.org" <users@kafka.apache.org> > Sent: Friday, December 19, 2014 9:00 AM > Subject: Re: Max. storage for Kafka and impact > > Yes. We need those many max partitions as we have a central messaging > service and thousands of topics. > > On Friday, December 19, 2014, nitin sharma <kumarsharma.ni...@gmail.com> > wrote: > > > hi, > > > > Few things you have to plan for: > > a. Ensure that from resilience point of view, you are having sufficient > > follower brokers for your partitions. > > b. In my testing of kafka (50TB/week) so far, haven't seen much issue > with > > CPU utilization or memory. I had 24 CPU and 32GB RAM. > > c. 200,000 partitions means around 1MB/week/partition. are you sure you > > need so many partitions? > > > > Regards, > > Nitin Kumar Sharma. > > > > > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash < > > achanta.va...@flipkart.com <javascript:;>> wrote: > > > > > > We definitely need a retention policy of a week. Hence. > > > > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash < > > > achanta.va...@flipkart.com <javascript:;>> wrote: > > > > > > > > Hi, > > > > > > > > We are using Kafka for our messaging system and we have an estimate > for > > > > 200 TB/week in the coming months. Will it impact any performance for > > > Kafka? > > > > > > > > PS: We will be having greater than 2 lakh partitions. > > > > > > > > > > -- > > > > Regards > > > > Vamsi Subhash > > > > > > > > > > > > > -- > > > Regards > > > Vamsi Subhash > > > > > > > > -- > Regards > Vamsi Subhash > > > > -- Regards Vamsi Subhash