see some comments inline On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash < achanta.va...@flipkart.com> wrote: > > We require: > - many topics > - ordering of messages for every topic >
Ordering is only on a per partition basis so you might have to pick a partition key that makes sense for what you are doing. > - Consumers hit different Http EndPoints which may be slow (in a push > model). In case of a Pull model, consumers may pull at the rate at which > they can process. > - We need parallelism to hit with as many consumers. Hence, we currently > have around 50 consumers/topic => 50 partitions. > I think you might be mixing up the fetch with the processing. You can have 1 partition and still have 50 message being processed in parallel (so a batch of messages). What language are you working in? How are you doing this processing exactly? > > Currently we have: > 2000 topics x 50 => 1,00,000 partitions. > If this is really the case then you are going to need at least 250 brokers (~ 4,000 partitions per broker). If you do that then you are in the 200TB per day world which doesn't sound to be the case. I really think you need to strategize more on your processing model some more. > > The incoming rate of ingestion at max is 100 MB/sec. We are planning for a > big cluster with many brokers. It is possible to handle this on just 3 brokers depending on message size, ability to batch, durability are also factors you really need to be thinking about. > > We have exactly the same use cases as mentioned in this video (usage at > LinkedIn): > https://www.youtube.com/watch?v=19DvtEC0EbQ > > To handle the zookeeper scenario, as mentioned in the above video, we are > planning to use SSDs and would upgrade to the new consumer (0.9+) once its > available as per the below video. > https://www.youtube.com/watch?v=7TZiN521FQA > > On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar > <j_thak...@yahoo.com.invalid > > wrote: > > > Technically/conceptually it is possible to have 200,000 topics, but do > you > > really need it like that?What do you intend to do with those messages - > > i.e. how do you forsee them being processed downstream? And are those > > topics really there to segregate different kinds of processing or > different > > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have > one > > topic per user or one topic per kind of event (e.g. login, pageview, > > adview, etc.)Remember there is significant book-keeping done within > > Zookeeper - and these many topics will make that book-keeping > significant. > > As for storage, I don't think it should be an issue with sufficient > > spindles, servers and higher than default memory configuration. > > Jayesh > > From: Achanta Vamsi Subhash <achanta.va...@flipkart.com> > > To: "users@kafka.apache.org" <users@kafka.apache.org> > > Sent: Friday, December 19, 2014 9:00 AM > > Subject: Re: Max. storage for Kafka and impact > > > > Yes. We need those many max partitions as we have a central messaging > > service and thousands of topics. > > > > On Friday, December 19, 2014, nitin sharma <kumarsharma.ni...@gmail.com> > > wrote: > > > > > hi, > > > > > > Few things you have to plan for: > > > a. Ensure that from resilience point of view, you are having sufficient > > > follower brokers for your partitions. > > > b. In my testing of kafka (50TB/week) so far, haven't seen much issue > > with > > > CPU utilization or memory. I had 24 CPU and 32GB RAM. > > > c. 200,000 partitions means around 1MB/week/partition. are you sure you > > > need so many partitions? > > > > > > Regards, > > > Nitin Kumar Sharma. > > > > > > > > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash < > > > achanta.va...@flipkart.com <javascript:;>> wrote: > > > > > > > > We definitely need a retention policy of a week. Hence. > > > > > > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash < > > > > achanta.va...@flipkart.com <javascript:;>> wrote: > > > > > > > > > > Hi, > > > > > > > > > > We are using Kafka for our messaging system and we have an estimate > > for > > > > > 200 TB/week in the coming months. Will it impact any performance > for > > > > Kafka? > > > > > > > > > > PS: We will be having greater than 2 lakh partitions. > > > > > > > > > > > > > > -- > > > > > Regards > > > > > Vamsi Subhash > > > > > > > > > > > > > > > > > -- > > > > Regards > > > > Vamsi Subhash > > > > > > > > > > > > > -- > > Regards > > Vamsi Subhash > > > > > > > > > > > > -- > Regards > Vamsi Subhash >