You can find those numbers in http://www.slideshare.net/Hadoop_Summit/building-a-realtime-data-pipeline-apache-kafka-at-linkedin?from_search=5 .
Thanks, Jun On Thu, Aug 15, 2013 at 4:38 PM, Vadim Keylis <vkeylis2...@gmail.com> wrote: > Just curious Jay. How many topics and consumers you guys have? > > Thanks > > > On Thu, Aug 15, 2013 at 4:07 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > The tradeoff is there: > > Pro: more partitions means more consumer parallelism. The total > > threads/processes across all consumer machines can't exceed the consumer > > count. > > Con: more partitions mean more file descriptors and hence smaller writes > to > > each file (so more random io). > > > > Our setting is fairly random. The ideal number would be the smallest > number > > that satisfies your forceable need for consumer parallelism. > > > > -Jay > > > > > > On Thu, Aug 15, 2013 at 3:41 PM, Vadim Keylis <vkeylis2...@gmail.com> > > wrote: > > > > > Jay. Thanks so much for explaining. What is the optimal number of > > > partitions per topic? What are the reasoning were behind your guys > choice > > > of 8 partitions per topic? > > > > > > Thanks, > > > Vadim > > > > > > > > > On Thu, Aug 15, 2013 at 1:58 PM, Jay Kreps <jay.kr...@gmail.com> > wrote: > > > > > > > Technically it is > > > > topics * partitions * replicas * 2 (index file and log file) + > #open > > > > sockets > > > > > > > > -Jay > > > > > > > > > > > > On Thu, Aug 15, 2013 at 11:49 AM, Vadim Keylis < > vkeylis2...@gmail.com > > > > >wrote: > > > > > > > > > Good Morning Joel. Just to understand clearly how to predict number > > of > > > > open > > > > > files kept by kafka. > > > > > > > > > > That is calculated by multiplying number of topics * number of > > > > partitions > > > > > * number of replicas. In our case it would be 150 * 36 * 3. Am I > > > correct? > > > > > How number of producers and consumers will influence/impact that > > > > > calculation? Is it advisable to have less partition? Does 36 > > partition > > > > > sounds reasonable? > > > > > > > > > > Thanks so much in advance > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 14, 2013 at 9:27 AM, Joel Koshy <jjkosh...@gmail.com> > > > wrote: > > > > > > > > > > > We use 30k as the limit. It is largely driven by the number of > > > > partitions > > > > > > (including replicas), retention period and number of > > > > > > simultaneous producers/consumers. > > > > > > > > > > > > In your case it seems you have 150 topics, 36 partitions, 3x > > > > replication > > > > > - > > > > > > with that configuration you will definitely need to up your file > > > handle > > > > > > limit. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Joel > > > > > > > > > > > > On Wednesday, August 14, 2013, Vadim Keylis wrote: > > > > > > > > > > > > > Good morning Jun. Correction in terms of open file handler > > limit. I > > > > was > > > > > > > wrong. I re-ran the command ulimit -Hn and it shows 10240. > Which > > > > > brings > > > > > > to > > > > > > > the next question. How appropriately calculate open files > handler > > > > > > required > > > > > > > by Kafka? What is your guys settings for this field? > > > > > > > > > > > > > > Thanks, > > > > > > > Vadim > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 14, 2013 at 8:19 AM, Vadim Keylis < > > > vkeylis2...@gmail.com > > > > > > <javascript:;>> > > > > > > > wrote: > > > > > > > > > > > > > > > Good morning Jun. We are using Kafka 0.8 that I built from > > trunk > > > in > > > > > > June > > > > > > > > or early July. I forgot to mention that running ulimit on the > > > hosts > > > > > > shows > > > > > > > > open file handler set to unlimited. What are the ways to > > recover > > > > from > > > > > > > last > > > > > > > > error and restart Kafka ? How can I delete topic with Kafka > > > service > > > > > on > > > > > > > all > > > > > > > > host down? How many topics can Kafka support to prevent to > many > > > > open > > > > > > file > > > > > > > > exception? What did you set open file handler limit in your > > > > cluster? > > > > > > > > > > > > > > > > Thanks so much, > > > > > > > > Vadim > > > > > > > > > > > > > > > > Sent from my iPhone > > > > > > > > > > > > > > > > On Aug 14, 2013, at 7:38 AM, Jun Rao <jun...@gmail.com > > > > <javascript:;>> > > > > > > > wrote: > > > > > > > > > > > > > > > > > The first error is caused by too many open file handlers. > > Kafka > > > > > keeps > > > > > > > > each > > > > > > > > > of the segment files open on the broker. So, the more > > > > > > topics/partitions > > > > > > > > you > > > > > > > > > have, the more file handlers you need. You probably need to > > > > > increase > > > > > > > the > > > > > > > > > open file handler limit and also monitor the # of open file > > > > > handlers > > > > > > so > > > > > > > > > that you can get an alert when it gets close to the limit. > > > > > > > > > > > > > > > > > > Not sure why you get the second error on restart. Are you > > using > > > > the > > > > > > 0.8 > > > > > > > > > beta1 release? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 13, 2013 at 11:04 PM, Vadim Keylis < > > > > > > vkeylis2...@gmail.com<javascript:;> > > > > > > > > >wrote: > > > > > > > > > > > > > > > > > >> We have 3 node kafka cluster. I initially created 4 > topics. > > > > > > > > >> I wrote small shell script to create 150 topics. > > > > > > > > >> > > > > > > > > >> TOPICS=$(< $1) > > > > > > > > >> for topic in $TOPICS > > > > > > > > >> do > > > > > > > > >> echo "/usr/local/kafka/bin/kafka-create-topic.sh > > --replica 3 > > > > > > --topic > > > > > > > > >> $topic --zookeeper $2:2181/kafka --partition 36" > > > > > > > > >> /usr/local/kafka/bin/kafka-create-topic.sh --replica 3 > > > --topic > > > > > > > $topic > > > > > > > > >> --zookeeper $2:2181/kafka --partition 36 > > > > > > > > >> done > > > > > > > > >> > > > > > > > > >> 10 minutes later I see messages like this > > > > > > > > >> [2013-08-13 11:43:58,944] INFO [ReplicaFetcherManager on > > > broker > > > > 7] > > > > > > > > Removing > > > > > > > > >> fetcher for partition [m3_registration,0] > > > > > > > > >> (kafka.server.ReplicaFetcherManager) followed by > > > > > > > > >> [2013-08-13 11:44:00,067] WARN [ReplicaFetcherThread-0-8], > > > error > > > > > for > > > > > > > > >> partition [m3_registration,22] to broker 8 > > > > > > > > >> (kafka.server.ReplicaFetcherThread) > > > > > > > > >> kafka.common.NotLeaderForPartitionException > > > > > > > > >> > > > > > > > > >> Then a few minutes later followed by the following > messages > > > that > > > > > > > > >> overwhelmed logging system. > > > > > > > > >> [2013-08-13 11:46:35,916] ERROR error in loggedRunnable > > > > > > > > >> (kafka.utils.Utils$) > > > > > > > > >> java.io.FileNotFoundException: > > > > > > > > >> /home/kafka/data7/replication-offset-checkpoint.tmp (Too > > many > > > > open > > > > > > > > files) > > > > > > > > >> at java.io.FileOutputStream.open(Native Method) > > > > > > > > >> at > > > > > java.io.FileOutputStream.<init>(FileOutputStream.java:194) > > > > > > > > >> > > > > > > > > >> I restarted the service after discovering the problem. > > After a > > > > few > > > > > > > > minutes > > > > > > > > >> attempting to recover kafka service crashed with the > > following > > > > > > error. > > > > > > > > >> > > > > > > > > >> [2013-08-13 17:20:08,953] INFO [Log Manager on Broker 7] > > > Loading > > > > > log > > > > > > > > >> 'm3_registration-29' (kafka.log.LogManager) > > > > > > > > >> [2013-08-13 17:20:08,992] FATAL Fatal error during > > > > > KafkaServerStable > > > > > > > > >> startup. Prepare to shutdown > > > (kafka.server.KafkaServerStartable) > > > > > > > > >> java.lang.IllegalStateException: Found log file with no > > > > > > corresponding > > > > > > > > index > > > > > > > > >> file. > > > > > > > > >> > > > > > > > > >> No activity on the cluster after topics were added. > > > > > > > > >> What could have cause the crash and trigger too many open > > > files > > > > > > > > exception? > > > > > > > > >> What the best way to recover in order to restart kafka > > > > service(Not > > > > > > > sure > > > > > > > > if > > > > > > > > >> delete topic command will work in this particular case as > > all > > > 3 > > > > > > > services > > > > > > > > >> would not start)?How to prevent in the future? > > > > > > > > >> > > > > > > > > >> Thanks so much in advance, > > > > > > > > >> Vadim > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >