Error while producing messages

2014-08-08 Thread Tanneru, Raj
Hi, I am trying to do capacity sizing estimate for our kafka cluster. I started with 5 broker cluster and 3 node zk. Used a simple java based producer to send messages to 5 topics that are created in the cluster. I used 2 client machines with 100 worker threads each sending messages continuou

Re: error recovery in multiple thread reading from Kafka with HighLevel api

2014-08-08 Thread Chen Wang
Guozhang, Just curious, do you guys already have a java version of the ConsumerOffsetChecker https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/tools/ConsumerOffsetChecker.scala so that I could use it in my storm topology? Chen On Fri, Aug 8, 2014 at 2:03 PM, Chen Wang wrote: >

Re: A wired producer connection timeout issue

2014-08-08 Thread S. Zhou
Thanks Guozhang. Any ideas on what could be wrong on that machine? We set up multiple producers in the same way but only one has this issue. On Friday, August 8, 2014 2:41 PM, Guozhang Wang wrote: This might be due to some issue on that producer machine, the "producer queue full and messag

Re: A wired producer connection timeout issue

2014-08-08 Thread Guozhang Wang
This might be due to some issue on that producer machine, the "producer queue full and message sent rate low" is likely to be the result of the frequent connection timeout, but not the cause of it. Guozhang On Fri, Aug 8, 2014 at 2:30 PM, S. Zhou wrote: > A Kafka producer frequently timeout wh

A wired producer connection timeout issue

2014-08-08 Thread S. Zhou
A Kafka producer frequently timeout when connecting to a remote Kafka cluster while producers on other machine (same data center) can connect to the Kafka cluster with no problem.  From the monitoring,  the ProductQueueSize is always full and message sent rate is low. We use Kafka 0.8. We set "

Re: error recovery in multiple thread reading from Kafka with HighLevel api

2014-08-08 Thread Chen Wang
ah..my bad..didn't notice i have put two auto.commit.interval.ms in the config. After fixing it it now behaves as expected.:-) Thanks again!! Chen On Fri, Aug 8, 2014 at 1:58 PM, Guozhang Wang wrote: > Chen, > > Your auto.commit.interval.ms is set to 1 sec, which may be too small. > Could > you

Re: Architecture: amount of partitions

2014-08-08 Thread Guozhang Wang
Kane, The in-built offset management is already in master branch, and will be included in 0.8.2. For now you can give the current trunk a spin. Guozhang On Fri, Aug 8, 2014 at 1:42 PM, Kane Kane wrote: > Hello Guozhang, > > Is storing offsets in kafka topic already in master branch? > We woul

Re: error recovery in multiple thread reading from Kafka with HighLevel api

2014-08-08 Thread Guozhang Wang
Chen, Your auto.commit.interval.ms is set to 1 sec, which may be too small. Could you try with larger numbers, like 1? Guozhang On Fri, Aug 8, 2014 at 1:41 PM, Chen Wang wrote: > Guozhang, > I just did a simple test, and kafka does not seem to do what it is supposed > to do: > I put 20 me

Re: Architecture: amount of partitions

2014-08-08 Thread Kane Kane
Hello Guozhang, Is storing offsets in kafka topic already in master branch? We would like to use that feature, when do you plan to release 0.8.2? Can we use master branch meanwhile (i.e. is it stable enough). Thanks. On Fri, Aug 8, 2014 at 1:38 PM, Guozhang Wang wrote: > Hi Roman, > > Current K

Re: error recovery in multiple thread reading from Kafka with HighLevel api

2014-08-08 Thread Chen Wang
Guozhang, I just did a simple test, and kafka does not seem to do what it is supposed to do: I put 20 messages numbered from 1 to 20 to a topic with 3 partitions, and throw Runtime exception on all the even numbered messages. (2, 4, 6,..) while (it.hasNext()){ String message = new Strin

Re: Architecture: amount of partitions

2014-08-08 Thread Guozhang Wang
Hi Roman, Current Kafka messaging guarantee is at-least once, and we are working on transactional messaging features to make it exactly once. We are expecting it to be used as synchronization/replication layer for some storage systems as your use case after that. As for your design, since you wil

Re: Architecture: amount of partitions

2014-08-08 Thread Jonathan Weeks
The approach may well depend on your deploy horizon. Currently the offset tracking of each partition is done in Zookeeper, which places an upper limit on the number of topic/partitions you want to have and operate with any kind of efficiency. In 0.8.2 hopefully coming in the next month or two,

Re: error recovery in multiple thread reading from Kafka with HighLevel api

2014-08-08 Thread Guozhang Wang
Chen, You can use the ConsumerOffsetChecker tool. http://kafka.apache.org/documentation.html#basic_ops_consumer_lag Guozhang On Fri, Aug 8, 2014 at 12:18 PM, Chen Wang wrote: > sounds like a good idea! I think i will go with the high level consumer > then. > Another question along with this

Architecture: amount of partitions

2014-08-08 Thread Roman Iakovlev
Dear all, I'm new to Kafka, and I'm considering using it for a maybe not very usual purpose. I want it to be a backend for data synchronization between a magnitude of devices, which are not always online (mobile and embedded devices). All the synchronized information belong to some user, and ca

Re: error recovery in multiple thread reading from Kafka with HighLevel api

2014-08-08 Thread Chen Wang
sounds like a good idea! I think i will go with the high level consumer then. Another question along with this design is that is there a way to check the lag for a consumer group for a topic? Upon machine crashes and restarts, I want to only continue reading from a certain topic if the lag is NOT 0

Re: error recovery in multiple thread reading from Kafka with HighLevel api

2014-08-08 Thread Guozhang Wang
Using simple consumer you then need to take care of consumer failure detection and partition reassignment yourself. But you would have more flexibility of the offsets. If each time processing incur errors the corresponding consumer thread will fail also (i.e. will not be involved in the rebalance

Re: error recovery in multiple thread reading from Kafka with HighLevel api

2014-08-08 Thread Chen Wang
Maybe i could batch the messages before commit.., e.g committing every 10 second.this is what the auto commit does anyway and I could live with duplicate data. What do u think? I would then also seem to need a monitoring daemon to check the lag to restart the consumer during machine crashes.. O

Re: error recovery in multiple thread reading from Kafka with HighLevel api

2014-08-08 Thread Chen Wang
Thanks,Guozhang, So if I switch to SimpleConsumer, will these problems be taken care of already? I would assume that I will need to manage all the offset by myself, including the error recovery logic, right? Chen On Fri, Aug 8, 2014 at 8:05 AM, Guozhang Wang wrote: > Hello Chen, > > 1. Manually

Re: error recovery in multiple thread reading from Kafka with HighLevel api

2014-08-08 Thread Guozhang Wang
Hello Chen, 1. Manually commit offsets does have the risk of duplicates, consider the following pattern: message = consumer.next(); process(message); consumer.commit(); the rebalance can happen between line 2 and 3, where the message has been processed but offset not being committed, if another