How costly is Re balancing of partitions for a topic

2014-11-05 Thread dinesh kumar
Hello, I am trying to come up with a design for consuming from Kafka. *I am using 0.8.1.1 version of Kafka. *I am thinking of designing a system where the consumer will be created every few seconds, consume the data from Kafka, process it and then quits after committing the offsets to Kafka. At a

Re: Spark Kafka Performance

2014-11-05 Thread Eduardo Costa Alfaia
Hi Bhavesh I will collect the dump and I will send for you. I am using a program that I have caught here https://github.com/edenhill/librdkafka/tree/master/examples and I have changed to meet my tests. I have attached the files.

Producer and Consumer properties

2014-11-05 Thread Eduardo Costa Alfaia
Hi Dudes, I would like to know if the producer and consumer’s properties files into the config folder should be configured. I have configured only the server.properties, is it enough? I am doing some tests about the performance, for example network throughput my scenario is: Like producer I am

Re: consumer ack for high-level consumer?

2014-11-05 Thread Guozhang Wang
Hello, You can turn of auto.commit.offset and manually call connector.commitOffset() manually after you have processed the data. One thing to remember is that the commit frequency is related to ZK (in the future, Kafka) writes and hence you may not want to commit after processed every single messa

Re: How costly is Re balancing of partitions for a topic

2014-11-05 Thread Guozhang Wang
Hello Dinesh, 1. A rebalance is triggered when the consumers is notified or the group member change / topic-partition change through ZK. 2. The cost of a rebalance is positively related to the #. consumers in the group and the #. of topics this group is consuming. The latency of the rebalance can

Re: How costly is Re balancing of partitions for a topic

2014-11-05 Thread dinesh kumar
Thanks for the answers. Have some follow up questions. Let me get a bit more specific. In a scenario of 1 topic with 400 - 500 partitions 1. Is it ok to have short lived consumer? Or it is recommended to have only long running consumers? 2. You mentioned that rebalance latency depends on # of c

Re: How costly is Re balancing of partitions for a topic

2014-11-05 Thread dinesh kumar
Thanks for the answers. Have some follow up questions. Let me get a bit more specific. In a scenario of 1 topic with 400 - 500 partitions 1. Is it ok to have short lived consumer? Or it is recommended to have only long running consumers? 2. You mentioned that rebalance latency depends on # of c

Storing data in kafka keys

2014-11-05 Thread Ivan Balashov
Hi, It looks like it is a general practice to avoid storing data in kafka keys. Some examples of this: Camus, Secor both not using keys. Even such a swiss-army tool as kafkacat doesn't seem to have the ability to display key (although I might be wrong). Also, console producer does not display keys

Consumer lag keep increasing

2014-11-05 Thread Chen Wang
Hey Guys, I have a really simply storm topology with a kafka spout, reading from kafka through high level consumer. Since the topic has 30 partitions, we have 30 threads in the spout reading from it. However, it seems that the lag keeps increasing even the thread only read the message and do nothin

Re: How costly is Re balancing of partitions for a topic

2014-11-05 Thread Guozhang Wang
1. Since each time a consumer group changes a rebalance among all the consumer members is triggered, it is usually recommend to have long lived consumers rather than short ones. However, in the new consumer we are working on optimizing the rebalance logic and remove its ZK dependency, so in the new

Re: Cannot connect to Kafka from outside of EC2

2014-11-05 Thread Guozhang Wang
Sameer, Yes, this is the server log. But there seems no abnormal entries in it, and it does not cover the same time range as the producer client throwing LeaderNotAvailableException (it was 10/24, 14:30). The reason that I want to check the server log at that same reason is that LeaderNotAvailabl

Re: Consumer lag keep increasing

2014-11-05 Thread Guozhang Wang
Chen, Your configs seems fine. Could you use ConsumerOffsetChecker tool to see if the offset is advancing at all (i.e. messages are comsumed), and if yes get some thread dumps and check if your consumer is blocked on some locks? Guozhang On Wed, Nov 5, 2014 at 2:01 PM, Chen Wang wrote: > Hey

Re: Consumer lag keep increasing

2014-11-05 Thread Chen Wang
Guozhang, I can see message keep coming, meaning messages are being consumed, right? But the lag is pretty huge (average 30m messages behind) as you can see from the graph: https://www.dropbox.com/s/xli25zicxv5f2qa/Screenshot%202014-11-05%2015.23.05.png?dl=0 My understanding is that for such light

Re: Cannot connect to Kafka from outside of EC2

2014-11-05 Thread Sameer Yami
The server.log was taken separately. We ran the test again and the server and producer logs are below (to get same timings). Thanks! Producer Logs - 2014-11-05 23:38:58,693 Thread-3-SendThread(ip-172-31-25-

No longer supporting Java 6, if? when?

2014-11-05 Thread Joe Stein
This has been coming up in a lot of projects and for other reasons too I wanted to kick off the discussion about if/when we end support for Java 6. Besides any API we may want to use in >= 7 we also compile our binaries for 6 for release currently. /*** Joe

Re: Error using migrationtool for upgrading 0.7 to 0.8

2014-11-05 Thread Tomas Nunez
Ok, still fighting with the migrationTool here... That tuple wasn't in the scala-library.jar. It turns out I was using scala 2.10 for kafka0.8 and scala 2.8 for kafka0.7, and the jar files were not compatible. So, for the record, it seems that you need both the 0.7 jar files and your 0.8 kafka com

Re: Error using migrationtool for upgrading 0.7 to 0.8

2014-11-05 Thread Gwen Shapira
org.apache.zookeeper.ClientCnxn is throwing the exception, so I'm 100% sure it eventually found the class. On Wed, Nov 5, 2014 at 5:59 PM, Tomas Nunez wrote: > Ok, still fighting with the migrationTool here... > > That tuple wasn't in the scala-library.jar. It turns out I was using scala > 2.10

Re: Error using migrationtool for upgrading 0.7 to 0.8

2014-11-05 Thread Gwen Shapira
Regarding more information: Maybe ltrace? If I were you, I'd go to MigrationTool code and start adding LOG lines. because there aren't enough of those to troubleshoot. On Wed, Nov 5, 2014 at 6:13 PM, Gwen Shapira wrote: > org.apache.zookeeper.ClientCnxn is throwing the exception, so I'm 100% >

Re: Error using migrationtool for upgrading 0.7 to 0.8

2014-11-05 Thread Gwen Shapira
Also, can you post your configs? Especially the "zookeeper.connect" one? On Wed, Nov 5, 2014 at 6:15 PM, Gwen Shapira wrote: > Regarding more information: > Maybe ltrace? > > If I were you, I'd go to MigrationTool code and start adding LOG lines. > because there aren't enough of those to trouble

High CPU usage of Crc32 on Kafka broker

2014-11-05 Thread Allen Wang
Hi, Using flight recorder, we have observed high CPU usage of CRC32 (kafka.utils.Crc32.update()) on Kafka broker. It uses as much as 25% of CPU on an instance. Tracking down stack trace, this method is invoked by ReplicaFetcherThread. Is there any tuning we can do to reduce this? Also on the top

Re: No longer supporting Java 6, if? when?

2014-11-05 Thread Worthy LaFollette
Mostly converted now to 1.7, this would be welcomed to get any new features. On Wed Nov 05 2014 at 7:32:55 PM Joe Stein wrote: > This has been coming up in a lot of projects and for other reasons too I > wanted to kick off the discussion about if/when we end support for Java 6. > Besides any API

Re: consumer ack for high-level consumer?

2014-11-05 Thread Chia-Chun Shih
Hi, Thanks for your response. I just read source code and found that: 1) ConsumerIterator$next() use PartitionTopicInfo$resetConsumeOffset to update offsets in PartitionTopicInfo objects. 2) ZookeeperConsumerConnector$commitOffset() gets latest offsets from PartitionTopicInfo objects, and upd

"metric.reporters" is not working

2014-11-05 Thread Bae, Jae Hyeon
Hi When I set up props.put("metric.reporters", Lists.newArrayList(ServoReporter.class.getName())); I got the following error: org.apache.kafka.common.config.ConfigException: Unknown configuration 'com.netflix.suro.sink.kafka.ServoReporter' at org.apache.kafka.common.config.AbstractConfig.get(Ab