Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Ewen Cheslack-Postava
Also worth mentioning is that the new producer doesn't have this behavior -- it will round robin over available partitions for records without keys. "Available" means it currently has a leader -- under normal cases this means it distributes evenly across all partitions, but if a partition is down t

Re: resources for simple consumer?

2015-07-15 Thread Ewen Cheslack-Postava
Hi Jeff, The simple consumer hasn't really changed, the info you found should still be relevant. The wiki page on it might be the most useful reference for getting started: https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example And if you want a version all setup to compile

Re: latency performance test

2015-07-15 Thread Ewen Cheslack-Postava
The tests are meant to evaluate different things and the way they send messages is the source of the difference. EndToEndLatency works with a single message at a time. It produces the message then waits for the consumer to receive it. This approach guarantees there is no delay due to queuing. The

Re: Load Balancing Kafka

2015-07-15 Thread Jiangjie Qin
AhŠ It seems you are more focusing on producer side workload balanceŠ If that is the case, please ignore my previous comments. Jiangjie (Becket) Qin On 7/15/15, 6:01 PM, "Jiangjie Qin" wrote: >If you have pretty balanced traffic on each partition and have set >auto.leader.rebalance.enabled to t

Re: Load Balancing Kafka

2015-07-15 Thread Jiangjie Qin
If you have pretty balanced traffic on each partition and have set auto.leader.rebalance.enabled to true or false, you might not need to do further workload balance. However, in most cases you probably still need to do some sort of load balancing based on the traffic and disk utilization of each b

Re: Load Balancing Kafka

2015-07-15 Thread Terry Bates
Greetings Sandy, Folks smarter than me can correct me if I am wrong. Using Python client you don't have to connect to Zookeeper, so just specifying one of the brokers should be sufficient. In terms of what happens to your messages as your client produces them, they should be randomly assigned to a

Load Balancing Kafka

2015-07-15 Thread Sandy Waters
Hi all, Do I need to load balance against the brokers? I am using the python driver and it seems to only want a single kafka broker host. However, in a situation where I have 10 brokers, is it still fine to just give it one host. Does zookeeper and kafka handle the load balancing and redirect m

Re: Latency test

2015-07-15 Thread Yuheng Du
Thank you, Tao! On Jul 15, 2015 6:27 PM, "Tao Feng" wrote: > Sorry Yufeng, You should change it in $KAFKA_HEAP_OPTS. > > On Wed, Jul 15, 2015 at 3:09 PM, Tao Feng wrote: > > > Hi Yuheng, > > > > You could add the -Xmx1024m in > > https://github.com/apache/kafka/blob/trunk/bin/kafka-run-class.sh

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Lance Laursen
>From the FAQ: "To reduce # of open sockets, in 0.8.0 ( https://issues.apache.org/jira/browse/KAFKA-1017), when the partitioning key is not specified or null, a producer will pick a random partition and stick to it for some time (default is 10 mins) before switching to another one. So, if there ar

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Stefan Miklosovic
Maybe there is some reason why produce sticks with a partition for some period of time - mostly performance related. I can imagine that constant switching between partitions can be kind of slow in such sense that producer has to "refocus" on another partition to send a message to and this switching

resources for simple consumer?

2015-07-15 Thread Jeff Gong
hey all, typically i've only ever had to use the high level consumer for my personal needs when handling data. recently, however, I have found the need to be more selective and careful with managing offsets and want the extended capability to do so. i know that there is a bit of documentation on a

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Stefan Miklosovic
Nice one! That might be it as well. Do you have an idea what is that configuration parameter called? On Thu, Jul 16, 2015 at 12:53 AM, JIEFU GONG wrote: > This is a total shot in the dark here so please ignore this if it fails to > make sense, but I remember that on some previous implementation o

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Stefan Miklosovic
I think I figured it out. I had to use custom parititioner which does basically nothing. Even I used it before, it was not taken into consideration because I was sending KeyedMessage without any key. Just partition and payload. Now I am doing it like this: producer.send(new KeyedMessage("topic"

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread JIEFU GONG
This is a total shot in the dark here so please ignore this if it fails to make sense, but I remember that on some previous implementation of the producer prior to when round-robin was enabled, producers would send messages to only one of the partitions for a set period of time (configurable, I bel

Re: Latency test

2015-07-15 Thread Tao Feng
Sorry Yufeng, You should change it in $KAFKA_HEAP_OPTS. On Wed, Jul 15, 2015 at 3:09 PM, Tao Feng wrote: > Hi Yuheng, > > You could add the -Xmx1024m in > https://github.com/apache/kafka/blob/trunk/bin/kafka-run-class.sh > KAFKA_JVM_PERFORMANCE_OPTS. > > > > On Wed, Jul 15, 2015 at 12:51 AM, Yu

Re: Latency test

2015-07-15 Thread Tao Feng
Hi Yuheng, You could add the -Xmx1024m in https://github.com/apache/kafka/blob/trunk/bin/kafka-run-class.sh KAFKA_JVM_PERFORMANCE_OPTS. On Wed, Jul 15, 2015 at 12:51 AM, Yuheng Du wrote: > Tao, > > If I am running on the command line the following command > >bin/kafka-run-class.sh kafka.tools

Re: Kafka partitioning is pretty much broken

2015-07-15 Thread Jagbir Hooda
Hi Stefan, Have you looked at the following output for message distribution across the topic-partitions and which topic-partition is consumed by which consumer thread? kafaka-server/bin>./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zkconnect localhost:2181 --group Jagbir On Wed, Jul

Re: Offset not committed

2015-07-15 Thread Jiangjie Qin
If that is the case, I guess that might still be some value to try to run broker and clients locally and see if the issue still exist. Thanks, Jiangjie (Becket) Qin On 7/15/15, 1:23 PM, "Vadim Bobrov" wrote: >it is pretty random > >On Wed, Jul 15, 2015 at 4:22 PM, Jiangjie Qin >wrote: > >> I’

Re: kafka benchmark tests

2015-07-15 Thread Yuheng Du
Hi Geoffrey, Thank you for your detailed explaining. They are really helpful. I am thinking of going after the second way, since I have bare metal access to all the nodes in the cluster, it's probably better to run real slave machines instead of virtual machines. (correct me if I am wrong) Each

Re: Offset not committed

2015-07-15 Thread Vadim Bobrov
it is pretty random On Wed, Jul 15, 2015 at 4:22 PM, Jiangjie Qin wrote: > I’m not sure if it is related to running in cloud. Do you see this > disconnection issue always happening on committing offsets or it happens > randomly? > > Jiangjie (becket) qin > > On 7/15/15, 12:53 PM, "Vadim Bobrov"

Re: Offset not committed

2015-07-15 Thread Jiangjie Qin
I’m not sure if it is related to running in cloud. Do you see this disconnection issue always happening on committing offsets or it happens randomly? Jiangjie (becket) qin On 7/15/15, 12:53 PM, "Vadim Bobrov" wrote: >there are lots of files under logs directory of the broker, just in case I >ch

Re: kafka benchmark tests

2015-07-15 Thread Geoffrey Anderson
Hi Yuheng, Yes, you should be able to run on either mac or linux. The test cluster consists of a test-driver machine and some number of slave machines. Right now, there are roughly two ways to set up the slave machines: 1) Slave machines are virtual machines *on* the test-driver machine. 2) Slav

Re: kafka TestEndtoEndLatency

2015-07-15 Thread Yuheng Du
Thanks. Here is the source code snippet of EndtoEndLatency test: for (i<- 0 until numMessages) { val begin = System.nanoTime producer.send( new ProducerRecord[Array[Byte],Array[Byte]](topic, message)) val received = iter.next val elapsed = System.nanoTime - begin // poor man's progress bar if (i

Re: Offset not committed

2015-07-15 Thread Vadim Bobrov
there are lots of files under logs directory of the broker, just in case I checked all modified around the time of error and found nothing unusual both client and broker are 0.8.2.1 could it have something to do with running it in the cloud? we are on Linode and I remember having random disconnect

Kafka partitioning is pretty much broken

2015-07-15 Thread Stefan Miklosovic
I have following problem, I tried almost everything I could but without any luck All I want to do is to have 1 producer, 1 topic, 10 partitions and 10 consumers. All I want is to send 1M of messages via producer to these 10 consumers. I am using built Kafka 0.8.3 from current upstream so I have

Re: kafka TestEndtoEndLatency

2015-07-15 Thread Guozhang Wang
The end-to-end latency record the transferring of a message from producer to broker, then to consumer. I cannot remember the details not but I think the EndtoEndLatency test record the latency as average, hence it is small. Guozhang On Wed, Jul 15, 2015 at 12:28 PM, Yuheng Du wrote: > Guozhang

Re: Offset not committed

2015-07-15 Thread Jiangjie Qin
Is there anything on the broker log? Is it possible that your client and broker are not running on the same version? Jiangjie (Becket) Qin On 7/15/15, 11:40 AM, "Vadim Bobrov" wrote: >caught it, thanks for help! >any ideas what to do? > >TRACE 2015-07-15 18:37:58,070 [chaos-akka.actor.jms-dispa

Re: kafka TestEndtoEndLatency

2015-07-15 Thread Yuheng Du
Guozhang, Thank you for explaining. I see that in ProducerPerformance call back functions were used to get the latency metrics. For the TestEndtoEndLatency, does message size matter? What this end-to-end latency comprise of, besides transferring a package from source to destination (typically arou

Re: kafka benchmark tests

2015-07-15 Thread Yuheng Du
Hi Geoffrey, Thank you for your helpful information. Do I have to install the virtual machines? I am using Mac as the testdriver machine or I can use a linux machine to run testdriver too. Thanks. best, Yuheng On Wed, Jul 15, 2015 at 2:55 PM, Geoffrey Anderson wrote: > Hi Yuheng, > > Running

Re: kafka TestEndtoEndLatency

2015-07-15 Thread Guozhang Wang
Yuheng, Only TestEndtoEndLatency's number are end to end, for ProducerPerformance the latency is for the send-to-ack latency, which increases as batch size increases. Guozhang On Wed, Jul 15, 2015 at 11:36 AM, Yuheng Du wrote: > In kafka performance tests https://gist.github.com/jkreps > /c7dd

Re: ExecutionException instead of UnknownTopicOrPartitionException

2015-07-15 Thread Guozhang Wang
Hi Jean-Charles, The Future.get() will always throw an ExecutionException(Throwable: CauseException), so in your code should look like sth.: --- try { future = producer.send(..); } catch (KafkaException e) { // handle any KafkaException that is not ApiException. } try {

Re: kafka benchmark tests

2015-07-15 Thread Geoffrey Anderson
Hi Yuheng, Running these tests requires a tool we've created at Confluent called 'ducktape', which you need to install with the command: pip install ducktape==0.2.0 Running the tests locally requires some setup (creation of virtual machines etc.) which is outlined here: https://github.com/apache/

Re: Offset not committed

2015-07-15 Thread Vadim Bobrov
caught it, thanks for help! any ideas what to do? TRACE 2015-07-15 18:37:58,070 [chaos-akka.actor.jms-dispatcher-1019 ] kafka.network.BoundedByteBufferSend - 113 bytes written. ERROR 2015-07-15 18:37:58,078 [chaos-akka.actor.jms-dispatcher-1019 ] kafka.consumer.ZookeeperConsumerConnector - [chaos-

kafka TestEndtoEndLatency

2015-07-15 Thread Yuheng Du
In kafka performance tests https://gist.github.com/jkreps /c7ddb4041ef62a900e6c The TestEndtoEndLatency results are typically around 2ms, while the ProducerPerformance normally has "average latency"around several hundres ms when using batch size 8196. Are both results talking about end to end lat

Kafka BrokerTopicMetrics MessageInPerSec rate

2015-07-15 Thread pushkar priyadarshi
Hi, While benchmarking new producer and consumer syncing offset in zookeeper i see that MessageInRate reported in BrokerTopicMetrics is not same as rate at which i am able to publish and consume messages. Using my own custom reporter i can see the rate at which messages are published and consumed

Re: Offset not committed

2015-07-15 Thread Vadim Bobrov
thanks Joel and Jiangjie, I have figured it out. In addition to my log4j 2 config file I also needed a log4j 1 config file, then it works. Let me trace what happens when the offsets are not committed and report back On Wed, Jul 15, 2015 at 1:33 PM, Joel Koshy wrote: > - You can also change the l

Re: Offset not committed

2015-07-15 Thread Joel Koshy
- You can also change the log4j level dynamically via the kafka.Log4jController mbean. - You can also look at offset commit request metrics (mbeans) on the broker (just to check if _any_ offset commits are coming through during the period you see no moving offsets). - The alternative is to ju

Re: Latency test

2015-07-15 Thread Yuheng Du
I have run the end to end latency test and the producerPerformance test on my kafka cluster according to https://gist.github.com/jkreps/c7ddb4041ef62a900e6c In end to end latency test, the latency was around 2ms. In producerperformance test, if use batch size 8196 to send 50,000,000 records: >bin

Re: Offset not committed

2015-07-15 Thread Jiangjie Qin
I am not sure how your project was setup. But I think it depends on what log4j property file you specified when you started your application. Can you check if you have log4j appender defined and the loggers are directed to the correct appender? Thanks, Jiangjie (Becket) Qin On 7/15/15, 8:10 AM,

Re: Consumer that consumes only local partition?

2015-07-15 Thread Gwen Shapira
This is not something you can use the consumer API to simply do easily (consumers don't have locality notion). I can imagine using Kafka's low-level API calls to get a list of partitions and the lead replica, figuring out which are local and using those - but that sounds painful. Are you 100% sure

Re: Java API for fetching Consumer group from Kafka Server(Not Zookeeper)

2015-07-15 Thread Jiangjie Qin
It looks kafka.admin.ConsumerGroupCommand class is what you need. Jiangjie (Becket) Qin On 7/14/15, 8:23 PM, "Swati Suman" wrote: >Hi Team, > >Currently, I am able to fetch the Topic,Partition,Leader,Log Size through >TopicMetadataRequest API available in Kafka. > >Is there any java api that gi

Re: Consumer that consumes only local partition?

2015-07-15 Thread Robert Metzger
Hi Shef, did you resolve this issue? I'm facing some performance issues and I was wondering whether reading locally would resolve them. On Mon, Jun 22, 2015 at 11:43 PM, Shef wrote: > Noob question here. I want to have a single consumer for each partition > that consumes only the messages that

Kafka HL Consumer stops periodically

2015-07-15 Thread Marina
Hi, I have a weird problem when processing high volumes of data through my 3-node, 3 topic, 4 partition Kafka cluster. I am not fully sure at which point the issue starts happening, but basically, after some time of processing lots of events (100 M or so in about 5 hr time span) with no issues,

Re: Offset not committed

2015-07-15 Thread Vadim Bobrov
Thanks Jiangjie, unfortunately turning trace level on does not seem to work (any log level actually) I am using log4j2 (through slf4j) and despite including log4j1 bridge and these lines: in my conf file I could not squeeze out any logging from kafka. Logging for all other libs (like zookeeper

latency performance test

2015-07-15 Thread Yuheng Du
Hi, I have run the end to end latency test and the producerPerformance test on my kafka cluster according to https://gist.github.com/jkreps/c7ddb4041ef62a900e6c In end to end latency test, the latency was around 2ms. In producerperformance test, if use batch size 8196 to send 50,000,000 records:

ExecutionException instead of UnknownTopicOrPartitionException

2015-07-15 Thread Jean-Charles Jabouille
Hi, I'm currently developing an application to use Kafka in Java. My application just push an offer synchronously in a topic. I have 3 brokers and 3 zookeeper instance. I want to catch Exception in order my process does not crash but try to retry and do some code for specific exception. So I

Re: kafka benchmark tests

2015-07-15 Thread Yuheng Du
Jiefu, Have you tried to run benchmark_test.py? I ran it and it asks me for the ducktape.services.service yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ python benchmark_test.py Traceback (most recent call last): File "benchmark_test.py", line 16, in from ducktape.services.service imp

Re: Latency test

2015-07-15 Thread Yuheng Du
Tao, If I am running on the command line the following command >bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency 192.168.1.3:9092 192.168.1.1:2181 speedx3 5000 100 1 -Xmx1024m It promped that it is not correct. So where should I put the -Xmx1024m option? Thanks. On Wed, Jul 15, 2015

Re: Latency test

2015-07-15 Thread Tao Feng
(Please correct me if I am wrong.) Based on TestEndToEndLatency( https://github.com/apache/kafka/blob/trunk/core/src/test/scala/kafka/tools/TestEndToEndLatency.scala), consumer_fetch_max_wait corresponds to fetch.wait.max.ms in consumer config( http://kafka.apache.org/documentation.html#consumerco

Re: Latency test

2015-07-15 Thread Yuheng Du
I got java out of heap error when running end to end latency test: yuhengdu@consumer0:/packages/kafka_2.10-0.8.2.1$ bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency 192.168.1.3:9092 192.168.1.1:2181 speedx3 5000 100 1 Exception in thread "main" java.lang.OutOfMemoryError: Java heap spac

Re: Latency test

2015-07-15 Thread Yuheng Du
Tao, Thanks. The example on https://gist.github.com/jkreps/c7ddb4041ef62a900e6c is outdated already. The error message shows: USAGE: java kafka.tools.TestEndToEndLatency$ broker_list zookeeper_connect topic num_messages consumer_fetch_max_wait producer_acks Can anyone helps me what should be pu