R: Spark Kafka Performance

2014-11-04 Thread Eduardo Alfaia
Hi Gwen, I have changed the java code kafkawordcount to use reducebykeyandwindow in spark. - Messaggio originale - Da: "Gwen Shapira" Inviato: ‎03/‎11/‎2014 21:08 A: "users@kafka.apache.org" Cc: "u...@spark.incubator.apache.org" Oggetto: Re: Spark Kafka Performance Not sure about the

Advertised host name

2014-11-04 Thread Ciprian Hacman
Hi, While setting up a cluster, I realised that Kafka uses host names to communicate between brokers. In my case, I don't have a DNS server to keep track of host names, and set up the host names manually. Because of this, brokers cannot connect to each other. This behaviour can be easily fixed by

Tuning replication

2014-11-04 Thread Todd S
Good day all, We're running a good sized Kafka cluster, running 0.8.1, and during our peak traffic times replication falls behind. I've been doing some reading about parameters for tuning replication, but I'd love some real world experience and insight. Some general questions: * Does Kafka 'lik

queued.max.message.chunks impact and consumer tuning

2014-11-04 Thread Bhavesh Mistry
Hi Kafka Dev Team, It seems that Maximum buffer size is set to 2 default. What is impact of changing this to 2000 or so ? This will improve the consumer thread performance ? More event will be buffered in memory. Or Is there any other recommendation to tune High Level Consumers ? Here is co

Re: High Level Consumer Iterator IllegalStateException Issue

2014-11-04 Thread Bhavesh Mistry
Hi Neha and Jun, I have fixed the issue on my side based on what Jun had mentioned "next() gives IllegalStateException if hasNext is not called..." Based on this I did further debug, I was my mistake sharing same consumer iterator across multiple threads so (I forgot to call iterator.remove() in

Re: queued.max.message.chunks impact and consumer tuning

2014-11-04 Thread Joel Koshy
We used to default to 10, but two should be sufficient. There is little reason to buffer more than that. If you increase it to 2000 you will most likely run into memory issues. E.g., if your fetch size is 1MB you would enqueue 1MB*2000 chunks in each queue. On Tue, Nov 04, 2014 at 09:05:44AM -0800

Re: Tuning replication

2014-11-04 Thread Joel Koshy
Ops-experts can share more details but here are some comments: > > * Does Kafka 'like' lots of small partitions for replication, or larger > ones? ie: if I'm passing 1Gbps into a topic, will replication be happier > if that's one partition, or many partitions? Since you also have to account for

Re: Tuning replication

2014-11-04 Thread Todd Palino
I think your answers are pretty spot-on, Joel. Under Replicated Count is the metric that we monitor to make sure the cluster is healthy. It lets us know when a broker is down (because all the numbers except one broker are elevated), or when a broker is struggling (low counts fluctuating across a fe

Kafka 0.8.1.1 replication issues

2014-11-04 Thread Christofer Hedbrandh
Hi Kafka users! I was just migrating a cluster of 3 brokers from one set of EC2 instances to another, but ran into replication problems. The method of migration used is that of stopping one broker and letting a new broker join with the same broker.id. Replication started, but after ~4 of ~15 GB th

Producer timeout setting not respected

2014-11-04 Thread Solon Gordon
Hi all, I've been investigating how Kafka 0.8.1.1 responds to the scenario where one broker loses connectivity (due to something like a hardware issue or network partition.) It looks like the brokers themselves adjust within a few seconds to reassign leaders and shrink ISRs. However, I see produce

Re: queued.max.message.chunks impact and consumer tuning

2014-11-04 Thread Bhavesh Mistry
Thanks for info. I will have to tune the memory. What else do you recommend for High level Consumer for optimal performance and drain as quickly as possible with auto commit on ? Thanks, Bhavesh On Tue, Nov 4, 2014 at 9:59 AM, Joel Koshy wrote: > We used to default to 10, but two should be su

Re: Producer timeout setting not respected

2014-11-04 Thread Guozhang Wang
Hello Solon, request.timeout.ms only controls the produce request timeout value, when the producer's first produce request gets timed out, it will try to re-fresh its metadata by sending metadata request. But when this non-produce request hits the broker whose connectivity has been disabled (i.e.

Re: Kafka 0.8.1.1 replication issues

2014-11-04 Thread Guozhang Wang
This seems to be related to https://issues.apache.org/jira/browse/KAFKA-1749 . Guozhang On Tue, Nov 4, 2014 at 10:30 AM, Christofer Hedbrandh < christo...@knewton.com> wrote: > Hi Kafka users! > > I was just migrating a cluster of 3 brokers from one set of EC2 instances > to another, but ran int

Re: Producer timeout setting not respected

2014-11-04 Thread Guozhang Wang
Actually I think this issue has just been resolved: https://issues.apache.org/jira/browse/KAFKA-1733 Guozhang On Tue, Nov 4, 2014 at 11:22 AM, Guozhang Wang wrote: > Hello Solon, > > request.timeout.ms only controls the produce request timeout value, when > the producer's first produce request

Re: queued.max.message.chunks impact and consumer tuning

2014-11-04 Thread Joel Koshy
I actually meant to say that you typically don't need to bump up the queued chunk setting - you can profile your consumer to see if significant time is being spent waiting on dequeuing from the chunk queues. If you happen to have a consumer consuming from a remote data center, then you should cons

Re: Producer timeout setting not respected

2014-11-04 Thread Solon Gordon
Yes, this lines up with the behavior I'm seeing. I'll wait for the patch to be released and then retest. Thanks! On Tue, Nov 4, 2014 at 2:36 PM, Guozhang Wang wrote: > Actually I think this issue has just been resolved: > > https://issues.apache.org/jira/browse/KAFKA-1733 > > Guozhang > > On Tue

Re: Cannot connect to Kafka from outside of EC2

2014-11-04 Thread Sameer Yami
Hi Guozhang, This is the server.log - [2014-11-04 20:21:57,510] INFO Verifying properties (kafka.utils.VerifiableProperties) [2014-11-04 20:21:57,545] INFO Property advertised.host.name is overridden to x.x.x.x (kafka.utils.VerifiableProperties) [2014-11-04 20:21:57,545] INFO Property broker.id i

Re: Tuning replication

2014-11-04 Thread Todd S
Joel, Thanks for your input - it fits what I was thinking, so it's good confirmation. > The easiest mbean to look at is the underreplicated partition count. > This is at the broker-level so it is coarse-grained. If it is > 0 you > can use various tools to do mbean queries to figure out which > pa

Re: Dynamically adding Kafka brokers

2014-11-04 Thread Neha Narkhede
I agree that KAFKA-1070 would be great to get in. I especially felt the need for something like this while using a few other systems that automated the port, id etc to give a good OOTB experience. Sorry, I lost track of the review. Will do so in the next few days. Thanks, Neha On Mon, Nov 3, 2014

OffsetOutOfRange Error

2014-11-04 Thread Jimmy John
Hello, We are using kafka version 0.8.1 and the python kafka client. Everything has been working fine and suddenly this morning I saw a OffsetOutOfRange on one of the partitions. (We have 20 partitions in our kafka cluster) We fixed it by seeking to the head offset and restarting the app.

Re: OffsetOutOfRange Error

2014-11-04 Thread Guozhang Wang
Hi Jim, OffsetOutOfRange means that the partition's log offset range is [a, b] and the requested offset is either < a or > b. It could be caused by log truncation based on the retention policy while consumer fetching at the same time. Guozhang On Tue, Nov 4, 2014 at 4:21 PM, Jimmy John wrote:

Re: Issue with async producer

2014-11-04 Thread Jun Rao
Which version of Kafka are you using? Is the broker I/O or network saturated? If so, that will limit the throughput that each producer can achieve. If not, using a larger number messages per batch and/or enabling producer side compression typically improves the producer throughput. Thanks, Jun O

Re: Spark Kafka Performance

2014-11-04 Thread Bhavesh Mistry
Hi Eduardo, Can you please take thread dump and see if there are blocking issues on producer side ? Do you have single instance of Producers and Multiple treads ? Are you using Scala Producer or New Java Producer ? Also, what is your producer property ? Thanks, Bhavesh On Tue, Nov 4, 2014 a

Re: OffsetOutOfRange Error

2014-11-04 Thread Shangan Chen
Hi Jim, Maybe your consumer lagged behind the current smallest offset. And why it happened? you might take a look at this ticket https://issues.apache.org/jira/browse/KAFKA-1640 On Wed, Nov 5, 2014 at 8:46 AM, Guozhang Wang wrote: > Hi Jim, > > OffsetOutOfRange means that the partition's log of

Re: Tuning replication

2014-11-04 Thread Todd Palino
We have our threshold for under replicated set at anything over 2. The reason we picked that number is because we have a cluster that tends to take very high traffic for short periods of time, and 2 gets us around the false positives (with a careful balance of the partitions in the cluster). We're

consumer ack for high-level consumer?

2014-11-04 Thread Chia-Chun Shih
Hi, I am a new to Kafka. In my understanding, high-level consumer ( ZookeeperConsumerConnector) changes offset when message is drawn by ConsumerIterator. But I would like to change offset when message is processed, not when message is drawn from broker. So if a consumer dies before a message is co

Location of Logging Files/How To Turn On Logging For Kafka Components

2014-11-04 Thread Alex Melville
Background: I have searched for a while online, and through the files located in the kafka/logs directory, trying to find where kafka writes log output to in order to debug the SimpleProducer I wrote. My producer is almost identical to the simple producer located here https://cwiki.apache.org/con