Monitor number of messages in and out for per topic at broker side

2014-07-11 Thread ravi singh
Hi Kafka Users, Is it possible to monitor the number of messages in and out for per topic on broker side. Their is a MBean for "AllTopicsMessagesInPerSec" but I couldn't find anything on per topic basis. Also what does the "Byte out rate" MBean indicates , because as per my understanding only t

Re: Consumer rebalancing retry settings and reconnecting after failure

2014-07-11 Thread Michal Michalski
Hey Guozhang, Thanks for reply. I get your point on "hiding" some issues, but I'd prefer to separate the recovery and reporting a failure. Also, I think if simple restart is a possible solution, it shouldn't require implementing it separately or, what's even worse, a manual intervention. Maybe I'l

Syslog Collector to Kafka 0.8 -- in Go

2014-07-11 Thread Philip O'Toole
I went looking for a Syslog Collector, written in Go, which would stream to Kafka. I couldn't find any, so put one together myself -- others might be interested. It optionally performs basic parsing of an RFC5424 header too, before streaming the messages to Kafka. As always, YMMV. https://gith

Re: Syslog Collector to Kafka 0.8 -- in Go

2014-07-11 Thread Joe Stein
Awesome, thanks Philip! I updated the ecosystem page https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop

Re: Handling of failure to send message in kafka async producer

2014-07-11 Thread Guozhang Wang
Prashant, One of the trade-off between sync and async producers is performance v.s. ordering. To guanrantee the scenario you described one has to use a sync produer then, since async producer is not designed for such guarantees. Guozhang On Thu, Jul 10, 2014 at 11:03 PM, Prashant Prakash wrote

Re: Monitor number of messages in and out for per topic at broker side

2014-07-11 Thread Guozhang Wang
There is indeed a per-topic metric for bytes in and bytes out. Should be named as [TopicName]BytesInPerSec, etc. Server does monitor bytes out rate whenever it sends any responses to consumer/producer clients. Guozhang On Fri, Jul 11, 2014 at 3:38 AM, ravi singh wrote: > Hi Kafka Users, > >

Re: Consumer rebalancing retry settings and reconnecting after failure

2014-07-11 Thread Guozhang Wang
Hi Michal, In your case you could try to increase the zookeeper session timeout value on the consumer side (default is 6 sec) and see if this is sufficient to cover the latency jitters. Guozhang On Fri, Jul 11, 2014 at 5:25 AM, Michal Michalski < michal.michal...@boxever.com> wrote: > Hey Guoz

Re: Monitor number of messages in and out for per topic at broker side

2014-07-11 Thread ravi singh
Couldn't find it in the documentation though. http://kafka.apache.org/documentation.html#monitoring I need specifically 'MessagesInPerSec' is there any way to deduce it from 'BytesIn/OutPerSecond'. On Fri, Jul 11, 2014 at 8:43 PM, Guozhang Wang wrote: > There is indeed a per-topic metric for

Re: Monitor number of messages in and out for per topic at broker side

2014-07-11 Thread Guozhang Wang
"MessagesInPerSec" also exists per-topic, but the sensor will only be created when there are messages produced to the brokers with this topic though. On Fri, Jul 11, 2014 at 9:20 AM, ravi singh wrote: > Couldn't find it in the documentation though. > http://kafka.apache.org/documentation.html#m

Re: Handling of failure to send message in kafka async producer

2014-07-11 Thread Jun Rao
The new producer (which will be beta in the next release) will support callback during async send. Thanks, Jun On Thu, Jul 10, 2014 at 11:03 PM, Prashant Prakash wrote: > Hi Gouzhang, > > Monitoring through JMX mbean will be an indirect way to detect producer > failure. > > In our requirement

Re: Monitor number of messages in and out for per topic at broker side

2014-07-11 Thread ravi singh
Thanks Guozhang!! Just checked it after publishing few messages , "MessagesInPerSec" exists for per topic. *But why there isnt "MessagesOutPerSec" similar to BytesOutPerSecond ?* *Regards,* *Ravi* On Fri, Jul 11, 2014 at 10:01 PM, Guozhang Wang wrote: > "MessagesInPerSec" also exists per-top

Re: Monitor number of messages in and out for per topic at broker side

2014-07-11 Thread Guozhang Wang
We do not have the "MessagesOutPerSec" because when returning messages to consumers in the fetch response, we do not decompress the message set if it is compressed, and hence do not really know the number of raw messages out. It can be roughly estimated as 'BytesOut/MessageSize'. On Fri, Jul 11,

Re: Handling of failure to send message in kafka async producer

2014-07-11 Thread vipul jhawar
Hi Jun We tested the sync, but its impossible to use it with that performance. Tried even by batching messages in one large message, to decrease the acks, but not much improvement. As you mentioned about next release, i saw an update - http://grokbase.com/t/kafka/users/145m2sabrt/async-producer-c

Re: Handling of failure to send message in kafka async producer

2014-07-11 Thread Guozhang Wang
The new producer is already in trunk, you can find it in org.apache.kafka.clients.producer.KafkaProducer Guozhang On Fri, Jul 11, 2014 at 11:18 AM, vipul jhawar wrote: > Hi Jun > > We tested the sync, but its impossible to use it with that performance. > Tried even by batching messages in one

request.required.acks=-1 under high data volume

2014-07-11 Thread Jiang Wu (Pricehistory) (BLOOMBERG/ 731 LEX -)
Hi, I'm doing stress and failover tests on a 3 node 0.8.1.1 kafka cluster and have the following observations. A topic is created with 1 partition and 3 replications. request.required.acks is set to -1 for a sync producer. When the publishing speed is high (3M messages, each 2000 bytes, publis

Re: Monitor number of messages in and out for per topic at broker side

2014-07-11 Thread ravi singh
Thanks a lot for clearing the doubts Wang. On Jul 11, 2014 11:13 PM, "Guozhang Wang" wrote: > We do not have the "MessagesOutPerSec" because when returning messages to > consumers in the fetch response, we do not decompress the message set if it > is compressed, and hence do not really know the n

Re: producer performance

2014-07-11 Thread Chen Song
Thanks Guaozhang. I have a follow-up question. Say if I can push 1M/s messages (100 size) into 2 node cluster. Is it safe to say I should be able to produce more or less 2M/s messages into 4 node cluster? On Tue, Jul 8, 2014 at 6:00 PM, Guozhang Wang wrote: > The second script is using the ne

Re: producer performance

2014-07-11 Thread Guozhang Wang
It depends on your network and the producer CPU, usually Kafka producer will likely to saturate the network bandwidth first before it used up the CPU if no compression is used. On Fri, Jul 11, 2014 at 1:19 PM, Chen Song wrote: > Thanks Guaozhang. > > I have a follow-up question. > > Say if I ca

Re: request.required.acks=-1 under high data volume

2014-07-11 Thread Guozhang Wang
Hello Jiang, That is a valid point. The reason we design ack=-1 to be "receive acks from replicas in ISR" is basically trading consistency for availability. I think instead of change it meaning, we could add another ack, -2 for instance, to specify "receive acks from all replicas" as a favor of co

Re: request.required.acks=-1 under high data volume

2014-07-11 Thread Jiang Wu (Pricehistory) (BLOOMBERG/ 731 LEX -)
Hi Guozhang, KAFKA-1537 is created. https://issues.apache.org/jira/i#browse/KAFKA-1537 I'll try to see if I'm able to submit a patch for this, but cannot commit a date, so please feel free to assign it to others. Regards, Jiang - Original Message - From: wangg...@gmail.com To: JIANG WU

Re: Facing issues with Kafka 0.8.1.1 and kafka-reassign-partitions.sh

2014-07-11 Thread Clark Haskins
I have written such a script. It balances the cluster by the data size on disk. It is written using lots of internal tools which is why its not open-sourced. I plan to re-write it without the internal tooling. In terms of leader balancing, when using the partition-reassignemnt script, whichever br

Re: request.required.acks=-1 under high data volume

2014-07-11 Thread Jay Kreps
I think the root problem is that replicas are falling behind and hence are effectively "failed" under normal load and also that you have unclean leader election enabled which "solves" this catastrophic failure by electing new leaders without complete data. Starting in 0.8.2 you will be able to sel

Re: Facing issues with Kafka 0.8.1.1 and kafka-reassign-partitions.sh

2014-07-11 Thread Florian Dambrine
Thanks for all your pieces of advice! I am working on my script to improve it. I am trying to find a way to select the best partition to relocate it to the newly added broker (taking in account the number of partition leaded by the replica brokers). I really want to avoid swaps to rebalance the

Re: kafka simpleconsumer question

2014-07-11 Thread Weide Zhang
Hi Guozhang, I have a couple of more questions about using the simpleConsumer API call. It seems when constructing the fetchRequest, it need a parameter called FetchSize. Do you know what's the purpose of this parameter ? Does that mean the call to Kafka won't be returned until the response mess

Re: request.required.acks=-1 under high data volume

2014-07-11 Thread Jun Rao
I am not sure if receiving acks from all replicas makes sense though. It means that none of the replicas can fail. However, the purpose of having multiple replicas is to be able to tolerate failures. Thanks, Jun On Fri, Jul 11, 2014 at 11:49 AM, Jiang Wu (Pricehistory) (BLOOMBERG/ 731 LEX -) w