Re: reprocessing messages in kafka

2013-08-02 Thread Oleg Ruchovets
Hi , found this capabilities in storm Spout. https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka Another very useful config in the spout is the ability to force the spout to rewind to a previous offset. You do forceStartOffsetTime on the spout config, like so: spoutConfig.forceSt

RE: Consumer pauses when running many threads

2013-08-02 Thread Sybrandy, Casey
Yes, we have. Our SA where this is occurring has been monitoring this. When the consumers went down, we could see that things were lagging. Yesterday, they lowered the number of threads for the consumers to six each and they haven't shut down yet. There appears to still be some lag, but sinc

[0.8] Can Kafka producer discover nodes from Zookeeper?

2013-08-02 Thread Haithem Jarraya
Hi, I am new to Kafka, and I am a bit confused for this particular use case. We are running a global Zookeeper cluster and I was wondering how I can have my Kafka producer discover Kafka hostnames? I saw that in the server.properties I can specify host.name and at that point I will be able to s

Re: Consumer pauses when running many threads

2013-08-02 Thread Jun Rao
That's right. In 0.7, # partitions is per broker. However, in 0.8, # partitions is for the whole cluster. Thanks, Jun On Fri, Aug 2, 2013 at 8:13 AM, Sybrandy, Casey < casey.sybra...@six3systems.com> wrote: > Yes, we have. Our SA where this is occurring has been monitoring this. > When the c

Re: [0.8] Can Kafka producer discover nodes from Zookeeper?

2013-08-02 Thread Jun Rao
Kafka producers identify brokers registered in ZK through a getMetadata request to one of the brokers, not through ZK directly. Two Kafka clusters can share the same ZK cluster, as long as they use different ZK namespace. Thanks, Jun On Fri, Aug 2, 2013 at 8:23 AM, Haithem Jarraya wrote: > Hi

Re: EventHandler in 0.8

2013-08-02 Thread Jun Rao
Chris, Thanks for bringing up this part. Yes, in 0.8, we don't really support a custom event handler. This is because (1) the producer send logic is a bit more complicated since it has to issue metadata requests whenever leaders change; (2) we are not aware of too many use cases of custom event ha

Re: [0.8] Can Kafka producer discover nodes from Zookeeper?

2013-08-02 Thread Chris Hogue
A couple of notes on name-spacing since there are a couple of gotchas when trying to configure it: It's specified via the zookeeper.connect property described here: http://kafka.apache.org/08/configuration.html If you have multiple servers in the zk cluster rather than a standalone node the path

EventHandler in 0.8

2013-08-02 Thread Chris Hogue
We're a heavy 0.7 user and are now digging into 0.8 for some new projects. One of the features we used in 0.7 appears to be different and not clearly supported in 0.8. We use the EventHandler plug-point in 0.7, specifically to do custom batching before the messages are actually sent to the broker.

Re: EventHandler in 0.8

2013-08-02 Thread Chris Hogue
Hi Jun. Yeah, I assumed this was a little-used hook so wasn't altogether surprised it's not supported in 0.8. For our current problem we'll have to dig a little further on how to approach it. Batching and compressing before sending to the producer is straight-forward enough, but the semantic parti

Re: [0.8] Can Kafka producer discover nodes from Zookeeper?

2013-08-02 Thread Haithem Jarraya
Thanks a lot Chris, you saved us a lot of time with this. -Haithem On 2 Aug 2013, at 16:58, Chris Hogue wrote: > A couple of notes on name-spacing since there are a couple of gotchas when > trying to configure it: > > It's specified via the zookeeper.connect property described here: > > http:/

RE: Fatal issue (was RE: 0.8 throwing exception "Failed to find leader" and high-level consumer fails to make progress_

2013-08-02 Thread Hargett, Phil
I've attached a patch to KAFKA-989: https://issues.apache.org/jira/browse/KAFKA-989 Does this patch seem valid? I'm not sure about any locking constraints / order in Kafka code, but this seemed like a benign change that might help the cause. This change should eliminate dangerous races between

Re: Client improvement discussion

2013-08-02 Thread Chris Hogue
These sounds like great steps. A couple of votes and questions: 1. Moving serialization out and basing it all off of byte[] for key and payload makes sense. Echoing a response below, we've ended up doing that in some cases anyway, and the others do a trivial transform to bytes with an Encoder. 2

Re: Client improvement discussion

2013-08-02 Thread Jay Kreps
I believe there are some open source C++ producer implementations. At linkedin we have a C++ implementation. We would like to open source this if there is interest. We would like to eventually include a C++ consumer as well. -Jay On Mon, Jul 29, 2013 at 6:03 AM, Sybrandy, Casey < casey.sybra...@

Re: Client improvement discussion

2013-08-02 Thread Jay Kreps
Great comments, answers inline! On Fri, Aug 2, 2013 at 12:28 PM, Chris Hogue wrote: > These sounds like great steps. A couple of votes and questions: > > 1. Moving serialization out and basing it all off of byte[] for key and > payload makes sense. Echoing a response below, we've ended up doing

Relative cluster sizes and cluster size limits

2013-08-02 Thread Scott Arthur
Hi, I have a question about scaling the broker count of a Kafka cluster. We have a scenario where we'll have two clusters replicating data into a third. We're wondering how we should size that third cluster so that it can handle the volume of messages from the two source clusters. Should we jus

Re: java.net.SocketException: Too many open files

2013-08-02 Thread Felix GV
We've had this problem with Zookeeper... Setting ulimit properly can occasionally be tricky because you need to logout and re-ssh into the box for the changes to take effect on the next processes you start up. Another problem we've hit was that our puppet service was running in the background and

Re: Client improvement discussion

2013-08-02 Thread Chris Hogue
Thanks for the responses. Additional follow-up inline. On Fri, Aug 2, 2013 at 2:21 PM, Jay Kreps wrote: > Great comments, answers inline! > > On Fri, Aug 2, 2013 at 12:28 PM, Chris Hogue wrote: > > > These sounds like great steps. A couple of votes and questions: > > > > 1. Moving serializati

Re: Client improvement discussion

2013-08-02 Thread Jay Kreps
Cool. With respect to compression performance, we definitely see the same thing, no debate. Of course if you want to just compress the message payloads you can do that now without needing much help from kafka--just pass in the compressed data. Whether it not it will do much depends on the size of

Re: Relative cluster sizes and cluster size limits

2013-08-02 Thread Jay Kreps
Hi Scott, What version of Kafka is this? In general our throughput will scale linearly with the number of machines or more specifically the number of disks. Our bottleneck will really be with the number of partitions. With thousands of partitions leader election can get slower (seconds), and if y

Relative cluster sizes and cluster size limits

2013-08-02 Thread Scott Arthur
Hi, I have a question about scaling the broker count of a Kafka cluster. We have a scenario where we'll have two clusters replicating data into a third. We're wondering how we should size that third cluster so that it can handle the volume of messages from the two source clusters. Should we jus

[0.8 + scala 2.10 patches] [error] Failed tests: kafka.log.LogTest

2013-08-02 Thread Rob Withers
I built 0.8, fresh, with the scala 2.10 patches from KAFKA-717's tgz, on my Macbook. I ran sbt tests (after running sbt eclipse) and got the following error: [error] Failed: : Total 180, Failed 1, Errors 0, Passed 179, Skipped 0 [error] Failed tests: [error] kafka.log.LogTest [error] (c

compression performance

2013-08-02 Thread Jay Kreps
Chris commented in another thread about the poor compression performance in 0.8, even with snappy. Indeed if I run the linear log write throughput test on my laptop I see 75MB/sec with no compression and 17MB/sec with snappy. This is a little surprising as snappy claims 200MB round-trip performan