Re: heterogenous kafka cluster?

2013-05-17 Thread Jason Rosenberg
Letting each broker have a weight sounds like a great idea. Since in my use case, topics are generally auto-created, it won't be practical to map brokers manually per topic. Thanks, Jason On Fri, May 17, 2013 at 8:38 PM, Jun Rao wrote: > In 0.8, you can create topics manually and explicitly

Re: scala questions

2013-05-17 Thread Jun Rao
Scala compiles to java bytecode. You can use java objects in scala and vice versa. Thanks, Jun On Fri, May 17, 2013 at 4:48 PM, Rob Withers wrote: > I've gotten to know y'all a bit, so I would like to ask my question here. > :) > > I am fairly unfamiliar with Scala, having worked a chapter o

Re: API to to query messages amount under one topic

2013-05-17 Thread Jun Rao
Yes, that's probably what you are looking for. It tells you the # of unconsumed messages per partition, for a particular consumer. Thanks, Jun On Fri, May 17, 2013 at 4:52 PM, Rob Withers wrote: > Found out about the following: > > JMX -> “kafka.server” -> (various FeatcherThread)-ConsumerLag

Re: heterogenous kafka cluster?

2013-05-17 Thread Jun Rao
In 0.8, you can create topics manually and explicitly specify the replica to broker mapping. Post 0.8, we can think of some more automated ways to deal with this (e.g., let each broker carry some kind of weight). Thanks, Jun On Fri, May 17, 2013 at 2:29 PM, Jason Rosenberg wrote: > Hi, > > I'

Re: API to to query messages amount under one topic

2013-05-17 Thread Rob Withers
Found out about the following: JMX -> “kafka.server” -> (various FeatcherThread)-ConsumerLag this sounds like the latency, in time. is it so? thanks, rob On May 17, 2013, at 11:49 AM, "Withers, Robert" wrote: > Could you add some JMX stats for us, then? > > - Queue length, by group offset

scala questions

2013-05-17 Thread Rob Withers
I've gotten to know y'all a bit, so I would like to ask my question here. :) I am fairly unfamiliar with Scala, having worked a chapter or 2 out of a scala book I bought. My understanding is that it is both an object language and a functional language. The only language I am extremely familia

Re: heterogenous kafka cluster?

2013-05-17 Thread Francis Dallaire
Well if you're willing to manage the partitions by yourself, I think it would be possible. You would have to create more partitions for your topics and then assign more partitions to the new machines. If kafka was able to rebalance automaticaly, and doing this by not only considering the number

Re: heterogenous kafka cluster?

2013-05-17 Thread Jason Rosenberg
Yeah, I thought of that (running 2 kafkas on one box), but it doesn't really add the benefit of redundancy through replication (e.g. if we have 2 replicas mapping to the same physical machine). Jason On Fri, May 17, 2013 at 2:50 PM, Chris Riccomini wrote: > Hey guys, > > I have no idea if this

Re: heterogenous kafka cluster?

2013-05-17 Thread Chris Riccomini
Hey guys, I have no idea if this would be reasonable, but what about just running two Kafka processes on the bigger box? Cheers, Chris On 5/17/13 2:48 PM, "Jason Rosenberg" wrote: >Just resource allocation issues. E.g. imagine having an existing kafka >cluster with one machine spec, and getti

Re: heterogenous kafka cluster?

2013-05-17 Thread Jason Rosenberg
Just resource allocation issues. E.g. imagine having an existing kafka cluster with one machine spec, and getting access to a few more hosts to augment the cluster, which are newer and therefore have twice the disk storage. I'd like to seamlessly add them into the cluster, without having to repla

Re: Update: RE: are commitOffsets botched to zookeeper?

2013-05-17 Thread Alex Zuzin
Neha, apologies, I just re-read what I sent and realized my "you" wasn't specific enough - it meant the Kafka team ;). -- "If you can't conceal it well, expose it with all your might" Alex Zuzin On Friday, May 17, 2013 at 2:25 PM, Alex Zuzin wrote: > Have you considered abstracting offset s

Re: heterogenous kafka cluster?

2013-05-17 Thread Neha Narkhede
That does seem a little hacky. But I'm trying to understand the requirement behind having to deploy heterogeneous hardware. What are you trying to achieve or optimize? Thanks, Neha On Fri, May 17, 2013 at 2:29 PM, Jason Rosenberg wrote: > Hi, > > I'm wondering if there's a good way to have a h

heterogenous kafka cluster?

2013-05-17 Thread Jason Rosenberg
Hi, I'm wondering if there's a good way to have a heterogenous kafka cluster (specifically, if we have nodes with different sized disks). So, we might want a larger node to receive more messages than a smaller node, etc. I expect there's something we can do with using a partitioner that has spec

Re: Update: RE: are commitOffsets botched to zookeeper?

2013-05-17 Thread Alex Zuzin
Have you considered abstracting offset storage away so people could implement their own? Would you take a patch if I'd stabbed at it, and if yes, what's the process (pardon the n00b)? KCBO, -- "If you can't conceal it well, expose it with all your might" Alex Zuzin On Friday, May 17, 2013 at

Re: Update: RE: are commitOffsets botched to zookeeper?

2013-05-17 Thread Scott Clasen
to clarify, I meant that Robert could/should store the offsets in a faster store not that kafka should default to that :) Thanks Neha On Fri, May 17, 2013 at 2:22 PM, Neha Narkhede wrote: > There is no particular need for storing the offsets in zookeeper. In fact > with Kafka 0.8, since partiti

Re: can producers(from same system) send messages to separate broker systems?

2013-05-17 Thread Neha Narkhede
Do you have any tests that measure that your high priority data is being delayed ? Assuming you are using 0.8, the end to end latency can be reduced by tuning some configs on the consumer (fetch.min.bytes, fetch.wait.max.ms ). The defaults for these configs are already tuned for low latency though.

Re: Update: RE: are commitOffsets botched to zookeeper?

2013-05-17 Thread Neha Narkhede
There is no particular need for storing the offsets in zookeeper. In fact with Kafka 0.8, since partitions will be highly available, offsets could be stored in Kafka topics. However, we haven't ironed out the design for this yet. Thanks, Neha On Fri, May 17, 2013 at 2:19 PM, Scott Clasen wrote:

Re: Setting a broker to be read-only?

2013-05-17 Thread Neha Narkhede
There is no easy way to do this in 0.7. If you are using a VIP to produce data to Kafka, you can remove that broker host from the VIP. It will still be available for consumer reads until it is shut down but won't accept new data from the producer. In 0.8, since there are many replicas, you will not

Re: Update: RE: are commitOffsets botched to zookeeper?

2013-05-17 Thread Scott Clasen
afaik you dont 'have' to store the consumed offsets in zk right, this is only automatic with some of the clients? why not store them in a data store that can write at the rate that you require? On Fri, May 17, 2013 at 2:15 PM, Withers, Robert wrote: > Update from our OPS team, regarding zookeep

Re: Getting the latest message always from Kafka

2013-05-17 Thread Neha Narkhede
For every unit test, you want your server and consumer to start fresh. In your unit test, you maintain state in the KafkaManager across unit tests so your tests don't work as expected. Try moving KafkaManager to the setup method. Currently, the 2nd test is consuming the message produced by the 1st

Update: RE: are commitOffsets botched to zookeeper?

2013-05-17 Thread Withers, Robert
Update from our OPS team, regarding zookeeper 3.4.x. Given stability, adoption of offset batching would be the only remaining bit of work to do. Still, I totally understand the restraint for 0.8... "As exercise in upgradability of zookeeper, I did a "out-of-the"box" upgrade on Zookeeper. I d

Re: Our use case and am I right in my definition of Throughput?

2013-05-17 Thread Neha Narkhede
Another parameter that you want to tune if you want higher throughput in 0.7 is the broker side flush interval. If you set it to something high, like 50K or 100K, you can maximize the throughput achieved by your producer. Thanks, Neha On Fri, May 17, 2013 at 8:11 AM, Jun Rao wrote: > You earli

Re: InvalidMessageException problems

2013-05-17 Thread Jason Weiss
Turns out it was OpenJDK on the AWS AMI instance. As soon as I replaced OpenJDK: java version "1.6.0_24" OpenJDK Runtime Environment (IcedTea6 1.11.11) (amazon-61.1.11.11.53.amzn1-x86_64) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) With Oracle JDK 1.6.0_38, all of my problems went away

RE: could an Encoder/Decoder be stateful?

2013-05-17 Thread Withers, Robert
Thanks, I created KAFKA-909 for the feature request. As an aside, I see that in Jira, the available kafka versions are 0.8, 0.8.1 and 0.9. I see the branch in github for 0.8, does this mean that the trunk is 0.8.1? And 9 is future planning on jira, but not established in github, yet? Thanks,

Setting a broker to be read-only?

2013-05-17 Thread Adam Phelps
We have a realtime analysis system setup using a Kafka 7 cluster to queue incoming data, but occasionally have a need to take one of the nodes offline. Is there a method to set a broker to be read-only so that we can drain unprocessed data from it prior to shutting it down? - Adam

RE: could an Encoder/Decoder be stateful?

2013-05-17 Thread Withers, Robert
I shall. Thanks! Thanks, Rob Withers Staff Analyst/Developer o: (720) 514-8963  c: (571) 262-1873 -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Friday, May 17, 2013 8:53 AM To: users@kafka.apache.org Subject: Re: could an Encoder/Decoder be stateful? Possible,

RE: API to to query messages amount under one topic

2013-05-17 Thread Withers, Robert
Could you add some JMX stats for us, then? - Queue length, by group offset vs lastOffset - latency between produce and consume, again by group Thanks, Rob Withers Staff Analyst/Developer o: (720) 514-8963  c: (571) 262-1873 -Original Message- From: Jun Rao [mailto:jun...@gmail.co

RE: WebMethods consumer for Kafka.

2013-05-17 Thread Neha Narkhede
3.3.3 has known serious bugs. You should at least use 3.3.4. I am not aware of a JMS bridge. Contributions are welcome :) Thanks, Neha On May 17, 2013 8:36 AM, "Seshadri, Balaji" wrote: > Hi Neha, > > I figured out it was issue with jar dependencies,I tried to use latest > stable version of zook

RE: WebMethods consumer for Kafka.

2013-05-17 Thread Seshadri, Balaji
Hi Neha, I figured out it was issue with jar dependencies,I tried to use latest stable version of zookeeper(3.4.5) and it broke. It working fine now with 3.3.3. I have one more question :). Do you guys have JMS bridge to Kafka,because webMethods supports only properitery and JMS spec so hooki

Re: InvalidMessageException problems

2013-05-17 Thread Jason Weiss
I am using the Java producer, and it is reproducible. When you say that the producer sends the corrupted data, are you referring to the producer as a black box, or something in my code? Where I'm getting stuck is that my producer data seems so ridiculously simple - create a ProducerData, following

Re: InvalidMessageException problems

2013-05-17 Thread Jun Rao
This indicates the messages sent to the broker are corrupted. Typically, this is because either the producer sends the corrupted data somehow or the network is flaky. Are you using a java producer? Is this reproducible? Thanks, Jun On Fri, May 17, 2013 at 7:08 AM, Jason Weiss wrote: > I have

Re: Our use case and am I right in my definition of Throughput?

2013-05-17 Thread Jun Rao
You earlier email seems to focus on latency, not throughput. Typically, you can either optimize for latency or throughput, but not both. If you want higher throughput, you should consider using the async mode in the producer with a larger batch size (e.g., 1000 messages). Using more instances of pr

Re: Time difference between message fetch and message send is very high (~900-1200ms)

2013-05-17 Thread Jun Rao
There are mainly 2 things to consider for latency. (1) How quickly does the producer send the message to the broker? If you use the sync mode in the producer, the message will be sent immediately (throughput will be lower in the sync mode though). If you use the async mode, the messages are sent af

Re: could an Encoder/Decoder be stateful?

2013-05-17 Thread Jun Rao
Possible, but definitely a post 0.8 item. If you are interested, could you file a jira to track this? Thanks, Jun On Thu, May 16, 2013 at 10:06 PM, Rob Withers wrote: > Could the producer be adapted to support the interface of the consumer? > > Thanks, > rob > > > -Original Message- >

Re: API to to query messages amount under one topic

2013-05-17 Thread Jun Rao
For monitoring, we have jmxs on the broker for both the message and the byte rate. Thanks, Jun On Thu, May 16, 2013 at 10:04 PM, Rob Withers wrote: > Immediately monitoring. Later, possible thresholding to evoke a > reconfiguration of the number of partitions into a new topic and migrate > m

Getting the latest message always from Kafka

2013-05-17 Thread Kishore V. Kopalle
Hi Neha, I am sorry about the design of this program, but what I am trying to do is to expose API to read and write from Kafka. I have some unit test cases as given below. The problem is that the first test case passes but the second test case fails saying that : org.junit.ComparisonFailure: expec

InvalidMessageException problems

2013-05-17 Thread Jason Weiss
I have a simple multi-threaded app trying to send numerous fixed-length, 2048 byte or 3072 byte messages into an Apache Kafka 0.7.2 cluster (3 machines) running in AWS on some AWS AMIs. When the messaging volume increases rapidly, a spike, I start running into lots of problems, specifically Inv

Re: Time difference between message fetch and message send is very high (~900-1200ms)

2013-05-17 Thread Kishore V. Kopalle
Hi Neha, I am a newbie to Kafka. My understanding so far is that batch size is supported only for async producer. Kindly correct me if I am wrong. The best latency with async producer I am getting is of the order of 300ms. Is that expected? I have to stick to 0.7 because KafkaSpout for Storm see

RE: possible to shutdown a consumerConnector without flushing the offset

2013-05-17 Thread Withers, Robert
Perfect. By the way, I see you went to GT...you wouldn't happen to know a certain Lex Spoon, would you? thanks, rob From: Neha Narkhede [neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:40 AM To: users@kafka.apache.org Subject: RE: possible to shutd

RE: only-once consumer groups

2013-05-17 Thread Withers, Robert
Absolutely, I will. However, I am pretty full today and I'll need more than 15 minutes to absorb what you have written, which looks very comprehensive. thanks, rob From: Neha Narkhede [neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:32 AM To: users@

RE: possible to shutdown a consumerConnector without flushing the offset

2013-05-17 Thread Neha Narkhede
If you turn off auto.commit.enable, that will ensure that messages are replayed whenever a consumer starts up and rebalances. Thanks, Neha On May 17, 2013 6:35 AM, "Withers, Robert" wrote: > Certainly I will try. Our understanding is that there are 2 scenarios > where messages could be replaye

RE: are commitOffsets botched to zookeeper?

2013-05-17 Thread Withers, Robert
Fair enough, this is something to look forward to. I appreciate the restraint you show to stay out of troubled waters. :) thanks, rob From: Neha Narkhede [neha.narkh...@gmail.com] Sent: Friday, May 17, 2013 7:35 AM To: users@kafka.apache.org Subject: RE

Re: Time difference between message fetch and message send is very high (~900-1200ms)

2013-05-17 Thread Neha Narkhede
Kishore, The end to end latency will be much lower in 08. For 07, you can try to tune the producer batch size so it sends data quicker. You can also reduce the log.flush.interval on the broker but that will drive up iops. Thanks, Neha On May 16, 2013 11:39 PM, "Kishore V. Kopalle" wrote: > Hi F

RE: are commitOffsets botched to zookeeper?

2013-05-17 Thread Neha Narkhede
Upgrading to a new zookeeper version is not an easy change. Also zookeeper 3.3.4 is much more stable compared to 3.4.x. We think it is better not to club 2 big changes together. So most likely this will be a post 08 item for stability purposes. Thanks, Neha On May 17, 2013 6:31 AM, "Withers, Rober

RE: possible to shutdown a consumerConnector without flushing the offset

2013-05-17 Thread Withers, Robert
Certainly I will try. Our understanding is that there are 2 scenarios where messages could be replayed: 1. if a consumer falls over hard, there are some message consumptions whose offsets had not yet been flushed to zookeeper and so when a rebalance occurs the consumer that starts getting mess

RE: only-once consumer groups

2013-05-17 Thread Neha Narkhede
We spent some time thinking about consolidating the high level and low level consumer APIs. It will be great if you can read the wiki and provide feedback - https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design Thanks, Neha On May 16, 2013 10:29 PM, "Rob Withers" wrote: > W

RE: are commitOffsets botched to zookeeper?

2013-05-17 Thread Withers, Robert
Awesome! Thanks for the clarification. I would like to offer my strong vote that this get tackled before a beta, to get it firmly into 0.8. Stabilize everything else to the existing use, but make offset updates batched. thanks, rob From: Neha Narkhede

Re: possible to shutdown a consumerConnector without flushing the offset

2013-05-17 Thread Neha Narkhede
Can you provide more details about what you mean by measuring replay when you kill a consumer? Thanks, Neha On May 17, 2013 6:26 AM, "Withers, Robert" wrote: > Would it be possible for someone to provide me with a 0.8 jar that > implements a ConsumerConnector.hardShutdown, which would interrupt

Re: What happens if one broker goes down

2013-05-17 Thread Neha Narkhede
You can read the high level design of kafka replication here http://www.slideshare.net/junrao/kafka-replication-apachecon2013 Generally if your replication factor is more than 1 you shouldn't see data loss in your test. When a broker fails, the producer will get an exception and it will retry. Th

possible to shutdown a consumerConnector without flushing the offset

2013-05-17 Thread Withers, Robert
Would it be possible for someone to provide me with a 0.8 jar that implements a ConsumerConnector.hardShutdown, which would interrupt all threads yet not do a final offset flush. We want to measure replay so we want to simulate a kill -9, but we want to keep running the process to flush stats a

Re: Is it possible to channelize high priority data and ordinary data in the same kafka cluster ???

2013-05-17 Thread Neha Narkhede
First of all, do you see any delay at all for your high priority data? There are config options that you can tune so that your consumer can see the data as soon as it is available. Thanks, Neha On May 16, 2013 10:24 PM, "Chitra Raveendran" < chitra.raveend...@fluturasolutions.com> wrote: > HI > >

RE: are commitOffsets botched to zookeeper?

2013-05-17 Thread Neha Narkhede
Sorry I wasn't clear. Zookeeper 3.4.x has this feature. As soon as 08 is stable and released it will be worth looking into when we can use zookeeper 3.4.x. Thanks, Neha On May 16, 2013 10:32 PM, "Rob Withers" wrote: > Can a request be made to zookeeper for this feature? > > Thanks, > rob > > > -

Re: C/C++ Client

2013-05-17 Thread anand nalya
Hi Joel, Any updates on the c++ producer? Thanks, Anand On 3 April 2013 05:59, Joel Koshy wrote: > Yes - we would be interested in doing that. I have been spending most of my > time over the past couple weeks on the C++ library (currently, only for the > producer). It is reasonably stable, alt

Re: Kafka performance

2013-05-17 Thread Kishore V. Kopalle
The timezone we are operating is Indian Standard Time (IST). So you will see times like 2013-05-17 11:54:02.973 in the messages below. On Fri, May 17, 2013 at 12:59 PM, Kishore V. Kopalle < kish...@greenmedsoft.com> wrote: > Hi Mathias, > > Yes they are. But we are talking about a single box run

Re: Kafka performance

2013-05-17 Thread Kishore V. Kopalle
Hi Mathias, Yes they are. But we are talking about a single box running Zoo Keeper, Kafka server and the Consumer/Producer. Regards, Kishore On Fri, May 17, 2013 at 12:53 PM, Mathias Herberts < mathias.herbe...@gmail.com> wrote: > Just for the sake of it, are your clocks synchronized using NTP?

Our use case and am I right in my definition of Throughput?

2013-05-17 Thread Kishore V. Kopalle
Hello All, Our use case is to display certain aggregates on GUI from a live stream of data coming in at more than 100k messages/sec. Will I be able to use Kafka for handling at least 100k messages/sec and send it to Twitter Storm for aggregate calculations is the question I have. I already know th

Re: Kafka performance

2013-05-17 Thread Mathias Herberts
Just for the sake of it, are your clocks synchronized using NTP? On Fri, May 17, 2013 at 8:28 AM, Kishore V. Kopalle wrote: > Hi Francis/Stone, > > I have modified log.default.flush.interval.ms to have a value of 1 in > config/server.properties file. The time did not come down as can be seen > fr