Letting each broker have a weight sounds like a great idea.
Since in my use case, topics are generally auto-created, it won't be
practical to map brokers manually per topic.
Thanks,
Jason
On Fri, May 17, 2013 at 8:38 PM, Jun Rao wrote:
> In 0.8, you can create topics manually and explicitly
Scala compiles to java bytecode. You can use java objects in scala and vice
versa.
Thanks,
Jun
On Fri, May 17, 2013 at 4:48 PM, Rob Withers wrote:
> I've gotten to know y'all a bit, so I would like to ask my question here.
> :)
>
> I am fairly unfamiliar with Scala, having worked a chapter o
Yes, that's probably what you are looking for. It tells you the # of
unconsumed messages per partition, for a particular consumer.
Thanks,
Jun
On Fri, May 17, 2013 at 4:52 PM, Rob Withers wrote:
> Found out about the following:
>
> JMX -> “kafka.server” -> (various FeatcherThread)-ConsumerLag
In 0.8, you can create topics manually and explicitly specify the replica
to broker mapping. Post 0.8, we can think of some more automated ways to
deal with this (e.g., let each broker carry some kind of weight).
Thanks,
Jun
On Fri, May 17, 2013 at 2:29 PM, Jason Rosenberg wrote:
> Hi,
>
> I'
Found out about the following:
JMX -> “kafka.server” -> (various FeatcherThread)-ConsumerLag
this sounds like the latency, in time. is it so?
thanks,
rob
On May 17, 2013, at 11:49 AM, "Withers, Robert" wrote:
> Could you add some JMX stats for us, then?
>
> - Queue length, by group offset
I've gotten to know y'all a bit, so I would like to ask my question here. :)
I am fairly unfamiliar with Scala, having worked a chapter or 2 out of a scala
book I bought. My understanding is that it is both an object language and a
functional language. The only language I am extremely familia
Well if you're willing to manage the partitions by yourself, I think it would
be possible. You would have to create more partitions for your topics and then
assign more partitions to the new machines.
If kafka was able to rebalance automaticaly, and doing this by not only
considering the number
Yeah,
I thought of that (running 2 kafkas on one box), but it doesn't really add
the benefit of redundancy through replication (e.g. if we have 2 replicas
mapping to the same physical machine).
Jason
On Fri, May 17, 2013 at 2:50 PM, Chris Riccomini wrote:
> Hey guys,
>
> I have no idea if this
Hey guys,
I have no idea if this would be reasonable, but what about just running
two Kafka processes on the bigger box?
Cheers,
Chris
On 5/17/13 2:48 PM, "Jason Rosenberg" wrote:
>Just resource allocation issues. E.g. imagine having an existing kafka
>cluster with one machine spec, and getti
Just resource allocation issues. E.g. imagine having an existing kafka
cluster with one machine spec, and getting access to a few more hosts to
augment the cluster, which are newer and therefore have twice the disk
storage. I'd like to seamlessly add them into the cluster, without having
to repla
Neha,
apologies, I just re-read what I sent and realized my "you" wasn't specific
enough - it meant the Kafka team ;).
--
"If you can't conceal it well, expose it with all your might"
Alex Zuzin
On Friday, May 17, 2013 at 2:25 PM, Alex Zuzin wrote:
> Have you considered abstracting offset s
That does seem a little hacky. But I'm trying to understand the requirement
behind having to deploy heterogeneous hardware. What are you trying to
achieve or optimize?
Thanks,
Neha
On Fri, May 17, 2013 at 2:29 PM, Jason Rosenberg wrote:
> Hi,
>
> I'm wondering if there's a good way to have a h
Hi,
I'm wondering if there's a good way to have a heterogenous kafka cluster
(specifically, if we have nodes with different sized disks). So, we might
want a larger node to receive more messages than a smaller node, etc.
I expect there's something we can do with using a partitioner that has
spec
Have you considered abstracting offset storage away so people could implement
their own?
Would you take a patch if I'd stabbed at it, and if yes, what's the process
(pardon the n00b)?
KCBO,
--
"If you can't conceal it well, expose it with all your might"
Alex Zuzin
On Friday, May 17, 2013 at
to clarify, I meant that Robert could/should store the offsets in a faster
store not that kafka should default to that :)
Thanks Neha
On Fri, May 17, 2013 at 2:22 PM, Neha Narkhede wrote:
> There is no particular need for storing the offsets in zookeeper. In fact
> with Kafka 0.8, since partiti
Do you have any tests that measure that your high priority data is being
delayed ? Assuming you are using 0.8, the end to end latency can be reduced
by tuning some configs on the consumer (fetch.min.bytes, fetch.wait.max.ms
).
The defaults for these configs are already tuned for low latency though.
There is no particular need for storing the offsets in zookeeper. In fact
with Kafka 0.8, since partitions will be highly available, offsets could be
stored in Kafka topics. However, we haven't ironed out the design for this
yet.
Thanks,
Neha
On Fri, May 17, 2013 at 2:19 PM, Scott Clasen wrote:
There is no easy way to do this in 0.7. If you are using a VIP to produce
data to Kafka, you can remove that broker host from the VIP. It will still
be available for consumer reads until it is shut down but won't accept new
data from the producer. In 0.8, since there are many replicas, you will not
afaik you dont 'have' to store the consumed offsets in zk right, this is
only automatic with some of the clients?
why not store them in a data store that can write at the rate that you
require?
On Fri, May 17, 2013 at 2:15 PM, Withers, Robert wrote:
> Update from our OPS team, regarding zookeep
For every unit test, you want your server and consumer to start fresh. In
your unit test, you maintain state in the KafkaManager across unit tests so
your tests don't work as expected. Try moving KafkaManager to the setup
method. Currently, the 2nd test is consuming the message produced by the
1st
Update from our OPS team, regarding zookeeper 3.4.x. Given stability, adoption
of offset batching would be the only remaining bit of work to do. Still, I
totally understand the restraint for 0.8...
"As exercise in upgradability of zookeeper, I did a "out-of-the"box" upgrade on
Zookeeper. I d
Another parameter that you want to tune if you want higher throughput in
0.7 is the broker side flush interval. If you set it to something high,
like 50K or 100K, you can maximize the throughput achieved by your producer.
Thanks,
Neha
On Fri, May 17, 2013 at 8:11 AM, Jun Rao wrote:
> You earli
Turns out it was OpenJDK on the AWS AMI instance. As soon as I replaced
OpenJDK:
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.11)
(amazon-61.1.11.11.53.amzn1-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
With Oracle JDK 1.6.0_38, all of my problems went away
Thanks, I created KAFKA-909 for the feature request.
As an aside, I see that in Jira, the available kafka versions are 0.8, 0.8.1
and 0.9. I see the branch in github for 0.8, does this mean that the trunk is
0.8.1? And 9 is future planning on jira, but not established in github, yet?
Thanks,
We have a realtime analysis system setup using a Kafka 7 cluster to
queue incoming data, but occasionally have a need to take one of the
nodes offline. Is there a method to set a broker to be read-only so
that we can drain unprocessed data from it prior to shutting it down?
- Adam
I shall. Thanks!
Thanks,
Rob Withers
Staff Analyst/Developer
o: (720) 514-8963
c: (571) 262-1873
-Original Message-
From: Jun Rao [mailto:jun...@gmail.com]
Sent: Friday, May 17, 2013 8:53 AM
To: users@kafka.apache.org
Subject: Re: could an Encoder/Decoder be stateful?
Possible,
Could you add some JMX stats for us, then?
- Queue length, by group offset vs lastOffset
- latency between produce and consume, again by group
Thanks,
Rob Withers
Staff Analyst/Developer
o: (720) 514-8963
c: (571) 262-1873
-Original Message-
From: Jun Rao [mailto:jun...@gmail.co
3.3.3 has known serious bugs. You should at least use 3.3.4. I am not aware
of a JMS bridge. Contributions are welcome :)
Thanks,
Neha
On May 17, 2013 8:36 AM, "Seshadri, Balaji"
wrote:
> Hi Neha,
>
> I figured out it was issue with jar dependencies,I tried to use latest
> stable version of zook
Hi Neha,
I figured out it was issue with jar dependencies,I tried to use latest stable
version of zookeeper(3.4.5) and it broke.
It working fine now with 3.3.3.
I have one more question :).
Do you guys have JMS bridge to Kafka,because webMethods supports only
properitery and JMS spec so hooki
I am using the Java producer, and it is reproducible. When you say that
the producer sends the corrupted data, are you referring to the producer
as a black box, or something in my code?
Where I'm getting stuck is that my producer data seems so ridiculously
simple - create a ProducerData, following
This indicates the messages sent to the broker are corrupted. Typically,
this is because either the producer sends the corrupted data somehow or the
network is flaky. Are you using a java producer? Is this reproducible?
Thanks,
Jun
On Fri, May 17, 2013 at 7:08 AM, Jason Weiss wrote:
> I have
You earlier email seems to focus on latency, not throughput. Typically, you
can either optimize for latency or throughput, but not both. If you want
higher throughput, you should consider using the async mode in the producer
with a larger batch size (e.g., 1000 messages). Using more instances of
pr
There are mainly 2 things to consider for latency. (1) How quickly does the
producer send the message to the broker? If you use the sync mode in the
producer, the message will be sent immediately (throughput will be lower in
the sync mode though). If you use the async mode, the messages are sent
af
Possible, but definitely a post 0.8 item. If you are interested, could you
file a jira to track this?
Thanks,
Jun
On Thu, May 16, 2013 at 10:06 PM, Rob Withers wrote:
> Could the producer be adapted to support the interface of the consumer?
>
> Thanks,
> rob
>
> > -Original Message-
>
For monitoring, we have jmxs on the broker for both the message and the
byte rate.
Thanks,
Jun
On Thu, May 16, 2013 at 10:04 PM, Rob Withers wrote:
> Immediately monitoring. Later, possible thresholding to evoke a
> reconfiguration of the number of partitions into a new topic and migrate
> m
Hi Neha,
I am sorry about the design of this program, but what I am trying to do is
to expose API to read and write from Kafka. I have some unit test cases as
given below. The problem is that the first test case passes but the second
test case fails saying that :
org.junit.ComparisonFailure: expec
I have a simple multi-threaded app trying to send numerous fixed-length, 2048
byte or 3072 byte messages into an Apache Kafka 0.7.2 cluster (3 machines)
running in AWS on some AWS AMIs. When the messaging volume increases rapidly, a
spike, I start running into lots of problems, specifically
Inv
Hi Neha,
I am a newbie to Kafka. My understanding so far is that batch size is
supported only for async producer. Kindly correct me if I am wrong. The
best latency with async producer I am getting is of the order of 300ms. Is
that expected?
I have to stick to 0.7 because KafkaSpout for Storm see
Perfect.
By the way, I see you went to GT...you wouldn't happen to know a certain Lex
Spoon, would you?
thanks,
rob
From: Neha Narkhede [neha.narkh...@gmail.com]
Sent: Friday, May 17, 2013 7:40 AM
To: users@kafka.apache.org
Subject: RE: possible to shutd
Absolutely, I will. However, I am pretty full today and I'll need more than 15
minutes to absorb what you have written, which looks very comprehensive.
thanks,
rob
From: Neha Narkhede [neha.narkh...@gmail.com]
Sent: Friday, May 17, 2013 7:32 AM
To: users@
If you turn off auto.commit.enable, that will ensure that messages are
replayed whenever a consumer starts up and rebalances.
Thanks,
Neha
On May 17, 2013 6:35 AM, "Withers, Robert" wrote:
> Certainly I will try. Our understanding is that there are 2 scenarios
> where messages could be replaye
Fair enough, this is something to look forward to. I appreciate the restraint
you show to stay out of troubled waters. :)
thanks,
rob
From: Neha Narkhede [neha.narkh...@gmail.com]
Sent: Friday, May 17, 2013 7:35 AM
To: users@kafka.apache.org
Subject: RE
Kishore,
The end to end latency will be much lower in 08. For 07, you can try to
tune the producer batch size so it sends data quicker. You can also reduce
the log.flush.interval on the broker but that will drive up iops.
Thanks,
Neha
On May 16, 2013 11:39 PM, "Kishore V. Kopalle"
wrote:
> Hi F
Upgrading to a new zookeeper version is not an easy change. Also zookeeper
3.3.4 is much more stable compared to 3.4.x. We think it is better not to
club 2 big changes together. So most likely this will be a post 08 item for
stability purposes.
Thanks,
Neha
On May 17, 2013 6:31 AM, "Withers, Rober
Certainly I will try. Our understanding is that there are 2 scenarios where
messages could be replayed:
1. if a consumer falls over hard, there are some message consumptions whose
offsets had not yet been flushed to zookeeper and so when a rebalance occurs
the consumer that starts getting mess
We spent some time thinking about consolidating the high level and low
level consumer APIs. It will be great if you can read the wiki and provide
feedback -
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
Thanks,
Neha
On May 16, 2013 10:29 PM, "Rob Withers" wrote:
> W
Awesome! Thanks for the clarification. I would like to offer my strong vote
that this get tackled before a beta, to get it firmly into 0.8. Stabilize
everything else to the existing use, but make offset updates batched.
thanks,
rob
From: Neha Narkhede
Can you provide more details about what you mean by measuring replay when
you kill a consumer?
Thanks,
Neha
On May 17, 2013 6:26 AM, "Withers, Robert" wrote:
> Would it be possible for someone to provide me with a 0.8 jar that
> implements a ConsumerConnector.hardShutdown, which would interrupt
You can read the high level design of kafka replication here
http://www.slideshare.net/junrao/kafka-replication-apachecon2013
Generally if your replication factor is more than 1 you shouldn't see data
loss in your test. When a broker fails, the producer will get an exception
and it will retry.
Th
Would it be possible for someone to provide me with a 0.8 jar that implements a
ConsumerConnector.hardShutdown, which would interrupt all threads yet not do a
final offset flush. We want to measure replay so we want to simulate a kill
-9, but we want to keep running the process to flush stats a
First of all, do you see any delay at all for your high priority data?
There are config options that you can tune so that your consumer can see
the data as soon as it is available.
Thanks,
Neha
On May 16, 2013 10:24 PM, "Chitra Raveendran" <
chitra.raveend...@fluturasolutions.com> wrote:
> HI
>
>
Sorry I wasn't clear. Zookeeper 3.4.x has this feature. As soon as 08 is
stable and released it will be worth looking into when we can use zookeeper
3.4.x.
Thanks,
Neha
On May 16, 2013 10:32 PM, "Rob Withers" wrote:
> Can a request be made to zookeeper for this feature?
>
> Thanks,
> rob
>
> > -
Hi Joel,
Any updates on the c++ producer?
Thanks,
Anand
On 3 April 2013 05:59, Joel Koshy wrote:
> Yes - we would be interested in doing that. I have been spending most of my
> time over the past couple weeks on the C++ library (currently, only for the
> producer). It is reasonably stable, alt
The timezone we are operating is Indian Standard Time (IST). So you will
see times like 2013-05-17 11:54:02.973 in the messages below.
On Fri, May 17, 2013 at 12:59 PM, Kishore V. Kopalle <
kish...@greenmedsoft.com> wrote:
> Hi Mathias,
>
> Yes they are. But we are talking about a single box run
Hi Mathias,
Yes they are. But we are talking about a single box running Zoo Keeper,
Kafka server and the Consumer/Producer.
Regards,
Kishore
On Fri, May 17, 2013 at 12:53 PM, Mathias Herberts <
mathias.herbe...@gmail.com> wrote:
> Just for the sake of it, are your clocks synchronized using NTP?
Hello All,
Our use case is to display certain aggregates on GUI from a live stream of
data coming in at more than 100k messages/sec. Will I be able to use Kafka
for handling at least 100k messages/sec and send it to Twitter Storm for
aggregate calculations is the question I have. I already know th
Just for the sake of it, are your clocks synchronized using NTP?
On Fri, May 17, 2013 at 8:28 AM, Kishore V. Kopalle
wrote:
> Hi Francis/Stone,
>
> I have modified log.default.flush.interval.ms to have a value of 1 in
> config/server.properties file. The time did not come down as can be seen
> fr
57 matches
Mail list logo