Hi Gwen,
I have changed the java code kafkawordcount to use reducebykeyandwindow in
spark.
- Messaggio originale -
Da: "Gwen Shapira"
Inviato: 03/11/2014 21:08
A: "users@kafka.apache.org"
Cc: "u...@spark.incubator.apache.org"
Oggetto: Re: Spark Kafka Performance
Not sure about the
Hi,
While setting up a cluster, I realised that Kafka uses host names to
communicate between brokers. In my case, I don't have a DNS server to keep
track of host names, and set up the host names manually. Because of this,
brokers cannot connect to each other.
This behaviour can be easily fixed by
Good day all,
We're running a good sized Kafka cluster, running 0.8.1, and during our
peak traffic times replication falls behind. I've been doing some reading
about parameters for tuning replication, but I'd love some real world
experience and insight.
Some general questions:
* Does Kafka 'lik
Hi Kafka Dev Team,
It seems that Maximum buffer size is set to 2 default. What is impact of
changing this to 2000 or so ? This will improve the consumer thread
performance ? More event will be buffered in memory. Or Is there any
other recommendation to tune High Level Consumers ?
Here is co
Hi Neha and Jun,
I have fixed the issue on my side based on what Jun had mentioned "next()
gives IllegalStateException if hasNext is not called..." Based on this I
did further debug, I was my mistake sharing same consumer iterator across
multiple threads so (I forgot to call iterator.remove() in
We used to default to 10, but two should be sufficient. There is
little reason to buffer more than that. If you increase it to 2000 you
will most likely run into memory issues. E.g., if your fetch size is
1MB you would enqueue 1MB*2000 chunks in each queue.
On Tue, Nov 04, 2014 at 09:05:44AM -0800
Ops-experts can share more details but here are some comments:
>
> * Does Kafka 'like' lots of small partitions for replication, or larger
> ones? ie: if I'm passing 1Gbps into a topic, will replication be happier
> if that's one partition, or many partitions?
Since you also have to account for
I think your answers are pretty spot-on, Joel. Under Replicated Count is
the metric that we monitor to make sure the cluster is healthy. It lets us
know when a broker is down (because all the numbers except one broker are
elevated), or when a broker is struggling (low counts fluctuating across a
fe
Hi Kafka users!
I was just migrating a cluster of 3 brokers from one set of EC2 instances
to another, but ran into replication problems. The method of migration used
is that of stopping one broker and letting a new broker join with the same
broker.id. Replication started, but after ~4 of ~15 GB th
Hi all,
I've been investigating how Kafka 0.8.1.1 responds to the scenario where
one broker loses connectivity (due to something like a hardware issue or
network partition.) It looks like the brokers themselves adjust within a
few seconds to reassign leaders and shrink ISRs. However, I see produce
Thanks for info. I will have to tune the memory. What else do you
recommend for High level Consumer for optimal performance and drain as
quickly as possible with auto commit on ?
Thanks,
Bhavesh
On Tue, Nov 4, 2014 at 9:59 AM, Joel Koshy wrote:
> We used to default to 10, but two should be su
Hello Solon,
request.timeout.ms only controls the produce request timeout value, when
the producer's first produce request gets timed out, it will try to
re-fresh its metadata by sending metadata request. But when this
non-produce request hits the broker whose connectivity has been disabled
(i.e.
This seems to be related to https://issues.apache.org/jira/browse/KAFKA-1749
.
Guozhang
On Tue, Nov 4, 2014 at 10:30 AM, Christofer Hedbrandh <
christo...@knewton.com> wrote:
> Hi Kafka users!
>
> I was just migrating a cluster of 3 brokers from one set of EC2 instances
> to another, but ran int
Actually I think this issue has just been resolved:
https://issues.apache.org/jira/browse/KAFKA-1733
Guozhang
On Tue, Nov 4, 2014 at 11:22 AM, Guozhang Wang wrote:
> Hello Solon,
>
> request.timeout.ms only controls the produce request timeout value, when
> the producer's first produce request
I actually meant to say that you typically don't need to bump up the
queued chunk setting - you can profile your consumer to see if
significant time is being spent waiting on dequeuing from the chunk
queues.
If you happen to have a consumer consuming from a remote data center,
then you should cons
Yes, this lines up with the behavior I'm seeing. I'll wait for the patch to
be released and then retest. Thanks!
On Tue, Nov 4, 2014 at 2:36 PM, Guozhang Wang wrote:
> Actually I think this issue has just been resolved:
>
> https://issues.apache.org/jira/browse/KAFKA-1733
>
> Guozhang
>
> On Tue
Hi Guozhang,
This is the server.log -
[2014-11-04 20:21:57,510] INFO Verifying properties
(kafka.utils.VerifiableProperties)
[2014-11-04 20:21:57,545] INFO Property advertised.host.name is overridden
to x.x.x.x (kafka.utils.VerifiableProperties)
[2014-11-04 20:21:57,545] INFO Property broker.id i
Joel,
Thanks for your input - it fits what I was thinking, so it's good confirmation.
> The easiest mbean to look at is the underreplicated partition count.
> This is at the broker-level so it is coarse-grained. If it is > 0 you
> can use various tools to do mbean queries to figure out which
> pa
I agree that KAFKA-1070 would be great to get in. I especially felt the
need for something like this while using a few other systems that automated
the port, id etc to give a good OOTB experience. Sorry, I lost track of the
review. Will do so in the next few days.
Thanks,
Neha
On Mon, Nov 3, 2014
Hello,
We are using kafka version 0.8.1 and the python kafka client.
Everything has been working fine and suddenly this morning I saw
a OffsetOutOfRange on one of the partitions. (We have 20 partitions in our
kafka cluster)
We fixed it by seeking to the head offset and restarting the app.
Hi Jim,
OffsetOutOfRange means that the partition's log offset range is [a, b] and
the requested offset is either < a or > b. It could be caused by log
truncation based on the retention policy while consumer fetching at the
same time.
Guozhang
On Tue, Nov 4, 2014 at 4:21 PM, Jimmy John wrote:
Which version of Kafka are you using? Is the broker I/O or network
saturated? If so, that will limit the throughput that each producer can
achieve. If not, using a larger number messages per batch and/or enabling
producer side compression typically improves the producer throughput.
Thanks,
Jun
O
Hi Eduardo,
Can you please take thread dump and see if there are blocking issues on
producer side ? Do you have single instance of Producers and Multiple
treads ?
Are you using Scala Producer or New Java Producer ? Also, what is your
producer property ?
Thanks,
Bhavesh
On Tue, Nov 4, 2014 a
Hi Jim,
Maybe your consumer lagged behind the current smallest offset. And why it
happened? you might take a look at this ticket
https://issues.apache.org/jira/browse/KAFKA-1640
On Wed, Nov 5, 2014 at 8:46 AM, Guozhang Wang wrote:
> Hi Jim,
>
> OffsetOutOfRange means that the partition's log of
We have our threshold for under replicated set at anything over 2. The
reason we picked that number is because we have a cluster that tends to
take very high traffic for short periods of time, and 2 gets us around the
false positives (with a careful balance of the partitions in the cluster).
We're
Hi,
I am a new to Kafka. In my understanding, high-level consumer (
ZookeeperConsumerConnector) changes offset when message is drawn
by ConsumerIterator. But I would like to change offset when message is
processed, not when message is drawn from broker. So if a consumer dies
before a message is co
Background:
I have searched for a while online, and through the files located in the
kafka/logs directory, trying to find where kafka writes log output to in
order to debug the SimpleProducer I wrote. My producer is almost identical
to the simple producer located here
https://cwiki.apache.org/con
27 matches
Mail list logo