urrent design.
> >
> > In any event, there is a configuration that you can tweak to set the max
> > time the producer will spend blocking in send(): max.block.ms
> >
> > -Dana
> >
> >
> >> On Tue, Mar 29, 2016 at 7:26 PM, Steven Wu
> wrote:
>
I also agree that returning a Future should never block. I have brought
this up when 0.8.2 was first released for new Java producer.
As Oleg said, KafkaProducer can also block if metadata is not fetched. This
is probably more often than offline broker, because metadata is loaded
lazily when there
nd any unusual things in the logs on brokers
> , it is likely that the process was up but was isolated from producer
> request and since the producer did not have timeout the producer buffer
> filled up.
>
> Thanks,
>
> Mayuresh
>
>
> On Thu, Sep 10, 2015 at 11:20 PM, Ste
e
> producer who are the new leaders for the partitions and the producer
> started sending messages to those brokers.
>
> KAFKA-2120 will handle all of this for you automatically.
>
> Thanks,
>
> Mayuresh
>
> On Tue, Sep 8, 2015 at 8:26 PM, Steven Wu wrote:
>
We have observed that some producer instances stopped sending traffic to
brokers, because the memory buffer is full. those producers got stuck in
this state permanently. Because we couldn't find out which broker is bad
here. So I did a rolling restart the all brokers. after the bad broker got
bounc
Hi,
I am talking about the 0.8.2 java producer.
In our deployment, we disables auto topic creation, because we would like
to control the precise number of partitions created for each topic and the
placement of partitions (e.g. zone-aware).
I did some experimentation and checked the code. metadat
you hit that
> kernel panic issue?
>
> Our company will still be running on AMI image 12.04 for a while, I will
> see whether the fix was also ported onto Ubuntu 12.04
>
> On Tue, Jun 2, 2015 at 2:53 PM, Steven Wu wrote:
>
> > now I remember we had same kernel panic iss
on
> June 2, 2015 at 4:39 PM
>
> On Jun 2, 2015, at 1:22 PM, Steven Wu
> wrote:
>
> can you elaborate what kind of instability you have encountered?
>
> We have seen the nodes become completely non-responsive. Usually they get
> rebooted automatically after 10-20 mi
Wes/Daniel,
can you elaborate what kind of instability you have encountered?
we are on Ubuntu 14.04.2 and haven't encountered any issues so far. in the
announcement, they did mention using Ubuntu 14.04 for better disk
throughput. not sure whether 14.04 also addresses any instability issue you
enc
EBS (network attached storage) has got a lot better over the last a few
years. we don't quite trust it for kafka workload.
At Netflix, we were going with the new d2 instance type (HDD). our
perf/load testing shows it satisfy our workload. SSD is better in latency
curve but pretty comparable in ter
in your callback impl object, you can save a reference to the actual
message.
On Wed, Mar 18, 2015 at 10:45 PM, sunil kalva wrote:
> Hi
> How do i access the actual message which is failed to send to cluster using
> Callback interface and onCompletion method.
>
> Basically if the sender is faile
metadata fetch only happens/blocks for the first time you call send. after
the metadata is retrieved can cached in memory. it will not block again. so
yes, there is a possibility it can block. of course, if cluster is down and
metadata was never fetched, then every send can block.
metadata is also
like as
> much an opportunity for improvement in the code as anything. Would you be
> willing to share the details?
>
> -jay
>
> On Sunday, February 22, 2015, Steven Wu wrote:
>
> > > The low connection partitioner might work for this
> > by attempting to reuse recent
> The low connection partitioner might work for this
by attempting to reuse recently used nodes whenever possible. That is
useful in environments with lots and lots of producers where you don't care
about semantic partitioning.
In one of the perf test, we found that above "sticky" partitioner impr
Jun,
You are right. I tried 0.8.2.0 producer with my test. confirmed that it
fixed the cpu issue.
Thanks,
Steven
On Thu, Feb 19, 2015 at 12:02 PM, Steven Wu wrote:
> will try 0.8.2.1 on producer and report back result.
>
> On Thu, Feb 19, 2015 at 11:52 AM, Jun Rao wrote:
>
Feb 19, 2015 at 10:42 AM, Steven Wu wrote:
>
> > forgot to mention in case it matters
> > producer: 0.8.2-beta
> > broker: 0.8.1.1
> >
> > On Thu, Feb 19, 2015 at 10:34 AM, Steven Wu
> wrote:
> >
> > > I think this is an issue caused by KAFKA-1788.
>
forgot to mention in case it matters
producer: 0.8.2-beta
broker: 0.8.1.1
On Thu, Feb 19, 2015 at 10:34 AM, Steven Wu wrote:
> I think this is an issue caused by KAFKA-1788.
>
> I was trying to test producer resiliency to broker outage. In this
> experiment, I shutdown all brokers
I think this is an issue caused by KAFKA-1788.
I was trying to test producer resiliency to broker outage. In this
experiment, I shutdown all brokers and see how producer behavior.
Here are the observations
1) kafka producer can recover from kafka outage. i.e. send resumed after
brokers came back
don't know whether it is the cause of your issue or not. but "batch.size"
is measured as bytes (not number of messages). default is 16384
On Mon, Feb 16, 2015 at 12:11 PM, ankit tyagi
wrote:
> Hey,
>
> I am doing POC on kafka .8.2.0 version.Currently I am using kafka-client
> of 0.8.2.0 version
couldn't reproduce/confirm the issue with my test. send 6 million msgs from
6 instances. got 6 million callbacks.
this could be some metric issues.
On Mon, Feb 9, 2015 at 8:23 PM, Steven Wu wrote:
> I don't have strong evidence that this is a bug yet. let me write some
> test
;
> -Jay
>
> On Mon, Feb 9, 2015 at 5:19 PM, Steven Wu wrote:
>
> > We observed some small discrepancy in messages sent per second reported
> at
> > different points. 1) and 4) matches very close. 2) and 3) matches very
> > close but are about *5-6% lower* compared t
We observed some small discrepancy in messages sent per second reported at
different points. 1) and 4) matches very close. 2) and 3) matches very
close but are about *5-6% lower* compared to 1) and 4).
1) send attempt from producer
2) send success from producer
3) record-send-rate reported by kafka
to see if the data is corrupted on disk?
>
> Thanks,
>
> Jun
>
>
> On Wed, Feb 4, 2015 at 9:55 AM, Steven Wu wrote:
>
> > Hi,
> >
> > We have observed these two exceptions with consumer *iterator.next()*
> > recently. want to ask how should we handle them
$java_pid" -L -o tid,pcpu
>> jstack -F "$java_pid"
>>
>> Then compare the thread # (may have to Hex # to decimal) between the
>> Jstack
>> and ps command. This will tell you which thread is consuming more CPU
>> for
>> that process.
>>
&
this is the jira regarding blocking on metadata.
https://issues.apache.org/jira/browse/KAFKA-1835
I am less concerned about the first-time blocking. I am more concerned
about the situation when kafka cluster/brokers are completely down. now we
can screw up the producer apps. I hope that we can tak
Hi,
We have observed these two exceptions with consumer *iterator.next()*
recently. want to ask how should we handle them properly.
*1) CRC corruption*
Message is corrupt (stored crc = 433657556, computed crc = 3265543163)
I assume in this case we should just catch it and move on to the next msg
o Hex # to decimal) between the Jstack
> and ps command. This will tell you which thread is consuming more CPU for
> that process.
>
> Thanks,
>
> Bhavesh
>
> On Wed, Feb 4, 2015 at 9:01 AM, Steven Wu wrote:
>
> > I have re-run my unit test with 0.8.2.0. same tigh
ssues are fixed in the new release. Please
> let us know if you still see the issue with it.
>
> Guozhang
>
> On Tue, Feb 3, 2015 at 8:52 PM, Steven Wu wrote:
>
> > sure. will try my unit test again with 0.8.2.0 release tomorrow and
> report
> > back my findings
he same test with the 0.8.2.
> final release?
>
> -Jay
>
> On Tue, Feb 3, 2015 at 8:37 PM, Steven Wu wrote:
>
> > actually, my local test can reproduce the issue although not immediately.
> > seems to happen after a few mins. I enabled TRACE level logging. here
> see
360186.
kafka-producer-network-thread | foo 20:34:32,626 DEBUG NetworkClient:369 -
Trying to send metadata request to node -2
On Tue, Feb 3, 2015 at 8:10 PM, Steven Wu wrote:
> Hi,
>
> We have observed high cpu and high network traffic problem when
> 1) cluster (0.8.1.1) has
Hi,
We have observed high cpu and high network traffic problem when
1) cluster (0.8.1.1) has no topic
2) KafkaProducer (0.8.2-beta) object is created without sending any traffic
We have observed such problem twice. In both cases, problem went away
immediately after one/any topic is created.
Is t
In Netflix, we have been using route53 DNS name as bootstrap servers in AWS
env. Basically, when a kafka broker start, we add it to route53 DNS name
for the cluster. this is like the VIP that Jay suggested.
But we are also moving toward to use Eureka service registry for
bootstrapping. We are worr
do you need total ordering among all events? or you just need ordering by
some partitionKey (e.g. events regarding one particular database key or
user id)? if it's the later, you can create multiple partitions and just
partition your events using the key to different kafka partitions.
On Fri, Jan
Jay, I don't think this line will bootstrap full metadata (for all topics).
it will just construct the cluster object with bootstrap host. you need to
do "metadata.add(topic)" to set interest of a topic's partition metadata.
Guozhang, I personally think this is ok. it just do a few DNS lookup or
adata for all topics when it is created. Note that this
> then brings back the opposite problem--doing remote communication during
> initialization which tends to bite a lot of people. But since this would be
> an option that would default to false perhaps it would be less likely to
> come as
+1. it should be truly async in all cases.
I understand some challenges that Jay listed in the other thread. But we
need a solution nonetheless. e.g. can we maintain a separate
list/queue/buffer for pending messages without metadata.
On Tue, Dec 23, 2014 at 12:57 PM, John Boardman
wrote:
> I wa
Guozhang,
can you point me to the code that implements "periodic/sticky" random
partitioner? I actually like to try it out in our env, even though I assume
it is NOT ported to 0.8.2 java producer.
Thanks,
Steven
On Mon, Dec 8, 2014 at 1:43 PM, Guozhang Wang wrote:
> Hi Yury,
>
> Originally th
> In practice the cases that actually mix serialization types in a single
stream are pretty rare I think just because the consumer then has the
problem of guessing how to deserialize, so most of these will end up with
at least some marker or schema id or whatever that tells you how to read
the data
would work (mirrormaker doesn't support that),
> and so would mirroring to another cluster. Neha's proposal would work also
> but I assume its a lot more work for the Kafka internals and therefor IMHO
> wouldn't meet the kiss principle.
>
> Kind regards,
> Erik.
&g
I think it doesn't have to be two more clusters. can be just two more
topics. MirrorMaker can copy from source topics in both regions into one
aggregate topic.
On Tue, Oct 21, 2014 at 1:54 AM, Erik van oosten <
e.vanoos...@grons.nl.invalid> wrote:
> Thanks Neha,
>
> Unfortunately, the maintenance
Jun
>
> On Wed, Oct 8, 2014 at 8:57 PM, Steven Wu wrote:
>
> > I have seen very high "Fetch-Consumer-RequestsPerSec" (like 15K) per
> broker
> > in a relatively idle cluster. My hypothesis some misbehaving consumer
> has a
> > tight polling loop without
I have seen very high "Fetch-Consumer-RequestsPerSec" (like 15K) per broker
in a relatively idle cluster. My hypothesis some misbehaving consumer has a
tight polling loop without any back-off logic with empty fetch.
Unfortunately, this metric doesn't have per-topic breakdown like
"BytesInPerSec" o
couldn't see your graph. but your replicator factor is 2. then replication
traffic can be the explanation. basically, BytesOut will be 2x of BytesIn.
On Thu, Sep 25, 2014 at 6:19 PM, ravi singh wrote:
> I have set up my kafka broker with as single producer and consumer. When I
> am plotting the
hangeListener.handleChildChange(PartitionStateMachine.scala:403)
at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
On Tue, Sep 9, 2014 at 7:18 PM, Steven Wu wrote:
> previous email is from state-change.l
)
On Tue, Sep 9, 2014 at 4:14 PM, Steven Wu wrote:
> ah. maybe you mean the controller log on leader/controller broker 5. yes.
> I do noticed some errors regarding these two partitions.
>
>
> [2014-09-09 01:10:53,651] ERROR Controller 5 epoch 5 encountered error
> while changing p
achine.scala:328)
at kafka.utils.Utils$.inLock(Utils.scala:538)
at
kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:327)
at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
at org.I0Itec.zkclient.ZkEventThread.
ing detected as the new broker
> when broker 0 comes up?
>
> Thanks,
>
> Jun
>
> On Tue, Sep 9, 2014 at 3:51 PM, Steven Wu wrote:
>
> > nope. sate-change log files only had some warnings regarding other
> > partitions. nothing related to these two partition
> you see anything related to these 2 partitions when broker 0 comes up?
>
> Thanks,
>
> Jun
>
> On Tue, Sep 9, 2014 at 9:41 AM, Steven Wu wrote:
>
> > noticed one important thing. topic foo's partition 1 and 2 have empty
> .log
> > file on replicas. I su
noticed one important thing. topic foo's partition 1 and 2 have empty .log
file on replicas. I suspect replication doesn't create the partition dir on
broker 0 in this case, which then cause the WARN logs.
On Mon, Sep 8, 2014 at 11:11 PM, Steven Wu wrote:
> sorry. forgot to ment
sorry. forgot to mention that I am running 0.8.1.1
On Mon, Sep 8, 2014 at 9:26 PM, Steven Wu wrote:
> did a push in cloud. after new instance for broker 0 comes up, I see a lot
> of WARNs in log file.
>
> 2014-09-09 04:21:09,271 WARN kafka.utils.Logging$class:83
> [request-
>
> Thanks,
>
> Jun
>
> On Mon, Sep 8, 2014 at 9:07 PM, Steven Wu wrote:
>
> > I did a push. new instance comes up and tries to fetch log/data from
> other
> > peers/replicas. Out of 60 partitions assigned for broker 0, it sync'ed up
> > 59. but for wh
did a push in cloud. after new instance for broker 0 comes up, I see a lot
of WARNs in log file.
2014-09-09 04:21:09,271 WARN kafka.utils.Logging$class:83
[request-expiration-task] [warn] [KafkaApi-0] Fetch request with
correlation id 51893 from client 1409779957450-6014fc32-0-0 on partition
[foo
I did a push. new instance comes up and tries to fetch log/data from other
peers/replicas. Out of 60 partitions assigned for broker 0, it sync'ed up
59. but for whatever reason, it didn't try to fetch this partition/topic.
[out-of-sync replica] BrokerId: 0, Topic: foo, PartitionId: 6, Leader: 5,
Re
rokerconfigs
>
> /***
> Joe Stein
> Founder, Principal Consultant
> Big Data Open Source Security LLC
> http://www.stealth.ly
> Twitter: @allthingshadoop
> ****/
> On Jul 31, 2014 6:52 PM, "Steven Wu&q
it seems that log retention is purely based on last touch/modified
timestamp. This is undesirable for code push in aws/cloud.
e.g. let's say retention window is 24 hours. disk size is 1 TB. disk util
is 60% (600GB). when new instance comes up, it will fetch log files (600GB)
from peers. those log
on based on hours to avoid too frequent
> log rolling and in turn too small segment files. For your case this may be
> reasonable to set the rolling criterion on minutes. Could you file a JIRA?
>
> Guozhang
>
>
> On Mon, Jun 2, 2014 at 4:00 PM, Steven Wu
> wrote:
>
> >
This might be a bit unusual. We have a topic that we only need to keep last
5 minutes of msgs so that replay from beginning is fast.
Although retention.ms has time unit of minute, segment.ms ONLY has time
unit of hour. If I understand cleanup correctly, it can only delete files
that are rolled ove
to point it out the actual
danger.
On Thu, May 22, 2014 at 8:49 PM, Jun Rao wrote:
> Delete topic is not supported in 0.8.1.1. How did you do it?
>
> Thanks,
>
> Jun
>
>
> On Thu, May 22, 2014 at 9:59 AM, Steven Wu wrote:
>
> > yes. I deleted a topic. but not at
ldChange$1.apply(ReplicaStateMachine.scala:328)
> at kafka.utils.Utils$.inLock(Utils.scala:538)
> at
>
> kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:327)
> at org.I0Itec.zkclient.ZkClient$7.run(ZkClien
if placing mirror maker in the same datacenter as target cluster,
it/consumer will talks to zookeeper in remote/source datacenter. would it
more susceptible to network problems?
As for the problem commit offset without actually producing/writing msgs to
target cluster, it can be solved by disablin
60 matches
Mail list logo