Hello, we are getting the following error:
server.log:[2019-12-17 15:05:28,757] ERROR [ReplicaManager broker=5] Error
processing fetch with max size 1048576 from consumer on partition my-topic-2:
(fetchOffset=312239, logStartOffset=-1, maxBytes=1048576,
currentLeaderEpoch=Optional.empty) (kafka
ip-address in the
host-field. Is there a way to make it replace the host-info AND use static
membership?
David Garcia
Staff Big Data Engineer
david.gar...@bazaarvoice.com
O:
M: 512.576.5864
Site <https://www.bazaarvoice.com/> | Blog <https://www.bazaarvoice.com/blog>
| Tw
Hello, my consumers are reporting invalid IP address. When running
kafka-consumer-groups –describe… I see the following:
TOPICPARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
CONSUMER-ID HOST CLIENT-ID
Topic1 12 1319627
<https://github.com/prometheus/jmx_exporter> to get Kafka metrics into
prometheus. Here’s our JMX Exporter config:
https://github.com/wikimedia/puppet/blob/production/modules/profile/files/kafka/broker_prometheus_jmx_exporter.yaml
On Thu, Nov 9, 2017 at 11:42
JMX was pretty easy to setup for us. Look up the various jmx beans locally for
your particular version of kafka (i.e. with jconsole..etc)
https://github.com/jmxtrans/jmxtrans
On 11/8/17, 7:10 PM, "chidigam ." wrote:
Hi All,
What is the simplest way of monitoring the metrics in kaka br
I’m not sure how your requirements of Kafka are related to your requirements
for marathon. Kafka is a streaming-log system and marathon is a scheduler.
Mesos, as your resource manager, simply “manages” resources. Are you asking
about multitenancy? If so, I highly recommend that you separate
If you’re on the AWS bandwagon, you can use Kinesis-Analytics
(https://aws.amazon.com/kinesis/analytics/). It’s very easy to use. Kafka is
a bit more flexible, but you have to instrument maintain it. You may also want
to look at Druid: http://druid.io/ There are some dashboards that you can
Consumers can be split up based on partitions. So, you can tell a consumer
group to listen to several topics and it will divvy up the work. Your use case
sounds very canonical. I would take a look at Kafka connect (if you’re using
the confluent stack).
-Daivd
http://docs.confluent.io/curren
Make a topic and then set the retention to 1 hour. Every 15 minutes, start a
consumer that always reads from the beginning.
-David
On 9/5/17, 9:28 AM, "Tauzell, Dave" wrote:
What are you going to do with the messages every 15 minutes?
One way I can think of is to have two consume
You are not going to get that kind of latency (i.e. less than 100
microseconds). In my experience, consumer->producer latency averages around:
20 milliseconds (cluster is in AWS with enhanced networking).
On 8/3/17, 2:32 PM, "Chao Wang" wrote:
Hi,
I observed that it took 2-6 mill
/home.html
-David
On 7/17/17, 12:35 PM, "David Garcia" wrote:
I think he means something like Akka Streams:
http://doc.akka.io/docs/akka/2.5.2/java/stream/stream-graphs.html
Directed Acyclic Graphs are trivial to construct in Akka Streams and use
back-pressure to preclude mem
I think he means something like Akka Streams:
http://doc.akka.io/docs/akka/2.5.2/java/stream/stream-graphs.html
Directed Acyclic Graphs are trivial to construct in Akka Streams and use
back-pressure to preclude memory issues.
-David
On 7/17/17, 12:20 PM, "Guozhang Wang" wrote:
Sameer,
I would just look at an example:
https://github.com/confluentinc/kafka-connect-jdbc
https://github.com/confluentinc/kafka-connect-hdfs
On 7/12/17, 8:27 AM, "Debasish Ghosh" wrote:
Hi -
I would like to use the embedded API of Kafka Connect as per
https://cwiki.apache.org/con
“…events so timely that the bearing upon of which is not immediately apparent
and are hidden from cognitive regard; the same so tardy, they herald apropos”
On 7/7/17, 12:06 PM, "Marcelo Vinicius" wrote:
Hello, my name is Marcelo, and I am from Brazil. I'm doing a search on
Kafka. I woul
If you’re using confluent, you can use the control center. It’s not free
however.
From: Muhammad Arshad
Reply-To: "users@kafka.apache.org"
Date: Monday, June 19, 2017 at 5:52 PM
To: "users@kafka.apache.org"
Subject: Kafka Monitoring
Hi,
wanted to see if there is Kafka monitoring which is av
What is your in-sync timeout set to?
-David
On 5/24/17, 5:57 AM, "Vinayak Sharma" wrote:
Hi,
I am running Kafka as a 2 node cluster.
When I am scaling up and down 1 kafka broker I am experiencing loss of
messages at consumer end during reassignment of partitions.
Do you
Unlike spark, you don’t need an entire framework to deploy your job. With
Kstreams, you just start up an application and go. You don’t need docker
either…although containerizing your stuff is probably a good strategy for the
purposes of deployment management (something you get with Yarn or a s
For Problem 1, you will probably have to either use the low-level API, and/or
do manual partition assignment.
For problem 2, you can simply re-publish the messages to a new topic with more
partitions…or, as in the first problem, just use the low level API. You can
also create more consumer gro
You can use kafka connect and something like bottled water if you want to be
fancy.
On 4/20/17, 8:46 AM, "arkaprova.s...@cognizant.com"
wrote:
Hi,
I would like to ingest data from RDBMS to CLOUD platform like Azure
HDInsight BLOB using Kafka . What will be the best practice in te
What do broker logs say around the time you send your messages?
On 4/18/17, 3:21 AM, "Ranjith Anbazhakan"
wrote:
Hi,
I have been testing behavior of multiple broker instances of kafka in same
machine and facing inconsistent behavior of producer sent records to buffer not
being av
The “NewShinyProducer” is also deprecated.
On 4/18/17, 5:41 PM, "David Garcia" wrote:
The console producer in the 0.10.0.0 release uses the old producer which
doesn’t have “backoff”…it’s really just for testing simple producing:
object ConsoleProducer {
def
The console producer in the 0.10.0.0 release uses the old producer which
doesn’t have “backoff”…it’s really just for testing simple producing:
object ConsoleProducer {
def main(args: Array[String]) {
try {
val config = new ProducerConfig(args)
val reader =
Class.forName(c
Kafka is very reliable when the broker actually gets the message and replies
back to the producer that it got the message (i.e. it won’t “lie”). Basically,
your producer tried to put too many bananas into the Bananer’s basket. And
yes, Windows is not supported. You will get much better perfor
On Tue, Apr 11, 2017 at 9:41 PM, David Garcia wrote:
> One issue is that Kafka leverage some very specific features of the linux
> kernel that are probably far different from Windows, so I imagine the
> performance profile is likewise much different.
>
> On
One issue is that Kafka leverage some very specific features of the linux
kernel that are probably far different from Windows, so I imagine the
performance profile is likewise much different.
On 4/11/17, 8:52 AM, "Tomasz Rojek" wrote:
Hi All,
We want to choose provider of messagin
I don’t think “benchmarking” frameworks WRT Kafka is a particularly
informative. The various frameworks available are better compared WRT their
features and processing limitations. For example, Akka-streams for kafka
effects a more intuitive way to express asynchronous operations. If you were
Thanks,
Jun
> On Mar 22, 2017, at 3:07 PM, David Garcia wrote:
>
> producer purgatory size
Look at producer purgatory size. Anything greater than 10 is bad (from my
experience). Keeping it under 4 seemed to help us. (i.e. if a broker is
getting slammed with write, use rebalance tools or add a new broker). Also
check network latency and/or adjust timeout for ISR checking. If on AW
Gah…nm…looked at source code…use this:
schema.generator.class=io.confluent.connect.storage.hive.schema.TimeBasedSchemaGenerator
On 3/3/17, 5:36 PM, "David Garcia" wrote:
Trying to user s3-loader and am getting this error:
org.apache.kafka.common.config.ConfigExceptio
Trying to user s3-loader and am getting this error:
org.apache.kafka.common.config.ConfigException: Invalid generator class: class
io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator
at
io.confluent.connect.storage.partitioner.TimeBasedPartitioner.newSchemaGenerator(T
From my experience, the bottle neck of a kafka cluster is the writing. (i.e.
the producers) The most direct way to measure how “stressed” the writing
threads are is to directly observe the producer purgatory buffer of your
brokers. The larger it gets the more likely a leader will report an ou
Sorry, wrong link: http://docs.confluent.io/2.0.1/kafka/deployment.html
On 1/24/17, 2:13 PM, "David Garcia" wrote:
This should give you an idea:
https://www.confluent.io/blog/design-and-deployment-considerations-for-deploying-apache-kafka-on-aws/
On 1/23/17, 10:25
This should give you an idea:
https://www.confluent.io/blog/design-and-deployment-considerations-for-deploying-apache-kafka-on-aws/
On 1/23/17, 10:25 PM, "Ewen Cheslack-Postava" wrote:
Smaller servers/instances work fine for tests, as long as the workload is
scaled down as well. Most me
https://kafka.apache.org/documentation#operations
It’s in there somewhere… ;-)
On 11/28/16, 7:34 PM, "西风瘦" wrote:
HI! WAIT YOUR ANSWER
I would start here: http://docs.confluent.io/3.1.0/streams/index.html
On 11/26/16, 8:27 PM, "Alan Kash" wrote:
Hi,
New to Kafka land.
I am looking into Interactive queries feature, which transforms Topics into
Tables with history, neat !
1. What kind of querie
ocess1")
>.addStateStore(profiles, "deltaProcess2", "deltaProcess1")
>.addStateStore(company_bucket, "deltaProcess2", "deltaProcess1");
>
> KafkaStreams streams = new KafkaStreams(builder, config);
>
> streams.setUncaug
If you are consuming from more than one topic/partition, punctuate is triggered
when the “smallest” time-value changes. So, if there is a partition that
doesn’t have any more messages on it, it will always have the smallest
time-value and that time value won’t change…hence punctuate never gets
Sorry, had a typo in my gist. Here is the correct location:
https://gist.github.com/anduill/710bb0619a80019016ac85bb5c060440
On 10/19/16, 4:33 PM, "David Garcia" wrote:
Hello everyone. I’m having a hell of a time figuring out a Kafka
performance issue in AWS. Any help
Hello everyone. I’m having a hell of a time figuring out a Kafka performance
issue in AWS. Any help is greatly appreciated!
Here is our AWS configuration:
- Zookeeper Cluster (3.4.6): 3-nodes on m4.xlarges (default
configuration) EBS volumes (sd1)
- Kafka Cluster (0.10.0):
Using the kafka-topics.sh script, simply set the retention in a way to remove
the message:
Kafka-topics.sh –zookeeper --alter –config retention.ms=
--topic
This is actually deprecated, but still works in newer kafka 0.10.0. Note:
cleanup=delete is required for this. This policy will only e
WRT performance, yes, changing message size will affect the performance of
producers and consumers. Please study the following to understand the
relationship between message size and performance (graphs at the bottom
visualize the relationship nicely):
https://engineering.linkedin.com/kafka/ben
Hello Daniccan. I apologize for the dumb question, but did you also check
“message.max.bytes” on the broker? Default is about 1meg (112 bytes) for
kafka 0.10.0. if you need to publish larger messages, you will need to adjust
that on the brokers and then restart them.
-David
On 10/14/16,
Hello, we are going to be upgrading the instance types of our brokers. We will
shut them down, upgrade, and the restart them. All told, they will be down for
about 15 minutes. Upon restart, is there anything we need to do other than run
preferred leader election? The brokers will start to ca
of this topic.
-David
On 10/7/16, 5:03 PM, "David Garcia" wrote:
Ok I found the bug. Basically, if there is an empty topic (in the list of
topics being consumed), any partition-group with partitions from the topic will
always return -1 as the smallest timestamp (see Partition
Actually, I think the bug is more subtle. What happens when a consumed topic
stops receiving messages? The smallest timestamp will always be the static
timestamp of this topic.
-David
On 10/7/16, 5:03 PM, "David Garcia" wrote:
Ok I found the bug. Basically, if there is an e
topics.
Punctuate will never be called.
-David
On 10/7/16, 1:11 PM, "David Garcia" wrote:
Yeah, this is possible. We have run the application (and have confirmed
data is being received) for over 30 mins…with a 60-second timer. So, do we
need to just rebuild our cluster w
uld this be the problem you're seeing? See also the related discussion
at
http://stackoverflow.com/questions/39535201/kafka-problems-with-timestampextractor
.
On Fri, Oct 7, 2016 at 6:07 PM, David Garcia wrote:
> Hello, I’m sure t
Maybe someone already answered this…but you can use the repartitioner to fix
that (it’s included with Kafka)
As far as root cause, you probably had a few leader elections due to excessive
latency. There is a cascading scenario that I noticed Kafka is vulnerable to.
The events transpire as fol
Hello, I’m sure this question has been asked many times.
We have a test-cluster (confluent 3.0.0 release) of 3 aws m4.xlarges. We have
an application that needs to use the punctuate() function to do some work on a
regular interval. We are using the WallClock extractor. Unfortunately, the
meth
Any reason you can’t use mirror maker?
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330
-David
On 10/6/16, 1:32 PM, "Craig Swift" wrote:
Hello,
We're in the process of upgrading several of our clusters to Kafka 10. I
was wondering if it's possible to us
Try running mirror maker from the other direction (i.e. from 0.8.2.1 ). I had
a similar issue, and that seemed to work.
-David
On 9/26/16, 5:19 PM, "Xavier Lange" wrote:
I'm using bin/kafka-mirror-maker.sh for the first time and I need to take
my "aws-cloudtrail" topic from a 0.8.2.1
To remediate, you could start another broker, rebalance, and then shut down the
busted broker. But, you really should put some monitoring on your system (to
help diagnose the actual problem). Datadog has a pretty good set of articles
for using jmx to do this:
https://www.datadoghq.com/blog/mo
Data is always provided by the leader of a topic-partition (i.e. a broker).
Here is a summary of how zookeeper is used:
https://www.quora.com/What-is-the-actual-role-of-ZooKeeper-in-Kafka
-David
On 9/10/16, 3:47 PM, "Eric Ho" wrote:
I notice that some Spark programs would contact someth
system?
-David
On 9/7/16, 3:41 PM, "David Garcia" wrote:
The “simplest” way to solve this is to “repartition” your data (i.e. the
streams you wish to join) with the partition key you wish to join on. This
obviously introduces redundancy, but it will solve your problem. F
The “simplest” way to solve this is to “repartition” your data (i.e. the
streams you wish to join) with the partition key you wish to join on. This
obviously introduces redundancy, but it will solve your problem. For example..
suppose you want to join topic T1 and topic T2…but they aren’t part
Cachestat
https://www.datadoghq.com/blog/collecting-kafka-performance-metrics/
On 9/7/16, 8:31 AM, "Peter Sinoros Szabo"
wrote:
Hi,
As I read more and more about kafka monitoring it seems that monitoring
the linux page cache hit ration is important, but I do not really find a
ok through the code I
think it is indeed a bug, and commit
6fb33afff976e467bfa8e0b29eb82770a2a3aaec
will not fix it IMHO.
Would you want to create a JIRA for keeping track of this issue?
Guozhang
On Sat, Sep 3, 2016 at 10:16 AM, David Garcia wrote:
This bug may be fixed after commit: 6fb33afff976e467bfa8e0b29eb82770a2a3aaec
When you start two consumer processes with a regex topic (with 2 or more
partitions for the matching topics), the second (i.e. nonleader) consumer will
fail with a null pointer exception.
Exception in thread "StreamThr
Regarding 3 and 4: https://calcite.apache.org/docs/stream.html (i.e. streaming
SQL queries)
On 8/24/16, 6:29 AM, "Herwerth Karin (CI/OSW3)"
wrote:
Dear Sir or Madam,
I'm a beginner in Apache Kafka and have questions which I don't get it from
the documentation.
1.
My team is considering using either Kafka-connect JDBC or Bottled water to
stream DB-changes from several production postgres DB’s. WRT bottled water,
this is a little scary:
https://github.com/confluentinc/bottledwater-pg/issues/96
But, the Kafka-connect option also seems like it could affect
You could create another partition in topic T, and publish the same message to
both partitions. You would have to configure P2 to read from the other
partition. Or you could have P1 write the message to another topic and
configure P2 to listen to that topic.
-David
On 8/16/16, 11:54 PM, "Dee
Have you looked at kafka manager: https://github.com/yahoo/kafka-manager
It provides consumer level metrics.
-David
On 8/2/16, 12:36 PM, "Phillip Mann" wrote:
Hello all,
This is a bit of a convoluted title but we are trying to set up monitoring
on our Kafka Cluster and Kafka Strea
Hello, I’ve googled around for this, but haven’t had any luck. Based upon
this: http://docs.confluent.io/3.0.0/streams/architecture.html#state KTables
are local to instances. An instance will process one or more partitions from
one or more topics. How does Kstreams/Ktables handle the followi
treams in order to see if KStreams can improve on that end.
Would you like to elaborate a bit more on that end?
Guozhang
On Thu, Jul 28, 2016 at 12:16 PM, David Garcia
wrote:
> Our team is evaluating KStreams and Reactive Kafka (version 0.11-M4) on a
&
Well, just a dumb question, but did you include all the brokers in your client
connection properties?
On 7/29/16, 10:48 AM, "Sean Morris (semorris)" wrote:
Anyone have any ideas?
From: semorris mailto:semor...@cisco.com>>
Date: Tuesday, July 26, 2016 at 9:40 AM
To: "users@k
What is your replication for these topics?
On 7/28/16, 3:03 PM, "Kessiler Rodrigues" wrote:
Hey guys,
I have > 5k topics with 5 partitions each in my cluster today.
My actual cluster configuration is:
6 brokers - 16 vCPUs, 14.4 GB
Nowadays, I’m having so
Our team is evaluating KStreams and Reactive Kafka (version 0.11-M4) on a
confluent 3.0 cluster. Our testing is very simple (pulling from one topic,
doing a simple transform) and then writing out to another topic.
The performance for the two systems is night and day. Both applications were
ru
Sounds like you might want to go the partition route:
http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
If you lose a broker (and you went the topic route), the probability that an
arbitrary topic was on the broker is higher than if you had gone the pa
http://docs.confluent.io/3.0.0/streams/architecture.html#parallelism-model
you shouldn’t have to do anything. Simply starting a new thread will
“rebalance” your streaming job. The job coordinates with tasks through kafka
itself.
On 7/26/16, 12:42 PM, "Davood Rafiei" wrote:
Hi,
Hello, we are working on designs for several streaming applications and a
common consideration is the need for occasional external database
updates/lookups. For example…we would be processing a stream of events with
some kind of local-id, and we occasionally need to resolve the local-id to a
g
s a "minor"
release. Kafka is a bit odd in that its "major" releases are labeled as a
normal "minor" release number because Kafka hasn't decided to make an
official 1.0 release yet.
What features/fixes are you looking for?
-Ewen
You should probably just put reporting in your app. Dropwizard, logs…etc. You
can also look at Kafka JMX consumer metrics (assuming you don’t have too many
consumers).
-David
On 7/22/16, 9:13 AM, "Adrienne Kole" wrote:
Hi,
How can I measure the latency and throughput in Kafka
Does anyone know when this release will be cut?
-David
n might be Kafka Server has a backup option and it self heals
from this behavior ... Just a theory
On Tue, Jul 19, 2016, 7:57 AM Abhinav Solan wrote:
> No, was monitoring the app at that time .. it was just sitting idle
>
> On Tue, Jul 19, 2016, 7:32 AM
Is it possible that your app is thrashing (i.e. FullGC’ing too much and not
processing messages)?
-David
On 7/19/16, 9:16 AM, "Abhinav Solan" wrote:
Hi Everyone, can anyone help me on this
Thanks,
Abhinav
On Mon, Jul 18, 2016, 6:19 PM Abhinav Solan wrote:
>
Is there any way to specify a dynamic topic list (e.g. like a regex whitelist
filter…like in the consumer API) with kafka streams? We would like to get the
benefit of automatic checkpointing and load balancing if possible.
-David
I would like to process messages from an input topic, but I don’t want to send
messages downstream…with KStreams. (i.e. I would like to read from a topic, do
some processing including occasional database updates, and that’s it…no output
to a topic). I could fake this by filtering out all my me
77 matches
Mail list logo