MM will hang until next message arrives. For example I have a MM running
and listen to a topic that has no message coming. I send ctrl + c to MM and
MM doesn't shutdown until I send a message to the topic. My question is
what if there is never a message coming to the topic how can I safely
shutdown
Fetch data from a leader to consumer. Replication fetcher is configured by
another property
On Saturday, March 14, 2015, Zakee wrote:
> Sorry, but still confused. Maximum number of threads (fetchers) to fetch
> from a Leader or maximum number of threads within a follower broker?
>
> Thanks for
Hi Stevo,
I won't speak for Joe, but what we do is documented in the link that Joe
provided:
"Adding servers to a Kafka cluster is easy, just assign them a unique
broker id and start up Kafka on your new servers. However these new servers
will not automatically be assigned any data partitions, so u
Can you verify that the leaders are evenly spread? and if necessary
run a preferred leader election?
On Fri, Mar 13, 2015 at 05:10:22PM -0700, Zakee wrote:
> I have 35 topics spread with total 398 partitions (2 of them are supposed to
> be very high volume and so allocated 28 partitions to them,
Thanks, Mayuresh. I did the same and it fixed the issue.
Thanks
Zakee
> On Mar 13, 2015, at 3:56 PM, Mayuresh Gharat
> wrote:
>
> The index files work in the following way :
> Its a mapping from logical offsets to a particular file location within the
> log file segment.
>
> If you see the
I have 35 topics spread with total 398 partitions (2 of them are supposed to be
very high volume and so allocated 28 partitions to them, others vary between 5
to 14).
Thanks
Zakee
> On Mar 13, 2015, at 3:25 PM, Joel Koshy wrote:
>
> I think what people have observed in the past is that incr
Sorry for late reply. Not sure what more details you need.
Here's example http://confluent.io/docs/current/kafka-rest/docs/intro.html
of exposing Kafka through remoting (http/rest) :-)
One can without looking into kafka rest proxy code see based on its
limitations that it's using HL consumer, with
Can you reproduce this problem? Although the the fix is strait forward we
would like to understand why this happened.
On 3/13/15, 3:56 PM, "Zakee" wrote:
>Just found there is a known issue to be resolved in future kafka version:
> https://issues.apache.org/jira/browse/KAFKA-1554
>
>The workaroun
These features are all nice, if one adds new brokers to support additional
topics, or to move existing partitions or whole topics to new brokers.
Referenced sentence is in paragraph named scalability. When I read
"expanded" I was thinking of scaling out, extending parallelization
capabilities, and
Just found there is a known issue to be resolved in future kafka version:
https://issues.apache.org/jira/browse/KAFKA-1554
The workaround mentioned here helped.
> The workaround is to delete all index files of size 10MB (the size of the
> pre-allocated files), and restart. Index files will be
The index files work in the following way :
Its a mapping from logical offsets to a particular file location within the
log file segment.
If you see the comments under OffsetIndex.scala code :
The file format is a series of entries. The physical format is a 4 byte
"relative" offset and a 4 byte f
I did a shutdown of the cluster and then try to restart and see the below error
on one of the 5 brokers, I can’t restart this instance and not sure how to fix
this.
[2015-03-13 15:27:31,793] ERROR There was an error in one of the threads during
logs loading: java.lang.IllegalArgumentException:
I think what people have observed in the past is that increasing
num-replica-fetcher-threads has diminishing returns fairly quickly.
You may want to instead increase the number of partitions in the topic
you are producing to. (How many do you have right now?)
On Fri, Mar 13, 2015 at 02:48:17PM -07
+1 - if you have a way to reproduce that would be ideal. We don't know
the root cause of this yet. Our guess is a corner case around
shutdowns, but not sure.
On Fri, Mar 13, 2015 at 03:13:45PM -0700, Jun Rao wrote:
> Is there a way that you can reproduce this easily?
>
> Thanks,
>
> Jun
>
> On
Is there a way that you can reproduce this easily?
Thanks,
Jun
On Fri, Mar 13, 2015 at 8:13 AM, Marc Labbe wrote:
> No exactly, the topics are compacted but messages are not compressed.
>
> I get the exact same error though. Any other options I should consider?
> We're on 0.8.2.0 and we also h
Hi Mayuresh,
I have currently set this property to 4 and I see from the logs that it starts
12 threads on each broker. I will try increasing it further.
Thanks
Zakee
> On Mar 13, 2015, at 11:53 AM, Mayuresh Gharat
> wrote:
>
> You might want to increase the number of Replica Fetcher thread
The newest version of Spark came out today.
https://spark.apache.org/releases/spark-release-1-3-0.html
Apparently they made improvements to the Kafka connector for Spark
Streaming (see Approach 2):
http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
Best,
Niek
Camus uses MapReduce though.
If Alberto uses Spark exclusively, I can see why installing MapReduce
cluster (with or without YARN) is not a desirable solution.
On Fri, Mar 13, 2015 at 1:06 PM, Thunder Stumpges wrote:
> Sorry to go back in time on this thread, but Camus does NOT use YARN. We hav
Sorry to go back in time on this thread, but Camus does NOT use YARN. We have
been using camus for a while on our CDH4 (no YARN) Hadoop cluster. It really is
fairly easy to set up, and seems to be quite good so far.
-Thunder
-Original Message-
From: amiori...@gmail.com [mailto:amiori..
It seemed really counter-intuitive; I can only imagine that it happened
because nobody wanted to refactor the existing KafkaInputDStream to use the
SimpleConsumer instead of the High Level Consumer (unless I'm misreading
the source - it looks like that's what the new DirectKafkaInputDStream is
doin
Also very interesting in hearing about them.
I prefer war stories in form for Jira for the relevant project ;)
There's a good chance we can make things less horrible if issues are reported.
Gwen
On Fri, Mar 13, 2015 at 12:48 PM, Andrew Otto wrote:
>> We are currently using spark streaming 1.2.1
1) You save everything 2 times (kafka and hdfs).
2) You need to enable the checkpoint feature, that means you cannot change
the configuration of the job, because the spark streaming context is
deserialized from hdfs every time you restart the job.
3) What happens if hdfs is unavailable, not clear?
Update :
Turns out this error happens in 2 scenarios
1. When there is a mis-match between the broker and zookeeper libs inside of
your process (found that from stackoverflow)
2.Apparetly when anything that uses scala parser combinators libs (in our case
scala.util.parsing.json.JSON) runs wit
> We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
> I will only say one thing : "a nightmare". ;-)
I’d be really interested in hearing about your experience here. I’m exploring
streaming frameworks a bit, and Spark Streaming is just so easy to use and set
up. I’d be
I really like the new approach. The WAL in HDFS never made much sense
to me (I mean, Kafka is a log. I know they don't want the Kafka
dependency, but a log for a log makes no sense).
Still experimental, but I think thats the right direction.
On Fri, Mar 13, 2015 at 12:38 PM, Alberto Miorin
wrote
Thanks for the heads-up, Alberto, that's good to know. We were about to
start a few projects working with Spark Streaming + Kafka; sounds like
there's still quite a bit of work to be done there.
-Will
On Fri, Mar 13, 2015 at 3:38 PM, Alberto Miorin
wrote:
> We are currently using spark streamin
We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
I will only say one thing : "a nightmare". ;-)
Let's see if things are better with 1.3.0 :
http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
On Fri, Mar 13, 2015 at 8:33 PM, William Briggs wrote:
> Sp
Spark Streaming also has built-in support for Kafka, and as of Spark 1.2,
it supports using an HDFS write-ahead log to ensure zero data loss while
streaming:
https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
-Will
On Fri, Mar 13, 201
I'll try this too. It looks very promising.
Thx
On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira wrote:
> There's a KafkaRDD that can be used in Spark:
> https://github.com/tresata/spark-kafka. It doesn't exactly replace
> Camus, but should be useful in building Camus-like system in Spark.
>
> On
https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
~ Joe Stein
- - - - - - - - - - - - - - - - -
http://www.stealth.ly
- - - - - - - - - - - - - - - - -
On Fri, Mar 13, 2015 at 3:05 PM, sunil kalva wrote:
> Joe
>
> "Well, I know it is semantic but right now it "can" be e
There's a KafkaRDD that can be used in Spark:
https://github.com/tresata/spark-kafka. It doesn't exactly replace
Camus, but should be useful in building Camus-like system in Spark.
On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
wrote:
> We use spark on mesos. I don't want to partition our clust
Flume solution looks very good.
Thx.
On Fri, Mar 13, 2015 at 8:15 PM, William Briggs wrote:
> I would think that this is not a particularly great solution, as you will
> end up running into quite a few edge cases, and I can't see this scaling
> particularly well - how do you know which server t
I would think that this is not a particularly great solution, as you will
end up running into quite a few edge cases, and I can't see this scaling
particularly well - how do you know which server to copy logs from in a
clustered and replicated environment? What happens when Kafka detects a
failure
We use spark on mesos. I don't want to partition our cluster because of one
YARN job (camus).
Best
Alberto
On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:
> Just curious - why - is Camus not suitable/working?
>
> Thanks,
> Otis
> --
> Monitoring * Alerting
Joe
"Well, I know it is semantic but right now it "can" be elastically scaled
without down time but you have to integrate into your environment for what
that means it has been that way since 0.8.0 imho"
here what do you mean "you have to integrate into your environment", how do
i achieve elas
ctrl+c should work. Did you see any issue for that?
On 3/12/15, 11:49 PM, "tao xiao" wrote:
>Hi,
>
>I wanted to know that how I can shutdown mirror maker safely (ctrl+c) when
>there is no message coming to consume. I am using mirror maker off trunk
>code.
>
>--
>Regards,
>Tao
You might want to increase the number of Replica Fetcher threads by setting
this property : *num.replica.fetchers*.
Thanks,
Mayuresh
On Thu, Mar 12, 2015 at 10:39 PM, Zakee wrote:
> With the producer throughput as large as > 150MB/s to 5 brokers on a
> continuous basis, I see a consistently hi
Just curious - why - is Camus not suitable/working?
Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/
On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin
wrote:
> I was wondering if anybody has already tried t
Actually new MM will commit offsets even if those messages are filtered
out. That¹s why I¹m asking will you resume consuming from a topic after
you stop consuming from it earlier. If you are going to do this, you need
do extra work in your message handler. For example,
1. When received a message th
I was wondering if anybody has already tried to mirror a kafka topic to
hdfs just copying the log files from the topic directory of the broker
(like 23244237.log).
The file format is very simple :
https://twitter.com/amiorin/status/576448691139121152/photo/1
Implementing an InputForma
Sorry, but still confused. Maximum number of threads (fetchers) to fetch from
a Leader or maximum number of threads within a follower broker?
Thanks for clarifying,
-Zakee
> On Mar 12, 2015, at 11:11 PM, tao xiao wrote:
>
> The number of fetchers is configurable via num.replica.fetchers. Th
The way offset management works with kafka is :
It stores offsets for a particular (groupId, Topic, partitionId) in a
particular partition of __consumer_offset topic.
1) By default the value is 50. You can change it by setting this property :
"*offsets.topic.num.partitions*" in your config.
2) No
I suppose that the patch for KAFKA-1641 had a fix for this issue.
Also it might be worth looking at Kafka-1755.
Thanks,
Mayuresh
On Fri, Mar 13, 2015 at 8:13 AM, Marc Labbe wrote:
> No exactly, the topics are compacted but messages are not compressed.
>
> I get the exact same error though. Any
Well, I know it is semantic but right now it "can" be elastically scaled
without down time but you have to integrate into your environment for what
that means it has been that way since 0.8.0 imho.
My point was just another way to-do that out of the box... folks do this
elastic scailing today
OK, thanks for heads up.
When reading Apache Kafka docs, and reading what Apache Kafka "can" I
expect it to already be available in latest general availability release,
not what's planned as part of some other project.
Kind regards,
Stevo Slavic.
On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein wrote
I wrote a short blog on what's being fixed in 0.8.2.1 release.
http://blog.confluent.io/2015/03/13/apache-kafka-0-8-2-1-release/
We recommend everyone upgrade to 0.8.2.1.
Thanks,
Jun
Thanks, I'll start with that before changing my deployment for oracle jdk.
On Fri, Mar 13, 2015 at 11:40 AM, Mark Reddy wrote:
> Hi Marc,
>
> If you are seeing high CPU usages with a large number of partitions on
> 0.8.2 you should definitely upgrade to 0.8.2.1 as the following issue was
> fixe
Hi Marc,
If you are seeing high CPU usages with a large number of partitions on
0.8.2 you should definitely upgrade to 0.8.2.1 as the following issue was
fixed: https://issues.apache.org/jira/browse/KAFKA-1952
Also see the 0.8.2.1 release notes for other fixes:
https://archive.apache.org/dist/kaf
No exactly, the topics are compacted but messages are not compressed.
I get the exact same error though. Any other options I should consider?
We're on 0.8.2.0 and we also had this on 0.8.1.1 before.
marc
On Fri, Mar 13, 2015 at 10:47 AM, Jun Rao wrote:
> Did you get into that issue for the sam
Hi,
our cluster is deployed on AWS, we have brokers on r3.large instances, a
decent amount of topics+partitions (+600 partitions). We're not making that
many requests/sec, roughly 80 produce/sec and 240 fetch/sec (not counting
internal replication requests) and yet CPU hovers around 40%, which I
c
Did you get into that issue for the same reason as in the jira, i.e.,
somehow compressed messages were sent to the compact topics?
Thanks,
Jun
On Fri, Mar 13, 2015 at 6:45 AM, Marc Labbe wrote:
> Hello,
>
> we're often seeing log cleaner exceptions reported in KAFKA-1641 and I'd
> like to know
Hello,
we're often seeing log cleaner exceptions reported in KAFKA-1641 and I'd
like to know if it's safe to apply the patch from that issue resolution to
0.8.2.1?
Reference: https://issues.apache.org/jira/browse/KAFKA-1641
Also there are 2 patches in there, I suppose I should be using only the
Hey Stevo, "can be elastically and transparently expanded without downtime." is
the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka as the
ability (knobs/levers) to-do this but has to be made to-do this out of the
box.
e.g. in Kafka on Mesos when a broker fails, after the configurab
Hello Apache Kafka community,
On Apache Kafka website home page http://kafka.apache.org/ it is stated
that Kafka "can be elastically and transparently expanded without downtime."
Is that really true? More specifically, can one just add one more broker,
have another partition added for the topic, h
Hi,
I am using Kafka 0.8.2.1. I have two topics with 10 partitions each.
Noticed that one more topic exist named as "__consumer_offset" with 50
partitions. My questions are:
1. Why this topic is created with 50 partition?
2. How to get consumer group names for a topic? Is there any document or
A
55 matches
Mail list logo