How to fetch consumer group names of a Topic from Kafka offset manager in Kafka 0.8.2.1

2015-03-13 Thread Madhukar Bharti
Hi,

I am using Kafka 0.8.2.1. I have two topics with 10 partitions each.
Noticed that one more topic exist named as "__consumer_offset" with 50
partitions.  My questions are:

1. Why this topic is created with 50 partition?
2. How to get consumer group names for a topic? Is there any document or
API to get all consumer group from Kafka offset storage manager like
zookeeper we have /consumer which lists all consumers.


Thanks in advance.

Regards,
Madhukar


Kafka elastic no downtime scalability

2015-03-13 Thread Stevo Slavić
Hello Apache Kafka community,

On Apache Kafka website home page http://kafka.apache.org/ it is stated
that Kafka "can be elastically and transparently expanded without downtime."
Is that really true? More specifically, can one just add one more broker,
have another partition added for the topic, have new broker assigned to be
the leader for new partition, have producers correctly write to the new
partition, and consumers read from it, with no broker, consumer or producer
downtime, no data loss, no manual action to move data from existing
partitions to new partition?

Kind regards,
Stevo Slavic.


Re: Kafka elastic no downtime scalability

2015-03-13 Thread Joe Stein
Hey Stevo, "can be elastically and transparently expanded without downtime." is
the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka as the
ability (knobs/levers) to-do this but has to be made to-do this out of the
box.

e.g. in Kafka on Mesos when a broker fails, after the configurable max fail
over timeout (meaning it is truly deemed hard failure) then a broker (with
the same id) will automatically be started on a another machine, data
replicated and back in action once that is done, automatically. Lots more
features already in there... we are also in progress to auto balance
partitions when increasing/decreasing the size of the cluster and some more
goodies too.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, Mar 13, 2015 at 8:43 AM, Stevo Slavić  wrote:

> Hello Apache Kafka community,
>
> On Apache Kafka website home page http://kafka.apache.org/ it is stated
> that Kafka "can be elastically and transparently expanded without
> downtime."
> Is that really true? More specifically, can one just add one more broker,
> have another partition added for the topic, have new broker assigned to be
> the leader for new partition, have producers correctly write to the new
> partition, and consumers read from it, with no broker, consumer or producer
> downtime, no data loss, no manual action to move data from existing
> partitions to new partition?
>
> Kind regards,
> Stevo Slavic.
>


Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Marc Labbe
Hello,

we're often seeing log cleaner exceptions reported in KAFKA-1641 and I'd
like to know if it's safe to apply the patch from that issue resolution to
0.8.2.1?

Reference: https://issues.apache.org/jira/browse/KAFKA-1641

Also there are 2 patches in there, I suppose I should be using only the
latest of the two.

thanks!
marc


Re: Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Jun Rao
Did you get into that issue for the same reason as in the jira, i.e.,
somehow compressed messages were sent to the compact topics?

Thanks,

Jun

On Fri, Mar 13, 2015 at 6:45 AM, Marc Labbe  wrote:

> Hello,
>
> we're often seeing log cleaner exceptions reported in KAFKA-1641 and I'd
> like to know if it's safe to apply the patch from that issue resolution to
> 0.8.2.1?
>
> Reference: https://issues.apache.org/jira/browse/KAFKA-1641
>
> Also there are 2 patches in there, I suppose I should be using only the
> latest of the two.
>
> thanks!
> marc
>


Kafka High CPU, 0.8.2.1 or openjdk?

2015-03-13 Thread Marc Labbe
Hi,

our cluster is deployed on AWS, we have brokers on r3.large instances, a
decent amount of topics+partitions (+600 partitions). We're not making that
many requests/sec, roughly 80 produce/sec and 240 fetch/sec (not counting
internal replication requests) and yet CPU hovers around 40%, which I
consider quite high given the nature of Kafka. I have worked on other
deployments not on AWS where we were getting much larger figures in
requests/sec, w/ much less CPU usage than that.

There are two things I consider trying to reduce that. The first one is
obviously using 0.8.2.1 though I am not sure how much the impact is. I also
found this thread from a while ago
http://mail-archives.apache.org/mod_mbox/kafka-users/201305.mbox/%3ceb51b84c-ad91-4a2f-b97d-5283ef079...@transpac.com%3E
about the use of the OpenJDK (which we're using, Ubuntu trusty's default)
vs Oracle JDK.

I am planning to do both anyways but I thought it'd be interesting to know
if anyone else has experienced that before. Is there any other tuning I
should think about?

thanks
marc


Re: Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Marc Labbe
No exactly, the topics are compacted but messages are not compressed.

I get the exact same error though. Any other options I should consider?
We're on 0.8.2.0 and we also had this on 0.8.1.1 before.

marc

On Fri, Mar 13, 2015 at 10:47 AM, Jun Rao  wrote:

> Did you get into that issue for the same reason as in the jira, i.e.,
> somehow compressed messages were sent to the compact topics?
>
> Thanks,
>
> Jun
>
> On Fri, Mar 13, 2015 at 6:45 AM, Marc Labbe  wrote:
>
> > Hello,
> >
> > we're often seeing log cleaner exceptions reported in KAFKA-1641 and I'd
> > like to know if it's safe to apply the patch from that issue resolution
> to
> > 0.8.2.1?
> >
> > Reference: https://issues.apache.org/jira/browse/KAFKA-1641
> >
> > Also there are 2 patches in there, I suppose I should be using only the
> > latest of the two.
> >
> > thanks!
> > marc
> >
>


Re: Kafka High CPU, 0.8.2.1 or openjdk?

2015-03-13 Thread Mark Reddy
Hi Marc,

If you are seeing high CPU usages with a large number of partitions on
0.8.2 you should definitely upgrade to 0.8.2.1 as the following issue was
fixed: https://issues.apache.org/jira/browse/KAFKA-1952

Also see the 0.8.2.1 release notes for other fixes:
https://archive.apache.org/dist/kafka/0.8.2.1/RELEASE_NOTES.html


Mark

On 13 March 2015 at 15:10, Marc Labbe  wrote:

> Hi,
>
> our cluster is deployed on AWS, we have brokers on r3.large instances, a
> decent amount of topics+partitions (+600 partitions). We're not making that
> many requests/sec, roughly 80 produce/sec and 240 fetch/sec (not counting
> internal replication requests) and yet CPU hovers around 40%, which I
> consider quite high given the nature of Kafka. I have worked on other
> deployments not on AWS where we were getting much larger figures in
> requests/sec, w/ much less CPU usage than that.
>
> There are two things I consider trying to reduce that. The first one is
> obviously using 0.8.2.1 though I am not sure how much the impact is. I also
> found this thread from a while ago
>
> http://mail-archives.apache.org/mod_mbox/kafka-users/201305.mbox/%3ceb51b84c-ad91-4a2f-b97d-5283ef079...@transpac.com%3E
> about the use of the OpenJDK (which we're using, Ubuntu trusty's default)
> vs Oracle JDK.
>
> I am planning to do both anyways but I thought it'd be interesting to know
> if anyone else has experienced that before. Is there any other tuning I
> should think about?
>
> thanks
> marc
>


Re: Kafka High CPU, 0.8.2.1 or openjdk?

2015-03-13 Thread Marc Labbe
Thanks, I'll start with that before changing my deployment for oracle jdk.


On Fri, Mar 13, 2015 at 11:40 AM, Mark Reddy  wrote:

> Hi Marc,
>
> If you are seeing high CPU usages with a large number of partitions on
> 0.8.2 you should definitely upgrade to 0.8.2.1 as the following issue was
> fixed: https://issues.apache.org/jira/browse/KAFKA-1952
>
> Also see the 0.8.2.1 release notes for other fixes:
> https://archive.apache.org/dist/kafka/0.8.2.1/RELEASE_NOTES.html
>
>
> Mark
>
> On 13 March 2015 at 15:10, Marc Labbe  wrote:
>
> > Hi,
> >
> > our cluster is deployed on AWS, we have brokers on r3.large instances, a
> > decent amount of topics+partitions (+600 partitions). We're not making
> that
> > many requests/sec, roughly 80 produce/sec and 240 fetch/sec (not counting
> > internal replication requests) and yet CPU hovers around 40%, which I
> > consider quite high given the nature of Kafka. I have worked on other
> > deployments not on AWS where we were getting much larger figures in
> > requests/sec, w/ much less CPU usage than that.
> >
> > There are two things I consider trying to reduce that. The first one is
> > obviously using 0.8.2.1 though I am not sure how much the impact is. I
> also
> > found this thread from a while ago
> >
> >
> http://mail-archives.apache.org/mod_mbox/kafka-users/201305.mbox/%3ceb51b84c-ad91-4a2f-b97d-5283ef079...@transpac.com%3E
> > about the use of the OpenJDK (which we're using, Ubuntu trusty's default)
> > vs Oracle JDK.
> >
> > I am planning to do both anyways but I thought it'd be interesting to
> know
> > if anyone else has experienced that before. Is there any other tuning I
> > should think about?
> >
> > thanks
> > marc
> >
>


What's new in Apache Kafka 0.8.2.1 release

2015-03-13 Thread Jun Rao
I wrote a short blog on what's being fixed in 0.8.2.1 release.

http://blog.confluent.io/2015/03/13/apache-kafka-0-8-2-1-release/

We recommend everyone upgrade to 0.8.2.1.

Thanks,

Jun


Re: Kafka elastic no downtime scalability

2015-03-13 Thread Stevo Slavić
OK, thanks for heads up.

When reading Apache Kafka docs, and reading what Apache Kafka "can" I
expect it to already be available in latest general availability release,
not what's planned as part of some other project.

Kind regards,
Stevo Slavic.

On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein  wrote:

> Hey Stevo, "can be elastically and transparently expanded without
> downtime." is
> the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka as the
> ability (knobs/levers) to-do this but has to be made to-do this out of the
> box.
>
> e.g. in Kafka on Mesos when a broker fails, after the configurable max fail
> over timeout (meaning it is truly deemed hard failure) then a broker (with
> the same id) will automatically be started on a another machine, data
> replicated and back in action once that is done, automatically. Lots more
> features already in there... we are also in progress to auto balance
> partitions when increasing/decreasing the size of the cluster and some more
> goodies too.
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
>
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
>
> On Fri, Mar 13, 2015 at 8:43 AM, Stevo Slavić  wrote:
>
> > Hello Apache Kafka community,
> >
> > On Apache Kafka website home page http://kafka.apache.org/ it is stated
> > that Kafka "can be elastically and transparently expanded without
> > downtime."
> > Is that really true? More specifically, can one just add one more broker,
> > have another partition added for the topic, have new broker assigned to
> be
> > the leader for new partition, have producers correctly write to the new
> > partition, and consumers read from it, with no broker, consumer or
> producer
> > downtime, no data loss, no manual action to move data from existing
> > partitions to new partition?
> >
> > Kind regards,
> > Stevo Slavic.
> >
>


Re: Kafka elastic no downtime scalability

2015-03-13 Thread Joe Stein
Well, I know it is semantic but right now it "can" be elastically scaled
without down time but you have to integrate into your environment for what
that means it has been that way since 0.8.0 imho.

My point was just another way to-do that out of the box... folks do this
elastic scailing today with AWS CloudFormation and internal systems they
built too.

So, it can be done... you just have todo it.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, Mar 13, 2015 at 12:39 PM, Stevo Slavić  wrote:

> OK, thanks for heads up.
>
> When reading Apache Kafka docs, and reading what Apache Kafka "can" I
> expect it to already be available in latest general availability release,
> not what's planned as part of some other project.
>
> Kind regards,
> Stevo Slavic.
>
> On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein  wrote:
>
> > Hey Stevo, "can be elastically and transparently expanded without
> > downtime." is
> > the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka as
> the
> > ability (knobs/levers) to-do this but has to be made to-do this out of
> the
> > box.
> >
> > e.g. in Kafka on Mesos when a broker fails, after the configurable max
> fail
> > over timeout (meaning it is truly deemed hard failure) then a broker
> (with
> > the same id) will automatically be started on a another machine, data
> > replicated and back in action once that is done, automatically. Lots more
> > features already in there... we are also in progress to auto balance
> > partitions when increasing/decreasing the size of the cluster and some
> more
> > goodies too.
> >
> > ~ Joe Stein
> > - - - - - - - - - - - - - - - - -
> >
> >   http://www.stealth.ly
> > - - - - - - - - - - - - - - - - -
> >
> > On Fri, Mar 13, 2015 at 8:43 AM, Stevo Slavić  wrote:
> >
> > > Hello Apache Kafka community,
> > >
> > > On Apache Kafka website home page http://kafka.apache.org/ it is
> stated
> > > that Kafka "can be elastically and transparently expanded without
> > > downtime."
> > > Is that really true? More specifically, can one just add one more
> broker,
> > > have another partition added for the topic, have new broker assigned to
> > be
> > > the leader for new partition, have producers correctly write to the new
> > > partition, and consumers read from it, with no broker, consumer or
> > producer
> > > downtime, no data loss, no manual action to move data from existing
> > > partitions to new partition?
> > >
> > > Kind regards,
> > > Stevo Slavic.
> > >
> >
>


Re: Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Mayuresh Gharat
I suppose that the patch for KAFKA-1641 had a fix for this issue.
Also it might be worth looking at Kafka-1755.

Thanks,

Mayuresh

On Fri, Mar 13, 2015 at 8:13 AM, Marc Labbe  wrote:

> No exactly, the topics are compacted but messages are not compressed.
>
> I get the exact same error though. Any other options I should consider?
> We're on 0.8.2.0 and we also had this on 0.8.1.1 before.
>
> marc
>
> On Fri, Mar 13, 2015 at 10:47 AM, Jun Rao  wrote:
>
> > Did you get into that issue for the same reason as in the jira, i.e.,
> > somehow compressed messages were sent to the compact topics?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Mar 13, 2015 at 6:45 AM, Marc Labbe  wrote:
> >
> > > Hello,
> > >
> > > we're often seeing log cleaner exceptions reported in KAFKA-1641 and
> I'd
> > > like to know if it's safe to apply the patch from that issue resolution
> > to
> > > 0.8.2.1?
> > >
> > > Reference: https://issues.apache.org/jira/browse/KAFKA-1641
> > >
> > > Also there are 2 patches in there, I suppose I should be using only the
> > > latest of the two.
> > >
> > > thanks!
> > > marc
> > >
> >
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: How to fetch consumer group names of a Topic from Kafka offset manager in Kafka 0.8.2.1

2015-03-13 Thread Mayuresh Gharat
The way offset management works with kafka is :
It stores offsets for a particular (groupId, Topic, partitionId) in a
particular partition of __consumer_offset topic.

1) By default the value is 50. You can change it by setting this property :
"*offsets.topic.num.partitions*" in your config.
2) No we don't have an API. But there is a ticket for those improvements :
KAFKA-1013 . There is
already a patch for this. We can add this to that. Would mind leaving a
comment there?

Thanks,

Mayuresh

On Fri, Mar 13, 2015 at 5:13 AM, Madhukar Bharti 
wrote:

> Hi,
>
> I am using Kafka 0.8.2.1. I have two topics with 10 partitions each.
> Noticed that one more topic exist named as "__consumer_offset" with 50
> partitions.  My questions are:
>
> 1. Why this topic is created with 50 partition?
> 2. How to get consumer group names for a topic? Is there any document or
> API to get all consumer group from Kafka offset storage manager like
> zookeeper we have /consumer which lists all consumers.
>
>
> Thanks in advance.
>
> Regards,
> Madhukar
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: createMessageStreams vs createMessageStreamsByFilter

2015-03-13 Thread Zakee
Sorry, but still confused.  Maximum number of threads (fetchers) to fetch from 
a Leader or maximum number of threads within a follower broker?

Thanks for clarifying,
-Zakee



> On Mar 12, 2015, at 11:11 PM, tao xiao  wrote:
> 
> The number of fetchers is configurable via num.replica.fetchers. The
> description of num.replica.fetchers in Kafka documentation is not quite
> accurate. num.replica.fetchers actually controls the max number of fetchers
> per broker. In you case num.replica.fetchers=8 and 5 brokers the means no
> more 8 fetchers created for each broker
> 
> On Fri, Mar 13, 2015 at 1:21 PM, Zakee  wrote:
> 
>> Is this always the case that there is only one fetcher per broker, won’t
>> setting num.replica.fetchers greater than number-of-brokers cause more
>> fetchers per broker?
>> Let’s I have 5 brokers, and num of replica fetchers is 8, will there be 2
>> fetcher threads pulling from  each broker?
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>>> On Mar 12, 2015, at 11:15 AM, James Cheng  wrote:
>>> 
>>> Ah, I understand now. I didn't realize that there was one fetcher thread
>> per broker.
>>> 
>>> Thanks Tao & Guozhang!
>>> -James
>>> 
>>> 
>>> On Mar 11, 2015, at 5:00 PM, tao xiao > xiaotao...@gmail.com>> wrote:
>>> 
 Fetcher thread is per broker basis, it ensures that at lease one fetcher
 thread per broker. Fetcher thread is sent to broker with a fetch
>> request to
 ask for all partitions. So if A, B, C are in the same broker fetcher
>> thread
 is still able to fetch data from A, B, C even though A returns no data.
 same logic is applied to different broker.
 
 On Thu, Mar 12, 2015 at 6:25 AM, James Cheng  wrote:
 
> 
> On Mar 11, 2015, at 9:12 AM, Guozhang Wang  wrote:
> 
>> Hi James,
>> 
>> What I meant before is that a single fetcher may be responsible for
> putting
>> fetched data to multiple queues according to the construction of the
>> streams setup, where each queue may be consumed by a different thread.
> And
>> the queues are actually bounded. Now say if there are two queues that
>> are
>> getting data from the same fetcher F, and are consumed by two
>> different
>> user threads A and B. If thread A for some reason got slowed / hung
>> consuming data from queue 1, then queue 1 will eventually get full,
>> and F
>> trying to put more data to it will be blocked. Since F is parked on
> trying
>> to put data to queue 1, queue 2 will not get more data from it, and
> thread
>> B may hence gets starved. Does that make sense now?
>> 
> 
> Yes, that makes sense. That is the scenario where one thread of a
>> consumer
> can cause a backup in the queue, which would cause other threads to not
> receive data.
> 
> What about the situation I described, where a thread consumes a queue
>> that
> is supposed to be filled with messages from multiple partitions? If
> partition A has no messages and partitions B and C do, how will the
>> fetcher
> behave? Will the processing thread receive messages from partitions B
>> and C?
> 
> Thanks,
> -James
> 
> 
>> Guozhang
>> 
>> On Tue, Mar 10, 2015 at 5:15 PM, James Cheng  wrote:
>> 
>>> Hi,
>>> 
>>> Sorry to bring up this old thread, but my question is about this
>> exact
>>> thing:
>>> 
>>> Guozhang, you said:
 A more concrete example: say you have topic AC: 3 partitions, topic
> BC: 6
 partitions.
 
 With createMessageStreams("AC" => 3, "BC" => 2) a total of 5 threads
> will
 be created, and consuming AC-1,AC-2,AC-3,BC-1/2/3,BC-4/5/6
> respectively;
 
 With createMessageStreamsByFilter("*C" => 3) a total of 3 threads
>> will
> be
 created, and consuming AC-1/BC-1/BC-2, AC-2/BC-3/BC-4,
>> AC-3/BC-5/BC-6
 respectively.
>>> 
>>> 
>>> You said that in the createMessageStreamsByFilter case, if topic AC
>> had
> no
>>> messages in it and consumer.timeout.ms = -1, then the 3 threads
>> might
> all
>>> be blocked waiting for data to arrive from topic AC, and so messages
> from
>>> BC would not be processed.
>>> 
>>> createMessageStreamsByFilter("*C" => 1) (single stream) would have
>> the
>>> same problem but just worse. Behind the scenes, is there a single
>> thread
>>> that is consuming (round-robin?) messages from the different
>> partitions
> and
>>> inserting them all into a single queue for the application code to
> process?
>>> And that is why a single partition with no messages with block the
>> other
>>> messages from getting through?
>>> 
>>> What about createMessageStreams("AC" => 1)? That creates a single
>> stream
>>> that contains messages from multiple partitions, which might be on
>>> different brokers. Does that also suffer the same problem, where if
>> one
>>> partition ha

Alternative to camus

2015-03-13 Thread Alberto Miorin
I was wondering if anybody has already tried to mirror a kafka topic to
hdfs just copying the log files from the topic directory of the broker
(like 23244237.log).

The file format is very simple :
https://twitter.com/amiorin/status/576448691139121152/photo/1

Implementing an InputFormat should not be so difficult.

Any drawbacks?


Re: Does consumer support combination of whitelist and blacklist topic filtering

2015-03-13 Thread Jiangjie Qin
Actually new MM will commit offsets even if those messages are filtered
out. That¹s why I¹m asking will you resume consuming from a topic after
you stop consuming from it earlier. If you are going to do this, you need
do extra work in your message handler. For example,
1. When received a message that triggers stopping consumption from a
topic, store the offset in message handler and start filter out the topic.
2. When received a message that triggers resuming consumption from a
topic, reset the offset for that topic to the stored offset, and stop
filtering.

This might need some tricks if you have multiple MM instances for
synchronization.

Thanks.

Jiangjie (Becket) Qin

On 3/12/15, 10:23 PM, "tao xiao"  wrote:

>I am not sure how MM is going to be rewritten. Based on the current
>implementation in trunk offset is not committed unless it is produced to
>destination. With assumption that this logic remains MM will not
>acknowledge the offset back to source for filtered message. So I think it
>is safe to filter messages out while keeping committed offset unchanged
>for
>that particular topic. Please correct me if I am wrong
>
>On Fri, Mar 13, 2015 at 1:12 PM, Guozhang Wang  wrote:
>
>> Note that with filtering in message handler, records from the source
>> cluster are still considered as "consumed" since the offsets will be
>> committed. If you change the filtering dynamically back to whilelist
>>these
>> topics, you will lose the data that gets consumed during the period of
>>the
>> blacklist.
>>
>> Guozhang
>>
>> On Thu, Mar 12, 2015 at 10:01 PM, tao xiao  wrote:
>>
>> > Yes, that will work. message handle can filter out message sent from
>> > certain topics
>> >
>> > On Fri, Mar 13, 2015 at 6:30 AM, Jiangjie Qin
>>> >
>> > wrote:
>> >
>> > > No sure if it is an option. But does filtering out topics with
>>message
>> > > handler works for you? Are you going to resume consuming from a
>>topic
>> > > after you stop consuming from it?
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > > On 3/12/15, 8:05 AM, "tao xiao"  wrote:
>> > >
>> > > >Yes, you are right. a dynamic topicfilter is more appropriate
>>where I
>> > can
>> > > >filter topics at runtime via some kind of interface e.g. JMX
>> > > >
>> > > >On Thu, Mar 12, 2015 at 11:03 PM, Guozhang Wang
>>
>> > > >wrote:
>> > > >
>> > > >> Tao,
>> > > >>
>> > > >> Based on your description I think the combination of whitelist /
>> > > >>blacklist
>> > > >> will not achieve your goal, since it is still static.
>> > > >>
>> > > >> Guozhang
>> > > >>
>> > > >> On Thu, Mar 12, 2015 at 6:30 AM, tao xiao 
>> > wrote:
>> > > >>
>> > > >> > Thank you Guozhang for your advice. A dynamic topic filter is
>> what I
>> > > >>need
>> > > >> > so that I can stop a topic consumption when I need to at
>>runtime.
>> > > >> >
>> > > >> > On Thu, Mar 12, 2015 at 9:21 PM, Guozhang Wang <
>> wangg...@gmail.com>
>> > > >> wrote:
>> > > >> >
>> > > >> > > 1. Dynamic: yeah that is sth. we could think of, this could
>>be
>> > > >>useful
>> > > >> > > operationally.
>> > > >> > > 2. Regex: I think in terms of expressiveness it should be
>> > sufficient
>> > > >> for
>> > > >> > > almost all subset of topics. In practice usually the rule of
>> thumb
>> > > >>is
>> > > >> > that
>> > > >> > > you will create your topics that belongs to the same "group"
>> with
>> > > >>some
>> > > >> > > prefix / suffix so that regex expression would not be crazily
>> > long.
>> > > >> > >
>> > > >> > > Guozhang
>> > > >> > >
>> > > >> > > On Thu, Mar 12, 2015 at 6:10 AM, tao xiao
>>> >
>> > > >> wrote:
>> > > >> > >
>> > > >> > > > something like dynamic filtering that can be updated at
>> runtime
>> > or
>> > > >> deny
>> > > >> > > all
>> > > >> > > > but allow a certain set of topics that cannot be specified
>> > easily
>> > > >>by
>> > > >> > > regex
>> > > >> > > >
>> > > >> > > > On Thu, Mar 12, 2015 at 9:06 PM, Guozhang Wang
>> > > >>
>> > > >> > > wrote:
>> > > >> > > >
>> > > >> > > > > Hmm, what kind of customized filtering do you have in
>>mind?
>> I
>> > > >> thought
>> > > >> > > > with
>> > > >> > > > > "--whitelist" you could already specify regex to do
>> filtering.
>> > > >> > > > >
>> > > >> > > > > On Thu, Mar 12, 2015 at 5:56 AM, tao xiao <
>> > xiaotao...@gmail.com
>> > > >
>> > > >> > > wrote:
>> > > >> > > > >
>> > > >> > > > > > Hi Guozhang,
>> > > >> > > > > >
>> > > >> > > > > > I was meant to be topicfilter not topic-count. sorry
>>for
>> the
>> > > >> > > confusion.
>> > > >> > > > > > What I want to achieve is to pass my own customized
>> > > >>topicfilter
>> > > >> to
>> > > >> > MM
>> > > >> > > > so
>> > > >> > > > > > that I can filter out topics what ever I like. I know
>>MM
>> > > >>doesn't
>> > > >> > > > support
>> > > >> > > > > > this now. I am just thinking if this is a good feature
>>to
>> > add
>> > > >>in
>> > > >> > > > > >
>> > > >> > > > > > On Thu, Mar 12, 2015 at 8:24 PM, Guozhang Wang <
>> > > >> wangg...@gmail.com
>> > > >> > >
>> > > >> > > > > wrote:
>> > 

Re: Alternative to camus

2015-03-13 Thread Otis Gospodnetic
Just curious - why - is Camus not suitable/working?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin 
wrote:

> I was wondering if anybody has already tried to mirror a kafka topic to
> hdfs just copying the log files from the topic directory of the broker
> (like 23244237.log).
>
> The file format is very simple :
> https://twitter.com/amiorin/status/576448691139121152/photo/1
>
> Implementing an InputFormat should not be so difficult.
>
> Any drawbacks?
>


Re: High Replica Max Lag

2015-03-13 Thread Mayuresh Gharat
You might want to increase the number of Replica Fetcher threads by setting
this property : *num.replica.fetchers*.

Thanks,

Mayuresh

On Thu, Mar 12, 2015 at 10:39 PM, Zakee  wrote:

> With the producer throughput as large as > 150MB/s to 5 brokers on a
> continuous basis, I see a consistently high value for Replica Max Lag (in
> millions). Is this normal or there is a way to tune so as to reduce replica
> MaxLag?
> As per documentation, replica max lag (in messages) between follower and
> leader replicas, should be less than replica.lag.max.messages (currently
> set to 5000)
>
>
> Thanks
> Zakee
>
>
>
> 
> Old School Yearbook Pics
> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> http://thirdpartyoffers.netzero.net/TGL3231/550278012ef8378006048st04duc




-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: How to shutdown mirror maker safely

2015-03-13 Thread Jiangjie Qin
ctrl+c should work. Did you see any issue for that?

On 3/12/15, 11:49 PM, "tao xiao"  wrote:

>Hi,
>
>I wanted to know that how I can shutdown mirror maker safely (ctrl+c) when
>there is no message coming to consume. I am using mirror maker off trunk
>code.
>
>-- 
>Regards,
>Tao



Re: Kafka elastic no downtime scalability

2015-03-13 Thread sunil kalva
Joe

"Well, I know it is semantic but right now it "can" be elastically scaled
without down time but you have to integrate into your environment for what
that means it has been that way since 0.8.0 imho"

here what do you mean "you have to integrate into your environment", how do
i achieve elastically scaled cluster seamlessly ?

SunilKalva

On Fri, Mar 13, 2015 at 10:27 PM, Joe Stein  wrote:

> Well, I know it is semantic but right now it "can" be elastically scaled
> without down time but you have to integrate into your environment for what
> that means it has been that way since 0.8.0 imho.
>
> My point was just another way to-do that out of the box... folks do this
> elastic scailing today with AWS CloudFormation and internal systems they
> built too.
>
> So, it can be done... you just have todo it.
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
>
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
>
> On Fri, Mar 13, 2015 at 12:39 PM, Stevo Slavić  wrote:
>
> > OK, thanks for heads up.
> >
> > When reading Apache Kafka docs, and reading what Apache Kafka "can" I
> > expect it to already be available in latest general availability release,
> > not what's planned as part of some other project.
> >
> > Kind regards,
> > Stevo Slavic.
> >
> > On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein  wrote:
> >
> > > Hey Stevo, "can be elastically and transparently expanded without
> > > downtime." is
> > > the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka as
> > the
> > > ability (knobs/levers) to-do this but has to be made to-do this out of
> > the
> > > box.
> > >
> > > e.g. in Kafka on Mesos when a broker fails, after the configurable max
> > fail
> > > over timeout (meaning it is truly deemed hard failure) then a broker
> > (with
> > > the same id) will automatically be started on a another machine, data
> > > replicated and back in action once that is done, automatically. Lots
> more
> > > features already in there... we are also in progress to auto balance
> > > partitions when increasing/decreasing the size of the cluster and some
> > more
> > > goodies too.
> > >
> > > ~ Joe Stein
> > > - - - - - - - - - - - - - - - - -
> > >
> > >   http://www.stealth.ly
> > > - - - - - - - - - - - - - - - - -
> > >
> > > On Fri, Mar 13, 2015 at 8:43 AM, Stevo Slavić 
> wrote:
> > >
> > > > Hello Apache Kafka community,
> > > >
> > > > On Apache Kafka website home page http://kafka.apache.org/ it is
> > stated
> > > > that Kafka "can be elastically and transparently expanded without
> > > > downtime."
> > > > Is that really true? More specifically, can one just add one more
> > broker,
> > > > have another partition added for the topic, have new broker assigned
> to
> > > be
> > > > the leader for new partition, have producers correctly write to the
> new
> > > > partition, and consumers read from it, with no broker, consumer or
> > > producer
> > > > downtime, no data loss, no manual action to move data from existing
> > > > partitions to new partition?
> > > >
> > > > Kind regards,
> > > > Stevo Slavic.
> > > >
> > >
> >
>



-- 
SunilKalva


Re: Alternative to camus

2015-03-13 Thread Alberto Miorin
We use spark on mesos. I don't want to partition our cluster because of one
YARN job (camus).

Best

Alberto

On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Just curious - why - is Camus not suitable/working?
>
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin  >
> wrote:
>
> > I was wondering if anybody has already tried to mirror a kafka topic to
> > hdfs just copying the log files from the topic directory of the broker
> > (like 23244237.log).
> >
> > The file format is very simple :
> > https://twitter.com/amiorin/status/576448691139121152/photo/1
> >
> > Implementing an InputFormat should not be so difficult.
> >
> > Any drawbacks?
> >
>


Re: Alternative to camus

2015-03-13 Thread William Briggs
I would think that this is not a particularly great solution, as you will
end up running into quite a few edge cases, and I can't see this scaling
particularly well - how do you know which server to copy logs from in a
clustered and replicated environment? What happens when Kafka detects a
failure and moves partition replicas to a different node? The reason that
the Kafka Consumer APIs exist is to shield you from having to think about
these things. In addition, you would be tightly coupling yourself to
Kafka's internal log format; in my experience, this sort of thing rarely
ends well.

Depending on your use case, Flume is a reasonable solution, if you don't
want to use Camus; it has a Kafka source that allows you to stream data out
of Kafka and into HDFS:
http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/

-Will

On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin 
wrote:

> I was wondering if anybody has already tried to mirror a kafka topic to
> hdfs just copying the log files from the topic directory of the broker
> (like 23244237.log).
>
> The file format is very simple :
> https://twitter.com/amiorin/status/576448691139121152/photo/1
>
> Implementing an InputFormat should not be so difficult.
>
> Any drawbacks?
>


Re: Alternative to camus

2015-03-13 Thread Alberto Miorin
Flume solution looks very good.

Thx.

On Fri, Mar 13, 2015 at 8:15 PM, William Briggs  wrote:

> I would think that this is not a particularly great solution, as you will
> end up running into quite a few edge cases, and I can't see this scaling
> particularly well - how do you know which server to copy logs from in a
> clustered and replicated environment? What happens when Kafka detects a
> failure and moves partition replicas to a different node? The reason that
> the Kafka Consumer APIs exist is to shield you from having to think about
> these things. In addition, you would be tightly coupling yourself to
> Kafka's internal log format; in my experience, this sort of thing rarely
> ends well.
>
> Depending on your use case, Flume is a reasonable solution, if you don't
> want to use Camus; it has a Kafka source that allows you to stream data out
> of Kafka and into HDFS:
> http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/
>
> -Will
>
> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin  > wrote:
>
>> I was wondering if anybody has already tried to mirror a kafka topic to
>> hdfs just copying the log files from the topic directory of the broker
>> (like 23244237.log).
>>
>> The file format is very simple :
>> https://twitter.com/amiorin/status/576448691139121152/photo/1
>>
>> Implementing an InputFormat should not be so difficult.
>>
>> Any drawbacks?
>>
>
>


Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
There's a KafkaRDD that can be used in Spark:
https://github.com/tresata/spark-kafka. It doesn't exactly replace
Camus, but should be useful in building Camus-like system in Spark.

On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
 wrote:
> We use spark on mesos. I don't want to partition our cluster because of one
> YARN job (camus).
>
> Best
>
> Alberto
>
> On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> Just curious - why - is Camus not suitable/working?
>>
>> Thanks,
>> Otis
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin > >
>> wrote:
>>
>> > I was wondering if anybody has already tried to mirror a kafka topic to
>> > hdfs just copying the log files from the topic directory of the broker
>> > (like 23244237.log).
>> >
>> > The file format is very simple :
>> > https://twitter.com/amiorin/status/576448691139121152/photo/1
>> >
>> > Implementing an InputFormat should not be so difficult.
>> >
>> > Any drawbacks?
>> >
>>


Re: Kafka elastic no downtime scalability

2015-03-13 Thread Joe Stein
https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, Mar 13, 2015 at 3:05 PM, sunil kalva  wrote:

> Joe
>
> "Well, I know it is semantic but right now it "can" be elastically scaled
> without down time but you have to integrate into your environment for what
> that means it has been that way since 0.8.0 imho"
>
> here what do you mean "you have to integrate into your environment", how do
> i achieve elastically scaled cluster seamlessly ?
>
> SunilKalva
>
> On Fri, Mar 13, 2015 at 10:27 PM, Joe Stein  wrote:
>
> > Well, I know it is semantic but right now it "can" be elastically scaled
> > without down time but you have to integrate into your environment for
> what
> > that means it has been that way since 0.8.0 imho.
> >
> > My point was just another way to-do that out of the box... folks do this
> > elastic scailing today with AWS CloudFormation and internal systems they
> > built too.
> >
> > So, it can be done... you just have todo it.
> >
> > ~ Joe Stein
> > - - - - - - - - - - - - - - - - -
> >
> >   http://www.stealth.ly
> > - - - - - - - - - - - - - - - - -
> >
> > On Fri, Mar 13, 2015 at 12:39 PM, Stevo Slavić 
> wrote:
> >
> > > OK, thanks for heads up.
> > >
> > > When reading Apache Kafka docs, and reading what Apache Kafka "can" I
> > > expect it to already be available in latest general availability
> release,
> > > not what's planned as part of some other project.
> > >
> > > Kind regards,
> > > Stevo Slavic.
> > >
> > > On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein 
> wrote:
> > >
> > > > Hey Stevo, "can be elastically and transparently expanded without
> > > > downtime." is
> > > > the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka
> as
> > > the
> > > > ability (knobs/levers) to-do this but has to be made to-do this out
> of
> > > the
> > > > box.
> > > >
> > > > e.g. in Kafka on Mesos when a broker fails, after the configurable
> max
> > > fail
> > > > over timeout (meaning it is truly deemed hard failure) then a broker
> > > (with
> > > > the same id) will automatically be started on a another machine, data
> > > > replicated and back in action once that is done, automatically. Lots
> > more
> > > > features already in there... we are also in progress to auto balance
> > > > partitions when increasing/decreasing the size of the cluster and
> some
> > > more
> > > > goodies too.
> > > >
> > > > ~ Joe Stein
> > > > - - - - - - - - - - - - - - - - -
> > > >
> > > >   http://www.stealth.ly
> > > > - - - - - - - - - - - - - - - - -
> > > >
> > > > On Fri, Mar 13, 2015 at 8:43 AM, Stevo Slavić 
> > wrote:
> > > >
> > > > > Hello Apache Kafka community,
> > > > >
> > > > > On Apache Kafka website home page http://kafka.apache.org/ it is
> > > stated
> > > > > that Kafka "can be elastically and transparently expanded without
> > > > > downtime."
> > > > > Is that really true? More specifically, can one just add one more
> > > broker,
> > > > > have another partition added for the topic, have new broker
> assigned
> > to
> > > > be
> > > > > the leader for new partition, have producers correctly write to the
> > new
> > > > > partition, and consumers read from it, with no broker, consumer or
> > > > producer
> > > > > downtime, no data loss, no manual action to move data from existing
> > > > > partitions to new partition?
> > > > >
> > > > > Kind regards,
> > > > > Stevo Slavic.
> > > > >
> > > >
> > >
> >
>
>
>
> --
> SunilKalva
>


Re: Alternative to camus

2015-03-13 Thread Alberto Miorin
I'll try this too. It looks very promising.

Thx

On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira  wrote:

> There's a KafkaRDD that can be used in Spark:
> https://github.com/tresata/spark-kafka. It doesn't exactly replace
> Camus, but should be useful in building Camus-like system in Spark.
>
> On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
>  wrote:
> > We use spark on mesos. I don't want to partition our cluster because of
> one
> > YARN job (camus).
> >
> > Best
> >
> > Alberto
> >
> > On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> >> Just curious - why - is Camus not suitable/working?
> >>
> >> Thanks,
> >> Otis
> >> --
> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> >> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <
> amiorin78+ka...@gmail.com
> >> >
> >> wrote:
> >>
> >> > I was wondering if anybody has already tried to mirror a kafka topic
> to
> >> > hdfs just copying the log files from the topic directory of the broker
> >> > (like 23244237.log).
> >> >
> >> > The file format is very simple :
> >> > https://twitter.com/amiorin/status/576448691139121152/photo/1
> >> >
> >> > Implementing an InputFormat should not be so difficult.
> >> >
> >> > Any drawbacks?
> >> >
> >>
>


Re: Alternative to camus

2015-03-13 Thread William Briggs
Spark Streaming also has built-in support for Kafka, and as of Spark 1.2,
it supports using an HDFS write-ahead log to ensure zero data loss while
streaming:
https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html

-Will

On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin 
wrote:

> I'll try this too. It looks very promising.
>
> Thx
>
> On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira 
> wrote:
>
> > There's a KafkaRDD that can be used in Spark:
> > https://github.com/tresata/spark-kafka. It doesn't exactly replace
> > Camus, but should be useful in building Camus-like system in Spark.
> >
> > On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
> >  wrote:
> > > We use spark on mesos. I don't want to partition our cluster because of
> > one
> > > YARN job (camus).
> > >
> > > Best
> > >
> > > Alberto
> > >
> > > On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
> > > otis.gospodne...@gmail.com> wrote:
> > >
> > >> Just curious - why - is Camus not suitable/working?
> > >>
> > >> Thanks,
> > >> Otis
> > >> --
> > >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > >> Solr & Elasticsearch Support * http://sematext.com/
> > >>
> > >>
> > >> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <
> > amiorin78+ka...@gmail.com
> > >> >
> > >> wrote:
> > >>
> > >> > I was wondering if anybody has already tried to mirror a kafka topic
> > to
> > >> > hdfs just copying the log files from the topic directory of the
> broker
> > >> > (like 23244237.log).
> > >> >
> > >> > The file format is very simple :
> > >> > https://twitter.com/amiorin/status/576448691139121152/photo/1
> > >> >
> > >> > Implementing an InputFormat should not be so difficult.
> > >> >
> > >> > Any drawbacks?
> > >> >
> > >>
> >
>


Re: Alternative to camus

2015-03-13 Thread Alberto Miorin
We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
I will only say one thing : "a nightmare". ;-)

Let's see if things are better with 1.3.0 :
http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html

On Fri, Mar 13, 2015 at 8:33 PM, William Briggs  wrote:

> Spark Streaming also has built-in support for Kafka, and as of Spark 1.2,
> it supports using an HDFS write-ahead log to ensure zero data loss while
> streaming:
> https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
>
> -Will
>
> On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin  > wrote:
>
>> I'll try this too. It looks very promising.
>>
>> Thx
>>
>> On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira 
>> wrote:
>>
>> > There's a KafkaRDD that can be used in Spark:
>> > https://github.com/tresata/spark-kafka. It doesn't exactly replace
>> > Camus, but should be useful in building Camus-like system in Spark.
>> >
>> > On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
>> >  wrote:
>> > > We use spark on mesos. I don't want to partition our cluster because
>> of
>> > one
>> > > YARN job (camus).
>> > >
>> > > Best
>> > >
>> > > Alberto
>> > >
>> > > On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
>> > > otis.gospodne...@gmail.com> wrote:
>> > >
>> > >> Just curious - why - is Camus not suitable/working?
>> > >>
>> > >> Thanks,
>> > >> Otis
>> > >> --
>> > >> Monitoring * Alerting * Anomaly Detection * Centralized Log
>> Management
>> > >> Solr & Elasticsearch Support * http://sematext.com/
>> > >>
>> > >>
>> > >> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <
>> > amiorin78+ka...@gmail.com
>> > >> >
>> > >> wrote:
>> > >>
>> > >> > I was wondering if anybody has already tried to mirror a kafka
>> topic
>> > to
>> > >> > hdfs just copying the log files from the topic directory of the
>> broker
>> > >> > (like 23244237.log).
>> > >> >
>> > >> > The file format is very simple :
>> > >> > https://twitter.com/amiorin/status/576448691139121152/photo/1
>> > >> >
>> > >> > Implementing an InputFormat should not be so difficult.
>> > >> >
>> > >> > Any drawbacks?
>> > >> >
>> > >>
>> >
>>
>
>


Re: Alternative to camus

2015-03-13 Thread William Briggs
Thanks for the heads-up, Alberto, that's good to know. We were about to
start a few projects working with Spark Streaming + Kafka; sounds like
there's still quite a bit of work to be done there.

-Will

On Fri, Mar 13, 2015 at 3:38 PM, Alberto Miorin 
wrote:

> We are currently using spark streaming 1.2.1 with kafka and write-ahead
> log.
> I will only say one thing : "a nightmare". ;-)
>
> Let's see if things are better with 1.3.0 :
> http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
>
> On Fri, Mar 13, 2015 at 8:33 PM, William Briggs 
> wrote:
>
>> Spark Streaming also has built-in support for Kafka, and as of Spark 1.2,
>> it supports using an HDFS write-ahead log to ensure zero data loss while
>> streaming:
>> https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
>>
>> -Will
>>
>> On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin <
>> amiorin78+ka...@gmail.com> wrote:
>>
>>> I'll try this too. It looks very promising.
>>>
>>> Thx
>>>
>>> On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira 
>>> wrote:
>>>
>>> > There's a KafkaRDD that can be used in Spark:
>>> > https://github.com/tresata/spark-kafka. It doesn't exactly replace
>>> > Camus, but should be useful in building Camus-like system in Spark.
>>> >
>>> > On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
>>> >  wrote:
>>> > > We use spark on mesos. I don't want to partition our cluster because
>>> of
>>> > one
>>> > > YARN job (camus).
>>> > >
>>> > > Best
>>> > >
>>> > > Alberto
>>> > >
>>> > > On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
>>> > > otis.gospodne...@gmail.com> wrote:
>>> > >
>>> > >> Just curious - why - is Camus not suitable/working?
>>> > >>
>>> > >> Thanks,
>>> > >> Otis
>>> > >> --
>>> > >> Monitoring * Alerting * Anomaly Detection * Centralized Log
>>> Management
>>> > >> Solr & Elasticsearch Support * http://sematext.com/
>>> > >>
>>> > >>
>>> > >> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <
>>> > amiorin78+ka...@gmail.com
>>> > >> >
>>> > >> wrote:
>>> > >>
>>> > >> > I was wondering if anybody has already tried to mirror a kafka
>>> topic
>>> > to
>>> > >> > hdfs just copying the log files from the topic directory of the
>>> broker
>>> > >> > (like 23244237.log).
>>> > >> >
>>> > >> > The file format is very simple :
>>> > >> > https://twitter.com/amiorin/status/576448691139121152/photo/1
>>> > >> >
>>> > >> > Implementing an InputFormat should not be so difficult.
>>> > >> >
>>> > >> > Any drawbacks?
>>> > >> >
>>> > >>
>>> >
>>>
>>
>>
>


Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
I really like the new approach. The WAL in HDFS never made much sense
to me (I mean, Kafka is a log. I know they don't want the Kafka
dependency, but a log for a log makes no sense).

Still experimental, but I think thats the right direction.

On Fri, Mar 13, 2015 at 12:38 PM, Alberto Miorin
 wrote:
> We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
> I will only say one thing : "a nightmare". ;-)
>
> Let's see if things are better with 1.3.0 :
> http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
>
> On Fri, Mar 13, 2015 at 8:33 PM, William Briggs  wrote:
>
>> Spark Streaming also has built-in support for Kafka, and as of Spark 1.2,
>> it supports using an HDFS write-ahead log to ensure zero data loss while
>> streaming:
>> https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
>>
>> -Will
>>
>> On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin > > wrote:
>>
>>> I'll try this too. It looks very promising.
>>>
>>> Thx
>>>
>>> On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira 
>>> wrote:
>>>
>>> > There's a KafkaRDD that can be used in Spark:
>>> > https://github.com/tresata/spark-kafka. It doesn't exactly replace
>>> > Camus, but should be useful in building Camus-like system in Spark.
>>> >
>>> > On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
>>> >  wrote:
>>> > > We use spark on mesos. I don't want to partition our cluster because
>>> of
>>> > one
>>> > > YARN job (camus).
>>> > >
>>> > > Best
>>> > >
>>> > > Alberto
>>> > >
>>> > > On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
>>> > > otis.gospodne...@gmail.com> wrote:
>>> > >
>>> > >> Just curious - why - is Camus not suitable/working?
>>> > >>
>>> > >> Thanks,
>>> > >> Otis
>>> > >> --
>>> > >> Monitoring * Alerting * Anomaly Detection * Centralized Log
>>> Management
>>> > >> Solr & Elasticsearch Support * http://sematext.com/
>>> > >>
>>> > >>
>>> > >> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <
>>> > amiorin78+ka...@gmail.com
>>> > >> >
>>> > >> wrote:
>>> > >>
>>> > >> > I was wondering if anybody has already tried to mirror a kafka
>>> topic
>>> > to
>>> > >> > hdfs just copying the log files from the topic directory of the
>>> broker
>>> > >> > (like 23244237.log).
>>> > >> >
>>> > >> > The file format is very simple :
>>> > >> > https://twitter.com/amiorin/status/576448691139121152/photo/1
>>> > >> >
>>> > >> > Implementing an InputFormat should not be so difficult.
>>> > >> >
>>> > >> > Any drawbacks?
>>> > >> >
>>> > >>
>>> >
>>>
>>
>>


Re: Alternative to camus

2015-03-13 Thread Andrew Otto
> We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
> I will only say one thing : "a nightmare". ;-)
I’d be really interested in hearing about your experience here.  I’m exploring 
streaming frameworks a bit, and Spark Streaming is just so easy to use and set 
up.  I’d be nice if it worked well.




> On Mar 13, 2015, at 15:38, Alberto Miorin  wrote:
> 
> We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
> I will only say one thing : "a nightmare". ;-)
> 
> Let's see if things are better with 1.3.0 :
> http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
> 
> On Fri, Mar 13, 2015 at 8:33 PM, William Briggs  wrote:
> 
>> Spark Streaming also has built-in support for Kafka, and as of Spark 1.2,
>> it supports using an HDFS write-ahead log to ensure zero data loss while
>> streaming:
>> https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
>> 
>> -Will
>> 
>> On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin >> wrote:
>> 
>>> I'll try this too. It looks very promising.
>>> 
>>> Thx
>>> 
>>> On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira 
>>> wrote:
>>> 
 There's a KafkaRDD that can be used in Spark:
 https://github.com/tresata/spark-kafka. It doesn't exactly replace
 Camus, but should be useful in building Camus-like system in Spark.
 
 On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
  wrote:
> We use spark on mesos. I don't want to partition our cluster because
>>> of
 one
> YARN job (camus).
> 
> Best
> 
> Alberto
> 
> On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
> 
>> Just curious - why - is Camus not suitable/working?
>> 
>> Thanks,
>> Otis
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log
>>> Management
>> Solr & Elasticsearch Support * http://sematext.com/
>> 
>> 
>> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <
 amiorin78+ka...@gmail.com
>>> 
>> wrote:
>> 
>>> I was wondering if anybody has already tried to mirror a kafka
>>> topic
 to
>>> hdfs just copying the log files from the topic directory of the
>>> broker
>>> (like 23244237.log).
>>> 
>>> The file format is very simple :
>>> https://twitter.com/amiorin/status/576448691139121152/photo/1
>>> 
>>> Implementing an InputFormat should not be so difficult.
>>> 
>>> Any drawbacks?
>>> 
>> 
 
>>> 
>> 
>> 



RE: JSON parsing causing rebalance to fail

2015-03-13 Thread Arunkumar Srambikkal (asrambik)
Update : 

Turns out this error happens in 2 scenarios 

1. When there is a mis-match between the broker and zookeeper libs inside of 
your process (found that from stackoverflow)

2.Apparetly when anything that uses scala parser combinators libs (in our case 
scala.util.parsing.json.JSON) runs within the same process as your consumer

1 is easily fixed. 2 happens consistently and it is often caused by other libs 
that you use, which makes it impossible to use kafka at all.

It's quite silly but totally disabling. 

We would appreciate it a lot, if someone could provide us a workaround  / point 
us to the exact issue so that we can custom patch it / provide a patch 
ourselves. 

Thanks
Arun

From: Arunkumar Srambikkal (asrambik) 
Sent: Wednesday, March 04, 2015 5:27 PM
To: users@kafka.apache.org
Subject: JSON parsing causing rebalance to fail

Hi, 

When I start a new consumer, it throws a Rebalance exception. 

However I hit it only on some machines where the run time libraries are 
different

The stack given below is what I encounter - is this a known issue? 

I saw this Jira but it's not resolved  so thought to confirm - 
https://issues.apache.org/jira/browse/KAFKA-1405

Thanks
Arun

[2015-03-04 14:30:37,609] INFO [name], exception during rebalance  
(kafka.consumer.ZookeeperConsumerConnector)

kafka.common.KafkaException: Failed to parse the broker info from zookeeper: 
{"jmx_port":-1,"timestamp":"1425459616502","host":"1.1.1.1","version":1,"port":64356}
    at kafka.cluster.Broker$.createBroker(Broker.scala:35)
    ..
Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to 
java.lang.Integer
    at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:105)
    at kafka.cluster.Broker$.createBroker(Broker.scala:40)



Re: Alternative to camus

2015-03-13 Thread Alberto Miorin
1) You save everything 2 times (kafka and hdfs).
2) You need to enable the checkpoint feature, that means you cannot change
the configuration of the job, because the spark streaming context is
deserialized from hdfs every time you restart the job.
3) What happens if hdfs is unavailable, not clear?
4) It's not transactional. For example we tried to use it to move some data
from kafka to elasticsearch. If elasticsearch is down, the spark streaming
job doesn't stop. It cares only about failures of spark or kafka.

On Fri, Mar 13, 2015 at 8:48 PM, Andrew Otto  wrote:

> > We are currently using spark streaming 1.2.1 with kafka and write-ahead
> log.
> > I will only say one thing : "a nightmare". ;-)
> I’d be really interested in hearing about your experience here.  I’m
> exploring streaming frameworks a bit, and Spark Streaming is just so easy
> to use and set up.  I’d be nice if it worked well.
>
>
>
>
> > On Mar 13, 2015, at 15:38, Alberto Miorin 
> wrote:
> >
> > We are currently using spark streaming 1.2.1 with kafka and write-ahead
> log.
> > I will only say one thing : "a nightmare". ;-)
> >
> > Let's see if things are better with 1.3.0 :
> > http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
> >
> > On Fri, Mar 13, 2015 at 8:33 PM, William Briggs 
> wrote:
> >
> >> Spark Streaming also has built-in support for Kafka, and as of Spark
> 1.2,
> >> it supports using an HDFS write-ahead log to ensure zero data loss while
> >> streaming:
> >>
> https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
> >>
> >> -Will
> >>
> >> On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin <
> amiorin78+ka...@gmail.com
> >>> wrote:
> >>
> >>> I'll try this too. It looks very promising.
> >>>
> >>> Thx
> >>>
> >>> On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira 
> >>> wrote:
> >>>
>  There's a KafkaRDD that can be used in Spark:
>  https://github.com/tresata/spark-kafka. It doesn't exactly replace
>  Camus, but should be useful in building Camus-like system in Spark.
> 
>  On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
>   wrote:
> > We use spark on mesos. I don't want to partition our cluster because
> >>> of
>  one
> > YARN job (camus).
> >
> > Best
> >
> > Alberto
> >
> > On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> >> Just curious - why - is Camus not suitable/working?
> >>
> >> Thanks,
> >> Otis
> >> --
> >> Monitoring * Alerting * Anomaly Detection * Centralized Log
> >>> Management
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> >> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <
>  amiorin78+ka...@gmail.com
> >>>
> >> wrote:
> >>
> >>> I was wondering if anybody has already tried to mirror a kafka
> >>> topic
>  to
> >>> hdfs just copying the log files from the topic directory of the
> >>> broker
> >>> (like 23244237.log).
> >>>
> >>> The file format is very simple :
> >>> https://twitter.com/amiorin/status/576448691139121152/photo/1
> >>>
> >>> Implementing an InputFormat should not be so difficult.
> >>>
> >>> Any drawbacks?
> >>>
> >>
> 
> >>>
> >>
> >>
>
>


Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
Also very interesting in hearing about them.

I prefer war stories in form for Jira for the relevant project ;)
There's a good chance we can make things less horrible if issues are reported.

Gwen

On Fri, Mar 13, 2015 at 12:48 PM, Andrew Otto  wrote:
>> We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
>> I will only say one thing : "a nightmare". ;-)
> I’d be really interested in hearing about your experience here.  I’m 
> exploring streaming frameworks a bit, and Spark Streaming is just so easy to 
> use and set up.  I’d be nice if it worked well.
>
>
>
>
>> On Mar 13, 2015, at 15:38, Alberto Miorin  wrote:
>>
>> We are currently using spark streaming 1.2.1 with kafka and write-ahead log.
>> I will only say one thing : "a nightmare". ;-)
>>
>> Let's see if things are better with 1.3.0 :
>> http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
>>
>> On Fri, Mar 13, 2015 at 8:33 PM, William Briggs  wrote:
>>
>>> Spark Streaming also has built-in support for Kafka, and as of Spark 1.2,
>>> it supports using an HDFS write-ahead log to ensure zero data loss while
>>> streaming:
>>> https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
>>>
>>> -Will
>>>
>>> On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin >>> wrote:
>>>
 I'll try this too. It looks very promising.

 Thx

 On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira 
 wrote:

> There's a KafkaRDD that can be used in Spark:
> https://github.com/tresata/spark-kafka. It doesn't exactly replace
> Camus, but should be useful in building Camus-like system in Spark.
>
> On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
>  wrote:
>> We use spark on mesos. I don't want to partition our cluster because
 of
> one
>> YARN job (camus).
>>
>> Best
>>
>> Alberto
>>
>> On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
>> otis.gospodne...@gmail.com> wrote:
>>
>>> Just curious - why - is Camus not suitable/working?
>>>
>>> Thanks,
>>> Otis
>>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log
 Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <
> amiorin78+ka...@gmail.com

>>> wrote:
>>>
 I was wondering if anybody has already tried to mirror a kafka
 topic
> to
 hdfs just copying the log files from the topic directory of the
 broker
 (like 23244237.log).

 The file format is very simple :
 https://twitter.com/amiorin/status/576448691139121152/photo/1

 Implementing an InputFormat should not be so difficult.

 Any drawbacks?

>>>
>

>>>
>>>
>


Re: Alternative to camus

2015-03-13 Thread William Briggs
It seemed really counter-intuitive; I can only imagine that it happened
because nobody wanted to refactor the existing KafkaInputDStream to use the
SimpleConsumer instead of the High Level Consumer (unless I'm misreading
the source - it looks like that's what the new DirectKafkaInputDStream is
doing, whereas KafkaInputDStream is using kafka.consumer.Consumer).

-Will

On Fri, Mar 13, 2015 at 3:42 PM, Gwen Shapira  wrote:

> I really like the new approach. The WAL in HDFS never made much sense
> to me (I mean, Kafka is a log. I know they don't want the Kafka
> dependency, but a log for a log makes no sense).
>
> Still experimental, but I think thats the right direction.
>
> On Fri, Mar 13, 2015 at 12:38 PM, Alberto Miorin
>  wrote:
> > We are currently using spark streaming 1.2.1 with kafka and write-ahead
> log.
> > I will only say one thing : "a nightmare". ;-)
> >
> > Let's see if things are better with 1.3.0 :
> > http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
> >
> > On Fri, Mar 13, 2015 at 8:33 PM, William Briggs 
> wrote:
> >
> >> Spark Streaming also has built-in support for Kafka, and as of Spark
> 1.2,
> >> it supports using an HDFS write-ahead log to ensure zero data loss while
> >> streaming:
> >>
> https://databricks.com/blog/2015/01/15/improved-driver-fault-tolerance-and-zero-data-loss-in-spark-streaming.html
> >>
> >> -Will
> >>
> >> On Fri, Mar 13, 2015 at 3:28 PM, Alberto Miorin <
> amiorin78+ka...@gmail.com
> >> > wrote:
> >>
> >>> I'll try this too. It looks very promising.
> >>>
> >>> Thx
> >>>
> >>> On Fri, Mar 13, 2015 at 8:25 PM, Gwen Shapira 
> >>> wrote:
> >>>
> >>> > There's a KafkaRDD that can be used in Spark:
> >>> > https://github.com/tresata/spark-kafka. It doesn't exactly replace
> >>> > Camus, but should be useful in building Camus-like system in Spark.
> >>> >
> >>> > On Fri, Mar 13, 2015 at 12:15 PM, Alberto Miorin
> >>> >  wrote:
> >>> > > We use spark on mesos. I don't want to partition our cluster
> because
> >>> of
> >>> > one
> >>> > > YARN job (camus).
> >>> > >
> >>> > > Best
> >>> > >
> >>> > > Alberto
> >>> > >
> >>> > > On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic <
> >>> > > otis.gospodne...@gmail.com> wrote:
> >>> > >
> >>> > >> Just curious - why - is Camus not suitable/working?
> >>> > >>
> >>> > >> Thanks,
> >>> > >> Otis
> >>> > >> --
> >>> > >> Monitoring * Alerting * Anomaly Detection * Centralized Log
> >>> Management
> >>> > >> Solr & Elasticsearch Support * http://sematext.com/
> >>> > >>
> >>> > >>
> >>> > >> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin <
> >>> > amiorin78+ka...@gmail.com
> >>> > >> >
> >>> > >> wrote:
> >>> > >>
> >>> > >> > I was wondering if anybody has already tried to mirror a kafka
> >>> topic
> >>> > to
> >>> > >> > hdfs just copying the log files from the topic directory of the
> >>> broker
> >>> > >> > (like 23244237.log).
> >>> > >> >
> >>> > >> > The file format is very simple :
> >>> > >> > https://twitter.com/amiorin/status/576448691139121152/photo/1
> >>> > >> >
> >>> > >> > Implementing an InputFormat should not be so difficult.
> >>> > >> >
> >>> > >> > Any drawbacks?
> >>> > >> >
> >>> > >>
> >>> >
> >>>
> >>
> >>
>


RE: Alternative to camus

2015-03-13 Thread Thunder Stumpges
Sorry to go back in time on this thread, but Camus does NOT use YARN. We have 
been using camus for a while on our CDH4 (no YARN) Hadoop cluster. It really is 
fairly easy to set up, and seems to be quite good so far.

-Thunder


-Original Message-
From: amiori...@gmail.com [mailto:amiori...@gmail.com] On Behalf Of Alberto 
Miorin
Sent: Friday, March 13, 2015 12:15 PM
To: users@kafka.apache.org
Cc: otis.gospodne...@gmail.com
Subject: Re: Alternative to camus

We use spark on mesos. I don't want to partition our cluster because of one 
YARN job (camus).

Best

Alberto

On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> 
wrote:

> Just curious - why - is Camus not suitable/working?
>
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management 
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin 
>  >
> wrote:
>
> > I was wondering if anybody has already tried to mirror a kafka topic 
> > to hdfs just copying the log files from the topic directory of the 
> > broker (like 23244237.log).
> >
> > The file format is very simple :
> > https://twitter.com/amiorin/status/576448691139121152/photo/1
> >
> > Implementing an InputFormat should not be so difficult.
> >
> > Any drawbacks?
> >
>


Re: Alternative to camus

2015-03-13 Thread Gwen Shapira
Camus uses MapReduce though.
If Alberto uses Spark exclusively, I can see why installing MapReduce
cluster (with or without YARN) is not a desirable solution.




On Fri, Mar 13, 2015 at 1:06 PM, Thunder Stumpges  wrote:
> Sorry to go back in time on this thread, but Camus does NOT use YARN. We have 
> been using camus for a while on our CDH4 (no YARN) Hadoop cluster. It really 
> is fairly easy to set up, and seems to be quite good so far.
>
> -Thunder
>
>
> -Original Message-
> From: amiori...@gmail.com [mailto:amiori...@gmail.com] On Behalf Of Alberto 
> Miorin
> Sent: Friday, March 13, 2015 12:15 PM
> To: users@kafka.apache.org
> Cc: otis.gospodne...@gmail.com
> Subject: Re: Alternative to camus
>
> We use spark on mesos. I don't want to partition our cluster because of one 
> YARN job (camus).
>
> Best
>
> Alberto
>
> On Fri, Mar 13, 2015 at 7:43 PM, Otis Gospodnetic < 
> otis.gospodne...@gmail.com> wrote:
>
>> Just curious - why - is Camus not suitable/working?
>>
>> Thanks,
>> Otis
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Fri, Mar 13, 2015 at 2:33 PM, Alberto Miorin
>> > >
>> wrote:
>>
>> > I was wondering if anybody has already tried to mirror a kafka topic
>> > to hdfs just copying the log files from the topic directory of the
>> > broker (like 23244237.log).
>> >
>> > The file format is very simple :
>> > https://twitter.com/amiorin/status/576448691139121152/photo/1
>> >
>> > Implementing an InputFormat should not be so difficult.
>> >
>> > Any drawbacks?
>> >
>>


Kafka and Spark 1.3.0

2015-03-13 Thread Niek Sanders
The newest version of Spark came out today.

https://spark.apache.org/releases/spark-release-1-3-0.html

Apparently they made improvements to the Kafka connector for Spark
Streaming (see Approach 2):

http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html

Best,
Niek


Re: High Replica Max Lag

2015-03-13 Thread Zakee
Hi Mayuresh,

I have currently set this property to 4 and I see from the logs that it starts 
12 threads on each broker. I will try increasing it further.

Thanks
Zakee



> On Mar 13, 2015, at 11:53 AM, Mayuresh Gharat  
> wrote:
> 
> You might want to increase the number of Replica Fetcher threads by setting
> this property : *num.replica.fetchers*.
> 
> Thanks,
> 
> Mayuresh
> 
> On Thu, Mar 12, 2015 at 10:39 PM, Zakee  wrote:
> 
>> With the producer throughput as large as > 150MB/s to 5 brokers on a
>> continuous basis, I see a consistently high value for Replica Max Lag (in
>> millions). Is this normal or there is a way to tune so as to reduce replica
>> MaxLag?
>> As per documentation, replica max lag (in messages) between follower and
>> leader replicas, should be less than replica.lag.max.messages (currently
>> set to 5000)
>> 
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>> 
>> Old School Yearbook Pics
>> View Class Yearbooks Online Free. Search by School & Year. Look Now!
>> http://thirdpartyoffers.netzero.net/TGL3231/550278012ef8378006048st04duc
> 
> 
> 
> 
> -- 
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
> 
> What's your flood risk?
> Find flood maps, interactive tools, FAQs, and agents in your area.
> http://thirdpartyoffers.netzero.net/TGL3255/550336d785c4236d61d0cmp05duc



Re: Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Jun Rao
Is there a way that you can reproduce this easily?

Thanks,

Jun

On Fri, Mar 13, 2015 at 8:13 AM, Marc Labbe  wrote:

> No exactly, the topics are compacted but messages are not compressed.
>
> I get the exact same error though. Any other options I should consider?
> We're on 0.8.2.0 and we also had this on 0.8.1.1 before.
>
> marc
>
> On Fri, Mar 13, 2015 at 10:47 AM, Jun Rao  wrote:
>
> > Did you get into that issue for the same reason as in the jira, i.e.,
> > somehow compressed messages were sent to the compact topics?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Mar 13, 2015 at 6:45 AM, Marc Labbe  wrote:
> >
> > > Hello,
> > >
> > > we're often seeing log cleaner exceptions reported in KAFKA-1641 and
> I'd
> > > like to know if it's safe to apply the patch from that issue resolution
> > to
> > > 0.8.2.1?
> > >
> > > Reference: https://issues.apache.org/jira/browse/KAFKA-1641
> > >
> > > Also there are 2 patches in there, I suppose I should be using only the
> > > latest of the two.
> > >
> > > thanks!
> > > marc
> > >
> >
>


Re: Log cleaner patch (KAFKA-1641) on 0.8.2.1

2015-03-13 Thread Joel Koshy
+1 - if you have a way to reproduce that would be ideal. We don't know
the root cause of this yet. Our guess is a corner case around
shutdowns, but not sure.

On Fri, Mar 13, 2015 at 03:13:45PM -0700, Jun Rao wrote:
> Is there a way that you can reproduce this easily?
> 
> Thanks,
> 
> Jun
> 
> On Fri, Mar 13, 2015 at 8:13 AM, Marc Labbe  wrote:
> 
> > No exactly, the topics are compacted but messages are not compressed.
> >
> > I get the exact same error though. Any other options I should consider?
> > We're on 0.8.2.0 and we also had this on 0.8.1.1 before.
> >
> > marc
> >
> > On Fri, Mar 13, 2015 at 10:47 AM, Jun Rao  wrote:
> >
> > > Did you get into that issue for the same reason as in the jira, i.e.,
> > > somehow compressed messages were sent to the compact topics?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Fri, Mar 13, 2015 at 6:45 AM, Marc Labbe  wrote:
> > >
> > > > Hello,
> > > >
> > > > we're often seeing log cleaner exceptions reported in KAFKA-1641 and
> > I'd
> > > > like to know if it's safe to apply the patch from that issue resolution
> > > to
> > > > 0.8.2.1?
> > > >
> > > > Reference: https://issues.apache.org/jira/browse/KAFKA-1641
> > > >
> > > > Also there are 2 patches in there, I suppose I should be using only the
> > > > latest of the two.
> > > >
> > > > thanks!
> > > > marc
> > > >
> > >
> >



Re: High Replica Max Lag

2015-03-13 Thread Joel Koshy
I think what people have observed in the past is that increasing
num-replica-fetcher-threads has diminishing returns fairly quickly.
You may want to instead increase the number of partitions in the topic
you are producing to. (How many do you have right now?)

On Fri, Mar 13, 2015 at 02:48:17PM -0700, Zakee wrote:
> Hi Mayuresh,
> 
> I have currently set this property to 4 and I see from the logs that it 
> starts 12 threads on each broker. I will try increasing it further.
> 
> Thanks
> Zakee
> 
> 
> 
> > On Mar 13, 2015, at 11:53 AM, Mayuresh Gharat  
> > wrote:
> > 
> > You might want to increase the number of Replica Fetcher threads by setting
> > this property : *num.replica.fetchers*.
> > 
> > Thanks,
> > 
> > Mayuresh
> > 
> > On Thu, Mar 12, 2015 at 10:39 PM, Zakee  wrote:
> > 
> >> With the producer throughput as large as > 150MB/s to 5 brokers on a
> >> continuous basis, I see a consistently high value for Replica Max Lag (in
> >> millions). Is this normal or there is a way to tune so as to reduce replica
> >> MaxLag?
> >> As per documentation, replica max lag (in messages) between follower and
> >> leader replicas, should be less than replica.lag.max.messages (currently
> >> set to 5000)
> >> 
> >> 
> >> Thanks
> >> Zakee
> >> 
> >> 
> >> 
> >> 
> >> Old School Yearbook Pics
> >> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> >> http://thirdpartyoffers.netzero.net/TGL3231/550278012ef8378006048st04duc
> > 
> > 
> > 
> > 
> > -- 
> > -Regards,
> > Mayuresh R. Gharat
> > (862) 250-7125
> > 
> > What's your flood risk?
> > Find flood maps, interactive tools, FAQs, and agents in your area.
> > http://thirdpartyoffers.netzero.net/TGL3255/550336d785c4236d61d0cmp05duc
> 



Broker Restart failed w/ Corrupt index found

2015-03-13 Thread Zakee
I did a shutdown of the cluster and then try to restart and see the below error 
on one of the 5 brokers, I can’t restart this instance and not sure how to fix 
this.

[2015-03-13 15:27:31,793] ERROR There was an error in one of the threads during 
logs loading: java.lang.IllegalArgumentException: requirement failed: Corrupt 
index found, index file 
(/data/vol6/kafka/kafka82/Topic22-6/06256447.index) has non-zero 
size but the last offset is 6256447 and the base offset is 6256447 
(kafka.log.LogManager)
[2015-03-13 15:27:31,795] FATAL [Kafka Server 5], Fatal error during 
KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.IllegalArgumentException: requirement failed: Corrupt index found, 
index file (/data/vol6/kafka/kafka82/Topic22-6/06256447.index) has 
non-zero size but the last offset is 6256447 and the base offset is 6256447
at scala.Predef$.require(Predef.scala:233)
at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352)
at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:184)
at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:183)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.log.Log.loadSegments(Log.scala:183)
at kafka.log.Log.(Log.scala:67)
at 
kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:142)
at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2015-03-13 15:27:31,831] INFO [Kafka Server 5], shutting down 
(kafka.server.KafkaServer)


Thanks
Zakee





Fast, Secure, NetZero 4G Mobile Broadband. Try it.
http://www.netzero.net/?refcd=NZINTISP0512T4GOUT2


Re: Broker Restart failed w/ Corrupt index found

2015-03-13 Thread Mayuresh Gharat
The index files work in the following way :
Its a mapping from logical offsets to a particular file location within the
log file segment.

If you see the comments under OffsetIndex.scala code :

The file format is a series of entries. The physical format is a 4 byte
"relative" offset and a 4 byte file location for the
 message with that offset. The offset stored is relative to the base offset
of the index file. So, for example,
 if the base offset was 50, then the offset 55 would be stored as 5. Using
relative offsets in this way let's us use
 only 4 bytes for the offset.

In you case the index file is non empty and its expecting that the base
offset should be greater than the last offset and that why it throws the
error. I suppose you can get rid of those index files and upon restart
those index will be rebuilt.

Thanks,

Mayuresh

On Fri, Mar 13, 2015 at 3:38 PM, Zakee  wrote:

> I did a shutdown of the cluster and then try to restart and see the below
> error on one of the 5 brokers, I can’t restart this instance and not sure
> how to fix this.
>
> [2015-03-13 15:27:31,793] ERROR There was an error in one of the threads
> during logs loading: java.lang.IllegalArgumentException: requirement
> failed: Corrupt index found, index file
> (/data/vol6/kafka/kafka82/Topic22-6/06256447.index) has
> non-zero size but the last offset is 6256447 and the base offset is 6256447
> (kafka.log.LogManager)
> [2015-03-13 15:27:31,795] FATAL [Kafka Server 5], Fatal error during
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> java.lang.IllegalArgumentException: requirement failed: Corrupt index
> found, index file
> (/data/vol6/kafka/kafka82/Topic22-6/06256447.index) has
> non-zero size but the last offset is 6256447 and the base offset is 6256447
> at scala.Predef$.require(Predef.scala:233)
> at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352)
> at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:184)
> at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:183)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.log.Log.loadSegments(Log.scala:183)
> at kafka.log.Log.(Log.scala:67)
> at
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:142)
> at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> [2015-03-13 15:27:31,831] INFO [Kafka Server 5], shutting down
> (kafka.server.KafkaServer)
>
>
> Thanks
> Zakee
>
>
>
>
> 
> Fast, Secure, NetZero 4G Mobile Broadband. Try it.
> http://www.netzero.net/?refcd=NZINTISP0512T4GOUT2
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: Broker Restart failed w/ Corrupt index found

2015-03-13 Thread Zakee
Just found there is a known issue to be resolved in future kafka version:  
https://issues.apache.org/jira/browse/KAFKA-1554

The workaround mentioned here helped.

> The workaround is to delete all index files of size 10MB (the size of the 
> pre-allocated files), and restart. Index files will be re-created.

> find $your_data_directory -size 10485760c -name *.index #-delete


Thanks
Zakee



> On Mar 13, 2015, at 3:38 PM, Zakee  wrote:
> 
> I did a shutdown of the cluster and then try to restart and see the below 
> error on one of the 5 brokers, I can’t restart this instance and not sure how 
> to fix this.
> 
> [2015-03-13 15:27:31,793] ERROR There was an error in one of the threads 
> during logs loading: java.lang.IllegalArgumentException: requirement failed: 
> Corrupt index found, index file 
> (/data/vol6/kafka/kafka82/Topic22-6/06256447.index) has non-zero 
> size but the last offset is 6256447 and the base offset is 6256447 
> (kafka.log.LogManager)
> [2015-03-13 15:27:31,795] FATAL [Kafka Server 5], Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
> java.lang.IllegalArgumentException: requirement failed: Corrupt index found, 
> index file (/data/vol6/kafka/kafka82/Topic22-6/06256447.index) 
> has non-zero size but the last offset is 6256447 and the base offset is 
> 6256447
>at scala.Predef$.require(Predef.scala:233)
>at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352)
>at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:184)
>at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:183)
>at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>at kafka.log.Log.loadSegments(Log.scala:183)
>at kafka.log.Log.(Log.scala:67)
>at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:142)
>at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.java:662)
> [2015-03-13 15:27:31,831] INFO [Kafka Server 5], shutting down 
> (kafka.server.KafkaServer)
> 
> 
> Thanks
> Zakee
> 
> 
> 


Protect what matters
Floods can happen anywhere. Learn your risk and find an agent today.
http://thirdpartyoffers.netzero.net/TGL3231/55036b26df3ee6b257d4fst02duc

Re: Kafka elastic no downtime scalability

2015-03-13 Thread Stevo Slavić
These features are all nice, if one adds new brokers to support additional
topics, or to move existing partitions or whole topics to new brokers.
Referenced sentence is in paragraph named scalability. When I read
"expanded" I was thinking of scaling out, extending parallelization
capabilities, and parallelism in Kafka is achieved with partitions. So I
understood that sentence that it is possible to add more partitions to
existing topics at runtime, with no downtime.

I just found in source that there is API for adding new partitions to
existing topics (see
https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/admin/AdminUtils.scala#L101
). Have to try it. I guess it should work during runtime, causing no
downtime or data loss, or moving data from existing to new partition.
Producers and consumers will eventually start writing to and reading from
new partition, and consumers should be able to read previously published
messages from old partitions, even messages which if they were sent again
would end up assigned/written to new partition.


Kind regards,
Stevo Slavic.

On Fri, Mar 13, 2015 at 8:27 PM, Joe Stein  wrote:

> https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
>
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
>
> On Fri, Mar 13, 2015 at 3:05 PM, sunil kalva  wrote:
>
> > Joe
> >
> > "Well, I know it is semantic but right now it "can" be elastically scaled
> > without down time but you have to integrate into your environment for
> what
> > that means it has been that way since 0.8.0 imho"
> >
> > here what do you mean "you have to integrate into your environment", how
> do
> > i achieve elastically scaled cluster seamlessly ?
> >
> > SunilKalva
> >
> > On Fri, Mar 13, 2015 at 10:27 PM, Joe Stein 
> wrote:
> >
> > > Well, I know it is semantic but right now it "can" be elastically
> scaled
> > > without down time but you have to integrate into your environment for
> > what
> > > that means it has been that way since 0.8.0 imho.
> > >
> > > My point was just another way to-do that out of the box... folks do
> this
> > > elastic scailing today with AWS CloudFormation and internal systems
> they
> > > built too.
> > >
> > > So, it can be done... you just have todo it.
> > >
> > > ~ Joe Stein
> > > - - - - - - - - - - - - - - - - -
> > >
> > >   http://www.stealth.ly
> > > - - - - - - - - - - - - - - - - -
> > >
> > > On Fri, Mar 13, 2015 at 12:39 PM, Stevo Slavić 
> > wrote:
> > >
> > > > OK, thanks for heads up.
> > > >
> > > > When reading Apache Kafka docs, and reading what Apache Kafka "can" I
> > > > expect it to already be available in latest general availability
> > release,
> > > > not what's planned as part of some other project.
> > > >
> > > > Kind regards,
> > > > Stevo Slavic.
> > > >
> > > > On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein 
> > wrote:
> > > >
> > > > > Hey Stevo, "can be elastically and transparently expanded without
> > > > > downtime." is
> > > > > the goal of Kafka on Mesos https://github.com/mesos/kafka as Kafka
> > as
> > > > the
> > > > > ability (knobs/levers) to-do this but has to be made to-do this out
> > of
> > > > the
> > > > > box.
> > > > >
> > > > > e.g. in Kafka on Mesos when a broker fails, after the configurable
> > max
> > > > fail
> > > > > over timeout (meaning it is truly deemed hard failure) then a
> broker
> > > > (with
> > > > > the same id) will automatically be started on a another machine,
> data
> > > > > replicated and back in action once that is done, automatically.
> Lots
> > > more
> > > > > features already in there... we are also in progress to auto
> balance
> > > > > partitions when increasing/decreasing the size of the cluster and
> > some
> > > > more
> > > > > goodies too.
> > > > >
> > > > > ~ Joe Stein
> > > > > - - - - - - - - - - - - - - - - -
> > > > >
> > > > >   http://www.stealth.ly
> > > > > - - - - - - - - - - - - - - - - -
> > > > >
> > > > > On Fri, Mar 13, 2015 at 8:43 AM, Stevo Slavić 
> > > wrote:
> > > > >
> > > > > > Hello Apache Kafka community,
> > > > > >
> > > > > > On Apache Kafka website home page http://kafka.apache.org/ it is
> > > > stated
> > > > > > that Kafka "can be elastically and transparently expanded without
> > > > > > downtime."
> > > > > > Is that really true? More specifically, can one just add one more
> > > > broker,
> > > > > > have another partition added for the topic, have new broker
> > assigned
> > > to
> > > > > be
> > > > > > the leader for new partition, have producers correctly write to
> the
> > > new
> > > > > > partition, and consumers read from it, with no broker, consumer
> or
> > > > > producer
> > > > > > downtime, no data loss, no manual action to move data from
> existing
> > > > > > partitions to new partition?
> > > > > >
> > > > > > Kind regards,
> > > > > > Stevo Slavic.
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > SunilKalva
> >
>


Re: Broker Restart failed w/ Corrupt index found

2015-03-13 Thread Jiangjie Qin
Can you reproduce this problem? Although the the fix is strait forward we
would like to understand why this happened.

On 3/13/15, 3:56 PM, "Zakee"  wrote:

>Just found there is a known issue to be resolved in future kafka version:
> https://issues.apache.org/jira/browse/KAFKA-1554
>
>The workaround mentioned here helped.
>
>> The workaround is to delete all index files of size 10MB (the size of
>>the pre-allocated files), and restart. Index files will be re-created.
>
>> find $your_data_directory -size 10485760c -name *.index #-delete
>
>
>Thanks
>Zakee
>
>
>
>> On Mar 13, 2015, at 3:38 PM, Zakee  wrote:
>> 
>> I did a shutdown of the cluster and then try to restart and see the
>>below error on one of the 5 brokers, I can¹t restart this instance and
>>not sure how to fix this.
>> 
>> [2015-03-13 15:27:31,793] ERROR There was an error in one of the
>>threads during logs loading: java.lang.IllegalArgumentException:
>>requirement failed: Corrupt index found, index file
>>(/data/vol6/kafka/kafka82/Topic22-6/06256447.index) has
>>non-zero size but the last offset is 6256447 and the base offset is
>>6256447 (kafka.log.LogManager)
>> [2015-03-13 15:27:31,795] FATAL [Kafka Server 5], Fatal error during
>>KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
>> java.lang.IllegalArgumentException: requirement failed: Corrupt index
>>found, index file
>>(/data/vol6/kafka/kafka82/Topic22-6/06256447.index) has
>>non-zero size but the last offset is 6256447 and the base offset is
>>6256447
>>at scala.Predef$.require(Predef.scala:233)
>>at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352)
>>at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:184)
>>at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:183)
>>at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>at 
>>scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>>at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>>at kafka.log.Log.loadSegments(Log.scala:183)
>>at kafka.log.Log.(Log.scala:67)
>>at 
>>kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$ano
>>nfun$apply$1.apply$mcV$sp(LogManager.scala:142)
>>at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
>>at 
>>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>at 
>>java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>at 
>>java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor
>>.java:886)
>>at 
>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav
>>a:908)
>>at java.lang.Thread.run(Thread.java:662)
>> [2015-03-13 15:27:31,831] INFO [Kafka Server 5], shutting down
>>(kafka.server.KafkaServer)
>> 
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>
>
>Protect what matters
>Floods can happen anywhere. Learn your risk and find an agent today.
>http://thirdpartyoffers.netzero.net/TGL3231/55036b26df3ee6b257d4fst02duc



Re: Reusable consumer across consumer groups

2015-03-13 Thread Stevo Slavić
Sorry for late reply. Not sure what more details you need.
Here's example http://confluent.io/docs/current/kafka-rest/docs/intro.html
of exposing Kafka through remoting (http/rest) :-)
One can without looking into kafka rest proxy code see based on its
limitations that it's using HL consumer, with all its deficiencies.
E.g. commit "request *must* be made to the specific REST proxy instance
holding the consumer instance" (see
http://confluent.io/docs/current/kafka-rest/docs/api.html#post--consumers-%28string-group_name%29-instances-%28string-instance%29-offsets
). Also "because consumers are stateful, any consumer instances created
with the REST API are tied to a specific REST proxy instance", and
"consumers may not change the set of topics they are subscribed to once
they have started consuming messages" (see
http://confluent.io/docs/current/kafka-rest/docs/api.html#consumers )

One of the things making high level consumer objects heavy is that each one
starts many threads, so a limited number of HL consumer instances can be
created per node (before OOM is thrown, not because there's not enough
memory, but because there are too many threads started).

With 0.8.2.x not much has changed on ability to reuse HL consumer instances
to poll on behalf of different consumer groups, consumer instances are
stateful - most importantly offset and lock(s) that active consumer is
holding. Luckily, there's simple consumer API.

Kind regards,
Stevo Slavic.

On Thu, Oct 23, 2014 at 6:36 PM, Neha Narkhede 
wrote:

> I'm wondering how much of this can be done using careful system design vs
> building it within the consumer itself. You could distribute the several
> consumer instances across machines since it is built for distributed load
> balancing. That will sufficiently isolate the resources required to run the
> various consumers. But probably you have a specific use case in mind for
> running several consumer groups on the same machine. Would you mind giving
> more details?
>
> On Thu, Oct 23, 2014 at 12:55 AM, Stevo Slavić  wrote:
>
> > Imagine exposing Kafka over various remoting protocols, where incoming
> > poll/read requests may come in concurrently for different consumer
> groups,
> > especially in a case with lots of different consumer groups.
> > If you create and destroy KafkaConsumer for each such request, response
> > times and throughput will be very low, and doing that is one of the ways
> to
> > reproduce https://issues.apache.org/jira/browse/KAFKA-1716
> >
> > It would be better if one could reuse a (pool of) Consumer instances, and
> > through a read operation parameter specify for which consumer group
> should
> > read be performed.
> >
> > Kind regards,
> > Stevo Slavic.
> >
> > On Tue, Oct 14, 2014 at 6:17 PM, Neha Narkhede 
> > wrote:
> >
> > > Stevo,
> > >
> > > The new consumer API is planned for 0.9, not 0.8.2. You can take a look
> > at
> > > a detailed javadoc here
> > > <
> > >
> >
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/org/apache/kafka/clients/consumer/KafkaConsumer.html
> > > >
> > > .
> > >
> > > Can you explain why you would like to poll messages across consumer
> > groups
> > > using just one instance?
> > >
> > > Thanks,
> > > Neha
> > >
> > > On Tue, Oct 14, 2014 at 1:03 AM, Stevo Slavić 
> wrote:
> > >
> > > > Hello Apache Kafka community,
> > > >
> > > > Current (Kafka 0.8.1.1) high-level API's KafkaConsumer is not
> > lightweight
> > > > object, it's creation takes some time and resources, and it does not
> > seem
> > > > to be thread-safe. It's API also does not support reuse, for
> consuming
> > > > messages from different consumer groups.
> > > >
> > > > I see even in the coming (0.8.2) redesigned API it will not be
> possible
> > > to
> > > > reuse consumer instance to poll messages from different consumer
> > groups.
> > > >
> > > > Can something be done to support this?
> > > >
> > > > Would it help if there was consumer group as a separate entity from
> > > > consumer, for all the subscription management tasks?
> > > >
> > > > Kind regards,
> > > > Stevo Slavic
> > > >
> > >
> >
>


Re: High Replica Max Lag

2015-03-13 Thread Zakee
I have 35 topics spread with total 398 partitions (2 of them are supposed to be 
very high volume and so allocated 28 partitions to them, others vary between 5 
to 14).

Thanks
Zakee



> On Mar 13, 2015, at 3:25 PM, Joel Koshy  wrote:
> 
> I think what people have observed in the past is that increasing
> num-replica-fetcher-threads has diminishing returns fairly quickly.
> You may want to instead increase the number of partitions in the topic
> you are producing to. (How many do you have right now?)
> 
> On Fri, Mar 13, 2015 at 02:48:17PM -0700, Zakee wrote:
>> Hi Mayuresh,
>> 
>> I have currently set this property to 4 and I see from the logs that it 
>> starts 12 threads on each broker. I will try increasing it further.
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>>> On Mar 13, 2015, at 11:53 AM, Mayuresh Gharat  
>>> wrote:
>>> 
>>> You might want to increase the number of Replica Fetcher threads by setting
>>> this property : *num.replica.fetchers*.
>>> 
>>> Thanks,
>>> 
>>> Mayuresh
>>> 
>>> On Thu, Mar 12, 2015 at 10:39 PM, Zakee  wrote:
>>> 
 With the producer throughput as large as > 150MB/s to 5 brokers on a
 continuous basis, I see a consistently high value for Replica Max Lag (in
 millions). Is this normal or there is a way to tune so as to reduce replica
 MaxLag?
 As per documentation, replica max lag (in messages) between follower and
 leader replicas, should be less than replica.lag.max.messages (currently
 set to 5000)
 
 
 Thanks
 Zakee
 
 
 
 
 Old School Yearbook Pics
 View Class Yearbooks Online Free. Search by School & Year. Look Now!
 http://thirdpartyoffers.netzero.net/TGL3231/550278012ef8378006048st04duc
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> -Regards,
>>> Mayuresh R. Gharat
>>> (862) 250-7125
>>> 
>>> What's your flood risk?
>>> Find flood maps, interactive tools, FAQs, and agents in your area.
>>> http://thirdpartyoffers.netzero.net/TGL3255/550336d785c4236d61d0cmp05duc
>> 
> 
> 
> The WORST exercise for aging
> Avoid this "healthy" exercise to look & feel 5-10 years YOUNGER
> http://thirdpartyoffers.netzero.net/TGL3255/550371f12e6ab71f1228amp01duc
> 



Re: Broker Restart failed w/ Corrupt index found

2015-03-13 Thread Zakee
Thanks, Mayuresh. I did the same and it fixed the issue.

Thanks
Zakee



> On Mar 13, 2015, at 3:56 PM, Mayuresh Gharat  
> wrote:
> 
> The index files work in the following way :
> Its a mapping from logical offsets to a particular file location within the
> log file segment.
> 
> If you see the comments under OffsetIndex.scala code :
> 
> The file format is a series of entries. The physical format is a 4 byte
> "relative" offset and a 4 byte file location for the
> message with that offset. The offset stored is relative to the base offset
> of the index file. So, for example,
> if the base offset was 50, then the offset 55 would be stored as 5. Using
> relative offsets in this way let's us use
> only 4 bytes for the offset.
> 
> In you case the index file is non empty and its expecting that the base
> offset should be greater than the last offset and that why it throws the
> error. I suppose you can get rid of those index files and upon restart
> those index will be rebuilt.
> 
> Thanks,
> 
> Mayuresh
> 
> On Fri, Mar 13, 2015 at 3:38 PM, Zakee  wrote:
> 
>> I did a shutdown of the cluster and then try to restart and see the below
>> error on one of the 5 brokers, I can’t restart this instance and not sure
>> how to fix this.
>> 
>> [2015-03-13 15:27:31,793] ERROR There was an error in one of the threads
>> during logs loading: java.lang.IllegalArgumentException: requirement
>> failed: Corrupt index found, index file
>> (/data/vol6/kafka/kafka82/Topic22-6/06256447.index) has
>> non-zero size but the last offset is 6256447 and the base offset is 6256447
>> (kafka.log.LogManager)
>> [2015-03-13 15:27:31,795] FATAL [Kafka Server 5], Fatal error during
>> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
>> java.lang.IllegalArgumentException: requirement failed: Corrupt index
>> found, index file
>> (/data/vol6/kafka/kafka82/Topic22-6/06256447.index) has
>> non-zero size but the last offset is 6256447 and the base offset is 6256447
>>at scala.Predef$.require(Predef.scala:233)
>>at kafka.log.OffsetIndex.sanityCheck(OffsetIndex.scala:352)
>>at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:184)
>>at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:183)
>>at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>at
>> scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>>at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>>at kafka.log.Log.loadSegments(Log.scala:183)
>>at kafka.log.Log.(Log.scala:67)
>>at
>> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:142)
>>at kafka.utils.Utils$$anon$1.run(Utils.scala:54)
>>at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>at java.lang.Thread.run(Thread.java:662)
>> [2015-03-13 15:27:31,831] INFO [Kafka Server 5], shutting down
>> (kafka.server.KafkaServer)
>> 
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>> 
>> 
>> Fast, Secure, NetZero 4G Mobile Broadband. Try it.
>> http://www.netzero.net/?refcd=NZINTISP0512T4GOUT2
>> 
> 
> 
> 
> -- 
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
> 
> The WORST exercise for aging
> Avoid this "healthy" exercise to look & feel 5-10 years YOUNGER
> http://thirdpartyoffers.netzero.net/TGL3255/550371f1694fc71f1228amp01duc



Re: High Replica Max Lag

2015-03-13 Thread Joel Koshy
Can you verify that the leaders are evenly spread? and if necessary
run a preferred leader election?

On Fri, Mar 13, 2015 at 05:10:22PM -0700, Zakee wrote:
> I have 35 topics spread with total 398 partitions (2 of them are supposed to 
> be very high volume and so allocated 28 partitions to them, others vary 
> between 5 to 14).
> 
> Thanks
> Zakee
> 
> 
> 
> > On Mar 13, 2015, at 3:25 PM, Joel Koshy  wrote:
> > 
> > I think what people have observed in the past is that increasing
> > num-replica-fetcher-threads has diminishing returns fairly quickly.
> > You may want to instead increase the number of partitions in the topic
> > you are producing to. (How many do you have right now?)
> > 
> > On Fri, Mar 13, 2015 at 02:48:17PM -0700, Zakee wrote:
> >> Hi Mayuresh,
> >> 
> >> I have currently set this property to 4 and I see from the logs that it 
> >> starts 12 threads on each broker. I will try increasing it further.
> >> 
> >> Thanks
> >> Zakee
> >> 
> >> 
> >> 
> >>> On Mar 13, 2015, at 11:53 AM, Mayuresh Gharat 
> >>>  wrote:
> >>> 
> >>> You might want to increase the number of Replica Fetcher threads by 
> >>> setting
> >>> this property : *num.replica.fetchers*.
> >>> 
> >>> Thanks,
> >>> 
> >>> Mayuresh
> >>> 
> >>> On Thu, Mar 12, 2015 at 10:39 PM, Zakee  wrote:
> >>> 
>  With the producer throughput as large as > 150MB/s to 5 brokers on a
>  continuous basis, I see a consistently high value for Replica Max Lag (in
>  millions). Is this normal or there is a way to tune so as to reduce 
>  replica
>  MaxLag?
>  As per documentation, replica max lag (in messages) between follower and
>  leader replicas, should be less than replica.lag.max.messages (currently
>  set to 5000)
>  
>  
>  Thanks
>  Zakee
>  
>  
>  
>  
>  Old School Yearbook Pics
>  View Class Yearbooks Online Free. Search by School & Year. Look Now!
>  http://thirdpartyoffers.netzero.net/TGL3231/550278012ef8378006048st04duc
> >>> 
> >>> 
> >>> 
> >>> 
> >>> -- 
> >>> -Regards,
> >>> Mayuresh R. Gharat
> >>> (862) 250-7125
> >>> 
> >>> What's your flood risk?
> >>> Find flood maps, interactive tools, FAQs, and agents in your area.
> >>> http://thirdpartyoffers.netzero.net/TGL3255/550336d785c4236d61d0cmp05duc
> >> 
> > 
> > 
> > The WORST exercise for aging
> > Avoid this "healthy" exercise to look & feel 5-10 years YOUNGER
> > http://thirdpartyoffers.netzero.net/TGL3255/550371f12e6ab71f1228amp01duc
> > 
> 



Re: Kafka elastic no downtime scalability

2015-03-13 Thread Chi Hoang
Hi Stevo,
I won't speak for Joe, but what we do is documented in the link that Joe
provided:
"Adding servers to a Kafka cluster is easy, just assign them a unique
broker id and start up Kafka on your new servers. However these new servers
will not automatically be assigned any data partitions, so unless
partitions are moved to them they won't be doing any work until new topics
are created. So usually when you add machines to your cluster you will want
to migrate some existing data to these machines.

The process of migrating data is manually initiated but fully automated.
Under the covers what happens is that Kafka will add the new server as a
follower of the partition it is migrating and allow it to fully replicate
the existing data in that partition. When the new server has fully
replicated the contents of this partition and joined the in-sync replica
one of the existing replicas will delete their partition's data.

The partition reassignment tool can be used to move partitions across
brokers. An ideal partition distribution would ensure even data load and
partition sizes across all brokers. In 0.8.1, the partition reassignment
tool does not have the capability to automatically study the data
distribution in a Kafka cluster and move partitions around to attain an
even load distribution. As such, the admin has to figure out which topics
or partitions should be moved around."
We use a tool called kafkat (https://github.com/airbnb/kafkat) for
reassignment and other administrative tasks, and have added brokers and
partitions without an problems.  The manual part is that you manually
initiate the commands, but Kafka takes care of the rest without any
interruption to producers and consumers.  I also want to make clear that
kafkat is not necessary but makes it much easier.

Hope this helps clarify your doubts.

Chi

On Fri, Mar 13, 2015 at 4:19 PM, Stevo Slavić  wrote:

> These features are all nice, if one adds new brokers to support additional
> topics, or to move existing partitions or whole topics to new brokers.
> Referenced sentence is in paragraph named scalability. When I read
> "expanded" I was thinking of scaling out, extending parallelization
> capabilities, and parallelism in Kafka is achieved with partitions. So I
> understood that sentence that it is possible to add more partitions to
> existing topics at runtime, with no downtime.
>
> I just found in source that there is API for adding new partitions to
> existing topics (see
>
> https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/admin/AdminUtils.scala#L101
> ). Have to try it. I guess it should work during runtime, causing no
> downtime or data loss, or moving data from existing to new partition.
> Producers and consumers will eventually start writing to and reading from
> new partition, and consumers should be able to read previously published
> messages from old partitions, even messages which if they were sent again
> would end up assigned/written to new partition.
>
>
> Kind regards,
> Stevo Slavic.
>
> On Fri, Mar 13, 2015 at 8:27 PM, Joe Stein  wrote:
>
> > https://kafka.apache.org/documentation.html#basic_ops_cluster_expansion
> >
> > ~ Joe Stein
> > - - - - - - - - - - - - - - - - -
> >
> >   http://www.stealth.ly
> > - - - - - - - - - - - - - - - - -
> >
> > On Fri, Mar 13, 2015 at 3:05 PM, sunil kalva 
> wrote:
> >
> > > Joe
> > >
> > > "Well, I know it is semantic but right now it "can" be elastically
> scaled
> > > without down time but you have to integrate into your environment for
> > what
> > > that means it has been that way since 0.8.0 imho"
> > >
> > > here what do you mean "you have to integrate into your environment",
> how
> > do
> > > i achieve elastically scaled cluster seamlessly ?
> > >
> > > SunilKalva
> > >
> > > On Fri, Mar 13, 2015 at 10:27 PM, Joe Stein 
> > wrote:
> > >
> > > > Well, I know it is semantic but right now it "can" be elastically
> > scaled
> > > > without down time but you have to integrate into your environment for
> > > what
> > > > that means it has been that way since 0.8.0 imho.
> > > >
> > > > My point was just another way to-do that out of the box... folks do
> > this
> > > > elastic scailing today with AWS CloudFormation and internal systems
> > they
> > > > built too.
> > > >
> > > > So, it can be done... you just have todo it.
> > > >
> > > > ~ Joe Stein
> > > > - - - - - - - - - - - - - - - - -
> > > >
> > > >   http://www.stealth.ly
> > > > - - - - - - - - - - - - - - - - -
> > > >
> > > > On Fri, Mar 13, 2015 at 12:39 PM, Stevo Slavić 
> > > wrote:
> > > >
> > > > > OK, thanks for heads up.
> > > > >
> > > > > When reading Apache Kafka docs, and reading what Apache Kafka
> "can" I
> > > > > expect it to already be available in latest general availability
> > > release,
> > > > > not what's planned as part of some other project.
> > > > >
> > > > > Kind regards,
> > > > > Stevo Slavic.
> > > > >
> > > > > On Fri, Mar 13, 2015 at 2:32 PM, Joe Stein 
> > > wrote:
>

Re: createMessageStreams vs createMessageStreamsByFilter

2015-03-13 Thread tao xiao
Fetch data from a leader to consumer. Replication fetcher is configured by
another property

On Saturday, March 14, 2015, Zakee  wrote:

> Sorry, but still confused.  Maximum number of threads (fetchers) to fetch
> from a Leader or maximum number of threads within a follower broker?
>
> Thanks for clarifying,
> -Zakee
>
>
>
> > On Mar 12, 2015, at 11:11 PM, tao xiao  > wrote:
> >
> > The number of fetchers is configurable via num.replica.fetchers. The
> > description of num.replica.fetchers in Kafka documentation is not quite
> > accurate. num.replica.fetchers actually controls the max number of
> fetchers
> > per broker. In you case num.replica.fetchers=8 and 5 brokers the means no
> > more 8 fetchers created for each broker
> >
> > On Fri, Mar 13, 2015 at 1:21 PM, Zakee  > wrote:
> >
> >> Is this always the case that there is only one fetcher per broker, won’t
> >> setting num.replica.fetchers greater than number-of-brokers cause more
> >> fetchers per broker?
> >> Let’s I have 5 brokers, and num of replica fetchers is 8, will there be
> 2
> >> fetcher threads pulling from  each broker?
> >>
> >> Thanks
> >> Zakee
> >>
> >>
> >>
> >>> On Mar 12, 2015, at 11:15 AM, James Cheng  > wrote:
> >>>
> >>> Ah, I understand now. I didn't realize that there was one fetcher
> thread
> >> per broker.
> >>>
> >>> Thanks Tao & Guozhang!
> >>> -James
> >>>
> >>>
> >>> On Mar 11, 2015, at 5:00 PM, tao xiao>> xiaotao...@gmail.com >> wrote:
> >>>
>  Fetcher thread is per broker basis, it ensures that at lease one
> fetcher
>  thread per broker. Fetcher thread is sent to broker with a fetch
> >> request to
>  ask for all partitions. So if A, B, C are in the same broker fetcher
> >> thread
>  is still able to fetch data from A, B, C even though A returns no
> data.
>  same logic is applied to different broker.
> 
>  On Thu, Mar 12, 2015 at 6:25 AM, James Cheng  > wrote:
> 
> >
> > On Mar 11, 2015, at 9:12 AM, Guozhang Wang  > wrote:
> >
> >> Hi James,
> >>
> >> What I meant before is that a single fetcher may be responsible for
> > putting
> >> fetched data to multiple queues according to the construction of the
> >> streams setup, where each queue may be consumed by a different
> thread.
> > And
> >> the queues are actually bounded. Now say if there are two queues
> that
> >> are
> >> getting data from the same fetcher F, and are consumed by two
> >> different
> >> user threads A and B. If thread A for some reason got slowed / hung
> >> consuming data from queue 1, then queue 1 will eventually get full,
> >> and F
> >> trying to put more data to it will be blocked. Since F is parked on
> > trying
> >> to put data to queue 1, queue 2 will not get more data from it, and
> > thread
> >> B may hence gets starved. Does that make sense now?
> >>
> >
> > Yes, that makes sense. That is the scenario where one thread of a
> >> consumer
> > can cause a backup in the queue, which would cause other threads to
> not
> > receive data.
> >
> > What about the situation I described, where a thread consumes a queue
> >> that
> > is supposed to be filled with messages from multiple partitions? If
> > partition A has no messages and partitions B and C do, how will the
> >> fetcher
> > behave? Will the processing thread receive messages from partitions B
> >> and C?
> >
> > Thanks,
> > -James
> >
> >
> >> Guozhang
> >>
> >> On Tue, Mar 10, 2015 at 5:15 PM, James Cheng  > wrote:
> >>
> >>> Hi,
> >>>
> >>> Sorry to bring up this old thread, but my question is about this
> >> exact
> >>> thing:
> >>>
> >>> Guozhang, you said:
>  A more concrete example: say you have topic AC: 3 partitions,
> topic
> > BC: 6
>  partitions.
> 
>  With createMessageStreams("AC" => 3, "BC" => 2) a total of 5
> threads
> > will
>  be created, and consuming AC-1,AC-2,AC-3,BC-1/2/3,BC-4/5/6
> > respectively;
> 
>  With createMessageStreamsByFilter("*C" => 3) a total of 3 threads
> >> will
> > be
>  created, and consuming AC-1/BC-1/BC-2, AC-2/BC-3/BC-4,
> >> AC-3/BC-5/BC-6
>  respectively.
> >>>
> >>>
> >>> You said that in the createMessageStreamsByFilter case, if topic AC
> >> had
> > no
> >>> messages in it and consumer.timeout.ms = -1, then the 3 threads
> >> might
> > all
> >>> be blocked waiting for data to arrive from topic AC, and so
> messages
> > from
> >>> BC would not be processed.
> >>>
> >>> createMessageStreamsByFilter("*C" => 1) (single stream) would have
> >> the
> >>> same problem but just worse. Behind the scenes, is there a single
> >> thread
> >>> that is consuming (round-robin?) messages from the different
> >> partitions
> > and
> >>> inserting them all into a single queue for the application code

Re: How to shutdown mirror maker safely

2015-03-13 Thread tao xiao
MM will hang until next message arrives. For example I have a MM running
and listen to a topic that has no message coming. I send ctrl + c to MM and
MM doesn't shutdown until I send a message to the topic. My question is
what if there is never a message coming to the topic how can I safely
shutdown MM

On Saturday, March 14, 2015, Jiangjie Qin  wrote:

> ctrl+c should work. Did you see any issue for that?
>
> On 3/12/15, 11:49 PM, "tao xiao" >
> wrote:
>
> >Hi,
> >
> >I wanted to know that how I can shutdown mirror maker safely (ctrl+c) when
> >there is no message coming to consume. I am using mirror maker off trunk
> >code.
> >
> >--
> >Regards,
> >Tao
>
>

-- 
Regards,
Tao