kafka audit

2015-03-23 Thread sunil kalva
Hi
What is best practice for adding audit feature in kafka, Is there any
framework available for enabling audit feature at producer and consumer
level and any UI frameworks for monitoring.

tx
SunilKalva


Re: kafka audit

2015-03-23 Thread tao xiao
Linkedin has an excellent tool that monitors lag/data loss/data duplication
and etc. Here is the reference

http://www.slideshare.net/JonBringhurst/kafka-audit-kafka-meetup-january-27th-2015

it is not open sourced though.

On Mon, Mar 23, 2015 at 3:26 PM, sunil kalva  wrote:

> Hi
> What is best practice for adding audit feature in kafka, Is there any
> framework available for enabling audit feature at producer and consumer
> level and any UI frameworks for monitoring.
>
> tx
> SunilKalva
>



-- 
Regards,
Tao


Re: kafka audit

2015-03-23 Thread Navneet Gupta (Tech - BLR)
Are there any plans to open source the same? What alternates do we have
here?

We are building an internal auditing framework for our entire big data
pipeline. Kafka is one of the data sources we have (ingested data).

On Mon, Mar 23, 2015 at 1:03 PM, tao xiao  wrote:

> Linkedin has an excellent tool that monitors lag/data loss/data duplication
> and etc. Here is the reference
>
>
> http://www.slideshare.net/JonBringhurst/kafka-audit-kafka-meetup-january-27th-2015
>
> it is not open sourced though.
>
> On Mon, Mar 23, 2015 at 3:26 PM, sunil kalva 
> wrote:
>
> > Hi
> > What is best practice for adding audit feature in kafka, Is there any
> > framework available for enabling audit feature at producer and consumer
> > level and any UI frameworks for monitoring.
> >
> > tx
> > SunilKalva
> >
>
>
>
> --
> Regards,
> Tao
>



-- 
Thanks & Regards,
Navneet Gupta


Check topic exists after deleting it.

2015-03-23 Thread anthony musyoki
On deleting a topic via TopicCommand.deleteTopic()

I get "Topic test-delete is marked for deletion."

I follow up by checking if the topic exists by using
AdminUtils.topicExists()
which suprisingly returns true.

I expected AdminUtils.TopicExists() to check both BrokerTopicsPath
and DeleteTopicsPath before returning a verdict but it only checks
BrokerTopicsPath

Shouldn't a topic marked for deletion return false for topicExists() ?


Re: Mirror maker fetcher thread unexpectedly stopped

2015-03-23 Thread tao xiao
I think I worked out the answer to question 1.
java.lang.IllegalMonitorStateException
was thrown due to no ownership of ReentrantLock when trying to call await()
on the lock condition.

Here is the code snippet from the AbstractFetcherThread.scala in trunk

partitionMapLock synchronized {
partitionsWithError ++= partitionMap.keys
// there is an error occurred while fetching partitions, sleep
a while
partitionMapCond.await(fetchBackOffMs, TimeUnit.MILLISECONDS)
}

as shown above partitionMapLock is not acquired before calling
partitionMapCond.await

we can fix this by explicitly calling partitionMapLock.lock(). below code
block should work

inLock(partitionMapLock) {
partitionsWithError ++= partitionMap.keys
// there is an error occurred while fetching partitions, sleep
a while
partitionMapCond.await(fetchBackOffMs, TimeUnit.MILLISECONDS)
}

On Mon, Mar 23, 2015 at 1:50 PM, tao xiao  wrote:

> Hi,
>
> I was running a mirror maker and got
>  java.lang.IllegalMonitorStateException that caused the underlying fetcher
> thread completely stopped. Here is the log from mirror maker.
>
> [2015-03-21 02:11:53,069] INFO Reconnect due to socket error:
> java.io.EOFException: Received -1 when reading from channel, socket has
> likely been closed. (kafka.consumer.SimpleConsumer)
> [2015-03-21 02:11:53,081] WARN
> [ConsumerFetcherThread-localhost-1426853968478-4ebaa354-0-6], Error in
> fetch Name: FetchRequest; Version: 0; CorrelationId: 2398588; ClientId:
> phx-slc-mm-user-3; ReplicaId: -1; MaxWait: 100 ms; MinBytes: 1 bytes;
> RequestInfo: [test.topic,0] -> PartitionFetchInfo(3766065,1048576).
> Possible cause: java.nio.channels.ClosedChannelException
> (kafka.consumer.ConsumerFetcherThread)
> [2015-03-21 02:11:53,083] ERROR
> [ConsumerFetcherThread-localhost-1426853968478-4ebaa354-0-6], Error due to
>  (kafka.consumer.ConsumerFetcherThread)
> java.lang.IllegalMonitorStateException
> at
> java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:155)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.fullyRelease(AbstractQueuedSynchronizer.java:1723)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2166)
> at
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:106)
> at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:90)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)
> [2015-03-21 02:11:53,083] INFO
> [ConsumerFetcherThread-localhost-1426853968478-4ebaa354-0-6], Stopped
>  (kafka.consumer.ConsumerFetcherThread)
>
> I am still investigating what caused the connection error on server side
> but I have a couple of questions related to mirror maker itself
>
> 1. What is root cause of java.lang.IllegalMonitorStateException? As shown
> in the AbstractFetcherThread source the fetcher thread should catch the
> java.io.EOFException thrown from underlying simplyConsumer and sleep a
> while before next run.
> 2. Mirror maker is unaware of the termination of fetcher thread. That
> makes it unable to detect the failure and trigger rebalancing. I have 3
> mirror maker instances running in 3 different machines listening to the
> same topic. I would expect the mirror maker will release the partition
> ownership when underlying fetcher thread terminates so that rebalancing can
> be triggered.but in fact this is not the case. is this expected behavior or
> do I miss configure anything?
>
> I am running the trunk version as of commit
> 82789e75199fdc1cae115c5c2eadfd0f1ece4d0d
>
> --
> Regards,
> Tao
>



-- 
Regards,
Tao


A kafka web monitor

2015-03-23 Thread Wan Wei
We have make a simple web console to monitor some kafka informations like
consumer offset, logsize.

https://github.com/shunfei/DCMonitor


Hope you like it and offer your help to make it better :)
Regards
Flow


Re: Post on running Kafka at LinkedIn

2015-03-23 Thread Todd Palino
Emmanuel, if it helps, here's a little more detail on the hardware spec we
are using at the moment:

12 CPU (HT enabled)
64 GB RAM
16 x 1TB SAS drives (2 are used as a RAID-1 set for the OS, 14 are a
RAID-10 set just for the Kafka log segments)

We don't colocate any other applications with Kafka except for a couple
monitoring agents. Zookeeper runs on completely separate nodes.

I suggest starting with looking at the basics - watch the CPU, memory, and
disk IO usage on the brokers as you are testing. You're likely going to
find one of these three is the constraint. Disk IO in particular can lead
to a significant increase in produce latency as it increases even over
10-15% utilization.

-Todd


On Fri, Mar 20, 2015 at 3:41 PM, Emmanuel  wrote:

> This is why I'm confused because I'm tryign to benchmark and I see numbers
> that seem pretty low to me...8000 events/sec on 2 brokers with 3CPU each
> and 5 partitions should be way faster than this and I don't know where to
> start to debug...
> the kafka-consumer-perf-test script gives me ridiculously low numbers
> (1000 events/sec/thread)
>
> So what could be causing this?
> From: jbringhu...@linkedin.com.INVALID
> To: users@kafka.apache.org
> Subject: Re: Post on running Kafka at LinkedIn
> Date: Fri, 20 Mar 2015 22:16:29 +
>
> Keep in mind that these brokers aren't really stressed too much at any
> given time -- we need to stay ahead of the capacity curve.
> Your message throughput will really just depend on what hardware you're
> using. However, in the past, we've benchmarked at 400,000 to more than
> 800,000 messages / broker / sec, depending on configuration (
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> ).
>
> -Jon
> On Mar 20, 2015, at 3:03 PM, Emmanuel  wrote:800B
> messages / day = 9.26M messages / sec over 1100 brokers
> = ~8400 message / broker / sec
> Do I get this right?
> Trying to benchmark my own test cluster and that's what I see with 2
> brokers...Just wondering if my numbers are good or bad...
>
>
> Subject: Re: Post on running Kafka at LinkedIn
> From: cl...@kafka.guru
> Date: Fri, 20 Mar 2015 14:27:58 -0700
> To: users@kafka.apache.org
>
> Yep! We are growing :)
>
> -Clark
>
> Sent from my iPhone
>
> On Mar 20, 2015, at 2:14 PM, James Cheng  wrote:
>
> Amazing growth numbers.
>
> At the meetup on 1/27, Clark Haskins presented their Kafka usage at the
> time. It was:
>
> Bytes in: 120 TB
> Messages In: 585 million
> Bytes out: 540 TB
> Total brokers: 704
>
> In Todd's post, the current numbers:
>
> Bytes in: 175 TB (45% growth)
> Messages In: 800 billion (36% growth)
> Bytes out: 650 TB (20% growth)
> Total brokers: 1100 (56% growth)
>
> That much growth in just 2 months? Wowzers.
>
> -James
>
> On Mar 20, 2015, at 11:30 AM, James Cheng  wrote:
>
> For those who missed it:
>
> The Kafka Audit tool was also presented at the 1/27 Kafka meetup:
> http://www.meetup.com/http-kafka-apache-org/events/219626780/
>
> Recorded video is here, starting around the 40 minute mark:
> http://www.ustream.tv/recorded/58109076
>
> Slides are here:
> http://www.ustream.tv/recorded/58109076
>
> -James
>
> On Mar 20, 2015, at 9:47 AM, Todd Palino  wrote:
>
> For those who are interested in detail on how we've got Kafka set up at
> LinkedIn, I have just published a new posted to our Engineering blog titled
> "Running Kafka at Scale"
>
>   https://engineering.linkedin.com/kafka/running-kafka-scale
>
> It's a general overview of our current Kafka install, tiered architecture,
> audit, and the libraries we use for producers and consumers. You'll also be
> seeing more posts from the SRE team here in the coming weeks on deeper
> looks into both Kafka and Samza.
>
> Additionally, I'll be giving a talk at ApacheCon next month on running
> tiered Kafka architectures. If you're in Austin for that, please come by
> and check it out.
>
> -Todd
>
>
>
>


Re: Check topic exists after deleting it.

2015-03-23 Thread Harsha
DeleteTopic makes a node in zookeeper to let controller know that there is a 
topic up for deletion. This doesn’t immediately delete the topic it can take 
time depending if all the partitions of that topic are online and brokers are 
available as well.  Once all the Log files deleted zookeeper node gets deleted 
as well.
Also make sure you don’t have any producers or consumers are running while the 
topic deleting is going on.

-- 
Harsha


On March 23, 2015 at 1:29:50 AM, anthony musyoki (anthony.musy...@gmail.com) 
wrote:

On deleting a topic via TopicCommand.deleteTopic()  

I get "Topic test-delete is marked for deletion."  

I follow up by checking if the topic exists by using  
AdminUtils.topicExists()  
which suprisingly returns true.  

I expected AdminUtils.TopicExists() to check both BrokerTopicsPath  
and DeleteTopicsPath before returning a verdict but it only checks  
BrokerTopicsPath  

Shouldn't a topic marked for deletion return false for topicExists() ?  


Re: kafka mirrormaker cross datacenter replication

2015-03-23 Thread Guozhang Wang
With MM, the source and destination cluster can choose different number of
partitions for the mirrored topic, and hence messages may be re-grouped in
the destination cluster. In addition, let's say you have two MMs piping
data to the same destination from two sources, the ordering of which
messages from the two source clusters arrive to the destination is
non-deterministic as well.

Guozhang

On Sun, Mar 22, 2015 at 9:40 PM, Kane Kim  wrote:

> I thought that ordering is guaranteed within the partition or mirror maker
> doesn't preserve partitions?
>
>
>
> On Fri, Mar 20, 2015 at 4:44 PM, Guozhang Wang  wrote:
>
> > I think 1) will work, but not sure if about 2), since messages replicated
> > at two clusters may be out of order as well, hence you may get message
> > 1,2,3,4 in one cluster and 1,3,4,2 in another. If you remember that your
> > latest message processed in the first cluster is 2, when you fail over to
> > the other cluster you may skip and miss message 3 and 4.
> >
> > Guozhang
> >
> > On Fri, Mar 20, 2015 at 1:07 PM, Kane Kim  wrote:
> >
> > > Also, as I understand we either have to mark all messages with unique
> IDs
> > > and then deduplicate them, or, if we want just store last message
> > processed
> > > per partition we will need exactly the same partitions number in both
> > > clusters?
> > >
> > > On Fri, Mar 20, 2015 at 10:19 AM, Guozhang Wang 
> > > wrote:
> > >
> > > > Not sure if transactional messaging will help in this case, as at
> least
> > > for
> > > > now it is still targeted within a single DC, i.e. a "transaction" is
> > only
> > > > defined within a Kafka cluster, not across clusters.
> > > >
> > > > Guozhang
> > > >
> > > > On Fri, Mar 20, 2015 at 10:08 AM, Jon Bringhurst <
> > > > jbringhu...@linkedin.com.invalid> wrote:
> > > >
> > > > > Hey Kane,
> > > > >
> > > > > When mirrormakers loose offsets on catastrophic failure, you
> > generally
> > > > > have two options. You can keep auto.offset.reset set to "latest"
> and
> > > > handle
> > > > > the loss of messages, or you can have it set to "earliest" and
> handle
> > > the
> > > > > duplication of messages.
> > > > >
> > > > > Although we try to avoid duplicate messages overall, when failure
> > > > happens,
> > > > > we (mostly) take the "earliest" path and deal with the duplication
> of
> > > > > messages.
> > > > >
> > > > > If your application doesn't treat messages as idempotent, you might
> > be
> > > > > able to get away with something like couchbase or memcached with a
> > TTL
> > > > > slightly higher than your Kafka retention time and use that to
> filter
> > > > > duplicates. Another pattern may be to deduplicate messages in
> Hadoop
> > > > before
> > > > > taking action on them.
> > > > >
> > > > > -Jon
> > > > >
> > > > > P.S. An option in the future might be
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka
> > > > >
> > > > > On Mar 19, 2015, at 5:32 PM, Kane Kim 
> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > What's the best strategy for failover when using mirror-maker to
> > > > > replicate
> > > > > > across datacenters? As I understand offsets in both datacenters
> > will
> > > be
> > > > > > different, how consumers should be reconfigured to continue
> reading
> > > > from
> > > > > > the same point where they stopped without data loss and/or
> > > duplication?
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: Check topic exists after deleting it.

2015-03-23 Thread Grant Henke
What happens when producers or consumers are running while the topic
deleting is going on?

On Mon, Mar 23, 2015 at 10:02 AM, Harsha  wrote:

> DeleteTopic makes a node in zookeeper to let controller know that there is
> a topic up for deletion. This doesn’t immediately delete the topic it can
> take time depending if all the partitions of that topic are online and
> brokers are available as well.  Once all the Log files deleted zookeeper
> node gets deleted as well.
> Also make sure you don’t have any producers or consumers are running while
> the topic deleting is going on.
>
> --
> Harsha
>
>
> On March 23, 2015 at 1:29:50 AM, anthony musyoki (
> anthony.musy...@gmail.com) wrote:
>
> On deleting a topic via TopicCommand.deleteTopic()
>
> I get "Topic test-delete is marked for deletion."
>
> I follow up by checking if the topic exists by using
> AdminUtils.topicExists()
> which suprisingly returns true.
>
> I expected AdminUtils.TopicExists() to check both BrokerTopicsPath
> and DeleteTopicsPath before returning a verdict but it only checks
> BrokerTopicsPath
>
> Shouldn't a topic marked for deletion return false for topicExists() ?
>



-- 
Grant Henke
Solutions Consultant | Cloudera
ghe...@cloudera.com | 920-980-8979
twitter.com/ghenke  | linkedin.com/in/granthenke


Re: Check topic exists after deleting it.

2015-03-23 Thread Harsha
Currently we have auto.create.topics.enable set to true by default. If this is 
set true any one who is making TopicMetadataRequest can create a topic . As 
both producers and consumers can send TopicMetadataRequest which will create a 
topic if the above config is true. So while doing deletion if there is  
producer or consumer running it can re-create a topic thats in deletion 
process. This issue going to be addressed in upcoming versions. Meanwhile if 
you are not creating topics via producer than turn this config off or stop 
producer and consumers while you are trying to delete a topic.
-- 
Harsha


On March 23, 2015 at 9:57:53 AM, Grant Henke (ghe...@cloudera.com) wrote:

What happens when producers or consumers are running while the topic  
deleting is going on?  

On Mon, Mar 23, 2015 at 10:02 AM, Harsha  wrote:  

> DeleteTopic makes a node in zookeeper to let controller know that there is  
> a topic up for deletion. This doesn’t immediately delete the topic it can  
> take time depending if all the partitions of that topic are online and  
> brokers are available as well. Once all the Log files deleted zookeeper  
> node gets deleted as well.  
> Also make sure you don’t have any producers or consumers are running while  
> the topic deleting is going on.  
>  
> --  
> Harsha  
>  
>  
> On March 23, 2015 at 1:29:50 AM, anthony musyoki (  
> anthony.musy...@gmail.com) wrote:  
>  
> On deleting a topic via TopicCommand.deleteTopic()  
>  
> I get "Topic test-delete is marked for deletion."  
>  
> I follow up by checking if the topic exists by using  
> AdminUtils.topicExists()  
> which suprisingly returns true.  
>  
> I expected AdminUtils.TopicExists() to check both BrokerTopicsPath  
> and DeleteTopicsPath before returning a verdict but it only checks  
> BrokerTopicsPath  
>  
> Shouldn't a topic marked for deletion return false for topicExists() ?  
>  



--  
Grant Henke  
Solutions Consultant | Cloudera  
ghe...@cloudera.com | 920-980-8979  
twitter.com/ghenke  | linkedin.com/in/granthenke  


Re: Check topic exists after deleting it.

2015-03-23 Thread Harsha
Just to be clear, one needs to stop producers and consumers that 
writing/reading from a topic “test” if they are trying to delete that specific 
topic “test”. Not all producers and clients.

-- 
Harsha
On March 23, 2015 at 10:13:47 AM, Harsha (harsh...@fastmail.fm) wrote:

Currently we have auto.create.topics.enable set to true by default. If this is 
set true any one who is making TopicMetadataRequest can create a topic . As 
both producers and consumers can send TopicMetadataRequest which will create a 
topic if the above config is true. So while doing deletion if there is  
producer or consumer running it can re-create a topic thats in deletion 
process. This issue going to be addressed in upcoming versions. Meanwhile if 
you are not creating topics via producer than turn this config off or stop 
producer and consumers while you are trying to delete a topic.
-- 
Harsha


On March 23, 2015 at 9:57:53 AM, Grant Henke (ghe...@cloudera.com) wrote:

What happens when producers or consumers are running while the topic
deleting is going on?

On Mon, Mar 23, 2015 at 10:02 AM, Harsha  wrote:

> DeleteTopic makes a node in zookeeper to let controller know that there is
> a topic up for deletion. This doesn’t immediately delete the topic it can
> take time depending if all the partitions of that topic are online and
> brokers are available as well. Once all the Log files deleted zookeeper
> node gets deleted as well.
> Also make sure you don’t have any producers or consumers are running while
> the topic deleting is going on.
>
> --
> Harsha
>
>
> On March 23, 2015 at 1:29:50 AM, anthony musyoki (
> anthony.musy...@gmail.com) wrote:
>
> On deleting a topic via TopicCommand.deleteTopic()
>
> I get "Topic test-delete is marked for deletion."
>
> I follow up by checking if the topic exists by using
> AdminUtils.topicExists()
> which suprisingly returns true.
>
> I expected AdminUtils.TopicExists() to check both BrokerTopicsPath
> and DeleteTopicsPath before returning a verdict but it only checks
> BrokerTopicsPath
>
> Shouldn't a topic marked for deletion return false for topicExists() ?
>



--
Grant Henke
Solutions Consultant | Cloudera
ghe...@cloudera.com | 920-980-8979
twitter.com/ghenke  | linkedin.com/in/granthenke


KafkaSpout forceFromStart Issue

2015-03-23 Thread François Méthot
Hi,

  We have a storm topology that uses Kafka to read a topic with 6
partitions. (  Kafka 0.8.2, Storm 0.9.3 )

Recently, we had to set the KafkaSpout to read from the beginning, so we
temporary configured our KafkaConfig this way:

kafkaConfig.forceFromStart=true
kafkaConfig.startOffsetTime = OffsetRequest.EarliestTime()

It worked well, but afterward, setting those parameters back to false and
to LatestTime respectively had no effect. In fact the topology won't read
from our topic anymore.

When the topology starts, The spout successully logs the offset and
consumer group's cursor position for each partition in the worker log. But
nothing is read.

The only way we can read back from our Topic is to give our SpoutConfig a
new Kafka ConsumerGroup Id.

So it looks like, if we don't want to modify any KafkaSpout/Kafka code, the
only way to read from the beginning would be to write the position we want
to read from in Zookeeper for our Consumer Group where offset are stored
and to restart our topology.
Would anyone know if this is a bug in the KafkaSpout or an issue inherited
from bug in Kafka?
Thanks
Francois


Re: 'roundrobin' partition assignment strategy restrictions

2015-03-23 Thread Jiangjie Qin
Hi Jason,

Yes, I agree the restriction makes the usage of round-robin less flexible.
I think the focus of round-robin strategy is workload balance. If
different consumers are consuming from different topics, it is unbalanced
by nature. In that case, is it possible that you use different consumer
group for different sets of topics?
The rolling update is a good point. If you do rolling bounce in a small
window, the rebalance retry should handle it. But if you want to canary a
new topic setting on one consumer for some time, it won’t work.
Could you maybe share the use case with more detail? So we can see if
there is any workaround.

Jiangjie (Becket) Qin

On 3/22/15, 10:04 AM, "Jason Rosenberg"  wrote:

>Jiangjie,
>
>Yeah, I welcome the round-robin strategy, as the 'range' strategy ('til
>now
>the only one available), is not always good at balancing partitions, as
>you
>observed above.
>
>The main thing I'm bringing up in this thread though is the question of
>why
>there needs to be a restriction to having a homogenous set of consumers in
>the group being balanced.  This is not a requirement for the range
>algorithm, but is for the roundrobin algorithm.  So, I'm just wanting to
>understand why there's that limitation.  (And sadly, in our case, we do
>have heterogenous consumers using the same groupid, so we can't easily
>turn
>on roundrobin at the moment, without some effort :) ).
>
>I can see that it does simplify the implementation to have that
>limitation,
>but I'm just wondering if there's anything fundamental that would prevent
>an implementation that works over heterogenous consumers.  E.g. "Lay out
>all partitions, and layout all consumer threads, and proceed round robin
>assigning each partition to the next consumer thread. *If the next
>consumer
>thread doesn't have a selection for the current partition, then move on to
>the next consumer-thread"*
>
>The current implementation is also problematic if you are doing a rolling
>restart of a consumer cluster.  Let's say you are updating the topic
>selection as part of an update to the cluster.  Once the first node is
>updated, the entire cluster will no longer be homogenous until the last
>node is updated, which means you will have a temporary outage consuming
>data until all nodes have been updated.  So, it makes it difficult to do
>rolling restarts, or canary updates on a subset of nodes, etc.
>
>Jason
>
>Jason
>
>On Fri, Mar 20, 2015 at 10:15 PM, Jiangjie Qin 
>wrote:
>
>> Hi Jason,
>>
>> The motivation behind round robin is to better balance the consumers¹
>> load. Imagine you have two topics each with two partitions. These topics
>> are consumed by two consumers each with two consumer threads.
>>
>> The range assignment gives:
>> T1-P1 -> C1-Thr1
>> T1-P2 -> C1-Thr2
>> T2-P1 -> C1-Thr1
>> T2-P2 -> C1-Thr2
>> Consumer 2 will not be consuming from any partitions.
>>
>> The round robin algorithm gives:
>> T1-P1 -> C1-Thr1
>> T1-P2 -> C1-Thr2
>> T2-P1 -> C2-Thr1
>> T2-p2 -> C2-Thr2
>> It is much better than range assignment.
>>
>> That¹s the reason why we introduced round robin strategy even though it
>> has restrictions.
>>
>> Jiangjie (Becket) Qin
>>
>>
>> On 3/20/15, 12:20 PM, "Jason Rosenberg"  wrote:
>>
>> >Jiangle,
>> >
>> >The error messages I got (and the config doc) do clearly state that the
>> >number of threads per consumer must match also
>> >
>> >I'm not convinced that an easy to understand algorithm would work fine
>> >with
>> >a heterogeneous set of selected topics between consumers.
>> >
>> >Jason
>> >
>> >On Thu, Mar 19, 2015 at 8:07 PM, Mayuresh Gharat
>> >> >> wrote:
>> >
>> >> Hi Becket,
>> >>
>> >> Can you list down an example for this. It would be easier to
>>understand
>> >>:)
>> >>
>> >> Thanks,
>> >>
>> >> Mayuresh
>> >>
>> >> On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin
>> >>
>> >> wrote:
>> >>
>> >> > Hi Jason,
>> >> >
>> >> > The round-robin strategy first takes the partitions of all the
>>topics
>> >>a
>> >> > consumer is consuming from, then distributed them across all the
>> >> consumers.
>> >> > If different consumers are consuming from different topics, the
>> >>assigning
>> >> > algorithm will generate different answers on different consumers.
>> >> > It is OK for consumers to have different thread count, but the
>> >>consumers
>> >> > have to consume from the same set of topics.
>> >> >
>> >> >
>> >> > For range strategy, the balance is for each individual topic
>>instead
>> >>of
>> >> > cross topics. So the balance is only done for the consumers
>>consuming
>> >> from
>> >> > the same topic.
>> >> >
>> >> > Thanks.
>> >> >
>> >> > Jiangjie (Becket) Qin
>> >> >
>> >> > On 3/19/15, 4:14 PM, "Jason Rosenberg"  wrote:
>> >> >
>> >> > >So,
>> >> > >
>> >> > >I've run into an issue migrating a consumer to use the new
>> >>'roundrobin'
>> >> > >partition.assignment.strategy.  It turns out that several of our
>> >> consumers
>> >> > >use the same group id, but instantiate several different consumer
>> >> > >instance

Re: KafkaSpout forceFromStart Issue

2015-03-23 Thread Harsha
Hi Francois,
           Looks like this belong storm mailing lists. Can you please send this 
question on storm mailing lists.

Thanks,
Harsha


On March 23, 2015 at 11:17:47 AM, François Méthot (fmetho...@gmail.com) wrote:

Hi,  

We have a storm topology that uses Kafka to read a topic with 6  
partitions. ( Kafka 0.8.2, Storm 0.9.3 )  

Recently, we had to set the KafkaSpout to read from the beginning, so we  
temporary configured our KafkaConfig this way:  

kafkaConfig.forceFromStart=true  
kafkaConfig.startOffsetTime = OffsetRequest.EarliestTime()  

It worked well, but afterward, setting those parameters back to false and  
to LatestTime respectively had no effect. In fact the topology won't read  
from our topic anymore.  

When the topology starts, The spout successully logs the offset and  
consumer group's cursor position for each partition in the worker log. But  
nothing is read.  

The only way we can read back from our Topic is to give our SpoutConfig a  
new Kafka ConsumerGroup Id.  

So it looks like, if we don't want to modify any KafkaSpout/Kafka code, the  
only way to read from the beginning would be to write the position we want  
to read from in Zookeeper for our Consumer Group where offset are stored  
and to restart our topology.  
Would anyone know if this is a bug in the KafkaSpout or an issue inherited  
from bug in Kafka?  
Thanks  
Francois  


Re: [ANN] sqlstream: Simple MySQL binlog to Kafka stream

2015-03-23 Thread James Cheng
I created a wiki page that lists all the MySQL replication options that people 
posted, plus a couple others. People may/may not find it useful.

https://github.com/wushujames/mysql-cdc-projects/wiki

I wasn't sure where to host it, so I put it up on a Github Wiki.

-James

On Mar 17, 2015, at 11:09 PM, Xiao  wrote:

> Linkedin Gabblin compaction tool is using Hive to perform the compaction. 
> Does it mean Lumos is replaced?
> 
> Confused…
> 
> On Mar 17, 2015, at 10:00 PM, Xiao  wrote:
> 
>> Hi, all,
>> 
>> Do you know whether Linkedin plans to open source Lumos in the near future?
>> 
>> I found the answer from Qiao Lin’s post about replication from Oracle/mySQL 
>> to Hadoop.
>> 
>>  - https://engineering.linkedin.com/data-ingestion/gobblin-big-data-ease
>> 
>> At the source side, it can be DataBus-based or file based.
>> 
>> At the target side, it is Lumos to rebuild the snapshots due to inability to 
>> do an update/delete in Hadoop.
>> 
>> The slides about Lumos:
>>  http://www.slideshare.net/Hadoop_Summit/th-220p230-cramachandranv1
>> The talk about Lumos:
>>  https://www.youtube.com/watch?v=AGlRjlrNDYk
>> 
>> Event publishing is different from database replication. Kafka is used for 
>> change publishing or maybe also used for sending changes (recorded in files).
>> 
>> Thanks,
>> 
>> Xiao Li
>> 
>> On Mar 17, 2015, at 7:26 PM, Arya Ketan  wrote:
>> 
>>> AFAIK , linkedin uses databus to do the same. Aesop is built on top of
>>> databus , extending its beautiful capabilities to mysql n hbase
>>> On Mar 18, 2015 7:37 AM, "Xiao"  wrote:
>>> 
 Hi, all,
 
 Do you know how Linkedin team publishes changed rows in Oracle to Kafka? I
 believe they already knew the whole problem very well.
 
 Using triggers? or directly parsing the log? or using any Oracle
 GoldenGate interfaces?
 
 Any lesson or any standard message format? Could the Linkedin people share
 it with us? I believe it can help us a lot.
 
 Thanks,
 
 Xiao Li
 
 
 On Mar 17, 2015, at 12:26 PM, James Cheng  wrote:
 
> This is a great set of projects!
> 
> We should put this list of projects on a site somewhere so people can
 more easily see and refer to it. These aren't Kafka-specific, but most seem
 to be "MySQL CDC." Does anyone have a place where they can host a page?
 Preferably a wiki, so we can keep it up to date easily.
> 
> -James
> 
> On Mar 17, 2015, at 8:21 AM, Hisham Mardam-Bey <
 hisham.mardam...@gmail.com> wrote:
> 
>> Pretty much a hijack / plug as well (=
>> 
>> https://github.com/mardambey/mypipe
>> 
>> "MySQL binary log consumer with the ability to act on changed rows and
>> publish changes to different systems with emphasis on Apache Kafka."
>> 
>> Mypipe currently encodes events using Avro before pushing them into
 Kafka
>> and is Avro schema repository aware. The project is young; and patches
 for
>> improvements are appreciated (=
>> 
>> On Mon, Mar 16, 2015 at 10:35 PM, Arya Ketan 
 wrote:
>> 
>>> Great work.
>>> Sorry for kinda hijacking this thread, but I though that we had built
>>> some-thing on mysql bin log event propagator and wanted to share it .
>>> You guys can also look into Aesop ( https://github.com/Flipkart/aesop
 ).
>>> Its
>>> a change propagation frame-work. It has relays which listens to bin
 logs of
>>> Mysql, keeps track of SCNs  and has consumers which can then
 (transform/map
>>> or interpret as is) the bin log-event to a destination. Consumers also
 keep
>>> track of SCNs and a slow consumer can go back to a previous SCN if it
 wants
>>> to re-listen to events  ( similar to kafka's consumer view ).
>>> 
>>> All the producers/consumers are extensible and you can write your own
>>> custom consumer and feed off the data to it.
>>> 
>>> Common use-cases:
>>> a) Archive mysql based data into say hbase
>>> b) Move mysql based data to say a search store for serving reads.
>>> 
>>> It has a decent ( not an awesome :) ) console too which gives a nice
 human
>>> readable view of where the producers and consumers are.
>>> 
>>> Current supported producers are mysql bin logs, hbase wall-edits.
>>> 
>>> 
>>> Further insights/reviews/feature reqs/pull reqs/advices are all
 welcome.
>>> 
>>> --
>>> Arya
>>> 
>>> Arya
>>> 
>>> On Tue, Mar 17, 2015 at 1:48 AM, Gwen Shapira 
>>> wrote:
>>> 
 Really really nice!
 
 Thank you.
 
 On Mon, Mar 16, 2015 at 7:18 AM, Pierre-Yves Ritschard <
 p...@spootnik.org
 
 wrote:
> Hi kafka,
> 
> I just wanted to mention I published a very simple project which can
> connect as MySQL replication client and stream replication events to
>

Re: kafka audit

2015-03-23 Thread Todd Palino
We've talked about it a little bit, but part of the problem is that it is
pretty well integrated into our infrastructure, and as such it's hard to
pull it out. I illustrated this a little differently than Jon did in my
latest blog post (http://engineering.linkedin.com/kafka/running-kafka-scale),
how the producer (and consumer) bits that handle audit are integrated in
our internal libraries that wrap the open source libraries. Between the
schema-registry, the publishing of the audit data back into Kafka, the
audit consumers, and the database that is needed for storing the audit
data, it gets woven in pretty tightly.

Confluent has made a start on this by releasing a stack with schemas
integrated in. This is probably a good place to start as far as building an
open source audit service.

-Todd


On Mon, Mar 23, 2015 at 12:47 AM, Navneet Gupta (Tech - BLR) <
navneet.gu...@flipkart.com> wrote:

> Are there any plans to open source the same? What alternates do we have
> here?
>
> We are building an internal auditing framework for our entire big data
> pipeline. Kafka is one of the data sources we have (ingested data).
>
> On Mon, Mar 23, 2015 at 1:03 PM, tao xiao  wrote:
>
> > Linkedin has an excellent tool that monitors lag/data loss/data
> duplication
> > and etc. Here is the reference
> >
> >
> >
> http://www.slideshare.net/JonBringhurst/kafka-audit-kafka-meetup-january-27th-2015
> >
> > it is not open sourced though.
> >
> > On Mon, Mar 23, 2015 at 3:26 PM, sunil kalva 
> > wrote:
> >
> > > Hi
> > > What is best practice for adding audit feature in kafka, Is there any
> > > framework available for enabling audit feature at producer and consumer
> > > level and any UI frameworks for monitoring.
> > >
> > > tx
> > > SunilKalva
> > >
> >
> >
> >
> > --
> > Regards,
> > Tao
> >
>
>
>
> --
> Thanks & Regards,
> Navneet Gupta
>


Kafka Meetup @ LinkedIn 3/24

2015-03-23 Thread Clark Haskins
Hey Everyone –

Just a reminder about the Meetup tomorrow night @ LinkedIn.


There will be 3 talks:

Offset management - 6:35PM Joel Koshy(LinkedIn)

The Netflix Data Pipeline - 7:05PM - Allen Wang & Steven Wu(Netflix)

Best Practices - 7:50PM - Jay Kreps(Confluent)


If you are interested in attending please sign up at the link below:

http://www.meetup.com/http-kafka-apache-org/events/220355031/


We will also be recording & streaming the event live at:

http://www.ustream.tv/linkedin-events

-Clark


*Clark Elliott Haskins III*

LinkedIn DDS Site Reliability Engineering

Kafka, Zookeeper, Samza SRE Manager

https://www.linkedin.com/in/clarkhaskins

*There is no place like 127.0.0.1*


Re: Updates To cwiki For Producer

2015-03-23 Thread Pete Wright
Hi Gwen - thanks for sending this along.  I'll patch my local checkout
and take a look at this.

Cheers,
-pete

On 03/20/15 21:16, Gwen Shapira wrote:
> We have a patch with examples:
> https://issues.apache.org/jira/browse/KAFKA-1982
> 
> Unfortunately, its not committed yet.
> 
> Gwen
> 
> On Fri, Mar 20, 2015 at 11:24 AM, Pete Wright 
> wrote:
> 
>> Thanks that's helpful.  I am working on an example producer using the
>> new API, if I have any helpful notes or examples I'll share that.
>>
>> I was basically trying to be lazy and poach some example code as a
>> starting point for our internal tests :)
>>
>> Cheers,
>> -pete
>>
>> On 03/20/15 10:59, Guozhang Wang wrote:
>>> For the new java producer, its java doc can be found here:
>>>
>>>
>> http://kafka.apache.org/083/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
>>>
>>> We can update the wiki if there are some examples that are still missing
>>> from this java doc.
>>>
>>> Guozhang
>>>
>>> On Thu, Mar 19, 2015 at 4:37 PM, Pete Wright >>
>>> wrote:
>>>
 Hi,
 Is there a plan to update the producer documentation on the wiki located
 here:


>> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example

 This would be helpful for people working on implementing the new
 producer class deployed in 0.8.2.x.  If there are any patches available
 for this or other docs available for reference that would be helpful as
 well.

 Thanks!
 -pete

 --
 Pete Wright
 Lead Systems Architect
 Rubicon Project
 pwri...@rubiconproject.com
 310.309.9298

>>>
>>>
>>>
>>
>> --
>> Pete Wright
>> Lead Systems Architect
>> Rubicon Project
>> pwri...@rubiconproject.com
>> 310.309.9298
>>
> 

-- 
Pete Wright
Lead Systems Architect
Rubicon Project
pwri...@rubiconproject.com
310.309.9298


Re: Updates To cwiki For Producer

2015-03-23 Thread Gwen Shapira
If you have feedback, don't hesitate to comment on the JIRA.

On Mon, Mar 23, 2015 at 4:19 PM, Pete Wright 
wrote:

> Hi Gwen - thanks for sending this along.  I'll patch my local checkout
> and take a look at this.
>
> Cheers,
> -pete
>
> On 03/20/15 21:16, Gwen Shapira wrote:
> > We have a patch with examples:
> > https://issues.apache.org/jira/browse/KAFKA-1982
> >
> > Unfortunately, its not committed yet.
> >
> > Gwen
> >
> > On Fri, Mar 20, 2015 at 11:24 AM, Pete Wright <
> pwri...@rubiconproject.com>
> > wrote:
> >
> >> Thanks that's helpful.  I am working on an example producer using the
> >> new API, if I have any helpful notes or examples I'll share that.
> >>
> >> I was basically trying to be lazy and poach some example code as a
> >> starting point for our internal tests :)
> >>
> >> Cheers,
> >> -pete
> >>
> >> On 03/20/15 10:59, Guozhang Wang wrote:
> >>> For the new java producer, its java doc can be found here:
> >>>
> >>>
> >>
> http://kafka.apache.org/083/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
> >>>
> >>> We can update the wiki if there are some examples that are still
> missing
> >>> from this java doc.
> >>>
> >>> Guozhang
> >>>
> >>> On Thu, Mar 19, 2015 at 4:37 PM, Pete Wright <
> pwri...@rubiconproject.com
> >>>
> >>> wrote:
> >>>
>  Hi,
>  Is there a plan to update the producer documentation on the wiki
> located
>  here:
> 
> 
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
> 
>  This would be helpful for people working on implementing the new
>  producer class deployed in 0.8.2.x.  If there are any patches
> available
>  for this or other docs available for reference that would be helpful
> as
>  well.
> 
>  Thanks!
>  -pete
> 
>  --
>  Pete Wright
>  Lead Systems Architect
>  Rubicon Project
>  pwri...@rubiconproject.com
>  310.309.9298
> 
> >>>
> >>>
> >>>
> >>
> >> --
> >> Pete Wright
> >> Lead Systems Architect
> >> Rubicon Project
> >> pwri...@rubiconproject.com
> >> 310.309.9298
> >>
> >
>
> --
> Pete Wright
> Lead Systems Architect
> Rubicon Project
> pwri...@rubiconproject.com
> 310.309.9298
>


Re: Anyone interested in speaking at Bay Area Kafka meetup @ LinkedIn on March 24?

2015-03-23 Thread Jason Rosenberg
Hi Jon,

It the link for the 1/27 meetup you posted works for me, but I haven't
found how to find that same link on the meetup site (there are links that
point to the live stream, which of course is no longer happening!).

Thoughts?

Thanks,

Jason

On Mon, Mar 2, 2015 at 11:31 AM, Jon Bringhurst <
jbringhu...@linkedin.com.invalid> wrote:

> The meetups are recorded. For example, here's a link to the January meetup:
>
> http://www.ustream.tv/recorded/58109076
>
> The links to the recordings are usually posted to the comments for each
> meetup on http://www.meetup.com/http-kafka-apache-org/
>
> -Jon
>
> On Feb 23, 2015, at 3:24 PM, Ruslan Khafizov 
> wrote:
>
> +1 For recording sessions.
> On 24 Feb 2015 07:22, "Jiangjie Qin"  wrote:
>
> +1, I¹m very interested.
>
> On 2/23/15, 3:05 PM, "Jay Kreps"  wrote:
>
> +1
>
> I think something like "Kafka on AWS at Netflix" would be hugely
> interesting to a lot of people.
>
> -Jay
>
> On Mon, Feb 23, 2015 at 3:02 PM, Allen Wang 
> wrote:
>
> We (Steven Wu and Allen Wang) can talk about Kafka use cases and
> operations
> in Netflix. Specifically, we can talk about how we scale and operate
> Kafka
> clusters in AWS and how we migrate our data pipeline to Kafka.
>
> Thanks,
> Allen
>
>
> On Mon, Feb 23, 2015 at 12:15 PM, Ed Yakabosky <
> eyakabo...@linkedin.com.invalid> wrote:
>
> Hi Kafka Open Source -
>
> LinkedIn will host another Bay Area Kafka meetup in Mountain View on
>
> March
>
> 24.  We are planning to present on Offset Management but are looking
>
> for
>
> additional speakers.  If you¹re interested in presenting a use case,
> operational plan, or your experience with a particular feature (REST
> interface, WebConsole), please reply-all to let us know.
>
> [BCC: Open Source lists]
>
> Thanks,
> Ed
>
>
>


Kafka Sync Producer threads(ack=0) are blocked

2015-03-23 Thread ankit tyagi
Hi All,

Currently we are using kafka_2.8.0-0.8.0-beta1 in our production system. I
am using sync producer with ack=0 to send the events to  broker.

but I am seeing  most of my producer threads are blocked.

"jmsListnerTaskExecutor-818" prio=10 tid=0x7f3f5c05a800 nid=0x1719
waiting for monitor entry [0x7f405935e000]
 *  java.lang.Thread.State: BLOCKED (on object monitor)*
*at
kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:53)*
*- waiting to lock <0x000602358ee8> (a java.lang.Object)*
at kafka.producer.Producer.send(Producer.scala:74)
at kafka.javaapi.producer.Producer.send(Producer.scala:32)
at
com.snapdeal.coms.kafka.KafkaProducer.send(KafkaProducer.java:135)
at
com.snapdeal.coms.kafka.KafkaEventPublisher.publishCOMSEventOnESB(KafkaEventPublisher.java:61)
at
com.snapdeal.coms.service.EventsPublishingService.publishStateChangeEvent(EventsPublishingService.java:88)
at
com.snapdeal.coms.publisher.core.PublisherEventPublishingService.publishUploadIdState(PublisherEventPublishingService.java:46)
at
com.snapdeal.coms.publisher.splitter.VendorProductUpdateSplitter.split(VendorProductUpdateSplitter.java:112)
at sun.reflect.GeneratedMethodAccessor227.invoke(Unknown Source)



[image: Inline image 1]

Jvisualvm also shows that most of the time producer threads are in blocked
state though I don't see any exception in kafka sever logs. Any insight??


Re: Check topic exists after deleting it.

2015-03-23 Thread anthony musyoki
Thanks for your prompt response.

In my check for topicExists i will add a check for topic in
DeleteTopicsPath.



On Mon, Mar 23, 2015 at 8:21 PM, Harsha  wrote:

> Just to be clear, one needs to stop producers and consumers that
> writing/reading from a topic “test” if they are trying to delete that
> specific topic “test”. Not all producers and clients.
>
> --
> Harsha
>
> On March 23, 2015 at 10:13:47 AM, Harsha (harsh...@fastmail.fm) wrote:
>
>  Currently we have auto.create.topics.enable set to true by default. If
> this is set true any one who is making TopicMetadataRequest can create a
> topic . As both producers and consumers can send TopicMetadataRequest which
> will create a topic if the above config is true. So while doing deletion if
> there is  producer or consumer running it can re-create a topic thats in
> deletion process. This issue going to be addressed in upcoming versions.
> Meanwhile if you are not creating topics via producer than turn this config
> off or stop producer and consumers while you are trying to delete a topic.
>  --
> Harsha
>
>
> On March 23, 2015 at 9:57:53 AM, Grant Henke (ghe...@cloudera.com) wrote:
>
>  What happens when producers or consumers are running while the topic
> deleting is going on?
>
> On Mon, Mar 23, 2015 at 10:02 AM, Harsha  wrote:
>
> > DeleteTopic makes a node in zookeeper to let controller know that there
> is
> > a topic up for deletion. This doesn’t immediately delete the topic it can
> > take time depending if all the partitions of that topic are online and
> > brokers are available as well. Once all the Log files deleted zookeeper
> > node gets deleted as well.
> > Also make sure you don’t have any producers or consumers are running
> while
> > the topic deleting is going on.
> >
> > --
> > Harsha
> >
> >
> > On March 23, 2015 at 1:29:50 AM, anthony musyoki (
> > anthony.musy...@gmail.com) wrote:
> >
> > On deleting a topic via TopicCommand.deleteTopic()
> >
> > I get "Topic test-delete is marked for deletion."
> >
> > I follow up by checking if the topic exists by using
> > AdminUtils.topicExists()
> > which suprisingly returns true.
> >
> > I expected AdminUtils.TopicExists() to check both BrokerTopicsPath
> > and DeleteTopicsPath before returning a verdict but it only checks
> > BrokerTopicsPath
> >
> > Shouldn't a topic marked for deletion return false for topicExists() ?
> >
>
>
>
> --
> Grant Henke
> Solutions Consultant | Cloudera
> ghe...@cloudera.com | 920-980-8979
> twitter.com/ghenke  |
> linkedin.com/in/granthenke
>
>