custom serializer and deserializer

2015-02-25 Thread ankit tyagi
Hi,

I want to use protobuff for serializing and deserializing kafkaevents in
.8.2.0. I can provide my custom serializer by
setting KEY_SERIALIZER_CLASS_CONFIG and VALUE_SERIALIZER_CLASS_CONFIG.

but how can I provide custom Deserializer.


non-blocking sends when cluster is down

2015-02-25 Thread Gary Ogden
Say the entire kafka cluster is down and there's no brokers to connect to.
Is it possible to use the java producer send method and not block until
there's a timeout?  Is it as simple as registering a callback method?

We need the ability for our application to not have any kind of delay when
sending messages and the cluster is down.  It's ok if the messages are lost
when the cluster is down.

Thanks!


Re: "at least once" consumer recommendations for a load of 5 K messages/second

2015-02-25 Thread Anand Somani
Sweet! that I would not depend on ZK more consumption anymore. Thanks for
the response Gwen, I will take a look at the link you have provided.

>From what I have read so far, for my scenario to work correctly I would
have multiple partitions and a consumer per partition, is that correct? So
for me to be able to improve throughput on the consumer, will need to play
with the number of partitions. Is there any recommendation on that ratio
partition/topic or that can be scaled up/out with powerful/more hardware?

Thanks
Anand

On Tue, Feb 24, 2015 at 8:11 PM, Gwen Shapira  wrote:

> * ZK was not built for 5K/s writes type of load
> * Kafka 0.8.2.0 allows you to commit messages to Kafka rather than ZK. I
> believe this is recommended.
> * You can also commit batches of messages (i.e. commit every 100 messages).
> This will reduce the writes and give you at least once while controlling
> number of duplicates in case of failure.
> * Yes, can be done in high level consumer. I give few tips here:
>
> http://ingest.tips/2014/10/12/kafka-high-level-consumer-frequently-missing-pieces/
>
> Gwen
>
> On Tue, Feb 24, 2015 at 1:57 PM, Anand Somani 
> wrote:
>
> > Hi,
> >
> > It is a little long, since I wanted to explain the use case and then ask
> > questions, so thanks for your attention
> >
> > Use case:
> >
> > We have a use case where everything in the queue has to be consumed at
> > least once. So the consumer has to have "consumed" (saved in some
> > destination database) the message before confirming consumption to kafka
> > (or ZK). Now it is possible and from what I have read so far we will have
> > consumer groups and partitions. Here are some facts/numbers for our case
> >
> > * We will potentially have messages with peaks of 5k /second.
> > * We can play with the message size if that makes any difference (keep
> it <
> > 100 bytes for a link or put the entire message avg size of 2-5K bytes).
> > * We do not need replication, but might have a kafka cluster to handle
> the
> > load.
> > * Also work consumption will take anywhere from 300-500ms, generally we
> > would like the consumer to be not behind by more than 1-2 minutes. So if
> > the message shows up in a queue, it should show up in the database
> within 2
> > minutes.
> >
> > The questions I have are
> >   * If this has been covered before, please point me to it. Thanks
> >   * Is that possible/recommended "controlled commit per consumed message"
> > for this load (have read about some concerns on ZK issues)?
> >   * Are there any recommendations on configurations in terms of
> partitions
> > to number of messages OR consumers? Maybe more queues/topics
> >   * Anything else that we might need to watch out for?
> >   * As for the client, I should be able to do this (control when the
> offset
> > commit happens) with high level consumer I suppose?
> >
> >
> > Thanks
> > Anand
> >
>


Re: "at least once" consumer recommendations for a load of 5 K messages/second

2015-02-25 Thread Gwen Shapira
I don't have good numbers, but I noticed that I usually scale number of
partitions by the consumer rates and not by producer rate.

Writing to HDFS can be a bit slow (30MB/s is pretty typical, IIRC), so if I
need to write 5G a second, I need at least 15 consumers, which means at
least 15 partitions. Hopefully your consumers will be doing better. Maybe
your bottleneck will be 1gE network speed. Who knows?

Small scale benchmark on your specific setup can go a long way in capacity
planning :)

Gwen

On Wed, Feb 25, 2015 at 9:45 AM, Anand Somani  wrote:

> Sweet! that I would not depend on ZK more consumption anymore. Thanks for
> the response Gwen, I will take a look at the link you have provided.
>
> From what I have read so far, for my scenario to work correctly I would
> have multiple partitions and a consumer per partition, is that correct? So
> for me to be able to improve throughput on the consumer, will need to play
> with the number of partitions. Is there any recommendation on that ratio
> partition/topic or that can be scaled up/out with powerful/more hardware?
>
> Thanks
> Anand
>
> On Tue, Feb 24, 2015 at 8:11 PM, Gwen Shapira 
> wrote:
>
> > * ZK was not built for 5K/s writes type of load
> > * Kafka 0.8.2.0 allows you to commit messages to Kafka rather than ZK. I
> > believe this is recommended.
> > * You can also commit batches of messages (i.e. commit every 100
> messages).
> > This will reduce the writes and give you at least once while controlling
> > number of duplicates in case of failure.
> > * Yes, can be done in high level consumer. I give few tips here:
> >
> >
> http://ingest.tips/2014/10/12/kafka-high-level-consumer-frequently-missing-pieces/
> >
> > Gwen
> >
> > On Tue, Feb 24, 2015 at 1:57 PM, Anand Somani 
> > wrote:
> >
> > > Hi,
> > >
> > > It is a little long, since I wanted to explain the use case and then
> ask
> > > questions, so thanks for your attention
> > >
> > > Use case:
> > >
> > > We have a use case where everything in the queue has to be consumed at
> > > least once. So the consumer has to have "consumed" (saved in some
> > > destination database) the message before confirming consumption to
> kafka
> > > (or ZK). Now it is possible and from what I have read so far we will
> have
> > > consumer groups and partitions. Here are some facts/numbers for our
> case
> > >
> > > * We will potentially have messages with peaks of 5k /second.
> > > * We can play with the message size if that makes any difference (keep
> > it <
> > > 100 bytes for a link or put the entire message avg size of 2-5K bytes).
> > > * We do not need replication, but might have a kafka cluster to handle
> > the
> > > load.
> > > * Also work consumption will take anywhere from 300-500ms, generally we
> > > would like the consumer to be not behind by more than 1-2 minutes. So
> if
> > > the message shows up in a queue, it should show up in the database
> > within 2
> > > minutes.
> > >
> > > The questions I have are
> > >   * If this has been covered before, please point me to it. Thanks
> > >   * Is that possible/recommended "controlled commit per consumed
> message"
> > > for this load (have read about some concerns on ZK issues)?
> > >   * Are there any recommendations on configurations in terms of
> > partitions
> > > to number of messages OR consumers? Maybe more queues/topics
> > >   * Anything else that we might need to watch out for?
> > >   * As for the client, I should be able to do this (control when the
> > offset
> > > commit happens) with high level consumer I suppose?
> > >
> > >
> > > Thanks
> > > Anand
> > >
> >
>


Announcing the Confluent Platform built on Apache Kafka

2015-02-25 Thread Neha Narkhede
Folks,

We, at Confluent , are excited to announce the release
of Confluent Platform 1.0 built around Apache Kafka -
http://blog.confluent.io/2015/02/25/announcing-the-confluent-platform-1-0/

We also published a detailed two-part guide on how you can put Kafka to use
in your organization -
http://blog.confluent.io/2015/02/25/stream-data-platform-1/

And, there is a public mailing list where we would love to hear your
feedback: confluent-platf...@googlegroups.com

Thanks,
Neha


generate specific throughput load

2015-02-25 Thread Josh J
Hi,

Is there a way to generate a specified amount of throughput? I'm using the
Stats class here

to
measure the throughput. Though I need to be able to precisely control the
amount of load. For example, 1000 records per second.

Thanks,
Josh


Re: generate specific throughput load

2015-02-25 Thread Magnus Edenhill
Hi,

the rdkafka_performance tool from librdkafka's examples [1] lets you do
this with something like:
rdkafka_performance -P -b  -t  [-p ] -r 
-s 

Thats the producer side, if you want performance measurements on the
consumer side as well you do:
rdkafka_performance -C -b  -t  -p  -o


There is an end-to-end latency measurement mode too where you run both
producer and consumer simultaneously with the '-l' option.
(this mode typically requires some config tweaking to get optimal results,
e.g. minimize batch wait timing)

Regards,
Magnus

[1]: https://github.com/edenhill/librdkafka

2015-02-25 19:46 GMT+01:00 Josh J :

> Hi,
>
> Is there a way to generate a specified amount of throughput? I'm using the
> Stats class here
> <
> https://github.com/apache/kafka/blob/7130da90a9ee9e6fb4beb2a2a6ab05c06c9bfac4/clients/src/main/java/org/apache/kafka/clients/tools/ProducerPerformance.java
> >
> to
> measure the throughput. Though I need to be able to precisely control the
> amount of load. For example, 1000 records per second.
>
> Thanks,
> Josh
>


Re: Kafka High Level Consumer

2015-02-25 Thread Joseph Lawson
Doh that was probably my bad Pranay!  A misinterpretation of some old consumer 
code.  btw, jruby-kafka is now at 1.1.1 with proper support for deleting the 
offset, setting the auto_offset_reset and whitelist/blacklist topics.  It's 
packed up in a nice gem file that includes all Kafka and log4j pre-requisites 
too.

It's pretty feature complete for Kafka 0.8.1.1.

Hurray! Thanks to everyone that submitted PRs to make it better.

-Joe Lawson


From: Pranay Agarwal 
Sent: Wednesday, February 25, 2015 1:45 AM
To: users@kafka.apache.org
Subject: Re: Kafka High Level Consumer

Thanks Jun. It seems it was an issue with jruby client I was using. Now,
they fixed it.

-Pranay

On Mon, Feb 23, 2015 at 4:57 PM, Jun Rao  wrote:

> Did you enable auto offset commit?
>
> Thanks,
>
> Jun
>
> On Tue, Feb 17, 2015 at 4:22 PM, Pranay Agarwal 
> wrote:
>
> > Hi,
> >
> > I am trying to read kafka consumer using high level kafka Consumer API. I
> > had to restart the consumers for some reason but I kept the same group
> id.
> > It seems the consumers have started consuming from the beginning (0
> offset)
> > instead from the point they had already consumed.
> >
> > What am I doing wrong here?  How do I make sure the consumer start only
> > from the point they had left before?
> >
> > Thanks
> > -Pranay
> >
>


Tips for working with Kafka and data streams

2015-02-25 Thread Jay Kreps
Hey guys,

One thing we tried to do along with the product release was start to put
together a practical guide for using Kafka. I wrote this up here:
http://blog.confluent.io/2015/02/25/stream-data-platform-1/

I'd like to keep expanding on this as good practices emerge and we learn
more stuff. So two questions:
1. Anything you think other people should know about working with data
streams? What did you wish you knew when you got started?
2. Anything you don't know about but would like to hear more about?

-Jay


Re: Announcing the Confluent Platform built on Apache Kafka

2015-02-25 Thread Joseph Lawson
This is really awesome stuff.  It's great to see y'all growing!  Thank you and 
congratulations!


From: Neha Narkhede 
Sent: Wednesday, February 25, 2015 1:31 PM
To: users@kafka.apache.org; d...@kafka.apache.org
Subject: Announcing the Confluent Platform built on Apache Kafka

Folks,

We, at Confluent , are excited to announce the release
of Confluent Platform 1.0 built around Apache Kafka -
http://blog.confluent.io/2015/02/25/announcing-the-confluent-platform-1-0/

We also published a detailed two-part guide on how you can put Kafka to use
in your organization -
http://blog.confluent.io/2015/02/25/stream-data-platform-1/

And, there is a public mailing list where we would love to hear your
feedback: confluent-platf...@googlegroups.com

Thanks,
Neha


Re: Announcing the Confluent Platform built on Apache Kafka

2015-02-25 Thread Andrew Otto
Wow, .deb packages.  I love you.


> On Feb 25, 2015, at 14:48, Joseph Lawson  wrote:
> 
> This is really awesome stuff.  It's great to see y'all growing!  Thank you 
> and congratulations!
> 
> 
> From: Neha Narkhede 
> Sent: Wednesday, February 25, 2015 1:31 PM
> To: users@kafka.apache.org; d...@kafka.apache.org
> Subject: Announcing the Confluent Platform built on Apache Kafka
> 
> Folks,
> 
> We, at Confluent , are excited to announce the release
> of Confluent Platform 1.0 built around Apache Kafka -
> http://blog.confluent.io/2015/02/25/announcing-the-confluent-platform-1-0/
> 
> We also published a detailed two-part guide on how you can put Kafka to use
> in your organization -
> http://blog.confluent.io/2015/02/25/stream-data-platform-1/
> 
> And, there is a public mailing list where we would love to hear your
> feedback: confluent-platf...@googlegroups.com
> 
> Thanks,
> Neha



Re: generate specific throughput load

2015-02-25 Thread Jiangjie Qin
There is this ProducerPerformance class coming with new java producer. You
can go to KAFKA_HOME/bin and use the following command:

./kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
USAGE: java org.apache.kafka.clients.tools.ProducerPerformance topic_name
num_records record_size target_records_sec [prop_name=prop_value]*


In you need to specify bootstrap.servers in the property.

Jiangjie (Becket) Qin


On 2/25/15, 11:28 AM, "Magnus Edenhill"  wrote:

>Hi,
>
>the rdkafka_performance tool from librdkafka's examples [1] lets you do
>this with something like:
>rdkafka_performance -P -b  -t  [-p ] -r 
>-s 
>
>Thats the producer side, if you want performance measurements on the
>consumer side as well you do:
>rdkafka_performance -C -b  -t  -p  -o
>
>
>There is an end-to-end latency measurement mode too where you run both
>producer and consumer simultaneously with the '-l' option.
>(this mode typically requires some config tweaking to get optimal results,
>e.g. minimize batch wait timing)
>
>Regards,
>Magnus
>
>[1]: https://github.com/edenhill/librdkafka
>
>2015-02-25 19:46 GMT+01:00 Josh J :
>
>> Hi,
>>
>> Is there a way to generate a specified amount of throughput? I'm using
>>the
>> Stats class here
>> <
>> 
>>https://github.com/apache/kafka/blob/7130da90a9ee9e6fb4beb2a2a6ab05c06c9b
>>fac4/clients/src/main/java/org/apache/kafka/clients/tools/ProducerPerform
>>ance.java
>> >
>> to
>> measure the throughput. Though I need to be able to precisely control
>>the
>> amount of load. For example, 1000 records per second.
>>
>> Thanks,
>> Josh
>>



RE: Announcing the Confluent Platform built on Apache Kafka

2015-02-25 Thread Aditya Auradkar
Congrats!


From: Andrew Otto [ao...@wikimedia.org]
Sent: Wednesday, February 25, 2015 12:06 PM
To: users@kafka.apache.org
Cc: d...@kafka.apache.org
Subject: Re: Announcing the Confluent Platform built on Apache Kafka

Wow, .deb packages.  I love you.


> On Feb 25, 2015, at 14:48, Joseph Lawson  wrote:
>
> This is really awesome stuff.  It's great to see y'all growing!  Thank you 
> and congratulations!
>
> 
> From: Neha Narkhede 
> Sent: Wednesday, February 25, 2015 1:31 PM
> To: users@kafka.apache.org; d...@kafka.apache.org
> Subject: Announcing the Confluent Platform built on Apache Kafka
>
> Folks,
>
> We, at Confluent , are excited to announce the release
> of Confluent Platform 1.0 built around Apache Kafka -
> http://blog.confluent.io/2015/02/25/announcing-the-confluent-platform-1-0/
>
> We also published a detailed two-part guide on how you can put Kafka to use
> in your organization -
> http://blog.confluent.io/2015/02/25/stream-data-platform-1/
>
> And, there is a public mailing list where we would love to hear your
> feedback: confluent-platf...@googlegroups.com
>
> Thanks,
> Neha



Re: Tips for working with Kafka and data streams

2015-02-25 Thread Christian Csar
I wouldn't say no to some discussion of encryption. We're running on Azure
EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments
in customer datacenters when needed) so can't just use disk level
encryption (which would have its own overhead). We're putting all of our
messages inside of encrypted envelopes before sending them to the stream
which limits our opportunities for schema verification of the underlying
messages to the declared type of the message.

Encryption at rest mostly works out to a sales point for customers who want
assurances, and in a Kafka focused discussion might be dealt with by
covering disk encryption and how the conversations between Kafka instances
are protected.

Christian


On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps  wrote:

> Hey guys,
>
> One thing we tried to do along with the product release was start to put
> together a practical guide for using Kafka. I wrote this up here:
> http://blog.confluent.io/2015/02/25/stream-data-platform-1/
>
> I'd like to keep expanding on this as good practices emerge and we learn
> more stuff. So two questions:
> 1. Anything you think other people should know about working with data
> streams? What did you wish you knew when you got started?
> 2. Anything you don't know about but would like to hear more about?
>
> -Jay
>


Re: non-blocking sends when cluster is down

2015-02-25 Thread Guozhang Wang
Hi Gray,

The Java producer will block on send() when the buffer is full and
block.on.buffer.full = true (
http://kafka.apache.org/documentation.html#newproducerconfigs). If you set
the config to false the send() call will throw a BufferExhaustedException
which, in your case, can be caught and ignore and allow the message to drop
on the floor.

Guozhang



On Wed, Feb 25, 2015 at 5:08 AM, Gary Ogden  wrote:

> Say the entire kafka cluster is down and there's no brokers to connect to.
> Is it possible to use the java producer send method and not block until
> there's a timeout?  Is it as simple as registering a callback method?
>
> We need the ability for our application to not have any kind of delay when
> sending messages and the cluster is down.  It's ok if the messages are lost
> when the cluster is down.
>
> Thanks!
>



-- 
-- Guozhang


Re: custom serializer and deserializer

2015-02-25 Thread Guozhang Wang
Only the consumer needs deserializer classes. The current Java consumer is
still under development but when it is finished you will find the
corresponding KEY_DESERIALIZER_CLASS_CONFIG /
VALUE_DESERIALIZER_CLASS_CONFIG in ConsumerConfig.

Guozhang

On Wed, Feb 25, 2015 at 4:54 AM, ankit tyagi 
wrote:

> Hi,
>
> I want to use protobuff for serializing and deserializing kafkaevents in
> .8.2.0. I can provide my custom serializer by
> setting KEY_SERIALIZER_CLASS_CONFIG and VALUE_SERIALIZER_CLASS_CONFIG.
>
> but how can I provide custom Deserializer.
>



-- 
-- Guozhang


Re: Tips for working with Kafka and data streams

2015-02-25 Thread Jay Kreps
Hey Christian,

That makes sense. I agree that would be a good area to dive into. Are you
primarily interested in network level security or encryption on disk?

-Jay

On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar  wrote:

> I wouldn't say no to some discussion of encryption. We're running on Azure
> EventHubs (with preparations for Kinesis for EC2, and Kafka for deployments
> in customer datacenters when needed) so can't just use disk level
> encryption (which would have its own overhead). We're putting all of our
> messages inside of encrypted envelopes before sending them to the stream
> which limits our opportunities for schema verification of the underlying
> messages to the declared type of the message.
>
> Encryption at rest mostly works out to a sales point for customers who want
> assurances, and in a Kafka focused discussion might be dealt with by
> covering disk encryption and how the conversations between Kafka instances
> are protected.
>
> Christian
>
>
> On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps  wrote:
>
> > Hey guys,
> >
> > One thing we tried to do along with the product release was start to put
> > together a practical guide for using Kafka. I wrote this up here:
> > http://blog.confluent.io/2015/02/25/stream-data-platform-1/
> >
> > I'd like to keep expanding on this as good practices emerge and we learn
> > more stuff. So two questions:
> > 1. Anything you think other people should know about working with data
> > streams? What did you wish you knew when you got started?
> > 2. Anything you don't know about but would like to hear more about?
> >
> > -Jay
> >
>


Broker Exceptions

2015-02-25 Thread Zakee
Need to know if I should I be worried about this or ignore them.

I see tons of these exceptions/warnings in the broker logs, not sure what
causes them and what could be done to fix them.

ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to broker
5:class kafka.common.NotLeaderForPartitionException
(kafka.server.ReplicaFetcherThread)
[2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
partition [TestTopic] to broker 5:class
kafka.common.NotLeaderForPartitionException
(kafka.server.ReplicaFetcherThread)
[2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch request
with correlation id 950084 from client ReplicaFetcherThread-1-2 on
partition [TestTopic,2] failed due to Leader not local for partition
[TestTopic,2] on broker 2 (kafka.server.ReplicaManager)


Any ideas?

-Zakee

Next Apple Sensation
1 little-known path to big profits
http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc

Re: broker restart problems

2015-02-25 Thread Zakee
Do you have the property auto.leader.rebalance.enable=true set in brokers?

Thanks
-Zakee

On Tue, Feb 24, 2015 at 11:47 PM, ZhuGe  wrote:

> Hi all:We have a cluster of 3 brokers(id : 0,1,2). We restart(simply use
> stop.sh and start.sh in bin directory) broker 1. The broker started
> successfully. However, all the partitions' leader moved to other brokers
> and no data were written into broker 2. This is the status of one
> topic:Topic:wx_rtdc_flumesinksPartitionCount:12
>  ReplicationFactor:3 Configs:Topic: wx_rtdc_flumesinks
>  Partition: 0Leader: 2   Replicas: 1,2,0 Isr: 2,0Topic:
> wx_rtdc_flumesinks   Partition: 1Leader: 2   Replicas: 2,0,1
> Isr: 2,0Topic: wx_rtdc_flumesinks   Partition: 2Leader: 0
>  Replicas: 0,1,2 Isr: 0,2Topic: wx_rtdc_flumesinks
>  Partition: 3Leader: 0   Replicas: 1,0,2 Isr: 0,2Topic:
> wx_rtdc_flumesinks   Partition: 4Leader: 2   Replicas: 2,1,0
> Isr: 2,0Topic: wx_rtdc_flumesinks   Partition: 5Leader: 0
>  Replicas: 0,2,1 Isr: 0,2Topic: wx_rtdc_flumesinks
>  Partition: 6Leader: 2   Replicas: 1,2,0 Isr: 2,0Topic:
> wx_rtdc_flumesinks   Partition: 7Leader: 2   Replicas: 2,0,1
> Isr: 2,0Topic: wx_rtdc_flumesinks   Partition: 8Leader: 0
>  Replicas: 0,1,2 Isr: 0,2Topic: wx_rtdc_flumesinks
>  Partition: 9Leader: 0   Replicas: 1,0,2 Isr: 0,2Topic:
> wx_rtdc_flumesinks   Partition: 10   Leader: 2   Replicas: 2,1,0
> Isr: 2,0Topic: wx_rtdc_flumesinks   Partition: 11   Leader: 0
>  Replicas: 0,2,1 Isr: 0,2
> It seems the broker is out of synchronize with other brokers.  and nothing
> changed after i run preferred replication leader election tool.  i think it
> is because the preferred replication is not in Isr, which is described in
> the wiki of replication tool.
> I want to know how to synchronize the replications of 3 brokers so that
> the broker 1 could work properly.any help would be appreciated.
> 
> High School Yearbooks
> View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
> http://thirdpartyoffers.netzero.net/TGL3255/54ed7fb69616f7fb61578mp03duc


Re: Broker Exceptions

2015-02-25 Thread Jiangjie Qin
These messages are usually caused by leader migration. I think as long as
you don¹t see this lasting for ever and got a bunch of under replicated
partitions, it should be fine.

Jiangjie (Becket) Qin

On 2/25/15, 4:07 PM, "Zakee"  wrote:

>Need to know if I should I be worried about this or ignore them.
>
>I see tons of these exceptions/warnings in the broker logs, not sure what
>causes them and what could be done to fix them.
>
>ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
>broker
>5:class kafka.common.NotLeaderForPartitionException
>(kafka.server.ReplicaFetcherThread)
>[2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
>partition [TestTopic] to broker 5:class
>kafka.common.NotLeaderForPartitionException
>(kafka.server.ReplicaFetcherThread)
>[2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
>request
>with correlation id 950084 from client ReplicaFetcherThread-1-2 on
>partition [TestTopic,2] failed due to Leader not local for partition
>[TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>
>
>Any ideas?
>
>-Zakee
>
>Next Apple Sensation
>1 little-known path to big profits
>http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc



Re: Tips for working with Kafka and data streams

2015-02-25 Thread Tong Li

+2, these kind of articles coming from the ones who created Kafka always
provide great value to Kafka users and developers. For my 2 cents, I would
love to see one or two articles for developers who involved in Kafka
development on the topics of how to develop test cases and how to run them,
what to expect when error occurs, typical system settings, I suspect that
most of us do run it on linux based systems, little pointer probably can
help a lot. and most importantly how to set up your dev environment so that
you are not struggling with the things the pioneers have already figured
out. For example, recommended dev. ide, debug methods, of course, these
will be the preference of the writer, no one is obligated to use but can
certainly get people started quicker. As Kafka draw more interest, I
suspect more developers will join, having something like that can be
extremely helpful.

Jay, articles similar to the one linked in your original email can actually
be submitted to developerworks, and you can get some money out of it if you
like. If you do not know how to do that, I can certainly provide some
pointers if you are interested.

Thanks.

Tong Li
OpenStack & Kafka Community Development
Building 501/B205
liton...@us.ibm.com



From:   Jay Kreps 
To: "d...@kafka.apache.org" ,
"users@kafka.apache.org" 
Date:   02/25/2015 02:52 PM
Subject:Tips for working with Kafka and data streams



Hey guys,

One thing we tried to do along with the product release was start to put
together a practical guide for using Kafka. I wrote this up here:
http://blog.confluent.io/2015/02/25/stream-data-platform-1/

I'd like to keep expanding on this as good practices emerge and we learn
more stuff. So two questions:
1. Anything you think other people should know about working with data
streams? What did you wish you knew when you got started?
2. Anything you don't know about but would like to hear more about?

-Jay


Re: Tips for working with Kafka and data streams

2015-02-25 Thread Christian Csar
The questions we get from customers typically end up being general so we
break out our answer into network level and on disk scenarios.

On disk/at rest scenario may just be use full disk encryption at the OS
level and Kafka doesn't need to worry about it. But documenting any issues
around it would be good. For example what sort of Kafka specific
performance impacts does it have, ie budgeting for better processors.

The security story right now is to run on a private network, but I believe
some of our customers like to be told that within datacenter transmissions
are encrypted on the wire. Based on
https://cwiki.apache.org/confluence/display/KAFKA/Security that might mean
waiting for TLS support, or using a VPN/ssh tunnel for the network
connections.

Since we're in hosted stream land we can't do either of the above and
encrypt the messages themselves. For those enterprises that are like our
customers but would run Kafka or use Confluent, having a story like the
above so they don't give up the benefits of your schema management layers
would be good.

Since I didn't mention it before I did find your blog posts handy (though
I'm already moving us towards stream centric land).

Christian

On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps  wrote:

> Hey Christian,
>
> That makes sense. I agree that would be a good area to dive into. Are you
> primarily interested in network level security or encryption on disk?
>
> -Jay
>
> On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar  wrote:
>
> > I wouldn't say no to some discussion of encryption. We're running on
> Azure
> > EventHubs (with preparations for Kinesis for EC2, and Kafka for
> deployments
> > in customer datacenters when needed) so can't just use disk level
> > encryption (which would have its own overhead). We're putting all of our
> > messages inside of encrypted envelopes before sending them to the stream
> > which limits our opportunities for schema verification of the underlying
> > messages to the declared type of the message.
> >
> > Encryption at rest mostly works out to a sales point for customers who
> want
> > assurances, and in a Kafka focused discussion might be dealt with by
> > covering disk encryption and how the conversations between Kafka
> instances
> > are protected.
> >
> > Christian
> >
> >
> > On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps  wrote:
> >
> > > Hey guys,
> > >
> > > One thing we tried to do along with the product release was start to
> put
> > > together a practical guide for using Kafka. I wrote this up here:
> > > http://blog.confluent.io/2015/02/25/stream-data-platform-1/
> > >
> > > I'd like to keep expanding on this as good practices emerge and we
> learn
> > > more stuff. So two questions:
> > > 1. Anything you think other people should know about working with data
> > > streams? What did you wish you knew when you got started?
> > > 2. Anything you don't know about but would like to hear more about?
> > >
> > > -Jay
> > >
> >
>


Re: Broker Exceptions

2015-02-25 Thread Zakee
Thanks, Jiangjie.

Yes, I do see under partitions usually shooting every hour. Anythings that
I could try to reduce it?

How does "num.replica.fetchers" affect the replica sync? Currently have
configured 7 each of 5 brokers.

-Zakee

On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin 
wrote:

> These messages are usually caused by leader migration. I think as long as
> you don¹t see this lasting for ever and got a bunch of under replicated
> partitions, it should be fine.
>
> Jiangjie (Becket) Qin
>
> On 2/25/15, 4:07 PM, "Zakee"  wrote:
>
> >Need to know if I should I be worried about this or ignore them.
> >
> >I see tons of these exceptions/warnings in the broker logs, not sure what
> >causes them and what could be done to fix them.
> >
> >ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
> >broker
> >5:class kafka.common.NotLeaderForPartitionException
> >(kafka.server.ReplicaFetcherThread)
> >[2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
> >partition [TestTopic] to broker 5:class
> >kafka.common.NotLeaderForPartitionException
> >(kafka.server.ReplicaFetcherThread)
> >[2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
> >request
> >with correlation id 950084 from client ReplicaFetcherThread-1-2 on
> >partition [TestTopic,2] failed due to Leader not local for partition
> >[TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
> >
> >
> >Any ideas?
> >
> >-Zakee
> >
> >Next Apple Sensation
> >1 little-known path to big profits
> >http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc
>
> 
> Extended Stay America
> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI
> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02duc
>


Re: Broker Exceptions

2015-02-25 Thread Jiangjie Qin
I don’t think num.replica.fetchers will help in this case. Increasing
number of fetcher threads will only help in cases where you have a large
amount of data coming into a broker and more replica fetcher threads will
help keep up. We usually only use 1-2 for each broker. But in your case,
it looks that leader migration cause issue.
Do you see anything else in the log? Like preferred leader election?

Jiangjie (Becket) Qin

On 2/25/15, 5:02 PM, "Zakee"  wrote:

>Thanks, Jiangjie.
>
>Yes, I do see under partitions usually shooting every hour. Anythings that
>I could try to reduce it?
>
>How does "num.replica.fetchers" affect the replica sync? Currently have
>configured 7 each of 5 brokers.
>
>-Zakee
>
>On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin 
>wrote:
>
>> These messages are usually caused by leader migration. I think as long
>>as
>> you don¹t see this lasting for ever and got a bunch of under replicated
>> partitions, it should be fine.
>>
>> Jiangjie (Becket) Qin
>>
>> On 2/25/15, 4:07 PM, "Zakee"  wrote:
>>
>> >Need to know if I should I be worried about this or ignore them.
>> >
>> >I see tons of these exceptions/warnings in the broker logs, not sure
>>what
>> >causes them and what could be done to fix them.
>> >
>> >ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] to
>> >broker
>> >5:class kafka.common.NotLeaderForPartitionException
>> >(kafka.server.ReplicaFetcherThread)
>> >[2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error for
>> >partition [TestTopic] to broker 5:class
>> >kafka.common.NotLeaderForPartitionException
>> >(kafka.server.ReplicaFetcherThread)
>> >[2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
>> >request
>> >with correlation id 950084 from client ReplicaFetcherThread-1-2 on
>> >partition [TestTopic,2] failed due to Leader not local for partition
>> >[TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>> >
>> >
>> >Any ideas?
>> >
>> >-Zakee
>> >
>> >Next Apple Sensation
>> >1 little-known path to big profits
>> 
>>>http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st03vuc
>>
>> 
>> Extended Stay America
>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free WIFI
>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02duc
>>



Re: Tips for working with Kafka and data streams

2015-02-25 Thread Julio Castillo
Although full disk encryption appears to be an easy solution, in our case
that may not be sufficient. For cases where the actual payload needs to be
encrypted, the cost of encryption is paid by the consumer and producers.
Further complicating the matter would be the handling of encryption keys,
etc. I think this is the area where enhancements to Kafka may facilitate
that key exchange between consumers and producers, still leaving it up to
the clients, but facilitating the key handling.

Julio

On 2/25/15, 4:24 PM, "Christian Csar"  wrote:

>The questions we get from customers typically end up being general so we
>break out our answer into network level and on disk scenarios.
>
>On disk/at rest scenario may just be use full disk encryption at the OS
>level and Kafka doesn't need to worry about it. But documenting any issues
>around it would be good. For example what sort of Kafka specific
>performance impacts does it have, ie budgeting for better processors.
>
>The security story right now is to run on a private network, but I believe
>some of our customers like to be told that within datacenter transmissions
>are encrypted on the wire. Based on
>https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_conf
>luence_display_KAFKA_Security&d=AwIBaQ&c=cKbMccWasSe6U4u_qE0M-qEjqwAh3shju
>L5QPa1B7Yk&r=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0&m=jhFmJTJBQfbq0sN
>jxtKA4M1tvSVgBLKOr2ePaK6zqww&s=HqZ4N2gLpCZ796dRG7Fo-KLOBc0tgnnvDnC_8VTUo84
>&e=  that might mean
>waiting for TLS support, or using a VPN/ssh tunnel for the network
>connections.
>
>Since we're in hosted stream land we can't do either of the above and
>encrypt the messages themselves. For those enterprises that are like our
>customers but would run Kafka or use Confluent, having a story like the
>above so they don't give up the benefits of your schema management layers
>would be good.
>
>Since I didn't mention it before I did find your blog posts handy (though
>I'm already moving us towards stream centric land).
>
>Christian
>
>On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps  wrote:
>
>> Hey Christian,
>>
>> That makes sense. I agree that would be a good area to dive into. Are
>>you
>> primarily interested in network level security or encryption on disk?
>>
>> -Jay
>>
>> On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar 
>>wrote:
>>
>> > I wouldn't say no to some discussion of encryption. We're running on
>> Azure
>> > EventHubs (with preparations for Kinesis for EC2, and Kafka for
>> deployments
>> > in customer datacenters when needed) so can't just use disk level
>> > encryption (which would have its own overhead). We're putting all of
>>our
>> > messages inside of encrypted envelopes before sending them to the
>>stream
>> > which limits our opportunities for schema verification of the
>>underlying
>> > messages to the declared type of the message.
>> >
>> > Encryption at rest mostly works out to a sales point for customers who
>> want
>> > assurances, and in a Kafka focused discussion might be dealt with by
>> > covering disk encryption and how the conversations between Kafka
>> instances
>> > are protected.
>> >
>> > Christian
>> >
>> >
>> > On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps  wrote:
>> >
>> > > Hey guys,
>> > >
>> > > One thing we tried to do along with the product release was start to
>> put
>> > > together a practical guide for using Kafka. I wrote this up here:
>> > > 
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.confluent.io_201
>>5_02_25_stream-2Ddata-2Dplatform-2D1_&d=AwIBaQ&c=cKbMccWasSe6U4u_qE0M-qEj
>>qwAh3shjuL5QPa1B7Yk&r=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0&m=jhFmJ
>>TJBQfbq0sNjxtKA4M1tvSVgBLKOr2ePaK6zqww&s=0I9x4bCw1kN3y9Y22l9lK_YbhSYEZpp4
>>ZwBBrP-dSLk&e= 
>> > >
>> > > I'd like to keep expanding on this as good practices emerge and we
>> learn
>> > > more stuff. So two questions:
>> > > 1. Anything you think other people should know about working with
>>data
>> > > streams? What did you wish you knew when you got started?
>> > > 2. Anything you don't know about but would like to hear more about?
>> > >
>> > > -Jay
>> > >
>> >
>>

NOTICE: This e-mail and any attachments to it may be privileged, confidential 
or contain trade secret information and is intended only for the use of the 
individual or entity to which it is addressed. If this e-mail was sent to you 
in error, please notify me immediately by either reply e-mail or by phone at 
408.498.6000, and do not use, disseminate, retain, print or copy the e-mail or 
any attachment. All messages sent to and from this e-mail address may be 
monitored as permitted by or necessary under applicable law and regulations.


RE: broker restart problems

2015-02-25 Thread ZhuGe
we did not have this setting in the property file, so it should be false. BTW, 
it this command means periodically invoking 'preferred replica leader election 
tool'?and how should i solve the "out of syn" problem of the broker?
> Date: Wed, 25 Feb 2015 16:09:42 -0800
> Subject: Re: broker restart problems
> From: kzak...@netzero.net
> To: users@kafka.apache.org
> 
> Do you have the property auto.leader.rebalance.enable=true set in brokers?
> 
> Thanks
> -Zakee
> 
> On Tue, Feb 24, 2015 at 11:47 PM, ZhuGe  wrote:
> 
> > Hi all:We have a cluster of 3 brokers(id : 0,1,2). We restart(simply use
> > stop.sh and start.sh in bin directory) broker 1. The broker started
> > successfully. However, all the partitions' leader moved to other brokers
> > and no data were written into broker 2. This is the status of one
> > topic:Topic:wx_rtdc_flumesinksPartitionCount:12
> >  ReplicationFactor:3 Configs:Topic: wx_rtdc_flumesinks
> >  Partition: 0Leader: 2   Replicas: 1,2,0 Isr: 2,0Topic:
> > wx_rtdc_flumesinks   Partition: 1Leader: 2   Replicas: 2,0,1
> > Isr: 2,0Topic: wx_rtdc_flumesinks   Partition: 2Leader: 0
> >  Replicas: 0,1,2 Isr: 0,2Topic: wx_rtdc_flumesinks
> >  Partition: 3Leader: 0   Replicas: 1,0,2 Isr: 0,2Topic:
> > wx_rtdc_flumesinks   Partition: 4Leader: 2   Replicas: 2,1,0
> > Isr: 2,0Topic: wx_rtdc_flumesinks   Partition: 5Leader: 0
> >  Replicas: 0,2,1 Isr: 0,2Topic: wx_rtdc_flumesinks
> >  Partition: 6Leader: 2   Replicas: 1,2,0 Isr: 2,0Topic:
> > wx_rtdc_flumesinks   Partition: 7Leader: 2   Replicas: 2,0,1
> > Isr: 2,0Topic: wx_rtdc_flumesinks   Partition: 8Leader: 0
> >  Replicas: 0,1,2 Isr: 0,2Topic: wx_rtdc_flumesinks
> >  Partition: 9Leader: 0   Replicas: 1,0,2 Isr: 0,2Topic:
> > wx_rtdc_flumesinks   Partition: 10   Leader: 2   Replicas: 2,1,0
> > Isr: 2,0Topic: wx_rtdc_flumesinks   Partition: 11   Leader: 0
> >  Replicas: 0,2,1 Isr: 0,2
> > It seems the broker is out of synchronize with other brokers.  and nothing
> > changed after i run preferred replication leader election tool.  i think it
> > is because the preferred replication is not in Isr, which is described in
> > the wiki of replication tool.
> > I want to know how to synchronize the replications of 3 brokers so that
> > the broker 1 could work properly.any help would be appreciated.
> > 
> > High School Yearbooks
> > View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
> > http://thirdpartyoffers.netzero.net/TGL3255/54ed7fb69616f7fb61578mp03duc
  

Re: Tips for working with Kafka and data streams

2015-02-25 Thread Christian Csar
Yeah, we do have scenarios where we use customer specific keys so our
envelopes end up containing key identification information for accessing
our key repository. I'll certainly follow any changes you propose in this
area with interest, but I'd expect that sort of centralized key thing to be
fairly separate from Kafka even if there's a handy optional layer that
integrates with it.

Christian

On Wed, Feb 25, 2015 at 5:34 PM, Julio Castillo <
jcasti...@financialengines.com> wrote:

> Although full disk encryption appears to be an easy solution, in our case
> that may not be sufficient. For cases where the actual payload needs to be
> encrypted, the cost of encryption is paid by the consumer and producers.
> Further complicating the matter would be the handling of encryption keys,
> etc. I think this is the area where enhancements to Kafka may facilitate
> that key exchange between consumers and producers, still leaving it up to
> the clients, but facilitating the key handling.
>
> Julio
>
> On 2/25/15, 4:24 PM, "Christian Csar"  wrote:
>
> >The questions we get from customers typically end up being general so we
> >break out our answer into network level and on disk scenarios.
> >
> >On disk/at rest scenario may just be use full disk encryption at the OS
> >level and Kafka doesn't need to worry about it. But documenting any issues
> >around it would be good. For example what sort of Kafka specific
> >performance impacts does it have, ie budgeting for better processors.
> >
> >The security story right now is to run on a private network, but I believe
> >some of our customers like to be told that within datacenter transmissions
> >are encrypted on the wire. Based on
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_conf
> >luence_display_KAFKA_Security&d=AwIBaQ&c=cKbMccWasSe6U4u_qE0M-qEjqwAh3shju
> >L5QPa1B7Yk&r=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0&m=jhFmJTJBQfbq0sN
> >jxtKA4M1tvSVgBLKOr2ePaK6zqww&s=HqZ4N2gLpCZ796dRG7Fo-KLOBc0tgnnvDnC_8VTUo84
> >&e=  that might mean
> >waiting for TLS support, or using a VPN/ssh tunnel for the network
> >connections.
> >
> >Since we're in hosted stream land we can't do either of the above and
> >encrypt the messages themselves. For those enterprises that are like our
> >customers but would run Kafka or use Confluent, having a story like the
> >above so they don't give up the benefits of your schema management layers
> >would be good.
> >
> >Since I didn't mention it before I did find your blog posts handy (though
> >I'm already moving us towards stream centric land).
> >
> >Christian
> >
> >On Wed, Feb 25, 2015 at 3:57 PM, Jay Kreps  wrote:
> >
> >> Hey Christian,
> >>
> >> That makes sense. I agree that would be a good area to dive into. Are
> >>you
> >> primarily interested in network level security or encryption on disk?
> >>
> >> -Jay
> >>
> >> On Wed, Feb 25, 2015 at 1:38 PM, Christian Csar 
> >>wrote:
> >>
> >> > I wouldn't say no to some discussion of encryption. We're running on
> >> Azure
> >> > EventHubs (with preparations for Kinesis for EC2, and Kafka for
> >> deployments
> >> > in customer datacenters when needed) so can't just use disk level
> >> > encryption (which would have its own overhead). We're putting all of
> >>our
> >> > messages inside of encrypted envelopes before sending them to the
> >>stream
> >> > which limits our opportunities for schema verification of the
> >>underlying
> >> > messages to the declared type of the message.
> >> >
> >> > Encryption at rest mostly works out to a sales point for customers who
> >> want
> >> > assurances, and in a Kafka focused discussion might be dealt with by
> >> > covering disk encryption and how the conversations between Kafka
> >> instances
> >> > are protected.
> >> >
> >> > Christian
> >> >
> >> >
> >> > On Wed, Feb 25, 2015 at 11:51 AM, Jay Kreps  wrote:
> >> >
> >> > > Hey guys,
> >> > >
> >> > > One thing we tried to do along with the product release was start to
> >> put
> >> > > together a practical guide for using Kafka. I wrote this up here:
> >> > >
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.confluent.io_201
> >>5_02_25_stream-2Ddata-2Dplatform-2D1_&d=AwIBaQ&c=cKbMccWasSe6U4u_qE0M-qEj
> >>qwAh3shjuL5QPa1B7Yk&r=rJHFl4LhCQ-6kvKROhIocflKqVSHRTvT-PgdZ5MFuS0&m=jhFmJ
> >>TJBQfbq0sNjxtKA4M1tvSVgBLKOr2ePaK6zqww&s=0I9x4bCw1kN3y9Y22l9lK_YbhSYEZpp4
> >>ZwBBrP-dSLk&e=
> >> > >
> >> > > I'd like to keep expanding on this as good practices emerge and we
> >> learn
> >> > > more stuff. So two questions:
> >> > > 1. Anything you think other people should know about working with
> >>data
> >> > > streams? What did you wish you knew when you got started?
> >> > > 2. Anything you don't know about but would like to hear more about?
> >> > >
> >> > > -Jay
> >> > >
> >> >
> >>
>
> NOTICE: This e-mail and any attachments to it may be privileged,
> confidential or contain trade secret information and is intended only for
> the use of the individual or entity to which it

Re: How to measure performance metrics

2015-02-25 Thread Otis Gospodnetic
Have a look at http://blog.sematext.com/2015/02/10/kafka-0-8-2-monitoring/
There are also various open-source projects.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Feb 25, 2015 at 12:40 AM, Bhuvana Baskar  wrote:

> Hi,
>
> Please let me know how to measure the performance metrics while
> pushing/consuming the message to/from the topic.
>
> Thanks.
>


Re: NetworkProcessorAvgIdlePercent

2015-02-25 Thread Jun Rao
Then you may want to consider increasing num.io.threads
and num.network.threads.

Thanks,

Jun

On Tue, Feb 24, 2015 at 7:48 PM, Zakee  wrote:

> Similar pattern for that too. Mostly hovering below.
>
> -Zakee
>
> On Tue, Feb 24, 2015 at 2:43 PM, Jun Rao  wrote:
>
> > What about RequestHandlerAvgIdlePercent?
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Feb 23, 2015 at 8:47 PM, Zakee  wrote:
> >
> > > Hi Jun,
> > >
> > > With ~100G of data being pushed per hour across 35 topics
> > > (replication-factor 3), the NetworkProcessorAvgIdlePercent is mostly
> > > showing below 0.5 sometimes when the producers send on a high rate.
> > >
> > > Thanks
> > > -Zakee
> > >
> > > On Sun, Feb 22, 2015 at 10:29 PM, Jun Rao  wrote:
> > >
> > > > What kind of load do you have on the brokers? On an idle cluster
> (just
> > > > fetch requests from the follower replicas), I
> > > > saw NetworkProcessorAvgIdlePercent at about 97%.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Feb 19, 2015 at 5:19 PM, Zakee  wrote:
> > > >
> > > > > Jun,
> > > > >
> > > > > I am already using the latest release 0.8.2.1.
> > > > >
> > > > > -Zakee
> > > > >
> > > > > On Thu, Feb 19, 2015 at 2:46 PM, Jun Rao  wrote:
> > > > >
> > > > > > Could you try the 0.8.2.1 release being voted on now? It fixes a
> > CPU
> > > > > issue
> > > > > > and should reduce the CPU load in network thread.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Thu, Feb 19, 2015 at 11:54 AM, Zakee 
> > wrote:
> > > > > >
> > > > > > > Kafka documentation recommends <0.3 for above metric. I assume
> > > > > processor
> > > > > > is
> > > > > > > busier if this goes below 0.3 and obviously it being < 0.3 for
> > long
> > > > > does
> > > > > > > not seem to be a good sign.
> > > > > > >
> > > > > > > What should be our criteria to raise an alert, I though it
> should
> > > be
> > > > > > when
> > > > > > > its value goes below 0.3. However, the value seems to be below
> > 0.3
> > > a
> > > > > lot
> > > > > > of
> > > > > > > the times, almost always if we take samples every five mins.
> What
> > > > > should
> > > > > > be
> > > > > > > the threshold to raise an alarm ?
> > > > > > >
> > > > > > > What would be the impact of having this below 0.3 or even zero
> > like
> > > > > most
> > > > > > of
> > > > > > > the times?
> > > > > > >
> > > > > > >
> > > > > > > -Zakee
> > > > > > > 
> > > > > > > How Old Men Tighten Skin
> > > > > > > 63 Year Old Man Shares DIY Skin Tightening Method You Can Do
> From
> > > > Home
> > > > > > >
> > > > >
> > >
> http://thirdpartyoffers.netzero.net/TGL3231/54e63f5bda4c23f5b6560st02vuc
> > > > > > 
> > > > > > 8% Annuity Return Secret
> > > > > > Earn Guaranteed Income for Life! Compare Rates Today.
> > > > > >
> > > >
> > http://thirdpartyoffers.netzero.net/TGL3255/54e6782bcbe78782b37bdmp15duc
> > > > >
> > > > 
> > > > High School Yearbooks
> > > > View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
> > > >
> > http://thirdpartyoffers.netzero.net/TGL3255/54eac962abad049627e56mp03duc
> > >
> > 
> > Extended Stay America
> > Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
> > http://thirdpartyoffers.netzero.net/TGL3255/54ed04fb9538c4fb6188mp15duc
>


Re: NetworkProcessorAvgIdlePercent

2015-02-25 Thread Zakee
Well currently I have configured 14 thread for both io and network. Do you
think we should consider more?

Thanks
-Zakee

On Wed, Feb 25, 2015 at 6:22 PM, Jun Rao  wrote:

> Then you may want to consider increasing num.io.threads
> and num.network.threads.
>
> Thanks,
>
> Jun
>
> On Tue, Feb 24, 2015 at 7:48 PM, Zakee  wrote:
>
> > Similar pattern for that too. Mostly hovering below.
> >
> > -Zakee
> >
> > On Tue, Feb 24, 2015 at 2:43 PM, Jun Rao  wrote:
> >
> > > What about RequestHandlerAvgIdlePercent?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Feb 23, 2015 at 8:47 PM, Zakee  wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > With ~100G of data being pushed per hour across 35 topics
> > > > (replication-factor 3), the NetworkProcessorAvgIdlePercent is mostly
> > > > showing below 0.5 sometimes when the producers send on a high rate.
> > > >
> > > > Thanks
> > > > -Zakee
> > > >
> > > > On Sun, Feb 22, 2015 at 10:29 PM, Jun Rao  wrote:
> > > >
> > > > > What kind of load do you have on the brokers? On an idle cluster
> > (just
> > > > > fetch requests from the follower replicas), I
> > > > > saw NetworkProcessorAvgIdlePercent at about 97%.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, Feb 19, 2015 at 5:19 PM, Zakee 
> wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > I am already using the latest release 0.8.2.1.
> > > > > >
> > > > > > -Zakee
> > > > > >
> > > > > > On Thu, Feb 19, 2015 at 2:46 PM, Jun Rao 
> wrote:
> > > > > >
> > > > > > > Could you try the 0.8.2.1 release being voted on now? It fixes
> a
> > > CPU
> > > > > > issue
> > > > > > > and should reduce the CPU load in network thread.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Thu, Feb 19, 2015 at 11:54 AM, Zakee 
> > > wrote:
> > > > > > >
> > > > > > > > Kafka documentation recommends <0.3 for above metric. I
> assume
> > > > > > processor
> > > > > > > is
> > > > > > > > busier if this goes below 0.3 and obviously it being < 0.3
> for
> > > long
> > > > > > does
> > > > > > > > not seem to be a good sign.
> > > > > > > >
> > > > > > > > What should be our criteria to raise an alert, I though it
> > should
> > > > be
> > > > > > > when
> > > > > > > > its value goes below 0.3. However, the value seems to be
> below
> > > 0.3
> > > > a
> > > > > > lot
> > > > > > > of
> > > > > > > > the times, almost always if we take samples every five mins.
> > What
> > > > > > should
> > > > > > > be
> > > > > > > > the threshold to raise an alarm ?
> > > > > > > >
> > > > > > > > What would be the impact of having this below 0.3 or even
> zero
> > > like
> > > > > > most
> > > > > > > of
> > > > > > > > the times?
> > > > > > > >
> > > > > > > >
> > > > > > > > -Zakee
> > > > > > > > 
> > > > > > > > How Old Men Tighten Skin
> > > > > > > > 63 Year Old Man Shares DIY Skin Tightening Method You Can Do
> > From
> > > > > Home
> > > > > > > >
> > > > > >
> > > >
> > http://thirdpartyoffers.netzero.net/TGL3231/54e63f5bda4c23f5b6560st02vuc
> > > > > > > 
> > > > > > > 8% Annuity Return Secret
> > > > > > > Earn Guaranteed Income for Life! Compare Rates Today.
> > > > > > >
> > > > >
> > >
> http://thirdpartyoffers.netzero.net/TGL3255/54e6782bcbe78782b37bdmp15duc
> > > > > >
> > > > > 
> > > > > High School Yearbooks
> > > > > View Class Yearbooks Online Free. Reminisce & Buy a Reprint Today!
> > > > >
> > >
> http://thirdpartyoffers.netzero.net/TGL3255/54eac962abad049627e56mp03duc
> > > >
> > > 
> > > Extended Stay America
> > > Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
> > >
> http://thirdpartyoffers.netzero.net/TGL3255/54ed04fb9538c4fb6188mp15duc
> >
> 
> Extended Stay America
> Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
> http://thirdpartyoffers.netzero.net/TGL3255/54ee8dd7e1cd7dd777e8mp03duc


Re: Latest offset is frozen

2015-02-25 Thread Stuart Reynolds
Doh! Was assuming there was only 1 partition... Need to read all the partitions.

On Mon, Feb 23, 2015 at 3:21 PM, Jun Rao  wrote:
> Hmm, when the tail offset is frozen, does it freeze forever? Also, do you
> get the same frozen offset if you run the GetOffsetShell command?
>
> Thanks,
>
> Jun
>
> On Sun, Feb 22, 2015 at 6:45 PM, Stuart Reynolds 
> wrote:
>
>> I'm finding that if I continuously produce values to a topic (say,
>> once every 2 seconds), and in another thread, query the head and tail
>> offsets of a topic, then sometimes I see the head offset increasing,
>> sometimes its frozen. What's up with that?
>>
>> I'm using scala client: 0.8.2 and server: 2.9.2-0.8.1.1
>>
>> I querying the head and tail offsets like this:
>>
>>   private def getOffset(consumer: SimpleConsumer, topic: String,
>> partition: Int, whichTime: Long): Long = {
>> val topicAndPartition = new TopicAndPartition(topic, partition);
>> val response = consumer.earliestOrLatestOffset(
>>topicAndPartition,
>>earliestOrLatest = whichTime,
>>consumerId= 0);
>> return response;
>>   }
>>
>>   case class HeadAndTailOffsets(head: Long, tail: Long)
>>
>>   def getHeadAndTailOffsets(consumer: SimpleConsumer, topic: String,
>> partition: Int = 0): HeadAndTailOffsets =
>> HeadAndTailOffsets(
>>   head = getOffset(consumer, topic, partition,
>> kafka.api.OffsetRequest.EarliestTime),
>>   tail = getOffset(consumer, topic, partition,
>> kafka.api.OffsetRequest.LatestTime))
>>
>> -
>> If I run a producer, consumer, and offset reporter threads. On the
>> first run I might get something like this:
>>
>>
>> -
>> producer   consumer offsets
>>offset,message   head,tail
>>   "MSG-0"   0, "MSG-0"  0,1
>>   "MSG-1"   1, "MSG-1"  0,2
>>   "MSG-2"   2, "MSG-2"  0,3
>> ...  ......
>> -
>> On subsequent runs, I might see something like this:
>> -
>>  producer  consumer   offsets
>> offset,message  head,tail
>>   "MSG-0"  10, "MSG-0” 0,21 ** tail is frozen
>>   "MSG-1"  11, "MSG-1" 0,21
>>   "MSG-2"  12, "MSG-2” 0,21 ** lies, damn lies
>>..
>>   "MSG-31" 31,"MSG-21" 0,21
>>   "MSG-32" 31,"MSG-22" 0,21
>> ...  ......
>> -
>> i.e. the consumer sees increasing offsets with the received messages,
>> but the thread reporting the topic's head and tail offsets is frozen.
>>
>> Is this a client bug or an issue with my usage?
>>
>> I have a fuller code sample here:
>>
>> http://stackoverflow.com/questions/28663714/why-is-kafkas-latest-offset-report-sometimes-frozen
>>
>> Thanks
>> - Stuart
>>