offsets.storage=kafka, dual.commit.enabled=false still requires ZK

2015-06-09 Thread noah
We are setting up a new Kafka project (0.8.2.1) and are trying to go straight to consumer offsets stored in Kafka. Unfortunately it looks like the Java consumer will try to connect to ZooKeeper regardless of the settings. Will/When will this dependency go away completely? It would simplify our dep

Manual Offset Commits with High Level Consumer skipping messages

2015-06-18 Thread noah
We are in a situation where we need at least once delivery. We have a thread that pulls messages off the consumer, puts them in a queue where they go through a few async steps, and then after the final step, we want to commit the offset to the messages we have completed. There may be items we have

Re: Manual Offset Commits with High Level Consumer skipping messages

2015-06-19 Thread noah
t; you get the metadata for the consumed messages? > > On Thu, Jun 18, 2015 at 11:21 PM, noah wrote: > > > We are in a situation where we need at least once delivery. We have a > > thread that pulls messages off the consumer, puts them in a queue where > > they go through

Re: Manual Offset Commits with High Level Consumer skipping messages

2015-06-21 Thread noah
On Sun, Jun 21, 2015 at 1:10 AM Jiangjie Qin wrote: > Hey Noah, > > Carl is right about the offset. The offset to be commit should be the > largest-consumed-offset + 1. But this should not break the at least once > guarantee. > From what I can see, your consumer should not skip

Re: Manual Offset Commits with High Level Consumer skipping messages

2015-06-22 Thread noah
; 2. Consumer thread: loop on consume -> process -> commit offset every N > messages. So we can make sure there is no weird race condition. > > Thanks, > > Jiangjie (Becket) Qin > > On 6/21/15, 6:23 AM, "noah" wrote: > > >On Sun, Jun 21, 2015 at 1:10 AM

Re: Message loss due to zookeeper ensemble doesn't work

2015-06-26 Thread noah
I think you have it backwards. If you don't write your consumer offsets, the worst case is that consumers will read some messages a second time. If your messages are idempotent, then you wont lose or corrupt any data. When the ZK cluster comes back up you can start writing offsets again. However,

Re: How to fetch offset in SimpleConsumer using Java

2015-06-29 Thread noah
I believe clientGroup is your consumer group id. You must've picked a value to commit with, so it needs to be the same one. On Mon, Jun 29, 2015 at 12:50 AM Xiang Zhou (Samuel) wrote: > Hi, > > I use the following snippets to try to get fetch the offset in a > SimpleConsumer I have committed (th

Re: How to monitor consuming rate and lag?

2015-06-30 Thread noah
If you are committing offsets to Kafka, try Burrow: https://github.com/linkedin/Burrow On Tue, Jun 30, 2015 at 3:41 AM Shady Xu wrote: > Hi all, > > I'm now using https://github.com/airbnb/kafka-statsd-metrics2 to monitor > our Kafka cluster. But there are not metrics about consuming rate and la

Re: kafka consumer group API

2015-07-09 Thread noah
Hi! I did something similar. You can use the high level consumer but turn off auto commit and commit only what you are done with. Here's the code I used: https://github.com/iamnoah/kakfa-offsets-test On Thu, Jul 9, 2015 at 4:53 PM Shashank Singh wrote: > Hi Team > > I was going over the documen

Re: Using Kafka as a persistent store

2015-07-10 Thread noah
I don't want to endorse this use of Kafka, but assuming you can give your message unique identifiers, I believe using log compaction will keep all unique messages forever. You can read about how consumer offsets stored in Kafka are managed using a compacted topic here: http://kafka.apache.org/docum

Does Simple Consumer Shell work with Kafka committed offsets?

2015-08-17 Thread noah
I'm trying to use the simple consumer shell to read a particular message, but I get no results for any parition+offset in my topic... I run something like this: ``` [kafka_2.10-0.8.2.1] # bin/kafka-simple-consumer-shell.sh --broker-list broker-01:9092,broker-02:9092,broker-03:9092 --offset 1 --par

Re: How to monitor lag when "kafka" is used as offset.storage?

2015-09-02 Thread noah
We use Burrow . There are rest endpoints you can use to get offsets and manually calculate lag, but if you are focused on alerting, I'd use it's consumer statuses as they are a bit smarter than a simple lag calculation. On Wed, Sep 2, 2015 at 4:08 AM shahab wro

Tools/recommendations to debug performance issues?

2015-09-14 Thread noah
We're using 0.8.2.1 processing maybe 1 million messages per hour. Each message includes tracking information with a timestamp for when it was produced, and a timestamp for when it was consumed, to give us roughly the amount of time it spent in Kafka. On average this number is in the seconds and ou

Re: log.retention.hours not working?

2015-09-21 Thread noah
"minimum age of a log file to be eligible for deletion" Key word is minimum. If you only have 1k logs, Kafka doesn't need to delete anything. Try to push more data through and when it needs to, it will start deleting old logs. On Mon, Sep 21, 2015 at 8:58 PM allen chan wrote: > Hi, > > Just brou

Re: committing offsets

2015-09-22 Thread noah
If you are using the console consumer to check the offsets topic, remember that you need this line in consumer.properties: exclude.internal.topics=false On Tue, Sep 22, 2015 at 6:05 AM Joris Peeters wrote: > Ah, nice! Does not look like it is working, though. For some reason the > __consumer_off

Re: high level consumer timeout?

2015-09-23 Thread noah
Assuming this is a test case with a new topic/consumer groups for each run, do you have auto.offset.reset=smallest? This happens to me constantly in tests because my consumers end up missing the first message since the default is largest (in which case auto commit is a red herring.) On Wed, Sep 23

Re: high level consumer timeout?

2015-09-23 Thread noah
ng on here, I'm in > any case quite thrilled that at least it seems to work now. :) Thanks! > -J > > -Original Message- > From: noah [mailto:iamn...@gmail.com] > Sent: 23 September 2015 09:44 > To: users@kafka.apache.org > Subject: Re: high level consumer timeout?

Frequent Consumer and Producer Disconnects

2015-09-24 Thread noah
We are having issues with producers and consumers frequently fully disconnecting (from both the brokers and ZK) and reconnecting without any apparent cause. On our production systems it can happen anywhere from every 10-15 seconds to 15-20 minutes. On our less beefy test systems and developer lapto

Re: Frequent Consumer and Producer Disconnects

2015-09-26 Thread noah
given you have a topic with 16 > partitions, and you're running 23 consumers, 7 of those consumer threads > are going to be idle because they do not own partitions. > > -Todd > > > On Fri, Sep 25, 2015 at 3:27 PM, noah wrote: > >> We're seeing this the most on develope

Strange ZK Error precedes frequent rebalances

2015-10-14 Thread noah
A number of our developers are seeing errors like the one below in their console when running a consumer on their laptop. The error is always followed by logging indicating that the local consumer is rebalancing, and in the meantime we are not making much progress. I'm reading this as the consumer

Re: Strange ZK Error precedes frequent rebalances

2015-10-14 Thread noah
x27;t lose connectivity to > Zookeeper or that sessions don't time out. You do this by: > 1. Tuning garbage collection on the consumer apps (G1 is recommended) to > avoid long GC pauses - leading cause for timeouts > 2. Increasing Zookeeper session timeout on the consumer > &g

0.9 KafkaConsumer Memory Usage

2016-06-21 Thread noah
I'm using 0.9.0.1 consumers on 0.9.0.1 brokers. In a single Java service, we have 4 producers and 5 consumers. They are all KafkaProducer and KafkaConsumer instances (the new consumer.) Since the 0.9 upgrade, this service is now OOMing after a being up for a few minutes. Heap dumps show >80MB of o

[kafka-connect] multiple or single clusters?

2016-06-24 Thread noah
I'm having some trouble figuring out the right way to run Kafka Connect in production. We will have multiple sink connectors that we need to remain running indefinitely and have at least once semantics (with as little duplication as possible) so it seems clear that we need to run in distributed mod

Re: Storing Kafka Message JSON to deep storage like S3

2016-12-06 Thread noah
If you are willing to setup Kafka Connect, my company has built this connector: https://github.com/spredfast/kafka-connect-s3