Re: Kafka topics with infinite retention?

2016-03-14 Thread Giidox
I would like to read an answer to this question as well. This is a similar architecture as I am planning. Dealing with secondary data store for old messages would indeed make things complicated. Clark Haskins wrote that the partition size is limited by machines capacity (I assume disk space):

Re: Kafka topics with infinite retention?

2016-03-14 Thread Daniel Schierbeck
Partitions being limited by disk size is no different from e.g. a SQL store. This would not be used for extremely high throughput. If, eventually, there was a good case for not requiring that an entire partition must be stored on a single machine, it would be possible to use the log segments for di

Re: Kafka Applicability - Large Messages

2016-03-14 Thread Jens Rantil
Just making it more explicit: AFAIK, all Kafka consumers I've seen loads the incoming messages into memory. Unless you make it possible to stream it do disk or something you need to make sure your consumers has the available memory. Cheers, Jens On Fri, Mar 4, 2016 at 6:07 PM Cees de Groot wrote

Re: Kafka topics with infinite retention?

2016-03-14 Thread Gerard Klijs
You might find what you want when looking how Kafka is used for samza, http://samza.apache.org/ On Mon, Mar 14, 2016 at 10:34 AM Daniel Schierbeck wrote: > Partitions being limited by disk size is no different from e.g. a SQL > store. This would not be used for extremely high throughput. If, > e

Re: Kafka topics with infinite retention?

2016-03-14 Thread Jens Rantil
This is definitely an interesting use case. However, you need to be aware that changing the broker topology won't rebalance the preexisting data from the previous brokers. That is, you risk loosing data. Cheers, Jens On Wed, Mar 9, 2016 at 2:10 PM Daniel Schierbeck wrote: > I'm considering an a

Re: Kafka topics with infinite retention?

2016-03-14 Thread Ben Stopford
A couple of things: - Compacted topics provide a useful way to retain meaningful datasets inside the broker, which don’t grow indefinitely. If you have an update-in-place use case, where the event sourced approach doesn’t buy you much, these will keep the reload time down when you regenerate ma

Kafka LTS release

2016-03-14 Thread Achanta Vamsi Subhash
Hi all, We find that there are many releases of Kafka and not all the bugs are back ported to the older releases. Can we have a LTS (Long Term Support) release which can be supported for 2 years with all the bugs back-ported? This will be very helpful as during the last 2-3 releases, we often hav

Re: Kafka Applicability - Large Messages

2016-03-14 Thread Ben Stopford
Becket did a good talk at the last Kafka meetup on how Linked In handle the large message problem. http://www.slideshare.net/JiangjieQin/handle-large-messages-in-apache-kafka-58692297 > On 14 Mar 2016, at 0

Re: Consuming previous messages and from different group.id

2016-03-14 Thread Gerard Klijs
Hi, if you use a new group for a consumer, the auto.offset.reset value will determine whether it will start at the beginning (with value earliest) or at the end (with value latest). For each group a separate offset is used, to two consumer, belonging to two different groups, when started before the

Re: Need a help in understanding __consumer_offsets topic creation in Kafka Cluster

2016-03-14 Thread Achanta Vamsi Subhash
We changed the policy to "delete" dynamically for the __consumer_offsets topic and it was a better option than doing a cluster restart after enabling log compaction. Also, we found problems when you are replicating to a log compacted topic from a non-compacted topic (which is leader). On Mon, Mar

UNKNOWN_MEMBER_ID assigned to consumer group

2016-03-14 Thread tao xiao
Hi team, I have about 10 consumers using the same consumer group connecting to Kafka. Occasional I can see UNKNOWN_MEMBER_ID assigned to some of the consumers. I want to under what situation this would happen? I use Kafka version 0.9.0.1

New client commitAsync SendFailedException

2016-03-14 Thread Alexey Romanchuk
Hi all! I am using new client 0.9.0.1. I found that when I call commitAsync multiple times before calling poll most of commits failed with SendFailedException. Here it is an example of code - https://gist.github.com/13h3r/42633bcd64b80ddffe6b Could you please explain commitAsync in more details

AUTO: Yan Wang is out of the office (returning 03/17/2016)

2016-03-14 Thread Yan Wang
I am out of the office until 03/17/2016. Note: This is an automated response to your message "New client commitAsync SendFailedException" sent on 3/14/2016 10:18:14 AM. This is the only notification you will receive while this person is away. ** This email and any attachments may contain i

Re: seekToBeginning doesn't work without auto.offset.reset

2016-03-14 Thread Cody Koeninger
Honestly the fact that everything is hidden inside poll() has been confusing people since last year, e.g. https://issues.apache.org/jira/browse/KAFKA-2359 I can try to formulate a KIP for this, but it seems clear that I'm not the only one giving this feedback, and I may not understand all the oth

Re: UNKNOWN_MEMBER_ID assigned to consumer group

2016-03-14 Thread Jason Gustafson
Hey Tao, This error indicates that a rebalance completed successfully before the consumer could rejoin. Basically it works like this: 1. Consumer 1 joins the group and is assigned member id A 2. Consumer 1's session timeout expires before successfully heartbeating. 3. The group is rebalanced with

Kafka Streams question

2016-03-14 Thread Mike Thomsen
I was reading a bit about Kafka Streams and was wondering if it is appropriate for my team's use. We ingest data using Kafka and Storm. Data gets pulled by Storm and sent off to bolts that publish the data into HBase and Solr. One of the things we need is something analogous to Storm's ability to f

Re: Retry Message Consumption On Database Failure

2016-03-14 Thread Jason Gustafson
Hey Michael, I don't think a policy of retrying indefinitely is generally possible with the new consumer even if you had a heartbeat API. The problem is that the consumer itself doesn't control when the group needs to rebalance. If another consumer joins or leaves the group, then all consumers wil

Re: seekToBeginning doesn't work without auto.offset.reset

2016-03-14 Thread Jason Gustafson
Late arrival to this discussion. I'm not really sure I see the problem with accessing the consumer in the rebalance listener. Before we passed the consumer instance as a separate argument, but that was only because the rebalance listener had to be passed by classname before a reference to the consu

Re: Kafka Applicability - Large Messages

2016-03-14 Thread Cees de Groot
On Mon, Mar 14, 2016 at 5:42 AM, Jens Rantil wrote: > Just making it more explicit: AFAIK, all Kafka consumers I've seen loads > the incoming messages into memory. Unless you make it possible to stream it > do disk or something you need to make sure your consumers has the available > memory. > >

Re: Kafka 0.9.0.1 broker 0.9 consumer location of consumer group data

2016-03-14 Thread Rajiv Kurian
Has any one run into similar problems. I have experienced the same problem again. This time when I use kafka-consumer-groups.sh tool it says that my consumer group is either missing or rebalancing. But when I use the --list method it shows up on the list. So my guess is it is rebalancing some how.

Re: Retry Message Consumption On Database Failure

2016-03-14 Thread Christian Posta
Jason, Can you link to the proposal so I can take a look? Would the "sticky" proposal prefer to keep partitions assigned to consumers who currently have them and have not failed? On Mon, Mar 14, 2016 at 10:16 AM, Jason Gustafson wrote: > Hey Michael, > > I don't think a policy of retrying indef

Re: Kafka 0.9.0.1 broker 0.9 consumer location of consumer group data

2016-03-14 Thread Jason Gustafson
Hey Rajiv, That sounds suspiciously like one of the bugs from 0.9.0.0. Have you updated kafka-clients to 0.9.0.1? -Jason On Mon, Mar 14, 2016 at 11:18 AM, Rajiv Kurian wrote: > Has any one run into similar problems. I have experienced the same problem > again. This time when I use kafka-consum

Re: Retry Message Consumption On Database Failure

2016-03-14 Thread Jason Gustafson
Yeah, that's the idea. Here's the JIRA I was thinking of: https://issues.apache.org/jira/browse/KAFKA-2273. I'm guessing this will need a KIP after 0.10 is out. -Jason On Mon, Mar 14, 2016 at 11:21 AM, Christian Posta wrote: > Jason, > > Can you link to the proposal so I can take a look? Would

Re: Kafka 0.9.0.1 broker 0.9 consumer location of consumer group data

2016-03-14 Thread Rajiv Kurian
No I haven't. It's still running the 0.9.0 client. I'll try upgrading if it sounds like an old bug. On Mon, Mar 14, 2016 at 11:24 AM, Jason Gustafson wrote: > Hey Rajiv, > > That sounds suspiciously like one of the bugs from 0.9.0.0. Have you > updated kafka-clients to 0.9.0.1? > > -Jason > > On

Re: Retry Message Consumption On Database Failure

2016-03-14 Thread Christian Posta
Awesome, thanks! I'll take a look! On Mon, Mar 14, 2016 at 11:27 AM, Jason Gustafson wrote: > Yeah, that's the idea. Here's the JIRA I was thinking of: > https://issues.apache.org/jira/browse/KAFKA-2273. I'm guessing this will > need a KIP after 0.10 is out. > > -Jason > > On Mon, Mar 14, 2016

Re: Kafka 0.9.0.1 broker 0.9 consumer location of consumer group data

2016-03-14 Thread Ismael Juma
Please upgrade indeed, 0.9.0.1 includes a number of important fixes. Ismael On 14 Mar 2016 18:36, "Rajiv Kurian" wrote: > No I haven't. It's still running the 0.9.0 client. I'll try upgrading if it > sounds like an old bug. > > On Mon, Mar 14, 2016 at 11:24 AM, Jason Gustafson > wrote: > > > He

Re: Kafka 0.9.0.1 broker 0.9 consumer location of consumer group data

2016-03-14 Thread Rajiv Kurian
@Jason, can you please point me to the bug that you were talking about in 0.9.0.0? On Mon, Mar 14, 2016 at 11:36 AM, Rajiv Kurian wrote: > No I haven't. It's still running the 0.9.0 client. I'll try upgrading if > it sounds like an old bug. > > On Mon, Mar 14, 2016 at 11:24 AM, Jason Gustafson

Re: Kafka Applicability - Large Messages

2016-03-14 Thread David Remy
9am the the-life-cycle-o9f-programming-languages 9 Sent from mobile, please excuse typos. Original Message From: Ben Stopford Sent: Monday, March 14, 2016 08:19 AM To: users@kafka.apache.org Subject: Re: Kafka Applicability - Large Messages Becket did a good talk at the last

Re: New client commitAsync SendFailedException

2016-03-14 Thread Jason Gustafson
Hey Alexey, Asynchronous commit handling could probably be improved quite a bit. Basically what's happening is that the client's send buffer is getting filled up, which causes the subsequent commits to fail with SendFailedException. We don't currently implement any retrying for asynchronous commit

Re: Kafka 0.9.0.1 broker 0.9 consumer location of consumer group data

2016-03-14 Thread Jason Gustafson
I think this is the one: https://issues.apache.org/jira/browse/KAFKA-2978. -Jason On Mon, Mar 14, 2016 at 11:54 AM, Rajiv Kurian wrote: > @Jason, can you please point me to the bug that you were talking about in > 0.9.0.0? > > On Mon, Mar 14, 2016 at 11:36 AM, Rajiv Kurian wrote: > > > No I ha

Kafka mirror maker issue. (data loss?)

2016-03-14 Thread feifei hsu
Hi, We are thinking using mirror maker to replic our kafka data stream. However, I heard mirror maker may lose data which we do not want. I am wondering if anyone has experience of mirror maker. How good and what the best practice to prevent dataloss is when we do data replica? Thanks

Re: seekToBeginning doesn't work without auto.offset.reset

2016-03-14 Thread Cody Koeninger
Regarding the rebalance listener, in the case of the spark integration, it is possible a job can fail and be restarted from checkpoint in a new jvm. That means that you need to be able to reconstruct objects. Any reasonable rebalance listener can't have a 0-arg constructor, because it needs a ref

Re: seekToBeginning doesn't work without auto.offset.reset

2016-03-14 Thread Jason Gustafson
Ah, that makes more sense. I have no idea about the limitations of your use case, but maybe you could expose a different interface to users. interface RebalanceListener { void onPartitionsAssigned(Consumer consumer, Collection partitions); void onPartitionsRevoked(Consumer consumer, Collection

Re: Kafka 0.9.0.1 broker 0.9 consumer location of consumer group data

2016-03-14 Thread Rajiv Kurian
Thanks Jason. I'll try to upgrade and see if it helps. On Mon, Mar 14, 2016 at 12:04 PM, Jason Gustafson wrote: > I think this is the one: https://issues.apache.org/jira/browse/KAFKA-2978. > > -Jason > > On Mon, Mar 14, 2016 at 11:54 AM, Rajiv Kurian wrote: > > > @Jason, can you please point me

Deletion of topic on 0.9.0.0 spams this exception

2016-03-14 Thread Scott Reynolds
>Conditional update of path >/brokers/topics/messages.events/partitions/0/state with data {"controller_epoch":2,"leader":492687262,"version":1,"leader_epoch":4,"isr":[492687262]} and expected version 10 failed due to org.apache.zookeeper.KeeperException$NoN I believe this is caused by deleting the

Re: Deletion of topic on 0.9.0.0 spams this exception

2016-03-14 Thread Stevo Slavić
I've recently created related ticket https://issues.apache.org/jira/browse/KAFKA-3390 On Mon, Mar 14, 2016, 20:54 Scott Reynolds wrote: > >Conditional update of path > >/brokers/topics/messages.events/partitions/0/state with data > > {"controller_epoch":2,"leader":492687262,"version":1,"leader_e

How to get message count per topic?

2016-03-14 Thread Grant Overby (groverby)
What is the most direct way to get a message count per topic or per partition? For context, this is to enable testing. We'd like to confirm with Kafka that a certain number of messages have been written or that the number of messages we processed is equal to the number received by Kafka. [http:/

Re: Kafka Streams question

2016-03-14 Thread Guozhang Wang
Hello Mike, What scenarios could cause your app to not be able to complete processing, are your referring to a runtime exception, or some other app errors (like writing to another external data service that is timed out and cannot be retried etc)? Guozhang On Mon, Mar 14, 2016 at 9:55 AM, Mike T

Re: Deletion of topic on 0.9.0.0 spams this exception

2016-03-14 Thread Scott Reynolds
Yep that is it. Thanks. I will watch the issue. On Mon, Mar 14, 2016 at 1:13 PM Stevo Slavić wrote: > I've recently created related ticket > https://issues.apache.org/jira/browse/KAFKA-3390 > > On Mon, Mar 14, 2016, 20:54 Scott Reynolds wrote: > > > >Conditional update of path > > >/brokers/top

Re: Larger Size Error Message

2016-03-14 Thread Fang Wong
After changing log level from INFO to TRACE, here is kafka server.log: [2016-03-14 06:43:03,568] TRACE 156 bytes written. (kafka.network.BoundedByteBufferSend) [2016-03-14 06:43:03,575] TRACE 68 bytes read. (kafka.network.BoundedByteBufferReceive) [2016-03-14 06:43:03,575] TRACE [ReplicaFetcherT

Re: seekToBeginning doesn't work without auto.offset.reset

2016-03-14 Thread Cody Koeninger
Sorry, by metadata I also meant the equivalent of the old OffsetRequest api, which partitionsFor doesn't give you. I understand why you didn't want to expose the broken "offsets before a certain time" api, but I don't understand why equivalent functionality for first or last offset isn't available

Re: seekToBeginning doesn't work without auto.offset.reset

2016-03-14 Thread Jason Gustafson
The offset API is definitely a gap at the moment. I think there were some problems with the old consumer's API and we wanted to make sure we didn't make the same mistakes. Unfortunately, I'm not sure anyone has had the time to give this the attention it needs. Here's a couple JIRAS if you want to h

Re: How to get message count per topic?

2016-03-14 Thread Stevo Slavić
See https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol Using metadata api one can get topic partitions and for each partition which broker is lead. Using offset api one can get partition size. Both apis are low level and to use them directly you would use SimpleConsume

Re: UNKNOWN_MEMBER_ID assigned to consumer group

2016-03-14 Thread tao xiao
Thanks Jason. What does consumer 1 would do upon receiving UNKNOWN_MEMBER_ID and does it rejoin the group eventually if it keeps polling? On Tue, 15 Mar 2016 at 00:58 Jason Gustafson wrote: > Hey Tao, > > This error indicates that a rebalance completed successfully before the > consumer could re

Re: New client commitAsync SendFailedException

2016-03-14 Thread Alexey Romanchuk
Thanks for reply Jason! Is it any way to control size of this buffer? Will it fails if I try to commit offsets for 100 topics/partitions? Sure I can work around it by batching all commits into one call, but the problem I can see is that the API does not enforce me to do this. From client point of

Re: New client commitAsync SendFailedException

2016-03-14 Thread Jay Kreps
This seems like a bug, no? It should just initiate the request not wait for it to be written, there is no way for the user to reason about the state of the send buffer. -jay On Monday, March 14, 2016, Jason Gustafson wrote: > Hey Alexey, > > Asynchronous commit handling could probably be improv