Re: Kafka Consumer thoughts

2015-08-02 Thread Jay Kreps
No the purpose of pause/resume was to be able to implement prioritization of processing by partition. i.e. if you want to prioritize a given subset of partitions without rebalancing them to another consumer you pause the others and continue reading. -Jay On Fri, Jul 31, 2015 at 4:55 PM, Jun Rao

Re: Kafka Consumer thoughts

2015-07-31 Thread Jason Gustafson
Hi Jun, This is still debatable, but I think it makes the most sense to keep pause/resume independent of assignment. Otherwise we still get into the weird ordering problems that we were trying to resolve before. To me, pause/resume expresses clearly the intent to suppress consumption from a set of

Re: Kafka Consumer thoughts

2015-07-31 Thread Jun Rao
Jason, I guess that with the new setAssignment() api, we will also be getting rid of pause() and resume()? Thanks, Jun On Fri, Jul 31, 2015 at 11:29 AM, Jason Gustafson wrote: > I was thinking a little bit this morning about the subscription API and I > have a few ideas on how to address some

Re: Kafka Consumer thoughts

2015-07-31 Thread Onur Karaman
For those who are interested in the ticket: https://issues.apache.org/jira/browse/KAFKA-2388 On Fri, Jul 31, 2015 at 1:14 PM, Jiangjie Qin wrote: > I like the idea as well. That is much clearer. > Also agree with Jay on the naming. > > Thanks, Jason. I'll update the Jira ticket. > > Jiangjie (Be

Re: Kafka Consumer thoughts

2015-07-31 Thread Onur Karaman
Great ideas Jason! On Fri, Jul 31, 2015 at 12:19 PM, Jay Kreps wrote: > I like all these ideas. > > Our convention is to keep method names declarative so it should probably be > subscribe(List topics, Callback c) > assign(List > The javadoc would obviously have to clarify the relationship be

Re: Kafka Consumer thoughts

2015-07-31 Thread Jiangjie Qin
I like the idea as well. That is much clearer. Also agree with Jay on the naming. Thanks, Jason. I'll update the Jira ticket. Jiangjie (Becket) Qin On Fri, Jul 31, 2015 at 12:19 PM, Jay Kreps wrote: > I like all these ideas. > > Our convention is to keep method names declarative so it should p

Re: Kafka Consumer thoughts

2015-07-31 Thread Jay Kreps
I like all these ideas. Our convention is to keep method names declarative so it should probably be subscribe(List topics, Callback c) assign(List wrote: > I was thinking a little bit this morning about the subscription API and I > have a few ideas on how to address some of the concerns about

Re: Kafka Consumer thoughts

2015-07-31 Thread Jason Gustafson
I was thinking a little bit this morning about the subscription API and I have a few ideas on how to address some of the concerns about intuitiveness and exception handling. 1. Split the current notion of topic/partition subscription into subscription of topics and assignment of partitions. These

Re: Kafka Consumer thoughts

2015-07-30 Thread Jay Kreps
Hey Becket, Yeah the high-level belief here is that it is possible to give something as high level as the existing "high level" consumer, but this is not likely to be the end-all be-all of high-level interfaces for processing streams of messages. For example neither of these interfaces handles the

Re: Kafka Consumer thoughts

2015-07-29 Thread Jiangjie Qin
Thanks for the comments Jason and Jay. Jason, I had the same concern for producer's callback as well before, but it seems to be fine from some callbacks I wrote - user can always pass in object in the constructor if necessary for synchronization. Jay, I agree that the current API might be fine fo

Re: Kafka Consumer thoughts

2015-07-29 Thread Ismael Juma
Hi Jay, Good points. A few remarks below. On Wed, Jul 29, 2015 at 11:16 PM, Jay Kreps wrote: > > I suggest we focus on threading and the current event-loop style of api > design since I think that is really the crux. > Agreed. I think ultimately though what you need to think about is, does an

Re: Kafka Consumer thoughts

2015-07-29 Thread Jay Kreps
Some comments on the proposal: I think we are conflating a number of things that should probably be addressed individually because they are unrelated. My past experience is that this always makes progress hard. The more we can pick apart these items the better: 1. threading model 2. blockin

Re: Kafka Consumer thoughts

2015-07-29 Thread Jason Gustafson
I think this proposal matches pretty well with what user's intuitively expect the implementation to be. At a glance, I don't see any problems with doing the liveness detection in the background thread. It also has the advantage that the frequency of heartbeats (which controls how long rebalancing t

Re: Kafka Consumer thoughts

2015-07-29 Thread Neha Narkhede
Works now. Thanks Becket! On Wed, Jul 29, 2015 at 1:19 PM, Jiangjie Qin wrote: > Ah... My bad, forgot to change the URL link for pictures. > Thanks for the quick response, Neha. It should be fixed now, can you try > again? > > Jiangjie (Becket) Qin > > On Wed, Jul 29, 2015 at 1:10 PM, Neha Narkh

Re: Kafka Consumer thoughts

2015-07-29 Thread Guozhang Wang
The images load for me well. On Wed, Jul 29, 2015 at 1:10 PM, Neha Narkhede wrote: > Thanks Becket. Quick comment - there seem to be a bunch of images that the > wiki refers to, but none loaded for me. Just making sure if its just me or > can everyone not see the pictures? > > On Wed, Jul 29, 20

Re: Kafka Consumer thoughts

2015-07-29 Thread Jiangjie Qin
Ah... My bad, forgot to change the URL link for pictures. Thanks for the quick response, Neha. It should be fixed now, can you try again? Jiangjie (Becket) Qin On Wed, Jul 29, 2015 at 1:10 PM, Neha Narkhede wrote: > Thanks Becket. Quick comment - there seem to be a bunch of images that the > wi

Re: Kafka Consumer thoughts

2015-07-29 Thread Neha Narkhede
Thanks Becket. Quick comment - there seem to be a bunch of images that the wiki refers to, but none loaded for me. Just making sure if its just me or can everyone not see the pictures? On Wed, Jul 29, 2015 at 12:00 PM, Jiangjie Qin wrote: > I agree with Ewen that a single threaded model will be

Re: Kafka Consumer thoughts

2015-07-29 Thread Jiangjie Qin
I agree with Ewen that a single threaded model will be tricky to implement the same conventional semantic of async or Future. We just drafted the following wiki which explains our thoughts in LinkedIn on the new consumer API and threading model. https://cwiki.apache.org/confluence/display/KAFKA/Ne

Re: Kafka Consumer thoughts

2015-07-28 Thread Ewen Cheslack-Postava
On Tue, Jul 28, 2015 at 5:18 PM, Guozhang Wang wrote: > I think Ewen has proposed these APIs for using callbacks along with > returning future in the commit calls, i.e. something similar to: > > public Future commit(ConsumerCommitCallback callback); > > public Future commit(Map offsets, > Consume

Re: Kafka Consumer thoughts

2015-07-28 Thread Guozhang Wang
I think Ewen has proposed these APIs for using callbacks along with returning future in the commit calls, i.e. something similar to: public Future commit(ConsumerCommitCallback callback); public Future commit(Map offsets, ConsumerCommitCallback callback); At that time I was slightly intending no

Re: Kafka Consumer thoughts

2015-07-28 Thread Neha Narkhede
Hey Adi, When we designed the initial version, the producer API was still changing. I thought about adding the Future and then just didn't get to it. I agree that we should look into adding it for consistency. Thanks, Neha On Tue, Jul 28, 2015 at 1:51 PM, Aditya Auradkar wrote: > Great discuss

Re: Kafka Consumer thoughts

2015-07-28 Thread Aditya Auradkar
Great discussion everyone! One general comment on the sync/async API's on the new consumer. I think the producer tackles sync vs async API's well. For API's that can either be sync or async, can we simply return a future? That seems more elegant for the API's that make sense either in both flavors

Re: Kafka Consumer thoughts

2015-07-27 Thread Jason Gustafson
I think if we recommend a longer session timeout, then we should expose the heartbeat frequency in configuration since this generally controls how long normal rebalances will take. I think it's currently hard-coded to 3 heartbeats per session timeout. It could also be nice to have an explicit Leave

Re: Kafka Consumer thoughts

2015-07-27 Thread Ewen Cheslack-Postava
Kartik, on your second point about timeouts with poll() and heartbeats, the consumer now handles this properly. KAFKA-2123 introduced a DelayedTaskQueue and that is used internally to handle processing events at the right time even if poll() is called with a large timeout. The same mechanism is use

Re: Kafka Consumer thoughts

2015-07-27 Thread Jay Kreps
Hey Kartik, Totally agree we don't want people tuning timeouts in the common case. However there are two ways to avoid this: 1. Default the timeout high 2. Put the heartbeat in a separate thread When we were doing the consumer design we discussed this tradeoff and I think the conclusion we came

Re: Kafka Consumer thoughts

2015-07-27 Thread Kartik Paramasivam
adding the open source alias. This email started off as a broader discussion around the new consumer. I was zooming into only the aspect of poll() being the only mechanism for driving the heartbeats. Yes the lag is the effect of the problem (not the problem). Monitoring the lag is important as