Re: New Consumer API discussion

2014-03-27 Thread Neha Narkhede
If people don't have any more thoughts on this, I will go ahead and submit a reviewboard to https://issues.apache.org/jira/browse/KAFKA-1328. Thanks, Neha On Mon, Mar 24, 2014 at 5:39 PM, Neha Narkhede wrote: > I took some time to write some example code using the new consumer APIs to > cover a

Re: New Consumer API discussion

2014-03-24 Thread Neha Narkhede
I took some time to write some example code using the new consumer APIs to cover a range of use cases. This exercise was very useful (thanks for the suggestion, Jay!) since I found several improvements to the APIs to make them more usable. Here are some of the changes

Re: New Consumer API discussion

2014-03-24 Thread Neha Narkhede
Hey Chris, Really sorry for the late reply, wonder how this fell through the cracks. Anyhow, thanks for the great feedback! Here are my comments - 1. Why is the config String->Object instead of String->String? This is probably more of a feedback about the new config management that we adopted in

Re: New Consumer API discussion

2014-03-17 Thread Neha Narkhede
I'm not quite sure if I fully understood your question. The consumer API exposes a close() method that will shutdown the consumer's connections to all brokers and frees up resources that the consumer uses. I've updated the javadoc for the new consumer API to include a few examples of different way

Re: New Consumer API discussion

2014-03-16 Thread Shanmugam, Srividhya
Can the consumer API provide a way to shut down the connector by doing a look up by the consumer group Id? For example, application may be consuming the messages in one thread whereas the shutdown call can be initiated in a different thread. This email and any files transmitted with it are con

Re: New Consumer API discussion

2014-03-03 Thread Chris Riccomini
Hey Guys, Also, for reference, we'll be looking to implement new Samza consumers which have these APIs: http://samza.incubator.apache.org/learn/documentation/0.7.0/api/javadocs/or g/apache/samza/system/SystemConsumer.html http://samza.incubator.apache.org/learn/documentation/0.7.0/api/javadocs/o

Re: New Consumer API discussion

2014-03-03 Thread Chris Riccomini
Hey Guys, Sorry for the late follow up. Here are my questions/thoughts on the API: 1. Why is the config String->Object instead of String->String? 2. Are these Java docs correct? KafkaConsumer(java.util.Map configs) A consumer is instantiated by providing a set of key-value pairs as configur

Re: New Consumer API discussion

2014-02-28 Thread Neha Narkhede
2. It would be good to document that the following apis are > > >> mutually > > >>>>>>>>>> exclusive. Also, if the partition level subscription is > > specified, > > >>>>>>> there > > >>>>>>>>&g

Re: New Consumer API discussion

2014-02-28 Thread S Ahmed
partition level if the subscription is done at the topic > >> level. > >>>>>>>>>> > >>>>>>>>>> *subscribe*(java.lang.String... topics) > >>>>>>>>>> *subscribe*(java.lang.String topic, int...

Re: New Consumer API discussion

2014-02-27 Thread Robert Withers
;>> Makes sense. Made the suggested improvements to the docs< >>>>>>> >>>>>> >>>> >> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/Consumer.html#subscribe%28java.lang.String...%29 >>>>>>>> >>

Re: New Consumer API discussion

2014-02-27 Thread Neha Narkhede
ess(consumer.pollToMark()); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> or > >>>>>>>>> > >>>>>>>>> long end = consumer.lastOffset(tp); > >>>>>

Re: New Consumer API discussion

2014-02-27 Thread Robert Withers
that if no partitions are specified, offsets >>>> will >>>>>>>> be committed for the subscribed list of partitions. One improvement >>>>> could >>>>>>>> be to >>>>>>>> explicitly state that the offsets r

Re: New Consumer API discussion

2014-02-27 Thread Neha Narkhede
gt; topics and partitions to *Kafka*. If no offsets are specified, > >> commits > >>>>>> offsets returned on the last {@link #poll(long, TimeUnit) poll()} > for > >>>>>> the subscribed list of topics and partitions. > >>>>>&g

Re: New Consumer API discussion

2014-02-26 Thread Robert Withers
signed(Consumer consumer, >>>>>> TopicPartition...partitions) >>>>>> >>>>>> public void *subscribe*(java.lang.String topic, int... partitions) >>>>>> >>>>>> Yes, this was discussed previously. I think g

Re: New Consumer API discussion

2014-02-25 Thread Neha Narkhede
> > >>> What's the use case of position()? Isn't that just the nextOffset() > on > > >>> the > > >>> last message returned from poll()? > > >>> > > >>> Yes, except in the case where a rebalance is triggered and poll

Re: New Consumer API discussion

2014-02-25 Thread Jay Kreps
> >>> is nice. > >>> > >>> > >>> > >>> > >>> On Mon, Feb 24, 2014 at 7:06 PM, Withers, Robert < > >>> robert.with...@dish.com> wrote: > >>> > >>>> That's wonderful. Thanks for kaf

Re: New Consumer API discussion

2014-02-25 Thread Jay Kreps
hang > >> > >> > >> On Mon, Feb 24, 2014 at 8:18 AM, Withers, Robert < > robert.with...@dish.com > >> <mailto:robert.with...@dish.com>>wrote: > >> > >> Jun, > >> > >> Are you saying it is possible to get events fro

Re: New Consumer API discussion

2014-02-25 Thread Jun Rao
ader changes and so on? I call this OOB traffic, since they are not > the > > core messages streaming, but side-band events, yet they are still > > potentially useful to consumers. > > > > Thank you, > > Robert > > > > > > Robert Withers > &

Re: New Consumer API discussion

2014-02-25 Thread Neha Narkhede
com>> wrote: >>>> >>>> Hi Robert, >>>> >>>> Yes, you can check out the callback functions in the new API >>>> >>>> onPartitionDesigned >>>> onPartitionAssigned >>>> >>>> and see if they

Re: New Consumer API discussion

2014-02-25 Thread Neha Narkhede
AM, Withers, Robert < >>> robert.with...@dish.com<mailto:robert.with...@dish.com>>wrote: >>> >>> Jun, >>> >>> Are you saying it is possible to get events from the high-level consumer >>> regarding various state machine changes? For instance, c

Re: New Consumer API discussion

2014-02-25 Thread Neha Narkhede
a partition is >> assigned/unassigned, when an offset is committed on a partition, when a >> leader changes and so on? I call this OOB traffic, since they are not the >> core messages streaming, but side-band events, yet they are still >> potentially useful to consumers. &

Re: New Consumer API discussion

2014-02-25 Thread Neha Narkhede
> o: (720) 514-8963 > c: (571) 262-1873 > > > > -Original Message----- > From: Jun Rao [mailto:jun...@gmail.com] > Sent: Sunday, February 23, 2014 4:19 PM > To: users@kafka.apache.org<mailto:users@kafka.apache.org> > Subject: Re: New Consumer API discussi

Re: New Consumer API discussion

2014-02-24 Thread Withers, Robert
/Developer o: (720) 514-8963 c: (571) 262-1873 -Original Message- From: Jun Rao [mailto:jun...@gmail.com] Sent: Sunday, February 23, 2014 4:19 PM To: users@kafka.apache.org<mailto:users@kafka.apache.org> Subject: Re: New Consumer API discussion Robert, For the push orient api, y

Re: New Consumer API discussion

2014-02-24 Thread Guozhang Wang
sage- > From: Jun Rao [mailto:jun...@gmail.com] > Sent: Sunday, February 23, 2014 4:19 PM > To: users@kafka.apache.org > Subject: Re: New Consumer API discussion > > Robert, > > For the push orient api, you can potentially implement your own > MessageHandler with tho

RE: New Consumer API discussion

2014-02-24 Thread Withers, Robert
: Jun Rao [mailto:jun...@gmail.com] Sent: Sunday, February 23, 2014 4:19 PM To: users@kafka.apache.org Subject: Re: New Consumer API discussion Robert, For the push orient api, you can potentially implement your own MessageHandler with those methods. In the main loop of our new consumer api, you

Re: New Consumer API discussion

2014-02-23 Thread Jun Rao
between fetching or no messages left. > > * A nextMsgs() method which returns all locally available messages > and kicks off a fetch for the next chunk. > > > > If you are trying to add transactional features, then formally define a > DTP capability and pull in other se

Re: New Consumer API discussion

2014-02-23 Thread Withers, Robert
We use kafka as a durable buffer for 3rd party event traffic. It acts as the event source in a lambda architecture. We want it to be exactly once and we are close, though we can lose messages aggregating for Hadoop. To really tie this all together, I think there should be an Apache project to

Re: New Consumer API discussion

2014-02-22 Thread Withers, Robert
514-8963 c: (571) 262-1873 -Original Message- From: Jay Kreps [mailto:jay.kr...@gmail.com] Sent: Sunday, February 16, 2014 10:13 AM To: users@kafka.apache.org<mailto:users@kafka.apache.org><mailto:users@kafka.apache.org> Subject: Re: New Consumer API discussion +1 I think those are

Re: New Consumer API discussion

2014-02-22 Thread Jay Kreps
you are trying to add transactional features, then formally define a > DTP capability and pull in other server frameworks to share the > implementation. Should it be XA/Open? How about a new peer2peer DTP > protocol? > > > > Thank you, > > Robert > > > > Ro

Re: New Consumer API discussion

2014-02-22 Thread Withers, Robert
: (571) 262-1873 -Original Message- From: Jay Kreps [mailto:jay.kr...@gmail.com] Sent: Sunday, February 16, 2014 10:13 AM To: users@kafka.apache.org<mailto:users@kafka.apache.org> Subject: Re: New Consumer API discussion +1 I think those are good. It is a little weird that changing

Re: New Consumer API discussion

2014-02-21 Thread Jay Kreps
Yes but the problem is that poll() actually has side effects if you are using auto commit. So you have to do an awkward thing were you track the last offset you've seen and somehow keep this up to date as the partitions you own changes. Likewise if you want this value prior to reading any messages

Re: New Consumer API discussion

2014-02-21 Thread Jun Rao
t a new peer2peer DTP > protocol? > > > > Thank you, > > Robert > > > > Robert Withers > > Staff Analyst/Developer > > o: (720) 514-8963 > > c: (571) 262-1873 > > > > -Original Message- > From: Jay Kreps [mailto:jay.kr...@gmail.com

Re: New Consumer API discussion

2014-02-21 Thread Jun Rao
What's the use case of position()? Isn't that just the nextOffset() on the last message returned from poll()? Thanks, Jun On Sun, Feb 16, 2014 at 9:12 AM, Jay Kreps wrote: > +1 I think those are good. It is a little weird that changing the fetch > point is not batched but changing the commit

RE: New Consumer API discussion

2014-02-19 Thread Withers, Robert
, Robert Robert Withers Staff Analyst/Developer o: (720) 514-8963 c: (571) 262-1873 -Original Message- From: Jay Kreps [mailto:jay.kr...@gmail.com] Sent: Sunday, February 16, 2014 10:13 AM To: users@kafka.apache.org Subject: Re: New Consumer API discussion +1 I think those are good. It

Re: New Consumer API discussion

2014-02-16 Thread Jay Kreps
+1 I think those are good. It is a little weird that changing the fetch point is not batched but changing the commit point is, but I suppose there is no helping that. -Jay On Sat, Feb 15, 2014 at 7:52 AM, Neha Narkhede wrote: > Jay, > > That makes sense. position/seek deal with changing the con

Re: New Consumer API discussion

2014-02-15 Thread Neha Narkhede
Jay, That makes sense. position/seek deal with changing the consumers in-memory data, so there is no remote rpc there. For some reason, I got committed and seek mixed up in my head at that time :) So we still end up with long position(TopicPartition tp) void seek(TopicPartitionOffset p)

Re: New Consumer API discussion

2014-02-14 Thread Jay Kreps
Oh, interesting. So I am assuming the following implementation: 1. We have an in-memory fetch position which controls the next fetch offset. 2. Changing this has no effect until you poll again at which point your fetch request will be from the newly specified offset 3. We then have an in-memory but

Re: New Consumer API discussion

2014-02-13 Thread Neha Narkhede
I think you are saying both, i.e. if you have committed on a partition it returns you that value but if you haven't it does a remote lookup? Correct. The other argument for making committed batched is that commit() is batched, so there is symmetry. position() and seek() are always in memory chan

Re: New Consumer API discussion

2014-02-13 Thread Jay Kreps
Hey Neha, I actually wasn't proposing the name TopicOffsetPosition, that was just a typo. I meant TopicPartitionOffset, and I was just referencing what was in the javadoc. So to restate my proposal without the typo, using just the existing classes (that naming is a separate question): long posi

Re: New Consumer API discussion

2014-02-13 Thread Tom Brown
Conceptually, do the position methods only apply to topics you've subscribed to, or do they apply to all topics in the cluster? E.g., could I retrieve or set the committed position of any partition? The positive use case for having access to all partition information would be to setup an active m

Re: New Consumer API discussion

2014-02-13 Thread Pradeep Gollakota
Hi Neha, 6. It seems like #4 can be avoided by using Map> Long> or Map as the argument type. > > How? lastCommittedOffsets() is independent of positions(). I'm not sure I > understood your suggestion. I think of subscription as you're subscribing to a Set of TopicPartitions. Because the argume

Re: New Consumer API discussion

2014-02-13 Thread Neha Narkhede
2. It returns a list of results. But how can you use the list? The only way to use the list is to make a map of tp=>offset and then look up results in this map (or do a for loop over the list for the partition you want). I recommend that if this is an in-memory check we just do one at a time. E.g.

Re: New Consumer API discussion

2014-02-13 Thread Jay Kreps
Hey guys, One thing that bugs me is the lack of symmetric for the different position calls. The way I see it there are two positions we maintain: the fetch position and the last commit position. There are two things you can do to these positions: get the current value or change the current value.

Re: New Consumer API discussion

2014-02-13 Thread Neha Narkhede
Pradeep - Thanks for your detailed comments. 1. subscribe(String topic, int... paritions) and unsubscribe(String topic, int... partitions) should be subscribe(TopicPartition... topicPartitions)and unsubscribe(TopicPartition... topicPartitons) I think that is reasonable. Overall, I'm in

Re: New Consumer API discussion

2014-02-12 Thread Jay Kreps
Ah, gotcha. -Jay On Wed, Feb 12, 2014 at 8:40 AM, Neha Narkhede wrote: > Jay > > Well none kind of address the common case which is to commit all > partitions. For these I was thinking just >commit(); > The advantage of this simpler method is that you don't need to bother about > partitions

Re: New Consumer API discussion

2014-02-12 Thread Neha Narkhede
Jay Well none kind of address the common case which is to commit all partitions. For these I was thinking just commit(); The advantage of this simpler method is that you don't need to bother about partitions you just consume the messages given to you and then commit them This is already what t

Re: New Consumer API discussion

2014-02-12 Thread Neha Narkhede
Imran, Sorry I am probably missing something basic, but I'm not sure how a multi-threaded consumer would work. I can imagine its either: a) I just have one thread poll kafka. If I want to process msgs in multiple threads, than I deal w/ that after polling, eg. stick them into a blocking queue o

Re: New Consumer API discussion

2014-02-11 Thread Jay Kreps
Comments inline: On Mon, Feb 10, 2014 at 2:31 PM, Guozhang Wang wrote: > Hello Jay, > > Thanks for the detailed comments. > > 1. Yeah we could discuss a bit more on that. > > 2. Since subscribe() is incremental, adding one topic-partition is OK, and > personally I think it is cleaner than subsc

Re: New Consumer API discussion

2014-02-11 Thread Guozhang Wang
Hi Imran, 1. I think choosing between a) and b) is really dependent on the consuming traffic. We decided to make the consumer client single-threaded and let users to decide using one or multiple clients based on traffic mainly because with a multi-thread client, the fetcher thread could die silent

Re: New Consumer API discussion

2014-02-11 Thread Imran Rashid
Hi, thanks for sharing this and getting feedback. Sorry I am probably missing something basic, but I'm not sure how a multi-threaded consumer would work. I can imagine its either: a) I just have one thread poll kafka. If I want to process msgs in multiple threads, than I deal w/ that after pol

Re: New Consumer API discussion

2014-02-11 Thread Pradeep Gollakota
Updated thoughts. 1. subscribe(String topic, int... paritions) and unsubscribe(String topic, int... partitions) should be subscribe(TopicPartition... topicPartitions)and unsubscribe(TopicPartition... topicPartitons) 2. Does it make sense to provide a convenience method to subs

Re: New Consumer API discussion

2014-02-11 Thread Neha Narkhede
Pradeep, To be clear, we want to get feedback on the APIs from the javadocsince the wiki will be slightly behind on the APIs. 1. Regarding consistency, do you have specific feedback on

Re: New Consumer API discussion

2014-02-11 Thread Pradeep Gollakota
Hi Jay, I apologize for derailing the conversation about the consumer API. We should start a new discussion about hierarchical topics, if we want to keep talking about it. My final thought on the matter is that, hierarchical topics is still an important feature to have in Kafka, because it gives u

Re: New Consumer API discussion

2014-02-11 Thread Jay Kreps
Hey Pradeep, That wiki is fairly old and it predated more flexible subscription mechanisms. In the high-level consumer you currently have wildcard subscription and in the new proposed interface you can actually subscribe based on any logic you want to create a "union" of streams. Personally I thin

Re: New Consumer API discussion

2014-02-10 Thread Pradeep Gollakota
WRT to hierarchical topics, I'm referring to KAFKA-1175. I would just like to think through the implications for the Consumer API if and when we do implement hierarchical topics. For example, in the proposal

Re: New Consumer API discussion

2014-02-10 Thread Neha Narkhede
Thanks for the feedback. Mattijs - - Constructors link to http://kafka.apache.org/documentation.html#consumerconfigs for valid configurations, which lists zookeeper.connect rather than metadata.broker.list, the value for BROKER_LIST_CONFIG in ConsumerConfig. Fixed it to just point to ConsumerConf

Re: New Consumer API discussion

2014-02-10 Thread Guozhang Wang
Hi Mattijs: 2. As Neha said, one design of the new consumer is to have non-blocking consuming API instead of blocking API. Do you have a strong reason in mind to still keep the blocking API instead of just using "while(no-data) poll(timeout)"? 3. No we have not thought about hierarchical topics.

Re: New Consumer API discussion

2014-02-10 Thread Guozhang Wang
Hello Jay, Thanks for the detailed comments. 1. Yeah we could discuss a bit more on that. 2. Since subscribe() is incremental, adding one topic-partition is OK, and personally I think it is cleaner than subscribe(String topic, int...partition)? 3. Originally I was thinking about two interfaces:

Re: New Consumer API discussion

2014-02-10 Thread Guozhang Wang
Hi Mattijs: We have not updated the wiki pages for config yet, and it will not be updated until we release 0.9 with these changes. Currently consumers do have a commitOffsets function that can be called by the users, but for most use cases auto.commit is turned on and this function gets called by

Re: New Consumer API discussion

2014-02-10 Thread Pradeep Gollakota
Couple of very quick thoughts. 1. +1 about renaming commit(...) and commitAsync(...) 2. I'd also like to extend the above for the poll() method as well. poll() and pollWithTimeout(long, TimeUnit)? 3. Have you guys given any thought around how this API would be used with hierarchical topics? 4. Wo

Re: New Consumer API discussion

2014-02-10 Thread Jay Kreps
A few items: 1. ConsumerRebalanceCallback a. onPartitionsRevoked would be a better name. b. We should discuss the possibility of splitting this into two interfaces. The motivation would be that in Java 8 single method interfaces can directly take methods which might be more intuitive. c. I

Re: New Consumer API discussion

2014-02-10 Thread Mattijs Ugen
Hey Neha, This looks really promising, I particularly like the ability to commit offsets for topic/partition tuples over just commit(). Some remarks: - Constructors link to http://kafka.apache.org/documentation.html#consumerconfigs for valid configurations, which lists zookeeper.connect rath

New Consumer API discussion

2014-02-10 Thread Neha Narkhede
As mentioned in previous emails, we are also working on a re-implementation of the consumer. I would like to use this email thread to discuss the details of the public API. I would also like us to be picky about this public api now so it is as good as possible and we don't need to break it in the f