[ https://issues.apache.org/jira/browse/KAFKA-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637179#comment-14637179 ]
Guozhang Wang commented on KAFKA-2350: -------------------------------------- Currently there is already a function for retrieving the subscribed topic partitions today: {code} public Set<TopicPartition> subscriptions() { acquire(); try { return Collections.unmodifiableSet(this.subscriptions.assignedPartitions()); } finally { release(); } } {code} which will for example remove the partition and hence change the returned values if consumer.unsubscribe(partition) is called. I actually think [~becket_qin]'s approach will not cause much confusion regarding the APIs. More explicitly assuming we add another function "assignment()" that returns you the assigned partitions, the semantics of the other APIs will be: {code} consumer.subscribe(topic); // will not throw any exception, but will update the assignment as well as subscription in the next poll. consumer.unsubscribe(topic); // will throw an exception if the topic is not subscribed; otherwise will update the assignment and the subscription in the next poll. consumer.assignment(); // return the assigned partitions consumer.subscriptions(); // return the subscribed partitions, which is the same to the assigned partitions most of the time consumer.subscribe(partition1); // will throw an exception if partition is not in assignment(), saying "it is not assigned to you" consumer.unsubscribe(partition2); // will throw an exception if partition is not in subscriptions(), saying "it is not subscribed by yourself" {code} What I am more concerned about this approach is about the client implementation. Since it allows a client to be both using Kafka partition assignment and not during its life cycle, this could possibly make the client state more complicated to manage. For example: {code} consumer.subscribe(topic1); // using kafka for assignment, say we are assigned topic1-partition1 and topic1-partition2 consumer.poll(); consumer.subscribe(topic2-partition1); // subscribe to another partition explicitly without letting kafka coordinator to be aware of. consumer.unsubscribe(topic1-partition1); // now the subscription is topic1-partition2 and topic2-partition1, where the first is from Kafka assignment and the second is from explicit subscription. {code} > Add KafkaConsumer pause capability > ---------------------------------- > > Key: KAFKA-2350 > URL: https://issues.apache.org/jira/browse/KAFKA-2350 > Project: Kafka > Issue Type: Improvement > Reporter: Jason Gustafson > Assignee: Jason Gustafson > > There are some use cases in stream processing where it is helpful to be able > to pause consumption of a topic. For example, when joining two topics, you > may need to delay processing of one topic while you wait for the consumer of > the other topic to catch up. The new consumer currently doesn't provide a > nice way to do this. If you skip poll() or if you unsubscribe, then a > rebalance will be triggered and your partitions will be reassigned. > One way to achieve this would be to add two new methods to KafkaConsumer: > {code} > void pause(String... topics); > void unpause(String... topics); > {code} > When a topic is paused, a call to KafkaConsumer.poll will not initiate any > new fetches for that topic. After it is unpaused, fetches will begin again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)