Btw, I'd like to ask you to move on thread for KIP discussion, as it will make us reaching conclusion faster and have single channel to discuss.
On Sun, Aug 11, 2019 at 8:16 PM Jungtaek Lim <kabh...@gmail.com> wrote: > So we have some use case which we don't just rely on everything what Kafka > consumer provides. We want to know current assignment on this consumer, and > to get the latest assignment, we called the hack `poll(0)`. > > That said, we don't want to pull any records here, and there's no way to > accomplish this. Please correct me if I'm missing something. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > On Sat, Aug 10, 2019 at 7:32 AM Colin McCabe <cmcc...@apache.org> wrote: > >> Hi Gabor, >> >> What is it that you want to do here? If you just want to check that the >> partitions exist, but not fetch any data, you could use >> AdminClient#describeTopics for that. If you want to create the topics, you >> could use AdminClient#createTopics. >> >> best, >> Colin >> >> >> On Fri, Aug 9, 2019, at 11:23, Gabor Somogyi wrote: >> > > Each KafkaConsumer method that returns metadata will already block >> until >> > such metadata is available >> > The old API was waiting infinitely but the new has a timeout which has >> > effect on the metadata fetch as well. >> > >> > Spark is interested in only the assigned partitions and/or >> > latest/earliest/... offsets. >> > Please see >> > >> https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala >> > At the moment the old poll(0) is used which can block infinitely and the >> > API is marked deprecated. >> > We would like to switch to the new API, see the background here: >> > https://github.com/apache/spark/pull/25135 >> > However in such case there is no need to fetch any kind of data since >> > driver is not doing any data processing. >> > >> > G >> > >> > >> > On Fri, Aug 9, 2019 at 7:39 PM Ryanne Dolan <ryannedo...@gmail.com> >> wrote: >> > >> > > > pull some records even they're only interested in metadata. >> > > >> > > Jungtaek, what is the use-case here? Each KafkaConsumer method that >> returns >> > > metadata will already block until such metadata is available... so why >> > > would you need to apply this "hack" in the first place? >> > > >> > > Ryanne >> > > >> > > On Wed, Aug 7, 2019 at 2:24 AM Jungtaek Lim <kabh...@gmail.com> >> wrote: >> > > >> > > > Hi devs, >> > > > >> > > > I'm trying to replace deprecated poll(long) with poll(Duration), and >> > > > realized there's no alternative which behaves exactly same as >> poll(0), as >> > > > poll(0) has been used as a hack to only update metadata instead of >> > > pulling >> > > > records. poll(Duration.ZERO) wouldn't behave same since even >> updating >> > > > metadata will be timed-out. So now end users would need to give more >> > > > timeout and even pull some records even they're only interested in >> > > > metadata. >> > > > >> > > > I looked back some KIPs which brought the change, and "discarded" >> KIP >> > > > (KIP-288 [1]) actually proposed a new API which only pulls metadata. >> > > > KIP-266 [2] is picked up instead but it didn't cover all the things >> what >> > > > KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been >> > > > supported officially, but the hack has been widely used and they >> can't be >> > > > ignored. >> > > > >> > > > Kafka test code itself relies on either deprecated poll(0), >> > > > or updateAssignmentMetadataIfNeeded, which seems to be private API >> only >> > > for >> > > > testing. >> > > > (Btw, I'd try out replacing poll(0) to >> updateAssignmentMetadataIfNeeded >> > > as >> > > > avoiding deprecated method - if it works I'll submit a PR.) >> > > > >> > > > I'm feeling that it would be ideal to expose >> > > > `updateAssignmentMetadataIfNeeded` to the public API, maybe with >> renaming >> > > > as `waitForAssignment` which was proposed in KIP-288 if it feels too >> > > long. >> > > > >> > > > What do you think? If it sounds feasible I'd like to try out >> contribution >> > > > on this. I'm new to contribute Kafka community, so not sure it would >> > > > require a new KIP or not. >> > > > >> > > > Thanks, >> > > > Jungtaek Lim (HeartSaVioR) >> > > > >> > > > 1. >> > > > >> > > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method >> > > > 2. >> > > > >> > > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior >> > > > >> > > >> > >> > > > -- > Name : Jungtaek Lim > Blog : http://medium.com/@heartsavior > Twitter : http://twitter.com/heartsavior > LinkedIn : http://www.linkedin.com/in/heartsavior > -- Name : Jungtaek Lim Blog : http://medium.com/@heartsavior Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior