> Each KafkaConsumer method that returns metadata will already block until such metadata is available The old API was waiting infinitely but the new has a timeout which has effect on the metadata fetch as well.
Spark is interested in only the assigned partitions and/or latest/earliest/... offsets. Please see https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala At the moment the old poll(0) is used which can block infinitely and the API is marked deprecated. We would like to switch to the new API, see the background here: https://github.com/apache/spark/pull/25135 However in such case there is no need to fetch any kind of data since driver is not doing any data processing. G On Fri, Aug 9, 2019 at 7:39 PM Ryanne Dolan <ryannedo...@gmail.com> wrote: > > pull some records even they're only interested in metadata. > > Jungtaek, what is the use-case here? Each KafkaConsumer method that returns > metadata will already block until such metadata is available... so why > would you need to apply this "hack" in the first place? > > Ryanne > > On Wed, Aug 7, 2019 at 2:24 AM Jungtaek Lim <kabh...@gmail.com> wrote: > > > Hi devs, > > > > I'm trying to replace deprecated poll(long) with poll(Duration), and > > realized there's no alternative which behaves exactly same as poll(0), as > > poll(0) has been used as a hack to only update metadata instead of > pulling > > records. poll(Duration.ZERO) wouldn't behave same since even updating > > metadata will be timed-out. So now end users would need to give more > > timeout and even pull some records even they're only interested in > > metadata. > > > > I looked back some KIPs which brought the change, and "discarded" KIP > > (KIP-288 [1]) actually proposed a new API which only pulls metadata. > > KIP-266 [2] is picked up instead but it didn't cover all the things what > > KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been > > supported officially, but the hack has been widely used and they can't be > > ignored. > > > > Kafka test code itself relies on either deprecated poll(0), > > or updateAssignmentMetadataIfNeeded, which seems to be private API only > for > > testing. > > (Btw, I'd try out replacing poll(0) to updateAssignmentMetadataIfNeeded > as > > avoiding deprecated method - if it works I'll submit a PR.) > > > > I'm feeling that it would be ideal to expose > > `updateAssignmentMetadataIfNeeded` to the public API, maybe with renaming > > as `waitForAssignment` which was proposed in KIP-288 if it feels too > long. > > > > What do you think? If it sounds feasible I'd like to try out contribution > > on this. I'm new to contribute Kafka community, so not sure it would > > require a new KIP or not. > > > > Thanks, > > Jungtaek Lim (HeartSaVioR) > > > > 1. > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method > > 2. > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior > > >