Hi Gabor, What is it that you want to do here? If you just want to check that the partitions exist, but not fetch any data, you could use AdminClient#describeTopics for that. If you want to create the topics, you could use AdminClient#createTopics.
best, Colin On Fri, Aug 9, 2019, at 11:23, Gabor Somogyi wrote: > > Each KafkaConsumer method that returns metadata will already block until > such metadata is available > The old API was waiting infinitely but the new has a timeout which has > effect on the metadata fetch as well. > > Spark is interested in only the assigned partitions and/or > latest/earliest/... offsets. > Please see > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala > At the moment the old poll(0) is used which can block infinitely and the > API is marked deprecated. > We would like to switch to the new API, see the background here: > https://github.com/apache/spark/pull/25135 > However in such case there is no need to fetch any kind of data since > driver is not doing any data processing. > > G > > > On Fri, Aug 9, 2019 at 7:39 PM Ryanne Dolan <ryannedo...@gmail.com> wrote: > > > > pull some records even they're only interested in metadata. > > > > Jungtaek, what is the use-case here? Each KafkaConsumer method that returns > > metadata will already block until such metadata is available... so why > > would you need to apply this "hack" in the first place? > > > > Ryanne > > > > On Wed, Aug 7, 2019 at 2:24 AM Jungtaek Lim <kabh...@gmail.com> wrote: > > > > > Hi devs, > > > > > > I'm trying to replace deprecated poll(long) with poll(Duration), and > > > realized there's no alternative which behaves exactly same as poll(0), as > > > poll(0) has been used as a hack to only update metadata instead of > > pulling > > > records. poll(Duration.ZERO) wouldn't behave same since even updating > > > metadata will be timed-out. So now end users would need to give more > > > timeout and even pull some records even they're only interested in > > > metadata. > > > > > > I looked back some KIPs which brought the change, and "discarded" KIP > > > (KIP-288 [1]) actually proposed a new API which only pulls metadata. > > > KIP-266 [2] is picked up instead but it didn't cover all the things what > > > KIP-288 proposed. I'm seeing some doc explaining poll(0) hasn't been > > > supported officially, but the hack has been widely used and they can't be > > > ignored. > > > > > > Kafka test code itself relies on either deprecated poll(0), > > > or updateAssignmentMetadataIfNeeded, which seems to be private API only > > for > > > testing. > > > (Btw, I'd try out replacing poll(0) to updateAssignmentMetadataIfNeeded > > as > > > avoiding deprecated method - if it works I'll submit a PR.) > > > > > > I'm feeling that it would be ideal to expose > > > `updateAssignmentMetadataIfNeeded` to the public API, maybe with renaming > > > as `waitForAssignment` which was proposed in KIP-288 if it feels too > > long. > > > > > > What do you think? If it sounds feasible I'd like to try out contribution > > > on this. I'm new to contribute Kafka community, so not sure it would > > > require a new KIP or not. > > > > > > Thanks, > > > Jungtaek Lim (HeartSaVioR) > > > > > > 1. > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-288%3A+%5BDISCARDED%5D+Consumer.poll%28%29+timeout+semantic+change+and+new+waitForAssignment+method > > > 2. > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior > > > > > >