Hi Jing, Thanks for opening the discussion. I am not sure we are ready to remove FlinkKafkaConsumer. The reason is that for existing users of FlinkKafkaConsumer who rely on KafkaDeserializationSchema::isEndOfStream(), there is currently no migration path for them to use FlinkKafkaConsumer.
This issue was explained in FLIP-208 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-208%3A+Add+RecordEvaluator+to+dynamically+stop+source+based+on+de-serialized+records>. The design was pretty much ready. I didn't start the voting thread because Fabian said he wanted more time to explore alternative solutions. My priority changed recently and don't plan to get this FLIP done for 1.17. It will be great if someone can address this issue so that we can move forward to remove FlinkKafkaConsumer. Thanks, Dong On Fri, Nov 11, 2022 at 8:53 PM Jing Ge <j...@ververica.com.invalid> wrote: > Hi all, > > Thank you all for the informative feedback. I figure there is a requirement > to improve the documentation wrt the migration from FlinkKafkaConsumer to > KafkaSource. I've fired a ticket[1] and connected it with [2]. This > shouldn't be the blocker for removing FlinkKafkaConsumer. > > Given there will be some ongoing SinkV2 upgrades, I will start a vote only > limited to FlinkKafkaConsumer elimination and related APIs graduation. As a > follow-up task, I will sync with Yun Gao before the coding freeze of 1.17 > release to check if we can start the second vote to remove > FlinkKafkaProducer with 1.17. > > Best regards, > Jing > > [1] https://issues.apache.org/jira/browse/FLINK-29999 > [2] https://issues.apache.org/jira/browse/FLINK-28302 > > > On Wed, Nov 2, 2022 at 11:39 AM Martijn Visser <martijnvis...@apache.org> > wrote: > > > Hi David, > > > > I believe that for the DataStream this is indeed documented [1] but it > > might be missed given that there is a lot of documentation and you need > to > > know that your problem is related to idleness. For the Table API I think > > this is never mentioned, so it should definitely be at least documented > > there. > > > > Thanks, > > > > Martijn > > > > [1] > > > > > https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/datastream/kafka/#idleness > > > > On Wed, Nov 2, 2022 at 11:28 AM David Anderson <dander...@apache.org> > > wrote: > > > > > > > > > > For the partition > > > > idleness problem could you elaborate more about it? I assume both > > > > FlinkKafkaConsumer and KafkaSource need a WatermarkStrategy to decide > > > > whether to mark the partition as idle. > > > > > > > > > As a matter of fact, no, that's not the case -- which is why I > mentioned > > > it. > > > > > > The FlinkKafkaConsumer automatically treats all initially empty (or > > > non-existent) partitions as idle, while the KafkaSource only does this > if > > > the WatermarkStrategy specifies that idleness handling is desired by > > > configuring withIdleness. This can be a source of confusion for folks > > > upgrading to the new connector. It most often shows up in situations > > where > > > the number of Kafka partitions is less than the parallelism of the > > > connector, which is a rather common occurrence in development and > testing > > > environments. > > > > > > I believe this change in behavior was made deliberately, so as to > create > > a > > > more consistent experience across all FLIP-27 connectors. This isn't > > > something that needs to be fixed, but does need to be communicated more > > > clearly. Unfortunately, the whole idleness mechanism remained > > significantly > > > broken until 1.16 (considering the impact of [1] and [2]), further > > > complicating the situation. Because of FLINK-28975 [2], users with > > > partitions that are initially empty may have problems with versions > > before > > > 1.15.3 (still unreleased) and 1.16.0. See [3] for an example of this > > > confusion. > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-18934 (idleness didn't > > > work > > > with connected streams) > > > [2] https://issues.apache.org/jira/browse/FLINK-28975 (idle streams > > could > > > never become active again) > > > [3] > > > > > > > > > https://stackoverflow.com/questions/70096166/parallelism-in-flink-kafka-source-causes-nothing-to-execute/70101290#70101290 > > > > > > Best, > > > David > > > > > > On Wed, Nov 2, 2022 at 5:26 AM Qingsheng Ren <re...@apache.org> wrote: > > > > > > > Thanks Jing for starting the discussion. > > > > > > > > +1 for removing FlinkKafkaConsumer, as KafkaSource has evolved for > many > > > > release cycles and should be stable enough. I have some concerns > about > > > the > > > > new Kafka sink based on sink v2, as sink v2 still has some ongoing > work > > > in > > > > 1.17 (maybe Yun Gao could provide some inputs). Also we found some > > issues > > > > of KafkaSink related to the internal mechanism of sink v2, like > > > > FLINK-29492. > > > > > > > > @David > > > > About the ability of DeserializationSchema#isEndOfStream, FLIP-208 is > > > > trying to complete this piece of the puzzle, and Hang Ruan ( > > > > ruanhang1...@gmail.com) plans to work on it in 1.17. For the > partition > > > > idleness problem could you elaborate more about it? I assume both > > > > FlinkKafkaConsumer and KafkaSource need a WatermarkStrategy to decide > > > > whether to mark the partition as idle. > > > > > > > > Best, > > > > Qingsheng > > > > Ververica (Alibaba) > > > > > > > > On Thu, Oct 27, 2022 at 8:06 PM Jing Ge <j...@ververica.com> wrote: > > > > > > > > > Hi Dev, > > > > > > > > > > I'd like to start a discussion about removing FlinkKafkaConsumer > and > > > > > FlinkKafkaProducer in 1.17. > > > > > > > > > > Back in the past, it was originally announced to remove it with > Flink > > > > 1.15 > > > > > after Flink 1.14 had been released[1]. And then postponed to the > next > > > > 1.15 > > > > > release which meant to remove it with Flink 1.16 but forgot to > change > > > the > > > > > doc[2]. I have created a PRs to fix it. Since the 1.16 release > branch > > > has > > > > > code freeze, it makes sense to, first of all, update the doc to say > > > that > > > > > FlinkKafkaConsumer will be removed with Flink 1.17 [3][4] and > second > > > > start > > > > > the discussion about removing them with the current master branch > > i.e. > > > > for > > > > > the coming 1.17 release. I'm all ears and looking forward to your > > > > feedback. > > > > > Thanks! > > > > > > > > > > Best regards, > > > > > Jing > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/#kafka-sourcefunction > > > > > [2] > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#kafka-sourcefunction > > > > > [3] https://github.com/apache/flink/pull/21172 > > > > > [4] https://github.com/apache/flink/pull/21171 > > > > > > > > > > > > > > >