Hi devs, I'd like to start a new FLIP to introduce the Hybrid Source. The hybrid source is a source that contains a list of concrete sources. The hybrid source reads from each contained source in the defined order. It switches from source A to the next source B when source A finishes.
In practice, many Flink jobs need to read data from multiple sources in sequential order. Change Data Capture (CDC) and machine learning feature backfill are two concrete scenarios of this consumption pattern. Users may have to either run two different Flink jobs or have some hacks in the SourceFunction to address such use cases. To support above scenarios smoothly, the Flink jobs need to first read from HDFS for historical data then switch to Kafka for real-time records. The hybrid source has several benefits from the user's perspective: - Switching among multiple sources is easy based on the switchable source implementations of different connectors. - This supports to automatically switching for user-defined switchable source that constitutes hybrid source. - There is complete and effective mechanism to support smooth source migration between historical and real-time data. Therefore, in this discussion, we propose to introduce a “Hybrid Source” API built on top of the new Source API (FLIP-27) to help users to smoothly switch sources. For more detail, please refer to the FLIP design doc[1]. I'm looking forward to your feedback. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source <https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source> Best, Nicholas Jiang -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/