Hi, Recently I've been working on a real-time data stream processing pipeline with DataStream API while preparing for a new service to launch. Now it's time to develop a back-fill job to produce the same result by reading data stored on Hive which we use for long-term storage.
Meanwhile, I watched Aljoscha's talk [1] and just wondered if I could reuse major components of the pipeline written in DataStream API. The pipeline conceptually looks as follows: (A) reads input from Kafka (B) performs AsyncIO to Redis in order to enrich the input data (C) appends timestamps and emits watermarks before time-based window (D) keyBy followed by a session window with a custom trigger for early firing (E) writes output to Kafka I have simple (maybe stupid) questions on reusing components of the pipeline written in DataStream API. (1) By replacing (A) with a bounded source, can I execute the pipeline with a new BATCH execution mode without modifying (B)~(E)? (2) Is there a bounded source for Hive available for DataStream API? Best, Dongwon [1] https://www.youtube.com/watch?v=z9ye4jzp4DQ