Re: execution.runtime-mode=BATCH when reading from Hive

Aljoscha Krettek Wed, 18 Nov 2020 04:44:10 -0800

Hi Dongwon,

Unfortunately, it's not that easy right now because normal Sinks thatrely on checkpointing to write out data, such as Kafka, don't work inBATCH execution mode because we don't have checkopoints there. It willwork, however, if you use a source that doesn't rely on checkpointing itwill work. The FlinkKafkaProducer with Semantic.NONE should work, forexample.

There is HiveSource, which is built on the new Source API that will workwell with both BATCH and STREAMING. It's quite new and it was only addedto be used by a Table/SQL connector but you might have some success withthat one.


Best,
Aljoscha

On 18.11.20 07:03, Dongwon Kim wrote:

Hi,

Recently I've been working on a real-time data stream processing pipeline
with DataStream API while preparing for a new service to launch.
Now it's time to develop a back-fill job to produce the same result by
reading data stored on Hive which we use for long-term storage.

Meanwhile, I watched Aljoscha's talk [1] and just wondered if I could reuse
major components of the pipeline written in DataStream API.
The pipeline conceptually looks as follows:
(A) reads input from Kafka
(B) performs AsyncIO to Redis in order to enrich the input data
(C) appends timestamps and emits watermarks before time-based window
(D) keyBy followed by a session window with a custom trigger for early
firing
(E) writes output to Kafka

I have simple (maybe stupid) questions on reusing components of the
pipeline written in DataStream API.
(1) By replacing (A) with a bounded source, can I execute the pipeline with
a new BATCH execution mode without modifying (B)~(E)?
(2) Is there a bounded source for Hive available for DataStream API?

Best,

Dongwon

[1] https://www.youtube.com/watch?v=z9ye4jzp4DQ

Re: execution.runtime-mode=BATCH when reading from Hive

Reply via email to