Lets say that I have:

SQL Query One from data in PostgreSQL (200K records).
SQL Query Two from data in PostgreSQL (1000 records).
and Kafka Topic One.

Let's also say that main data from this Flink job arrives in Kafka Topic One.

If I need SQL Query One and SQL Query Two to happen just one time, when the job 
starts up, and afterwards maybe store it in Keyed State or Broadcast State, but 
it's not really part of the stream, then what is the best practice for 
supporting that in Flink

The Flink job needs to stream data from Kafka Topic One, aggregate it, and 
perform computations that require all of the data in SQL Query One and SQL 
Query Two to perform its business logic.

I am using Flink 1.10.

I supposed to query the database before the Job I submitted, and then pass it 
on as parameters to a function?
Or am I supposed to use JDBCInputFormat for both queries and create two 
streams, and somehow connect or broadcast both of them two the main stream that 
uses Kafka Topic One?

I would appreciate guidance. Please.  Thank you.

Sincerely,

Marco A. Villalobos



Reply via email to