Hello all,

I kind of need the community’s help with some ideas, as I’m quite new with 
Flink and I feel like I need a little bit of guidance in regard to an 
implementation I’m working on.

What I need to do, is to have a way to store a mysql table in Flink, and expose 
that data to other jobs, as I need to query the data i to enrich some records 
received on a Kafka Source.

The initial solution, I’m working now is:


  1.  Have a processor that uses Flink JDBC CDC Connector over the table that 
stores the information I need. (This is implemented currently - working)
  2.  Find a way to store that Stream Source inside a table inside Flink. (I 
tried with the approach to create a MySql JDBC Catalog – but apparently, I can 
only create Postgres Catalog programmatically) – This is the question – What 
api do I need to use to facilitate saving inside Flink in a SQL Table, the data 
retrieved by the CDC Source?
  3.  The solution from point 2. Needs to be done in a way that I can query 
that table, for each record I receive in a different Job that has a Kafka 
Source as the entrypoint.

I was thinking about having the CDC Source inside the job that has the Kafka 
source, and I’m going to test if this is feasible as we speak, but the idea is 
that I need to get some information from the MySql database, each time I 
process one record from the Kafka source – will this be a good option if I’m 
able to persist the data into a temporary view inside the processor? I’m just 
worried that I might need to reuse this data sets from the sql database in 
future jobs, so this is why I’d like to have something decoupled and available 
for the entire cluster.

Like I said I’m new to Flink and it’s proven quite difficult for me to 
understand exactly what would be the best solution to use in my situation, this 
is the reason why I’m asking users that might have more experience with this 
and that might have had the same issues sometime in the past.

Thank you in advance, guys!

Regards,
Dan Serb

Reply via email to