Hi Huyen, That is not my problem statement. If it is just ingesting from A to B I am sure there are enough tutorials for me to get it done. I also feel like the more I elaborate the more confusing it gets and I am not sure why.
I want to join two streams and I want to see/query the results of that join in real-time but I also have some constraints. I come from Spark world and in Spark, if I do an inner join of two streams A & B then I can see results only when there are matching rows between A & B (By definition of inner join this makes sense) but if I do an outer join of two streams A & B I need to specify the time constraint and only when the time elapses fully I can see the rows that did not match. This means if my time constraint is one hour I cannot query the intermediate results until one hour and this is not the behavior I want. I want to be able to do all sorts of SQL queries on intermediate results. I like Flink's idea of externalizing the state that way I don't have to worry about the memory but I am also trying to avoid writing a separate microservice that needs to poll and display the intermediate results of the join in real-time. Instead, I am trying to see if there is a way to treat that constantly evolving intermediate results as a streaming source, and maybe do some more transformations and push out to another sink. Hope that makes sense. Thanks, Kant On Thu, Oct 31, 2019 at 2:43 AM Huyen Levan <lvhu...@gmail.com> wrote: > Hi Kant, > > So your problem statement is "ingest 2 streams into a data warehouse". The > main component of the solution, from my view, is that SQL server. You can > have a sink function to insert records in your two streams into two > different tables (A and B), or upsert into one single table C. That upsert > action itself serves as a join function, there's no need to join in Flink > at all. > > There are many tools out there can be used for that ingestion. Flink, of > course, can be used for that purpose. But for me, it's an overkill. > > Regards, > Averell > > On Thu, 31 Oct. 2019, 8:19 pm kant kodali, <kanth...@gmail.com> wrote: > >> Hi Averell, >> >> yes, I want to run ad-hoc SQL queries on the joined data as well as data >> that may join in the future. For example, let's say if you take datasets A >> and B in streaming mode a row in A can join with a row B in some time in >> future let's say but meanwhile if I query the intermediate state using SQL >> I want the row in A that have not yet joined with B to also be available to >> Query. so not just joined results alone but also data that might be join in >> the future as well since its all streaming. >> >> Thanks! >> >