Hi Shamit, unless you have some temporal relationship between the records to be joined, you have to use a regular join over stream 1 and stream 2. Since you cannot define any window, all data will be held in Flink's state, which is not an issue for a few millions but probably means you have to use rocksdb statebackend [1] or else you may run out of main memory.
I recommend using Flink SQL or Table API, which will also prune all unnecessary columns from your data. If you want to use DataStream API instead, I recommend to drop all unrelated columns prior to the join. [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#choose-the-right-state-backend On Tue, Feb 9, 2021 at 10:47 PM Shamit <jainsha...@gmail.com> wrote: > Hello Flink Users, > > I am newbie and have question on join of two streams (stream1 and stream2 ) > from Kafka topic based on some key. > > In my use case I need to join with stream2 data which might be year old and > more. > > Now if on stream1 the data gets arrived today and I need to join with > stream2 based on some key Please let me know how efficiently I can do. > > stream2 might have lots of records(in millions). > > Please help. > > Regards, > Shamit Jain > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >