Re: Join two streams from Kafka

Arvid Heise Thu, 11 Feb 2021 04:02:29 -0800

Hi Shamit,

unless you have some temporal relationship between the records to be
joined, you have to use a regular join over stream 1 and stream 2.
Since you cannot define any window, all data will be held in Flink's state,
which is not an issue for a few millions but probably means you have to use
rocksdb statebackend [1] or else you may run out of main memory.


I recommend using Flink SQL or Table API, which will also prune all
unnecessary columns from your data. If you want to use DataStream API
instead, I recommend to drop all unrelated columns prior to the join.

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#choose-the-right-state-backend

On Tue, Feb 9, 2021 at 10:47 PM Shamit <jainsha...@gmail.com> wrote:

> Hello Flink Users,
>
> I am newbie and have question on join of two streams (stream1 and stream2 )
> from Kafka topic based on some key.
>
> In my use case I need to join with stream2 data which might be year old and
> more.
>
> Now if on stream1 the data gets arrived today and I need to join with
> stream2 based on some key Please let me know how efficiently I can do.
>
> stream2 might have lots of records(in millions).
>
> Please help.
>
> Regards,
> Shamit Jain
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Join two streams from Kafka

Reply via email to