Re: Join of DataStream and DataSet

Fabian Hueske Mon, 15 Apr 2019 01:51:57 -0700

Hi Reminia,

What Hequn said is correct.


However, I would *not* use a regular but model the problem as a
time-versioned table join.
A regular join will materialize both inputs which is probably not want you
want to do for a stream.
For a time-versioned table join, only the time-versioned table would be
stored (this should be your DataSet) and the stream is just streamed along.

Best, Fabian

Am Mo., 15. Apr. 2019 um 04:02 Uhr schrieb Hequn Cheng <chenghe...@gmail.com
>:

> Hi Reminia,
>
> Currently, we can't join a DataStream with a DataSet in Flink. However,
> the DataSet is actually a kind of bounded stream. From the point of this
> view, you can use a streaming job to achieve your goal. Flink Table API &
> SQL support different kinds of join[1]. You can take a closer look at them.
> Probably a regular join[2] is ok for you.
>
> Finally, I think you raised a very good point. It would be better if
> Flink can support such kind of join more direct and efficient.
>
> Best, Hequn
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins
> [2]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/joins.html#regular-joins
>
> On Thu, Apr 11, 2019 at 5:16 PM Reminia Scarlet <reminia.scar...@gmail.com>
> wrote:
>
>> Spark streaming supports direct join from stream DataFrame and batch
>> DataFrame , and it's
>> easy to implement an enrich pipeline that joins a stream and a dimension
>> table.
>>
>>  I checked the doc of flink, seems that this feature is a jira ticket
>> which haven't been resolved yet.
>>
>> So how can I implement such a pipeline easily in Flink?
>>
>>
>>

Re: Join of DataStream and DataSet

Reply via email to