Re: Join DataStream with dimension tables?

Aljoscha Krettek Fri, 22 Apr 2016 08:53:10 -0700

Hi Srikanth,
that's an interesting use case. It's not possible to do something like this
out-of-box but I'm actually working on API for such cases.

In the mean time, I programmed a short example that shows how something
like this can be programmed using the API that is currently available. It
requires writing a custom operator but it is still somewhat succinct:
https://gist.github.com/aljoscha/c657b98b4017282693a67f1238c88906

Please let me know if you have any questions.

Cheers,
Aljoscha

On Thu, 21 Apr 2016 at 03:06 Srikanth <srikanth...@gmail.com> wrote:

> Hello,
>
> I have a fairly typical streaming use case but not able to figure how to
> implement it best in Flink.
> I want to join records read from a kafka stream with one(or more)
> dimension tables which are saved as flat files.
>
> As per this jira <https://issues.apache.org/jira/browse/FLINK-2320> its
> not possible to join DataStream with DataSet.
> These tables are too big to do a collect() and join.
>
> It will be good to read these files during startup, do a partitionByHash
> and keep it cached.
> On the DataStream may be do a keyBy and join.
> Is something like this possible?
>
> Srikanth
>

Re: Join DataStream with dimension tables?

Reply via email to