Hi Srikanth, that's an interesting use case. It's not possible to do something like this out-of-box but I'm actually working on API for such cases.
In the mean time, I programmed a short example that shows how something like this can be programmed using the API that is currently available. It requires writing a custom operator but it is still somewhat succinct: https://gist.github.com/aljoscha/c657b98b4017282693a67f1238c88906 Please let me know if you have any questions. Cheers, Aljoscha On Thu, 21 Apr 2016 at 03:06 Srikanth <srikanth...@gmail.com> wrote: > Hello, > > I have a fairly typical streaming use case but not able to figure how to > implement it best in Flink. > I want to join records read from a kafka stream with one(or more) > dimension tables which are saved as flat files. > > As per this jira <https://issues.apache.org/jira/browse/FLINK-2320> its > not possible to join DataStream with DataSet. > These tables are too big to do a collect() and join. > > It will be good to read these files during startup, do a partitionByHash > and keep it cached. > On the DataStream may be do a keyBy and join. > Is something like this possible? > > Srikanth >