Re: Flink - Process datastream in a bounded context (like Dataset) - Unifying stream & batch

Hequn Cheng Tue, 25 Sep 2018 05:51:02 -0700

Hi bastien,

Flink features two relational APIs, the Table API and SQL. Both APIs are
unified APIs for batch and stream processing, i.e., queries are executed
with the same semantics on unbounded, real-time streams or bounded[1].
There are also documents about Join[2].


Best, Hequn
[1] https://flink.apache.org/flink-applications.html#layered-apis
[2]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#joins

On Tue, Sep 25, 2018 at 4:14 PM bastien dine <bastien.d...@gmail.com> wrote:

> Hello everyone,
>
> I need to join some files to perform some processing.. The dataset API is
> a perfect way to achieve this, I am able to do it when I read file in batch
> (csv)
>
> However in the prod environment, I will receive thoses files in kafka
> messages (one message = one line of a file)
> So I am considering using a global window + a custom trigger on a end of
> file message and a process window function.
> But I can not go too far with that as process is only one function and
> chaining functions will be a pain. I don't think that emitting a datastream
> & windows / trigger on EOF before every process function is a good idea
>
> However I would like to work in a bounded way once I received all of my
> elements (after the trigger on global window), like the dataset API, as I
> will join on my whole dataset..
>
> I thought maybe it would be a good idea to go for table API and group
> window ? but you can not have custom trigger and a global group window on a
> table ?(like the global window on datastream ?)
> Best alternative would be to create a dataset as a result of my process
> window function.. but I don't think this is possible, is it ?
>
> Best Regards,
> Bastien
>

Re: Flink - Process datastream in a bounded context (like Dataset) - Unifying stream & batch

Reply via email to