Re: Flink - Process datastream in a bounded context (like Dataset) - Unifying stream & batch

Hequn Cheng Tue, 25 Sep 2018 18:23:59 -0700

Hi bastien,

Could you give more details about your scenario? Do you want to load
another file from the same kafka after current file has been processed?
I am curious about why you want to join data in a bounded way when the
input data is a stream. The stream-stream join outputs same results as
batch join. So you don't have to join data on condition that all data
is received.
If you do want process data in this way, I think you can use udtf
<https://ci.apache.org/projects/flink/flink-docs-master/dev/table/udfs.html#table-functions>
to achieve this behavior. In the udtf, data will be loaded from kafka and
you can perform processing(join) once received all of the elements.


Best, Hequn

On Tue, Sep 25, 2018 at 10:07 PM bastien dine <bastien.d...@gmail.com>
wrote:

> Hi Hequn,
>
> Thanks for your response
> Yea I know about the table API, but I am searching a way to have a bounded
> context with a stream, somehow create a dataset from a buffer store in a
> window of datastream
>
> Regards, Bastien
>
> Le mar. 25 sept. 2018 à 14:50, Hequn Cheng <chenghe...@gmail.com> a
> écrit :
>
>> Hi bastien,
>>
>> Flink features two relational APIs, the Table API and SQL. Both APIs are
>> unified APIs for batch and stream processing, i.e., queries are executed
>> with the same semantics on unbounded, real-time streams or bounded[1].
>> There are also documents about Join[2].
>>
>> Best, Hequn
>> [1] https://flink.apache.org/flink-applications.html#layered-apis
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#joins
>>
>> On Tue, Sep 25, 2018 at 4:14 PM bastien dine <bastien.d...@gmail.com>
>> wrote:
>>
>>> Hello everyone,
>>>
>>> I need to join some files to perform some processing.. The dataset API
>>> is a perfect way to achieve this, I am able to do it when I read file in
>>> batch (csv)
>>>
>>> However in the prod environment, I will receive thoses files in kafka
>>> messages (one message = one line of a file)
>>> So I am considering using a global window + a custom trigger on a end of
>>> file message and a process window function.
>>> But I can not go too far with that as process is only one function and
>>> chaining functions will be a pain. I don't think that emitting a datastream
>>> & windows / trigger on EOF before every process function is a good idea
>>>
>>> However I would like to work in a bounded way once I received all of my
>>> elements (after the trigger on global window), like the dataset API, as I
>>> will join on my whole dataset..
>>>
>>> I thought maybe it would be a good idea to go for table API and group
>>> window ? but you can not have custom trigger and a global group window on a
>>> table ?(like the global window on datastream ?)
>>> Best alternative would be to create a dataset as a result of my process
>>> window function.. but I don't think this is possible, is it ?
>>>
>>> Best Regards,
>>> Bastien
>>>
>>

Re: Flink - Process datastream in a bounded context (like Dataset) - Unifying stream & batch

Reply via email to