Re: From Kafka Stream to Flink

Maatary Okouya Tue, 06 Aug 2019 12:07:49 -0700

Fabian,

could you please clarify the following statement:


However joining an append-only table with this view without adding temporal
join condition, means that the stream is fully materialized as state.
This is because previously emitted results must be updated when the view
changes.
It really depends on the semantics of the join and query that you need, how
much state the query will need to maintain.


I am not sure to understand the problem. If i have to append-only table and
perform some join on it, what's the issue ?


On Tue, Aug 6, 2019 at 8:03 PM Maatary Okouya <maatarioko...@gmail.com>
wrote:

> Thank you for the clarification. Really appreciated.
>
> Is Last_val part of the API ?
>
> On Fri, Aug 2, 2019 at 10:49 AM Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hi,
>>
>> Flink does not distinguish between streams and tables. For the Table API
>> / SQL, there are only tables that are changing over time, i.e., dynamic
>> tables.
>> A Stream in the Kafka Streams or KSQL sense, is in Flink a Table with
>> append-only changes, i.e., records are only inserted and never deleted or
>> modified.
>> A Table in the Kafka Streams or KSQL sense, is in Flink a Table that has
>> upsert and delete changes, i.e., the table has a unique key and records are
>> inserted, deleted, or updated per key.
>>
>> In the current version, Flink does not have native support to ingest an
>> upsert stream as a dynamic table (right now only append-only tables can be
>> ingested, native support for upsert tables will be added soon.).
>> However, you can create a view with the following SQL query on an
>> append-only table that creates an upsert table:
>>
>> SELECT key, LAST_VAL(v1), LAST_VAL(v2), ...
>> FROM appendOnlyTable
>> GROUP BY key
>>
>> Given, this view, you can run all kinds of SQL queries on it.
>> However joining an append-only table with this view without adding
>> temporal join condition, means that the stream is fully materialized as
>> state.
>> This is because previously emitted results must be updated when the view
>> changes.
>> It really depends on the semantics of the join and query that you need,
>> how much state the query will need to maintain.
>>
>> An alternative to using Table API / SQL and it's dynamic table
>> abstraction is to use Flink's DataStream API and ProcessFunctions.
>> These APIs are more low level and expose access to state and timers,
>> which are the core ingredients for stream processing.
>> You can implement pretty much all logic of KStreams and more in these
>> APIs.
>>
>> Best, Fabian
>>
>>
>> Am Di., 23. Juli 2019 um 13:06 Uhr schrieb Maatary Okouya <
>> maatarioko...@gmail.com>:
>>
>>> I would like to have a KTable, or maybe in Flink term a dynamic Table,
>>> that only contains the latest value for each keyed record. This would allow
>>> me to perform aggregation and join, based on the latest state of every
>>> record, as opposed to every record over time, or a period of time.
>>>
>>> On Sun, Jul 21, 2019 at 8:21 AM miki haiat <miko5...@gmail.com> wrote:
>>>
>>>> Can you elaborate more  about your use case .
>>>>
>>>>
>>>> On Sat, Jul 20, 2019 at 1:04 AM Maatary Okouya <maatarioko...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am a user of Kafka Stream so far. However, because i have been face
>>>>> with several limitation in particular in performing Join on KTable.
>>>>>
>>>>> I was wondering what is the appraoch in Flink to achieve  (1) the
>>>>> concept of KTable, i.e. a Table that represent a changeLog, i.e. only the
>>>>> latest version of all keyed records,  and (2) joining those.
>>>>>
>>>>> There are currently a lot of limitation around that on Kafka Stream,
>>>>> and i need that for performing some ETL process, where i need to mirror
>>>>> entire databases in Kafka, and then do some join on the table to emit the
>>>>> logical entity in Kafka Topics. I was hoping that somehow i could acheive
>>>>> that by using FLink as intermediary.
>>>>>
>>>>> I can see that you support any kind of join, but i just don't see the
>>>>> notion of Ktable.
>>>>>
>>>>>
>>>>>

Re: From Kafka Stream to Flink

Reply via email to