Re: Persisting Table in Flink API

Hequn Cheng Mon, 16 Jul 2018 05:17:44 -0700

Hi Shivam,

I think the non-window stream-stream join can solve your problem.
The non-window join will store all data from both inputs and output joined
results. The semantics of non-window join is exactly the same with batch
join.
One important thing to note is that the state of join might grow infinitely
depending on the number of distinct input rows, so please provide a query
configuration with valid retention interval[1] to prevent excessive state
size.


Let me know If you have any other confusions.

Best, Hequn

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming.html#idle-state-retention-time

On Mon, Jul 16, 2018 at 5:18 PM, Shivam Sharma <28shivamsha...@gmail.com>
wrote:

> Hi Vino,
>
> First I want to tell you that we are working on Flink SQL so there is no
> chance to use Data Stream API.
>
> I will give one example of my use case here:-
>
> Let's say we have two Kafka Topics:
>
>    1. UserName to UserId Mapping => {"userName": "shivam", "userId": 123}
>    2. User transactions information in which username is coming. => {"user":
>    "shivam", "transactionAmount": 3250}
>
> Final result should be like this  => {"user": "shivam", "userId": 123,
> "transactionAmount": 3250}
>
> SQL Query for this: SELECT t2.user, t1.userID, t2.transactionAmount from
> userTable as t1 join transactionTable as t2 on t1.userName = t2.user
>
> Now, whenever a transaction happens then we need to add UserId also in the
> record using Flink SQL. We need to join these two streams. So need to store
> userName to id mapping somewhere like in RocksDB
>
> Thanks
>
> On Mon, Jul 16, 2018 at 12:04 PM vino yang <yanghua1...@gmail.com> wrote:
>
>> Hi Shivam,
>>
>> Can you provide more details about your use case? The join for batch or
>> streaming? which join type (window or non-window or stream-dimension table
>> join)?
>>
>> If it is stream-dimension table join and the table is huge, use Redis or
>> some cache based on memory, can help to process your problem. And you can
>> customize the flink's physical plan (like Hequn said) and use async
>> operator to optimize access to the third-party system.
>>
>> Thanks,
>> Vino yang.
>>
>> 2018-07-16 9:17 GMT+08:00 Hequn Cheng <chenghe...@gmail.com>:
>>
>>> Hi Shivam,
>>>
>>> Currently, fink sql/table-api support window join and non-window
>>> join[1].
>>> If your requirements are not being met by sql/table-api, you can also
>>> use the datastream to implement your own logic. You can refer to the
>>> non-window join implement as an example[2][3].
>>>
>>> Best, Hequn
>>>
>>> [1] https://ci.apache.org/projects/flink/flink-docs-
>>> master/dev/table/sql.html#joins
>>> [2] https://github.com/apache/flink/blob/master/flink-
>>> libraries/flink-table/src/main/scala/org/apache/flink/
>>> table/plan/nodes/datastream/DataStreamJoin.scala
>>> [3] https://github.com/apache/flink/blob/master/flink-
>>> libraries/flink-table/src/main/scala/org/apache/flink/
>>> table/runtime/join/NonWindowInnerJoin.scala
>>>
>>> On Sun, Jul 15, 2018 at 11:29 PM, Shivam Sharma <
>>> 28shivamsha...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have one use case in which we need to persist Table in Flink which
>>>> can be later used to join with other tables. This table can be huge so we
>>>> need to store it in off-heap but faster access. Any suggestions regarding
>>>> this?
>>>>
>>>> --
>>>> Shivam Sharma
>>>> Data Engineer @ Goibibo
>>>> Indian Institute Of Information Technology, Design and Manufacturing
>>>> Jabalpur
>>>> Mobile No- (+91) 8882114744
>>>> Email:- 28shivamsha...@gmail.com
>>>> LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
>>>> <https://www.linkedin.com/in/28shivamsharma>*
>>>>
>>>
>>>
>>
>
> --
> Shivam Sharma
> Data Engineer @ Goibibo
> Indian Institute Of Information Technology, Design and Manufacturing
> Jabalpur
> Mobile No- (+91) 8882114744
> Email:- 28shivamsha...@gmail.com
> LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
> <https://www.linkedin.com/in/28shivamsharma>*
>

Re: Persisting Table in Flink API

Reply via email to