Re: Persisting Table in Flink API

Shivam Sharma Mon, 16 Jul 2018 02:19:17 -0700

Hi Vino,

First I want to tell you that we are working on Flink SQL so there is no
chance to use Data Stream API.


I will give one example of my use case here:-

Let's say we have two Kafka Topics:

   1. UserName to UserId Mapping => {"userName": "shivam", "userId": 123}
   2. User transactions information in which username is coming. => {"user":
   "shivam", "transactionAmount": 3250}

Final result should be like this  => {"user": "shivam", "userId": 123,
"transactionAmount": 3250}

SQL Query for this: SELECT t2.user, t1.userID, t2.transactionAmount from
userTable as t1 join transactionTable as t2 on t1.userName = t2.user

Now, whenever a transaction happens then we need to add UserId also in the
record using Flink SQL. We need to join these two streams. So need to store
userName to id mapping somewhere like in RocksDB

Thanks

On Mon, Jul 16, 2018 at 12:04 PM vino yang <yanghua1...@gmail.com> wrote:

> Hi Shivam,
>
> Can you provide more details about your use case? The join for batch or
> streaming? which join type (window or non-window or stream-dimension table
> join)?
>
> If it is stream-dimension table join and the table is huge, use Redis or
> some cache based on memory, can help to process your problem. And you can
> customize the flink's physical plan (like Hequn said) and use async
> operator to optimize access to the third-party system.
>
> Thanks,
> Vino yang.
>
> 2018-07-16 9:17 GMT+08:00 Hequn Cheng <chenghe...@gmail.com>:
>
>> Hi Shivam,
>>
>> Currently, fink sql/table-api support window join and non-window join[1].
>> If your requirements are not being met by sql/table-api, you can also use
>> the datastream to implement your own logic. You can refer to the non-window
>> join implement as an example[2][3].
>>
>> Best, Hequn
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#joins
>> [2]
>> https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamJoin.scala
>> [3]
>> https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/join/NonWindowInnerJoin.scala
>>
>> On Sun, Jul 15, 2018 at 11:29 PM, Shivam Sharma <28shivamsha...@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> We have one use case in which we need to persist Table in Flink which
>>> can be later used to join with other tables. This table can be huge so we
>>> need to store it in off-heap but faster access. Any suggestions regarding
>>> this?
>>>
>>> --
>>> Shivam Sharma
>>> Data Engineer @ Goibibo
>>> Indian Institute Of Information Technology, Design and Manufacturing
>>> Jabalpur
>>> Mobile No- (+91) 8882114744
>>> Email:- 28shivamsha...@gmail.com
>>> LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
>>> <https://www.linkedin.com/in/28shivamsharma>*
>>>
>>
>>
>

-- 
Shivam Sharma
Data Engineer @ Goibibo
Indian Institute Of Information Technology, Design and Manufacturing
Jabalpur
Mobile No- (+91) 8882114744
Email:- 28shivamsha...@gmail.com
LinkedIn:-*https://www.linkedin.com/in/28shivamsharma
<https://www.linkedin.com/in/28shivamsharma>*

Re: Persisting Table in Flink API

Reply via email to