Re: Can i use lot of keyd states or should i use 1 big key state.

shashank agarwal Thu, 10 Aug 2017 10:39:37 -0700

Thanks Aljoscha and Stephan for clearing the doubt.




On Wed, Aug 9, 2017 at 7:37 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
>
> If you have one keyed state, say "count by email id", and many different
> keys you will only have one column in RocksDB (or one HashTable). Actually,
> a lot of users have hundreds of millions of different keys for some states.
>
> Best,
> Aljoscha
>
> On 2. Aug 2017, at 14:59, shashank agarwal <shashank...@gmail.com> wrote:
>
> If I am creating KeyedState ("count by email id") and keyed stream has 10
> unique email id's.
>
> Will it create 1 column family or hash table ?
>
> Or it will create 10 column family or hash table ?
>
> Can i have millions of unique email id in that keyed state ?
>
>
>
> On Tue, Aug 1, 2017 at 2:59 AM, shashank agarwal <shashank...@gmail.com>
> wrote:
>
>> Ok if i am taking it as right for an example :
>>
>> if  i am creating a keyed state with name "total count by email" for
>> key(project id + email)  than it will create a single hash-table or column
>> family "total count by email" and all the unique email id's will be rows of
>> that single hash-table or column family and than i can store millions of
>> unique email id's in that.
>>
>> Means it will create only single state object for all unique email id's ?
>>
>>
>>
>>
>> On Tue, Aug 1, 2017 at 1:53 AM, Stephan Ewen <se...@apache.org> wrote:
>>
>>> Each keyed state in Flink is a hashtable or a column family in RocksDB.
>>> Having too many of those is not memory efficient.
>>>
>>> Having fewer states is better, if you can adapt your schema that way.
>>>
>>> I would also look into "MapState", which is an efficient way to have
>>> "sub keys" under a keyed state.
>>>
>>> Stephan
>>>
>>>
>>> On Mon, Jul 31, 2017 at 6:01 PM, shashank agarwal <shashank...@gmail.com
>>> > wrote:
>>>
>>>> Hello,
>>>>
>>>> I have to compute results on basis of lot of history data, parameters
>>>> like total transactions in last 1 month, last 1 day, last 1 hour etc. by
>>>> email id, ip, mobile, name, address, zipcode etc.
>>>>
>>>> So my question is this right approach to create keyed state by email,
>>>> mobile, zipcode etc. or should i create 1 big mapped state (BS) and than
>>>> process that BS, may be in process function or by applying some loop and
>>>> filter logic in window or process function.
>>>>
>>>> My main worry is i will end up with millions of states, because there
>>>> can be millions unique emails, phone numbers or zipcode if i create keyed
>>>> state by email, phone etc.
>>>>
>>>> am i right ? is this impact on the performance or is this wrong
>>>> approach ? Which approach would you suggest in this use case.
>>>>
>>>>
>>>> --
>>>> Thanks Regards
>>>>
>>>> SHASHANK AGARWAL
>>>>  ---  Trying to mobilize the things....
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Thanks Regards
>>
>> SHASHANK AGARWAL
>>  ---  Trying to mobilize the things....
>>
>>
>
>
> --
> Thanks Regards
>
> SHASHANK AGARWAL
>  ---  Trying to mobilize the things....
>
>
>


-- 
Thanks Regards

SHASHANK AGARWAL
 ---  Trying to mobilize the things....

Re: Can i use lot of keyd states or should i use 1 big key state.

Reply via email to