Thanks Aljoscha and Stephan for clearing the doubt.
On Wed, Aug 9, 2017 at 7:37 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > > If you have one keyed state, say "count by email id", and many different > keys you will only have one column in RocksDB (or one HashTable). Actually, > a lot of users have hundreds of millions of different keys for some states. > > Best, > Aljoscha > > On 2. Aug 2017, at 14:59, shashank agarwal <shashank...@gmail.com> wrote: > > If I am creating KeyedState ("count by email id") and keyed stream has 10 > unique email id's. > > Will it create 1 column family or hash table ? > > Or it will create 10 column family or hash table ? > > Can i have millions of unique email id in that keyed state ? > > > > On Tue, Aug 1, 2017 at 2:59 AM, shashank agarwal <shashank...@gmail.com> > wrote: > >> Ok if i am taking it as right for an example : >> >> if i am creating a keyed state with name "total count by email" for >> key(project id + email) than it will create a single hash-table or column >> family "total count by email" and all the unique email id's will be rows of >> that single hash-table or column family and than i can store millions of >> unique email id's in that. >> >> Means it will create only single state object for all unique email id's ? >> >> >> >> >> On Tue, Aug 1, 2017 at 1:53 AM, Stephan Ewen <se...@apache.org> wrote: >> >>> Each keyed state in Flink is a hashtable or a column family in RocksDB. >>> Having too many of those is not memory efficient. >>> >>> Having fewer states is better, if you can adapt your schema that way. >>> >>> I would also look into "MapState", which is an efficient way to have >>> "sub keys" under a keyed state. >>> >>> Stephan >>> >>> >>> On Mon, Jul 31, 2017 at 6:01 PM, shashank agarwal <shashank...@gmail.com >>> > wrote: >>> >>>> Hello, >>>> >>>> I have to compute results on basis of lot of history data, parameters >>>> like total transactions in last 1 month, last 1 day, last 1 hour etc. by >>>> email id, ip, mobile, name, address, zipcode etc. >>>> >>>> So my question is this right approach to create keyed state by email, >>>> mobile, zipcode etc. or should i create 1 big mapped state (BS) and than >>>> process that BS, may be in process function or by applying some loop and >>>> filter logic in window or process function. >>>> >>>> My main worry is i will end up with millions of states, because there >>>> can be millions unique emails, phone numbers or zipcode if i create keyed >>>> state by email, phone etc. >>>> >>>> am i right ? is this impact on the performance or is this wrong >>>> approach ? Which approach would you suggest in this use case. >>>> >>>> >>>> -- >>>> Thanks Regards >>>> >>>> SHASHANK AGARWAL >>>> --- Trying to mobilize the things.... >>>> >>>> >>>> >>>> >>>> >>> >> >> >> -- >> Thanks Regards >> >> SHASHANK AGARWAL >> --- Trying to mobilize the things.... >> >> > > > -- > Thanks Regards > > SHASHANK AGARWAL > --- Trying to mobilize the things.... > > > -- Thanks Regards SHASHANK AGARWAL --- Trying to mobilize the things....