Hi, If you have one keyed state, say "count by email id", and many different keys you will only have one column in RocksDB (or one HashTable). Actually, a lot of users have hundreds of millions of different keys for some states.
Best, Aljoscha > On 2. Aug 2017, at 14:59, shashank agarwal <shashank...@gmail.com> wrote: > > If I am creating KeyedState ("count by email id") and keyed stream has 10 > unique email id's. > > Will it create 1 column family or hash table ? > > Or it will create 10 column family or hash table ? > > Can i have millions of unique email id in that keyed state ? > > > > On Tue, Aug 1, 2017 at 2:59 AM, shashank agarwal <shashank...@gmail.com > <mailto:shashank...@gmail.com>> wrote: > Ok if i am taking it as right for an example : > > if i am creating a keyed state with name "total count by email" for > key(project id + email) than it will create a single hash-table or column > family "total count by email" and all the unique email id's will be rows of > that single hash-table or column family and than i can store millions of > unique email id's in that. > > Means it will create only single state object for all unique email id's ? > > > > > On Tue, Aug 1, 2017 at 1:53 AM, Stephan Ewen <se...@apache.org > <mailto:se...@apache.org>> wrote: > Each keyed state in Flink is a hashtable or a column family in RocksDB. > Having too many of those is not memory efficient. > > Having fewer states is better, if you can adapt your schema that way. > > I would also look into "MapState", which is an efficient way to have "sub > keys" under a keyed state. > > Stephan > > > On Mon, Jul 31, 2017 at 6:01 PM, shashank agarwal <shashank...@gmail.com > <mailto:shashank...@gmail.com>> wrote: > Hello, > > I have to compute results on basis of lot of history data, parameters like > total transactions in last 1 month, last 1 day, last 1 hour etc. by email id, > ip, mobile, name, address, zipcode etc. > > So my question is this right approach to create keyed state by email, mobile, > zipcode etc. or should i create 1 big mapped state (BS) and than process that > BS, may be in process function or by applying some loop and filter logic in > window or process function. > > My main worry is i will end up with millions of states, because there can be > millions unique emails, phone numbers or zipcode if i create keyed state by > email, phone etc. > > am i right ? is this impact on the performance or is this wrong approach ? > Which approach would you suggest in this use case. > > > -- > Thanks Regards > > SHASHANK AGARWAL > --- Trying to mobilize the things.... > > > > > > > > > -- > Thanks Regards > > SHASHANK AGARWAL > --- Trying to mobilize the things.... > > > > > -- > Thanks Regards > > SHASHANK AGARWAL > --- Trying to mobilize the things....