Re: Storing large lists into state per key

2017-12-20 Thread Stephan Ewen
Thanks a lot! On Tue, Dec 19, 2017 at 11:08 PM, Jan Lukavský wrote: > Hi, > > I filled a JIRA issue and pushed a PR for this. > > https://issues.apache.org/jira/browse/FLINK-8297 > > Best, > > Jan > > > On 12/14/2017 11:13 AM, Stephan Ewen wrote: > >> Hi Jan! >> >> One could implement the Rocks

Re: Storing large lists into state per key

2017-12-19 Thread Jan Lukavský
Hi, I filled a JIRA issue and pushed a PR for this. https://issues.apache.org/jira/browse/FLINK-8297 Best,  Jan On 12/14/2017 11:13 AM, Stephan Ewen wrote: Hi Jan! One could implement the RocksDB ListState like you suggested. We did it the current way because that pattern is actually quit

Re: Storing large lists into state per key

2017-12-14 Thread Jan Lukavský
Hi Stephen, yes, definitely. I have put together a POC implementation that seems to work for my use-case (not yet tested for performance, though). A have put together a PR, just for discussion of the topic, here: https://github.com/datadrivencz/flink/pull/1/ I know, that the PR doesn't follo

Re: Storing large lists into state per key

2017-12-14 Thread Stephan Ewen
Hi Jan! One could implement the RocksDB ListState like you suggested. We did it the current way because that pattern is actually quite efficient if you list fits into memory - The list append is constant and the list access is the first time the values are concatenated. Especially for typical win

Re: Storing large lists into state per key

2017-12-13 Thread Aljoscha Krettek
Hi, Using a MapState is a workaround that should work but it would be nice if ListState would work for state that is too big to fit into memory. Best, Aljoscha > On 13. Dec 2017, at 17:40, Jan Lukavský wrote: > > Hi Aljoscha, > > thanks for reply. Do you see any issues in implementing the li

Re: Storing large lists into state per key

2017-12-13 Thread Jan Lukavský
Hi Aljoscha, thanks for reply. Do you see any issues in implementing the list state the way Fabian suggested (i.e. using the MapState)? I feel there are some open questions, mostly because the InternalListState (which I suppose the RocksDBListState should implement) extends InternalKvState, w

Re: Storing large lists into state per key

2017-12-13 Thread Aljoscha Krettek
Hi, If I remember correctly, there was actually an effort to change the RocksDB list state the way you described. I'm cc'ing Stephan, who was involved in that and this is the Jira issue: https://issues.apache.org/jira/browse/FLINK-5756 Best, A

Re: Storing large lists into state per key

2017-12-12 Thread Ovidiu-Cristian MARCU
Hi Jan, You could associate a key to each element of your Key's list (e.g., hashing the value), keep only the keys in heap (e.g., in a list) and the associated state key-value/s in an external store like RocksDB/Redis, but you will notice large overheads due to de/serializing - a huge penatly f

Re: Storing large lists into state per key

2017-12-12 Thread Jan Lukavský
Hi Fabian, thanks for quick reply, what you suggest seems to work at first sight, I will try it. Is there any reason not to implement a RocksDBListState this way in general? Is there any increased overhead of this approach? Thanks,  Jan On 12/12/2017 11:17 AM, Fabian Hueske wrote: Hi Jan,

Re: Storing large lists into state per key

2017-12-12 Thread Fabian Hueske
Hi Jan, I cannot comment on the internal design, but you could put the data into a RocksDBStateBackend MapState where the value X is your data type and the key is the list index. You would need another ValueState for the current number of elements that you put into the MapState. A MapState allows

Storing large lists into state per key

2017-12-12 Thread Jan Lukavský
Hi all, I have a question that appears as a user@ question, but brought me into the dev@ mailing list while I was browsing through the Flink's source codes. First I'll try to briefly describe my use case. I'm trying to do a group-by-key operation with a limited number of distinct keys (which I