Hi Kostas,

Thanks for responding. Details in-line below.

> On Apr 27, 2017, at 1:19am, Kostas Kloudas <k.klou...@data-artisans.com> 
> wrote:
> 
> Hi Ken,
> 
> Unfortunately, iterating over all keys is not currently supported.
> 
> Do you have your own custom operator (because you mention “from within the 
> operator…”) or
> you have a process function (because you mention the “onTimer” method)?

Currently it’s a process function, but I might be able to just use a regular 
operator.

> Also, could you describe your use case a bit more?  You have a periodic timer 
> per key and when
> a timer for a given key fires you want to have access to the state of all the 
> keys?

The timer bit is because I’m filling an async queue, and thus need to trigger 
emitting tuples to the operator’s output stream independent of inbound tuples.

The main problems I’m trying to solve (without requiring a separate scalable DB 
infrastructure) are:

 - entries have an associated “earliest processing time”. I don’t want to send 
these through the system until that time trigger has passed.
 - entries have an associated “score”. I want to favor processing high scoring 
entries over low scoring entries.
 - if an entry’s score is too low, I want to archive it, versus constantly 
re-evaluate it using the above two factors.

I’ve got my own custom DB that is working for the above, and scales to target 
sizes of 1B+ entries per server by using a mixture of RAM and disk.

But having to checkpoint it isn’t trivial.

So I thought that if there was a way to (occasionally) iterate over the keys in 
the state backend, I could get what I needed with the minimum effort.

But sounds like that’s not possible currently.

Thanks,

— Ken



>> On Apr 27, 2017, at 3:02 AM, Ken Krugler <kkrugler_li...@transpac.com 
>> <mailto:kkrugler_li...@transpac.com>> wrote:
>> 
>> Is there a way to iterate over all of the key/value entries in the state 
>> backend, from within the operator that’s making use of the same?
>> 
>> E.g. I’ve got a ReducingState, and on a timed interval (inside of the 
>> onTimer method) I need to iterate over all KV state and emit the N “best” 
>> entries.
>> 
>> What’s the recommended approach?
>> 
>> Thanks,
>> 
>> — Ken
>> 
> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Reply via email to