Thanks Fang, will do some read up. /Uwe
On Tue, Feb 17, 2015 at 11:01 PM, Yan Fang <yanfang...@gmail.com> wrote: > Hi Uwe, > > Your use case seems to me is more like a state-management case. What comes > to my mind is that, > 1) every time a song is played, you updates the count of this song. You do > not put the map in memory, as you said, the memory could be quite large. > Instead, you use Samza's build-in key-value storage. ( you do all this in > process method ) > > 2) you scan the whole key-value DB every, say, one hour. ( you do all this > in window method) > > * This could provide better fault-tolerance ( for example, your machine is > down during the one hour. you will not lose any count number by restoring > the key-value DB) > > Some relevant links: > * > http://samza.apache.org/learn/documentation/0.8/container/state-management.html#windowed-aggregation > * > http://samza.apache.org/learn/documentation/0.8/container/state-management.html#approaches-to-managing-task-state > * > http://samza.apache.org/learn/documentation/0.8/container/state-management.html#key-value-storage > > Hope this helps. > > Cheers, > > Fang, Yan > yanfang...@gmail.com > +1 (206) 849-4108 > > On Tue, Feb 17, 2015 at 11:35 AM, Uwe Dauernheim <u...@dauernheim.net> wrote: > >> I try to model a music charts system to get familiar with Samza. >> Charts are defined by the top N entries with highest count of a map >> from unique track ID, basically a song, to counter, basically the >> amount of plays of this entity, during a sliding time-window. >> >> The problem I see is that of an evergrowing size of this map as the ID >> space of tracks can be quite large (let's pick 2E6). Not all of these >> IDs will be played (thus should be counted) within a given time-window >> (let's pick 1 hour) but it's not obvious to me when to prune the map >> during this sliding time-window. >> >> I assume dealing with sliding time-windows is a common case for stream >> processing thus some useful API provided by Samza. Does an example or >> tutorial for this kind of sliding time-window counting example exist? >>