Hi, Pan,
I was reading the 10.0 documentation on Samza state management. One
particular section that explains counting the number of page views for each
user stands out to me, as it also uses a full table scan to output
aggregation results:
Note that this job effectively pauses at the hour mark t
Hi, David,
Generally speaking, iterators will make a snapshot of key space of RocksDB.
Hence, it associates with some memory overhead. More severe performance
issue we saw before is that if you insert and delete tons of sessions in a
short time period, the iterator seek function can be extremely s
Hi, Yi,
Yes, the sessions are keyed by the sessionId.
In our case, iterating through all OPEN sessions is inevitable, since that
is precisely where we evaluate (base on timestamp) and close sessions. In
other words, the closed session queue you suggested cannot be constructed
without going throug
Hi, David,
I would recommend to keep a separate table of closed sessions as a "queue",
ordered by the time the session is closed. And in your window method, just
create an iterator in the "queue" and only make progress toward the end of
the "queue", and do a point deletion in the sessionStore, whi
We use Samza RocksDB to keep track of our user event sessions. The task
periodically calls window() to update all sessions in the store and purge
all closed sessions.
We do all of this in the same iterator loop.
Here's how we are doing it:
public void window(MessageCollector collector, TaskCoor