Hey Ben, Cool, please email if anything else comes up. Re: fresh-store, I think it should be possible to add a .clear() to the KV interface. This would result in creating a new DB and deleting the old one. Like the RocksDB TTL, it wouldn't result in any deletes being sent to the changelog, though. If this sounds useful, definitely open a JIRA for it.
Cheers, Chris On Tue, Feb 17, 2015 at 12:10 AM, Benjamin Edwards <edwards.b...@gmail.com> wrote: > I think having followed along with the other thread, my initial approach > was flawed. We use Cassandra in prod a ton (the classic Cassandra / Spark > combo) at my job and have been running into a few issues with streaming / > local state etc etc. Hence my wanting to have a look at Samza. Very long > way round to say that we use TTLs for lots of things! Thanks for the > write-up about the interaction between the db and the changelog . Very > thorough. I might come back with a request about the fresh store feature, > but it definitely needs a bit more baking / experience with Samza. > > Ben > > On Tue Feb 17 2015 at 01:59:03 Chris Riccomini <criccom...@apache.org> > wrote: > > > Hey Ben, > > > > The problem with TTL is that it's handled entirely internally in RocksDB. > > There's no way for us to know when a key's been deleted. You can work > > around this if you also alter the changelog topic settings in your > > changelog Kafka topic to be TTL based, not log-compacted, then these two > > should roughly match. For example, if you have a 1h TTL in RocksDB and a > 1h > > TTL in your Kafka changelog topic, then the semantics are ROUGHLY > > equivalent. I say ROUGHLY because the two are going to be GC'ing expired > > keys independently of one another. > > > > Also, during a restart, the TTLs in the RocksDB store will be fully > reset. > > For example, if at minute 59 of a key, you restart the job, then the > Kafka > > topic will restore it when the job starts, and the TTL will reset back > to 0 > > minutes in the RocksDB store (though, a minute later Kafka will drop it > > from the changelog). If you don't need EXACT TTL guarantees, then this > > should be fine. If you do need exact, then .all() is probably the way to > > go. > > > > Cheers, > > Chris > > > > On Mon, Feb 16, 2015 at 1:39 PM, Benjamin Edwards < > edwards.b...@gmail.com> > > wrote: > > > > > Yes, I was using a changelog. You bring up a good point. I think I need > > to > > > think harder about what I am trying to do. Maybe deleting all the keys > > > isn't that bad. Especially is I amortise it over the life of the next > > > period. > > > > > > It seems like waiting for TTLs is probably the right thing to do > > > ultimately. > > > > > > Thanks for the timely response! > > > > > > Ben > > > > > > On Sun Feb 15 2015 at 23:43:27 Chris Riccomini <criccom...@apache.org> > > > wrote: > > > > > > > Hey Benjamin, > > > > > > > > You're right. Currently you have to call .all(), and delete > everything. > > > > > > > > RocksDB just committed TTL support for their Java library. This > feature > > > > allows data to automatically be expired out. Once RocksDB releases > > their > > > > TTL patch (I believe in a few weeks, according to Igor), we'll update > > > Samza > > > > 0.9.0. Our tracker patch is here: > > > > > > > > https://issues.apache.org/jira/browse/SAMZA-537 > > > > > > > > > Is there no way to just say I don't care about the old data, gimme > a > > > new > > > > store? > > > > > > > > We don't have this feature right now, but we could add it. This > feature > > > is > > > > a bit more complicated when a changelog is attached, since we will > have > > > to > > > > execute deletes for every key (we still need to call .all()). Are you > > > > running with a changelog? > > > > > > > > Cheers, > > > > Chris > > > > > > > > On Sun, Feb 15, 2015 at 10:41 AM, Benjamin Edwards < > > > edwards.b...@gmail.com > > > > > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I am trialling samza for some windowed stream processing. > Typically I > > > > want > > > > > to aggregate a bunch of state over some window of messages, process > > the > > > > > data, then drop the current state. The only way that I can see to > do > > > that > > > > > at the moment is to delete every key. This seems expensive. Is > there > > no > > > > way > > > > > to just say I don't care about the old data, gimme a new store? > > > > > > > > > > Ben > > > > > > > > > > > > > > >