Thanks Sylvain Can you please point us to what interface should be implemented in order to write our own custom compaction. And how is it supposed to be configured?
-----Original Message----- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Tuesday, July 19, 2011 11:40 AM To: user@cassandra.apache.org Subject: Re: How to keep only exactly column of key On Tue, Jul 19, 2011 at 10:15 AM, Lior Golan <lio...@taboola.com> wrote: > Can't this capping be done (approximately) during compaction. > Something > like: > > 1. Ability to define for a column family that it's a "capped > collection" with at most N columns per row > > 2. During write - just add the column > > 3. During reads - get a slice with the most recent / top N > column (in terms of column order) > > 4. During compaction - if the number of columns in the row is > more than N, trim it to the top N columns (by replacing the rest of > the columns with a tombstone in the compacted row) > > Since I guess the purpose of this is for automated cleanup, and not > for enforcing exactly N columns, I think this would be sufficient The problem with that is that we cannot enforce this on the query side. Or more precisely, returning the top N first columns is fine, but what with query like "M columns starting from 'b'" ? Or columns by name ? We cannot do those efficiently while enforcing that we won't return any columns after the N first ones. The only solution would be to always query the first N ones and then filter afterwards, but that's not efficient. What I mean here is that it is hard to add that as a column family option given the limitation it would entail. That being said, 1.0 will add pluggable compaction (it's already in trunk) and it will be very easy to have a compaction that just drop columns after the N first. It would then be on the client side to deal with the possibility to get more that the first N ones, but as you said, if it is for automated cleanup, that will be enough. -- Sylvain > From: Tupshin Harper [mailto:tups...@tupshin.com] > Sent: Tuesday, July 19, 2011 10:04 AM > To: user@cassandra.apache.org > Subject: Re: How to keep only exactly column of key > > > > Speaking from practical experience, it is possible to simulate this > feature by retrieving a slice of your row that only contains the most > recent 100 items. You can then prevent the rows from growing out of > control by checking the size of the row and pruning it back to 100 > every N writes, where N is small enough to prevent excessive growth, > but large enough to prevent excessive overhead. A value of 50 or so > for N worked reasonably well for me for. If you do go down this path, > though, keep in mind that rapid writes and deletes to a single column > are basically a Cassandra anti-pattern due to performance problems with huge > numbers of tombstones. > > > > I would love to see a feature added similar to MongoDB's "capped > collections", but I don't believe there is any easy way to retrofit it > into Cassandra's sstable approach. > http://www.mongodb.org/display/DOCS/Capped+Collections > > > > -Tupshin > > On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight <beukni...@gmail.com> > wrote: > > Dear all, > > > > I want to keep only 100 column of a key: when I add a column for a > key, if the number column of key is 100, another column (by order) will be > deleted. > > > > Does Cassandra have setting for that? > > -- > Best regards, > JKnight > >