The ticket for pluggable compaction is https://issues.apache.org/jira/browse/CASSANDRA-1610. It's not released yet, so there is not real documentation for this yet. But if you really want to look into it, you can start looking at AbstractCompactionStragegy in trunk.
-- Sylvain On Wed, Jul 20, 2011 at 10:57 AM, Lior Golan <lio...@taboola.com> wrote: > Thanks Sylvain > > Can you please point us to what interface should be implemented in order to > write our own custom compaction. And how is it supposed to be configured? > > -----Original Message----- > From: Sylvain Lebresne [mailto:sylv...@datastax.com] > Sent: Tuesday, July 19, 2011 11:40 AM > To: user@cassandra.apache.org > Subject: Re: How to keep only exactly column of key > > On Tue, Jul 19, 2011 at 10:15 AM, Lior Golan <lio...@taboola.com> wrote: >> Can't this capping be done (approximately) during compaction. >> Something >> like: >> >> 1. Ability to define for a column family that it's a "capped >> collection" with at most N columns per row >> >> 2. During write - just add the column >> >> 3. During reads - get a slice with the most recent / top N >> column (in terms of column order) >> >> 4. During compaction - if the number of columns in the row is >> more than N, trim it to the top N columns (by replacing the rest of >> the columns with a tombstone in the compacted row) >> >> Since I guess the purpose of this is for automated cleanup, and not >> for enforcing exactly N columns, I think this would be sufficient > > The problem with that is that we cannot enforce this on the query side. > Or more precisely, returning the top N first columns is fine, but what with > query like "M columns starting from 'b'" ? Or columns by name ? > We cannot do those efficiently while enforcing that we won't return any > columns after the N first ones. The only solution would be to always query > the first N ones and then filter afterwards, but that's not efficient. > > What I mean here is that it is hard to add that as a column family option > given the limitation it would entail. That being said, 1.0 will add pluggable > compaction (it's already in trunk) and it will be very easy to have a > compaction that just drop columns after the N first. It would then be on the > client side to deal with the possibility to get more that the first N ones, > but as you said, if it is for automated cleanup, that will be enough. > > -- > Sylvain > >> From: Tupshin Harper [mailto:tups...@tupshin.com] >> Sent: Tuesday, July 19, 2011 10:04 AM >> To: user@cassandra.apache.org >> Subject: Re: How to keep only exactly column of key >> >> >> >> Speaking from practical experience, it is possible to simulate this >> feature by retrieving a slice of your row that only contains the most >> recent 100 items. You can then prevent the rows from growing out of >> control by checking the size of the row and pruning it back to 100 >> every N writes, where N is small enough to prevent excessive growth, >> but large enough to prevent excessive overhead. A value of 50 or so >> for N worked reasonably well for me for. If you do go down this path, >> though, keep in mind that rapid writes and deletes to a single column >> are basically a Cassandra anti-pattern due to performance problems with huge >> numbers of tombstones. >> >> >> >> I would love to see a feature added similar to MongoDB's "capped >> collections", but I don't believe there is any easy way to retrofit it >> into Cassandra's sstable approach. >> http://www.mongodb.org/display/DOCS/Capped+Collections >> >> >> >> -Tupshin >> >> On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight <beukni...@gmail.com> >> wrote: >> >> Dear all, >> >> >> >> I want to keep only 100 column of a key: when I add a column for a >> key, if the number column of key is 100, another column (by order) will be >> deleted. >> >> >> >> Does Cassandra have setting for that? >> >> -- >> Best regards, >> JKnight >> >> > > >