The ticket for pluggable compaction is
https://issues.apache.org/jira/browse/CASSANDRA-1610.
It's not released yet, so there is not real documentation for this
yet. But if you really want to look into it, you can start looking at
AbstractCompactionStragegy in trunk.

--
Sylvain


On Wed, Jul 20, 2011 at 10:57 AM, Lior Golan <lio...@taboola.com> wrote:
> Thanks Sylvain
>
> Can you please point us to what interface should be implemented in order to 
> write our own custom compaction. And how is it supposed to be configured?
>
> -----Original Message-----
> From: Sylvain Lebresne [mailto:sylv...@datastax.com]
> Sent: Tuesday, July 19, 2011 11:40 AM
> To: user@cassandra.apache.org
> Subject: Re: How to keep only exactly column of key
>
> On Tue, Jul 19, 2011 at 10:15 AM, Lior Golan <lio...@taboola.com> wrote:
>> Can't this capping be done (approximately) during compaction.
>> Something
>> like:
>>
>> 1.       Ability to define for a column family that it's a "capped
>> collection" with at most N columns per row
>>
>> 2.       During write - just add the column
>>
>> 3.       During reads - get a slice with the most recent / top N
>> column (in terms of column order)
>>
>> 4.       During compaction - if the number of columns in the row is
>> more than N, trim it to the top N columns (by replacing the rest of
>> the columns with a tombstone in the compacted row)
>>
>> Since I guess the purpose of this is for automated cleanup, and not
>> for enforcing exactly N columns, I think this would be sufficient
>
> The problem with that is that we cannot enforce this on the query side.
> Or more precisely, returning the top N first columns is fine, but what with 
> query like "M columns starting from 'b'" ? Or columns by name ?
> We cannot do those efficiently while enforcing that we won't return any 
> columns after the N first ones. The only solution would be to always query 
> the first N ones and then filter afterwards, but that's not efficient.
>
> What I mean here is that it is hard to add that as a column family option 
> given the limitation it would entail. That being said, 1.0 will add pluggable 
> compaction (it's already in trunk) and it will be very easy to have a 
> compaction that just drop columns after the N first. It would then be on the 
> client side to deal with the possibility to get more that the first N ones, 
> but as you said, if it is for automated cleanup, that will be enough.
>
> --
> Sylvain
>
>> From: Tupshin Harper [mailto:tups...@tupshin.com]
>> Sent: Tuesday, July 19, 2011 10:04 AM
>> To: user@cassandra.apache.org
>> Subject: Re: How to keep only exactly column of key
>>
>>
>>
>> Speaking from practical experience, it is possible to simulate this
>> feature by retrieving a slice of your row that only contains the most
>> recent 100 items. You can then prevent the rows from growing out of
>> control by checking the size of the row and pruning it back to 100
>> every N writes, where N is small enough to prevent excessive growth,
>> but large enough to prevent excessive overhead. A value of 50 or so
>> for N worked reasonably well for me for. If you do go down this path,
>> though, keep in mind that rapid writes and deletes to a single column
>> are basically a Cassandra anti-pattern due to performance problems with huge 
>> numbers of tombstones.
>>
>>
>>
>> I would love to see a feature added similar to MongoDB's "capped
>> collections", but I don't believe there is any easy way to retrofit it
>> into Cassandra's sstable approach.
>> http://www.mongodb.org/display/DOCS/Capped+Collections
>>
>>
>>
>> -Tupshin
>>
>> On Mon, Jul 18, 2011 at 8:22 AM, JKnight JKnight <beukni...@gmail.com>
>> wrote:
>>
>> Dear all,
>>
>>
>>
>> I want to keep only 100 column of a key: when I add a column for a
>> key, if the number column of key is 100, another column (by order) will be 
>> deleted.
>>
>>
>>
>> Does Cassandra have setting for that?
>>
>> --
>> Best regards,
>> JKnight
>>
>>
>
>
>

Reply via email to