Limiting by number of columns in a row will perform very poorly.

Limiting by the time a column has existed can perform quite well, and
was added by Sylvain for 0.7 in
https://issues.apache.org/jira/browse/CASSANDRA-699

On Fri, Apr 16, 2010 at 1:50 PM, Chris Shorrock <ch...@shorrockin.com> wrote:
> I'm attempting to come up with a technique for limiting the number of
> columns a single key (or super column - doesn't matter too much for the
> context of this conversation) may contain at any one time. My actual
> use-case is a little too meaty to try to describe so an alternate use-case
> of this mechanism could be:
>
> Construct a twitter-esque feed which maintains a list N tweets. Tweets (in
> this system - and in reality I suppose) occur at such a rate that you want
> to limit a given users "feed" to N items. You do not have the ability to
> store an infinite number of tweets due to the physical constraints of your
> hardware.
>
> The "my first idea" answer is when a tweet is inserted into the the feed of
> a given person, that you then do a count and delete of any outstanding
> tweets. In reality you could first count, then (if count >= N) do a batch
> mutate for the insertion of the new entry and the removal of the old. My
> issue with this approach is that after a certain point every new entry into
> the system will incur the removal of an old entry. The count, once a feed
> has reached N will always be >= N on any subsequent queries. Depending on
> how you index the tweets you may need to actually do a read instead of count
> to get the row identifiers.
> My second approach was to utilize a "slot" system where you have a record
> stored somewhere that indicates the next slot for insertion. This can be
> thought of as a fixed length array where you store the next insertion point
> in some other column family. When a new tweet occurs you retrieve the
> current "slot" meta-data, insert into that index, then update the meta-data
> for the next insertion. My concerns with this relate around synchronization
> and losing entries due to concurrent operations. I'd rather not have to
> something like ZooKeeper to synchronize in the application cluster.
> I have some other ideas but I'm mostly just spit-balling at this point. So I
> thought I'd reach out the collective intelligence of the group to see if
> anyone has implemented something similar. Thanks in advance.

Reply via email to