I'm in the process of implementing a Totally Ordered Queue in Cassandra, and
wanted to bounce my ideas off the list and also see if there are any other
suggestions.

I've come up with an external source of ID's that are always increasing (but
not monotonic), and I've also used external synchronization to ensure only
one writer to a given queue. And I handle de-duping in the app.


My current solution is : (simplified)

Use the "QueueId", to Key into a row of a CF.
Then, every column in that CF corresponds to a new entry in the Queue, with
a custom Comparator to sort the columns by my external ID that is always
increasing.

Technically I never delete data from the Queue, and I just page through it
from a given ID using a SliceRange, etc.

Obviously the problem being that the row needs to get compacted. so then I
started bucketizing with multiple rows for a given queue (for example one
per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...)

Does this seem reasonable? It's solvable, but is starting to seem
complicated to implement... It would be very easy if I didn't have to have
multiple buckets..



My other thought is to store one entry per row, and perform get_range_slices
and specify a KeyRange, with the OrderPreservingPartitioner.
But it isn't exactly clear to me what the Order of the keys are in this
system, so I don't know how to construct my key and queries appropriately...
Is this Lexical String Order? Or?

So for example.. Assuming my QueueId's are longs, and my ID's are also
longs.. My key would be (in Java):

long queueId;
long msgId;

key = "" + queueId + ":" + msgId;

And if I wanted to do a query my key range might be from
start = "" + queueId + ":0"
end = "" + queueId + ":" + Long.MAX_VALUE;

(Will I have to left pad the msgIds with 0's)?

And is this going to be efficient if my msgId isn't monotonically
increasing?

Thanks,
-JD

Reply via email to