Hi all,
I'm trying to implement a priority queue for holding a large number (millions) of items that need to be processed in time order. My solution works - but gets slower and slower until performance is unacceptable - even with a small number of items. Each item essentially needs to be popped off the queue (some arbitrary work is then done) and then the item is returned to the queue with a new timestamp indicating when it should be processed again. We thus cycle through all work items eventually, but some may come around more frequently than others. I am implementing this as a single Cassandra row, in a CF with a TimeUUID comparator. Each column name is a TimeUUID, with an arbitrary column value describing the work item; the columns are thus sorted in time order. To pop items, I do a get() such as: cf.get(row_key, column_finish=now, column_start=yesterday, column_count=1000) to get all the items at the head of the queue (if any) whose time exceeds the current system time. For each item retrieved, I do a delete to remove the old column, then an insert with a fresh TimeUUID column name (system time + arbitrary increment), thus putting the item back somewhere in the queue (currently, the back of the queue) I do a batch_mutate for all these deletes and inserts, with a queue size of 2000. These are currently interleaved i.e. delete1-insert1-delete2-insert2... This all appears to work correctly, but the performance starts at around 8000 cycles/sec, falls to around 1800/sec over the first 250K cycles, and continues to fall over time, down to about 150/sec, after a few million cycles. This happens regardless of the overall size of the row (I have tried sizes from 1000 to 100,000 items). My target performance is 1000 cycles/sec (but my data store will need to handle other work concurrently). I am currently using just a single node running on localhost, using a pycassa client. 4 core, 4GB machine, Fedora 14. Is this expected behaviour (is there just too much churn for a single row to perform well), or am I doing something wrong? Would https://issues.apache.org/jira/browse/CASSANDRA-2583 in version 0.8.1 fix this problem (I am using version 0.7.6)? Thanks! David. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. QinetiQ may monitor email traffic data and also the content of email for the purposes of security. QinetiQ Limited (Registered in England & Wales: Company Number: 3796233) Registered office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.