Here's a slightly better version and a python script. -ml
-- put this in and run using 'cqlsh -f
DROP KEYSPACE latest;
CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};
USE latest;
CREATE TABLE time_series (
bucket_userid text, --
Then you can do this. I handle millions of entries this way and it works
well if you are mostly interested in recent activity.
If you need to span all activity then you can use a separate table to
maintain the 'latest'. This table should also be sharded as entries will be
'hot'. Sharding will spre
Thanks Michael,
But I cannot sort the rows in memory, as the number of columns will be
quite huge.
>From the python script above:
select_stmt = "select * from time_series where userid = 'XYZ'"
This would return me many hundreds of thousands of columns. I need to go in
time-series order using
If you have set up the table as described in my previous message, you could
run this python snippet to return the desired result:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import logging
logging.basicConfig()
from operator import itemgetter
import cassandra
from cassandra.cluster import Clus
You could try this. C* doesn't do it all for you, but it will efficiently
get you the right data.
-ml
-- put this in and run using 'cqlsh -f
DROP KEYSPACE latest;
CREATE KEYSPACE latest WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 1
};
USE latest;
CREATE TA
I have been faced with a problem of grouping composites on the second-part.
Lets say my CF contains this
TimeSeriesCF
key:UserID
composite-col-name:TimeUUID:PKID
Some sample data
UserID = XYZ