Thanks Michael, But I cannot sort the rows in memory, as the number of columns will be quite huge.
>From the python script above: select_stmt = "select * from time_series where userid = 'XYZ'" This would return me many hundreds of thousands of columns. I need to go in time-series order using ranges [Pagination queries]. On Wed, Sep 11, 2013 at 7:06 AM, Laing, Michael <michael.la...@nytimes.com>wrote: > If you have set up the table as described in my previous message, you > could run this python snippet to return the desired result: > > #!/usr/bin/env python > # -*- coding: utf-8 -*- > import logging > logging.basicConfig() > > from operator import itemgetter > > import cassandra > from cassandra.cluster import Cluster > from cassandra.query import SimpleStatement > > cql_cluster = Cluster() > cql_session = cql_cluster.connect() > cql_session.set_keyspace('latest') > > select_stmt = "select * from time_series where userid = 'XYZ'" > query = SimpleStatement(select_stmt) > rows = cql_session.execute(query) > > results = [] > for row in rows: > max_time = max(row.colname.keys()) > results.append((row.userid, row.pkid, max_time, row.colname[max_time])) > > sorted_results = sorted(results, key=itemgetter(2), reverse=True) > for result in sorted_results: print result > > # prints: > > # (u'XYZ', u'1002', u'204', u'Col-Name-5') > # (u'XYZ', u'1000', u'203', u'Col-Name-4') > # (u'XYZ', u'1001', u'201', u'Col-Name-2') > > > > On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael <michael.la...@nytimes.com > > wrote: > >> You could try this. C* doesn't do it all for you, but it will efficiently >> get you the right data. >> >> -ml >> >> -- put this in <file> and run using 'cqlsh -f <file> >> >> DROP KEYSPACE latest; >> >> CREATE KEYSPACE latest WITH replication = { >> 'class': 'SimpleStrategy', >> 'replication_factor' : 1 >> }; >> >> USE latest; >> >> CREATE TABLE time_series ( >> userid text, >> pkid text, >> colname map<text, text>, >> PRIMARY KEY (userid, pkid) >> ); >> >> UPDATE time_series SET colname = colname + {'200':'Col-Name-1'} WHERE >> userid = 'XYZ' AND pkid = '1000'; >> UPDATE time_series SET colname = colname + >> {'201':'Col-Name-2'} WHERE userid = 'XYZ' AND pkid = '1001'; >> UPDATE time_series SET colname = colname + >> {'202':'Col-Name-3'} WHERE userid = 'XYZ' AND pkid = '1000'; >> UPDATE time_series SET colname = colname + >> {'203':'Col-Name-4'} WHERE userid = 'XYZ' AND pkid = '1000'; >> UPDATE time_series SET colname = colname + >> {'204':'Col-Name-5'} WHERE userid = 'XYZ' AND pkid = '1002'; >> >> SELECT * FROM time_series WHERE userid = 'XYZ'; >> >> -- returns: >> -- userid | pkid | colname >> >> ----------+------+----------------------------------------------------------------- >> -- XYZ | 1000 | {'200': 'Col-Name-1', '202': 'Col-Name-3', '203': >> 'Col-Name-4'} >> -- XYZ | 1001 | {'201': >> 'Col-Name-2'} >> -- XYZ | 1002 | {'204': >> 'Col-Name-5'} >> >> -- use an app to pop off the latest key/value from the map for each row, >> then sort by key desc. >> >> >> On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan < >> ravikumar.govindara...@gmail.com> wrote: >> >>> I have been faced with a problem of grouping composites on the >>> second-part. >>> >>> Lets say my CF contains this >>> >>> >>> TimeSeriesCF >>> key: UserID >>> composite-col-name: TimeUUID:PKID >>> >>> Some sample data >>> >>> UserID = XYZ >>> Time:PKID >>> Col-Name1 = 200:1000 >>> Col-Name2 = 201:1001 >>> Col-Name3 = 202:1000 >>> Col-Name4 = 203:1000 >>> Col-Name5 = 204:1002 >>> >>> Whenever a time-series query is issued, it should return the following >>> in time-desc order. >>> >>> UserID = XYZ >>> Col-Name5 = 204:1002 >>> Col-Name4 = 203:1000 >>> Col-Name2 = 201:1001 >>> >>> Is something like this possible in Cassandra? Is there a different way >>> to design and achieve the same objective? >>> >>> -- >>> Ravi >>> >>> >> >> >