If you have set up the table as described in my previous message, you could run this python snippet to return the desired result:
#!/usr/bin/env python # -*- coding: utf-8 -*- import logging logging.basicConfig() from operator import itemgetter import cassandra from cassandra.cluster import Cluster from cassandra.query import SimpleStatement cql_cluster = Cluster() cql_session = cql_cluster.connect() cql_session.set_keyspace('latest') select_stmt = "select * from time_series where userid = 'XYZ'" query = SimpleStatement(select_stmt) rows = cql_session.execute(query) results = [] for row in rows: max_time = max(row.colname.keys()) results.append((row.userid, row.pkid, max_time, row.colname[max_time])) sorted_results = sorted(results, key=itemgetter(2), reverse=True) for result in sorted_results: print result # prints: # (u'XYZ', u'1002', u'204', u'Col-Name-5') # (u'XYZ', u'1000', u'203', u'Col-Name-4') # (u'XYZ', u'1001', u'201', u'Col-Name-2') On Tue, Sep 10, 2013 at 6:32 PM, Laing, Michael <michael.la...@nytimes.com>wrote: > You could try this. C* doesn't do it all for you, but it will efficiently > get you the right data. > > -ml > > -- put this in <file> and run using 'cqlsh -f <file> > > DROP KEYSPACE latest; > > CREATE KEYSPACE latest WITH replication = { > 'class': 'SimpleStrategy', > 'replication_factor' : 1 > }; > > USE latest; > > CREATE TABLE time_series ( > userid text, > pkid text, > colname map<text, text>, > PRIMARY KEY (userid, pkid) > ); > > UPDATE time_series SET colname = colname + {'200':'Col-Name-1'} WHERE > userid = 'XYZ' AND pkid = '1000'; > UPDATE time_series SET colname = colname + > {'201':'Col-Name-2'} WHERE userid = 'XYZ' AND pkid = '1001'; > UPDATE time_series SET colname = colname + > {'202':'Col-Name-3'} WHERE userid = 'XYZ' AND pkid = '1000'; > UPDATE time_series SET colname = colname + > {'203':'Col-Name-4'} WHERE userid = 'XYZ' AND pkid = '1000'; > UPDATE time_series SET colname = colname + > {'204':'Col-Name-5'} WHERE userid = 'XYZ' AND pkid = '1002'; > > SELECT * FROM time_series WHERE userid = 'XYZ'; > > -- returns: > -- userid | pkid | colname > > ----------+------+----------------------------------------------------------------- > -- XYZ | 1000 | {'200': 'Col-Name-1', '202': 'Col-Name-3', '203': > 'Col-Name-4'} > -- XYZ | 1001 | {'201': > 'Col-Name-2'} > -- XYZ | 1002 | {'204': > 'Col-Name-5'} > > -- use an app to pop off the latest key/value from the map for each row, > then sort by key desc. > > > On Tue, Sep 10, 2013 at 9:21 AM, Ravikumar Govindarajan < > ravikumar.govindara...@gmail.com> wrote: > >> I have been faced with a problem of grouping composites on the >> second-part. >> >> Lets say my CF contains this >> >> >> TimeSeriesCF >> key: UserID >> composite-col-name: TimeUUID:PKID >> >> Some sample data >> >> UserID = XYZ >> Time:PKID >> Col-Name1 = 200:1000 >> Col-Name2 = 201:1001 >> Col-Name3 = 202:1000 >> Col-Name4 = 203:1000 >> Col-Name5 = 204:1002 >> >> Whenever a time-series query is issued, it should return the following in >> time-desc order. >> >> UserID = XYZ >> Col-Name5 = 204:1002 >> Col-Name4 = 203:1000 >> Col-Name2 = 201:1001 >> >> Is something like this possible in Cassandra? Is there a different way to >> design and achieve the same objective? >> >> -- >> Ravi >> >> > >