Hi, I have a column family storing very large blobs that I would not like to duplicate, if possible. Here's a simplified version:
CREATE TABLE timeline ( key text, a int, b int, value blob, PRIMARY KEY (key, a, b) ); On this, I run exactly two types of query. Both of them must have a query range on 'a', and just one must have 'b' restricted. First query: cqlsh> SELECT * FROM timeline where key = 'event' and a >= 2 and a <= 3; This one runs fine. Second query: cqlsh> SELECT * FROM timeline where key = 'event' and a >= 2 and a <= 3 and b = 12; code=2200 [Invalid query] message="PRIMARY KEY column "b" cannot be restricted (preceding column "ColumnDefinition{name=a, type=org.apache.cassandra.db.marshal.Int32Type, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}" is either not restricted or by a non-EQ relation)" This fails. Even if I create an index: CREATE INDEX timeline_b ON timeline (b); cqlsh> SELECT * FROM timeline where key = 'event' and a >= 2 and a <= 3 and b = 12; code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING" I solved this problem by duplicating the column family (in "timeline_by_a" and "timeline_by_b" where a and b are in opposite order), but I'm wondering if there's a better solution, as this tends to grow pretty big. In particular, from the little understanding that I have of the Cassandra internals, it seems like even the second query should be fairly efficient since the clustering columns are stored in order on disk, thus I don't understand the ALLOW FILTERING requirement. Another alternative that I'm thinking is just keeping another column family that will serve as an "index" and I'll manually manage it in the application. Thanks.