Hi all, I posted this question on stackoverflow - I'm having an issue with CQL collections, anyone got any insight here?
(http://stackoverflow.com/questions/39218180/cql-collections-appear-slow) I'm playing around with storing data in cassandra and I'm finding a significant performance problem with CQL collections. I started with this schema: CREATE TABLE TEST ( date DATE, tranche TEXT, id INT, properties MAP<TEXT,FLOAT>, PRIMARY KEY ((date,tranche), id)) if I run a query for all data in this partition SELECT * FROM TEST where date = "2016-08-26" and tranche = "third" tracing reports it takes ~1.3 seconds to load 15K rows. There are about 85 entries in the map. Wall clock time from python is ~5 seconds. This seems really slow to load just one 'partition' So I tried this schema instead and used message pack to store the entire map in a single cell CREATE TABLE TEST ( date DATE, tranche TEXT, id INT, properties blob, PRIMARY KEY ((date,tranche), id)) Now the same query takes ~60ms (as reported by tracing) and ~500ms wall clock time (again using python) I get that there's more to do with the MAP version, but this seems like an unexpected performance degradation. One oddity I noticed while testing this was that in both cases tracing reported it was returning 15K cells (which corresponds to the number of rows). I'd expect this in the second schema, but my understanding was that each element in a map was stored in it's own cell in current versions of cassandra, so a bit surprised by this. I'm using version 3.7 of cassandra and the datastax python drivers. Anyone got any insight into what happening here? -Ben