The map version of the schema needs to deserialize, serialize, and then deserialize about 85 times more cells, if your average map has 85 elements. I would assume that's where most of the performance slowdown is coming from. If you can take the time to run that through a profiler, that would be useful to see if there is some unexpected inefficiency.
I'll also point out that you could use a frozen map (e.g. frozen<map<text, float>>) and you'd probably get performance that's somewhere in the middle of the other two approaches. On Tue, Aug 30, 2016 at 8:00 PM, Ben Frank <b...@airlust.com> wrote: > Hi all, I posted this question on stackoverflow - I'm having an issue with > CQL collections, anyone got any insight here? > > (http://stackoverflow.com/questions/39218180/cql-collections-appear-slow) > > I'm playing around with storing data in cassandra and I'm finding a > significant performance problem with CQL collections. I started with this > schema: > > CREATE TABLE TEST ( > date DATE, > tranche TEXT, > id INT, > properties MAP<TEXT,FLOAT>, > PRIMARY KEY ((date,tranche), id)) > > if I run a query for all data in this partition > > SELECT * FROM TEST where date = "2016-08-26" and tranche = "third" > > tracing reports it takes ~1.3 seconds to load 15K rows. There are about 85 > entries in the map. Wall clock time from python is ~5 seconds. This seems > really slow to load just one 'partition' > > So I tried this schema instead and used message pack to store the entire > map in a single cell > > CREATE TABLE TEST ( > date DATE, > tranche TEXT, > id INT, > properties blob, > PRIMARY KEY ((date,tranche), id)) > > Now the same query takes ~60ms (as reported by tracing) and ~500ms wall > clock time (again using python) > > I get that there's more to do with the MAP version, but this seems like an > unexpected performance degradation. > > One oddity I noticed while testing this was that in both cases tracing > reported it was returning 15K cells (which corresponds to the number of > rows). I'd expect this in the second schema, but my understanding was that > each element in a map was stored in it's own cell in current versions of > cassandra, so a bit surprised by this. > > I'm using version 3.7 of cassandra and the datastax python drivers. Anyone > got any insight into what happening here? > > -Ben > -- Tyler Hobbs DataStax <http://datastax.com/>