Hi all, I posted this question on stackoverflow - I'm having an issue with
CQL collections, anyone got any insight here?

(http://stackoverflow.com/questions/39218180/cql-collections-appear-slow)

I'm playing around with storing data in cassandra and I'm finding a
significant performance problem with CQL collections. I started with this
schema:

CREATE TABLE TEST (
  date DATE,
  tranche TEXT,
  id INT,
  properties MAP<TEXT,FLOAT>,
  PRIMARY KEY ((date,tranche), id))

if I run a query for all data in this partition

SELECT * FROM TEST where date = "2016-08-26" and tranche = "third"

tracing reports it takes ~1.3 seconds to load 15K rows. There are about 85
entries in the map. Wall clock time from python is ~5 seconds. This seems
really slow to load just one 'partition'

So I tried this schema instead and used message pack to store the entire
map in a single cell

CREATE TABLE TEST (
  date DATE,
  tranche TEXT,
  id INT,
  properties blob,
  PRIMARY KEY ((date,tranche), id))

Now the same query takes ~60ms (as reported by tracing) and ~500ms wall
clock time (again using python)

I get that there's more to do with the MAP version, but this seems like an
unexpected performance degradation.

One oddity I noticed while testing this was that in both cases tracing
reported it was returning 15K cells (which corresponds to the number of
rows). I'd expect this in the second schema, but my understanding was that
each element in a map was stored in it's own cell in current versions of
cassandra, so a bit surprised by this.

I'm using version 3.7 of cassandra and the datastax python drivers. Anyone
got any insight into what happening here?

-Ben

Reply via email to