The map version of the schema needs to deserialize, serialize, and then
deserialize about 85 times more cells, if your average map has 85
elements.  I would assume that's where most of the performance slowdown is
coming from.  If you can take the time to run that through a profiler, that
would be useful to see if there is some unexpected inefficiency.

I'll also point out that you could use a frozen map (e.g. frozen<map<text,
float>>) and you'd probably get performance that's somewhere in the middle
of the other two approaches.

On Tue, Aug 30, 2016 at 8:00 PM, Ben Frank <b...@airlust.com> wrote:

> Hi all, I posted this question on stackoverflow - I'm having an issue with
> CQL collections, anyone got any insight here?
>
> (http://stackoverflow.com/questions/39218180/cql-collections-appear-slow)
>
> I'm playing around with storing data in cassandra and I'm finding a
> significant performance problem with CQL collections. I started with this
> schema:
>
> CREATE TABLE TEST (
>   date DATE,
>   tranche TEXT,
>   id INT,
>   properties MAP<TEXT,FLOAT>,
>   PRIMARY KEY ((date,tranche), id))
>
> if I run a query for all data in this partition
>
> SELECT * FROM TEST where date = "2016-08-26" and tranche = "third"
>
> tracing reports it takes ~1.3 seconds to load 15K rows. There are about 85
> entries in the map. Wall clock time from python is ~5 seconds. This seems
> really slow to load just one 'partition'
>
> So I tried this schema instead and used message pack to store the entire
> map in a single cell
>
> CREATE TABLE TEST (
>   date DATE,
>   tranche TEXT,
>   id INT,
>   properties blob,
>   PRIMARY KEY ((date,tranche), id))
>
> Now the same query takes ~60ms (as reported by tracing) and ~500ms wall
> clock time (again using python)
>
> I get that there's more to do with the MAP version, but this seems like an
> unexpected performance degradation.
>
> One oddity I noticed while testing this was that in both cases tracing
> reported it was returning 15K cells (which corresponds to the number of
> rows). I'd expect this in the second schema, but my understanding was that
> each element in a map was stored in it's own cell in current versions of
> cassandra, so a bit surprised by this.
>
> I'm using version 3.7 of cassandra and the datastax python drivers. Anyone
> got any insight into what happening here?
>
> -Ben
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Reply via email to