Thanks Tyler!
    I wasn't aware of frozen collections - the tracing shows pretty similar
timing characteristics between frozen collection and binary schemas.
Interestingly it's still dog slow while (presumably) doing the
deserialization in python, so although the trace reports good results it's
still taking ~3 seconds to load data into python wall clock time. Anyway -
thanks for the answer, really appreciate it.

-Ben

On Wed, Aug 31, 2016 at 8:55 AM, Tyler Hobbs <ty...@datastax.com> wrote:

> The map version of the schema needs to deserialize, serialize, and then
> deserialize about 85 times more cells, if your average map has 85
> elements.  I would assume that's where most of the performance slowdown is
> coming from.  If you can take the time to run that through a profiler, that
> would be useful to see if there is some unexpected inefficiency.
>
> I'll also point out that you could use a frozen map (e.g. frozen<map<text,
> float>>) and you'd probably get performance that's somewhere in the middle
> of the other two approaches.
>
> On Tue, Aug 30, 2016 at 8:00 PM, Ben Frank <b...@airlust.com> wrote:
>
> > Hi all, I posted this question on stackoverflow - I'm having an issue
> with
> > CQL collections, anyone got any insight here?
> >
> > (http://stackoverflow.com/questions/39218180/cql-collections-appear-slow
> )
> >
> > I'm playing around with storing data in cassandra and I'm finding a
> > significant performance problem with CQL collections. I started with this
> > schema:
> >
> > CREATE TABLE TEST (
> >   date DATE,
> >   tranche TEXT,
> >   id INT,
> >   properties MAP<TEXT,FLOAT>,
> >   PRIMARY KEY ((date,tranche), id))
> >
> > if I run a query for all data in this partition
> >
> > SELECT * FROM TEST where date = "2016-08-26" and tranche = "third"
> >
> > tracing reports it takes ~1.3 seconds to load 15K rows. There are about
> 85
> > entries in the map. Wall clock time from python is ~5 seconds. This seems
> > really slow to load just one 'partition'
> >
> > So I tried this schema instead and used message pack to store the entire
> > map in a single cell
> >
> > CREATE TABLE TEST (
> >   date DATE,
> >   tranche TEXT,
> >   id INT,
> >   properties blob,
> >   PRIMARY KEY ((date,tranche), id))
> >
> > Now the same query takes ~60ms (as reported by tracing) and ~500ms wall
> > clock time (again using python)
> >
> > I get that there's more to do with the MAP version, but this seems like
> an
> > unexpected performance degradation.
> >
> > One oddity I noticed while testing this was that in both cases tracing
> > reported it was returning 15K cells (which corresponds to the number of
> > rows). I'd expect this in the second schema, but my understanding was
> that
> > each element in a map was stored in it's own cell in current versions of
> > cassandra, so a bit surprised by this.
> >
> > I'm using version 3.7 of cassandra and the datastax python drivers.
> Anyone
> > got any insight into what happening here?
> >
> > -Ben
> >
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Reply via email to