On Sun, Aug 31, 2014 at 2:59 AM, Todd Nine <toddn...@apache.org> wrote:

> Hi all,
>   I'm working on transferring our thrift DAOs over to CQL.  It's going
> well, except for 2 cases that both use multi get.  The use case is very
> simple.  It is a narrow row, by design, with only a few columns.  When I
> perform a multiget, I need to get up to 1k rows at a time.  I do not want
> to turn these into a wide row using scopeId and scopeType as the row key.
>
>
> On the physical level, my Column Family needs something similar to the
> following format.
>
>
> scopeId, scopeType, nodeId, nodeType :{ timestamp: 0x00 }
>
>
> I've defined by table with the following CQL.
>
>
> CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes
> ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar,
> timestamp bigint,
> PRIMARY KEY ((scopeId , scopeType, nodeId, nodeType))
> )WITH caching = 'all'
>
>
> This works well for inserts deletes and single reads.  I always know the
> scopeId, scopeType, nodeId, and nodeType, so I want to return the timestamp
> columns.  I thought I could use the IN operation and specify the pairs of
> nodeId and nodeTypes I have as input, however this doesn't work.
>
> Can anyone give me a suggestion on how to perform a multiget when I have
> several values for the nodeId and the nodeType?  This read occurs on every
> read of edges so making 1k trips is not going to work from a performance
> perspective.
>
> Below is the query I've tried.
>
> SELECT timestamp FROM  Graph_Marked_Nodes WHERE scopeId = ? AND scopeType
> = ? AND nodeId IN (uuid1, uuid2, uuid3) AND nodeType IN ('foo','bar')
>

This is not supported by CQL currently. We do support the equivalent of
multiget in CQL through IN, but it's slightly limited in the case of
compound partition keys in that you can only use a IN on the last column of
such compound partition key currently (here, that's nodeType). There is no
good reason for that limitation outside of historical ones and I've opened
CASSANDRA-7855 to fix it.

That being said, I would argue that it's hardly a big deal since using
multiget has always been slightly frown upon. Multiget doesn't do much
optimization, the only thing it does is that it parallelize the queries on
the coordinator, which is something you can do as easily client side. And
doing it client side has a few advantages: you will get the result for each
partition as soon as it's performed, which can allow you to process things
sooner. Also, a multi-get is more likely to timeout that splitting it to
individual queries (and having only one of the subquery that timeout means
you don't get any result at all). Lastly, while doing the parallelization
server side will use a tiny bit more network traffic between the client and
coordinator, you will save intra-cluster traffic provided you have a
token-aware client (because each query will be properly routed).

--
Sylvain


>
> I've found this issue, which looks like it's a solution to my problem.
>
> https://issues.apache.org/jira/browse/CASSANDRA-6875
>
> However, I'm not able to get the syntax in the issue description to work
> either.  Any input would be appreciated!
>
> Cassandra: 2.0.10
> Datastax Driver: 2.1.0
>
> Thanks,
> Todd
>
>
>
>
>

Reply via email to