Hey Michael, Thanks for the response. If I use the clustered columns in the way you described, won't that make the row key of the column family scopeId and scopeType?
The scope fields represent a graph's owner. The graph itself can have several billion nodes in it. When a lot of deletes start occurring on the same graph, I will quickly saturate the row capacity of a column family if the physical row key is only the scope. This is why I have each node on its own row key. As long as our cluster has the capacity to handle the load, we won't hit the upper bounds of the maximum columns in a row. I'm new to CQL in our code. I've only been using it for administration. I've been using the thrift interface in code since the 0.6 days. I feel I have a strong understanding of the internals of the column family structure. I'm struggling to find documentation on the CQL to physical layout that isn't a trivial example, especially are around multiget use cases. Do you have any pointers to blogs or tutorials you've found helpful? Thanks, Todd On Sunday, August 31, 2014, Laing, Michael <michael.l...@nytimes.com> wrote: > Actually I think you do want to use scopeId, scopeType as the partition > key (and drop row caching until you upgrade to 2.1 where "rows" are in fact > rows and not partitions): > > CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes > ( > scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar, > timestamp bigint, > PRIMARY KEY ((scopeId , scopeType), nodeId, nodeType) > ); > > Then you can select using IN on the cartesian product of your clustering > keys: > > SELECT timestamp > FROM Graph_Marked_Nodes > WHERE scopeId = ? > AND scopeType = ? > AND (nodeId, nodeType) IN ( > (uuid1, 'foo'), (uuid1, 'bar'), > (uuid2, 'foo'), (uuid2, 'bar'), > (uuid3, 'foo'), (uuid3, 'bar') > ); > > ml > > PS Of course you could boldly go to 2.1 now for a nice performance boost :) > > > > > On Sat, Aug 30, 2014 at 8:59 PM, Todd Nine <toddn...@apache.org > <javascript:_e(%7B%7D,'cvml','toddn...@apache.org');>> wrote: > >> Hi all, >> I'm working on transferring our thrift DAOs over to CQL. It's going >> well, except for 2 cases that both use multi get. The use case is very >> simple. It is a narrow row, by design, with only a few columns. When I >> perform a multiget, I need to get up to 1k rows at a time. I do not want >> to turn these into a wide row using scopeId and scopeType as the row key. >> >> >> On the physical level, my Column Family needs something similar to the >> following format. >> >> >> scopeId, scopeType, nodeId, nodeType :{ timestamp: 0x00 } >> >> >> I've defined by table with the following CQL. >> >> >> CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes >> ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar, >> timestamp bigint, >> PRIMARY KEY ((scopeId , scopeType, nodeId, nodeType)) >> )WITH caching = 'all' >> >> >> This works well for inserts deletes and single reads. I always know the >> scopeId, scopeType, nodeId, and nodeType, so I want to return the timestamp >> columns. I thought I could use the IN operation and specify the pairs of >> nodeId and nodeTypes I have as input, however this doesn't work. >> >> Can anyone give me a suggestion on how to perform a multiget when I have >> several values for the nodeId and the nodeType? This read occurs on every >> read of edges so making 1k trips is not going to work from a performance >> perspective. >> >> Below is the query I've tried. >> >> SELECT timestamp FROM Graph_Marked_Nodes WHERE scopeId = ? AND scopeType >> = ? AND nodeId IN (uuid1, uuid2, uuid3) AND nodeType IN ('foo','bar') >> >> I've found this issue, which looks like it's a solution to my problem. >> >> https://issues.apache.org/jira/browse/CASSANDRA-6875 >> >> However, I'm not able to get the syntax in the issue description to work >> either. Any input would be appreciated! >> >> Cassandra: 2.0.10 >> Datastax Driver: 2.1.0 >> >> Thanks, >> Todd >> >> >> >> >> >