You might want to take a look at Titan, a graph database that can use Cassandra 
as its storage engine, and see how it does these things.

-- Jack Krupansky

From: Todd Nine 
Sent: Sunday, August 31, 2014 11:06 AM
To: user@cassandra.apache.org 
Subject: Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

Hey Michael, 
   Thanks for the response.  If I use the clustered columns in the way you 
described, won't that make the row key of the column family scopeId and 
scopeType?  

The scope fields represent a graph's owner.  The graph itself can have several 
billion nodes in it.  When a lot of deletes start occurring on the same graph, 
I will quickly saturate the row capacity of a column family if the physical row 
key is only the scope.  

This is why I have each node on its own row key.  As long as our cluster has 
the capacity to handle the load, we won't hit the upper bounds of the maximum 
columns in a row. 

I'm new to CQL in our code.  I've only been using it for administration.  I've 
been using the thrift interface in code since the 0.6 days.  

I feel I have a strong understanding of the internals of the column family 
structure.   I'm struggling to find documentation on the CQL to physical layout 
that isn't a trivial example, especially are around multiget use cases.  Do you 
have any pointers to blogs or tutorials you've found helpful? 

Thanks,
Todd

On Sunday, August 31, 2014, Laing, Michael <michael.l...@nytimes.com> wrote:

  Actually I think you do want to use scopeId, scopeType as the partition key 
(and drop row caching until you upgrade to 2.1 where "rows" are in fact rows 
and not partitions): 

  CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes 
  (
      scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar, timestamp 
bigint,  
      PRIMARY KEY ((scopeId , scopeType), nodeId, nodeType) 
  );

  Then you can select using IN on the cartesian product of your clustering keys:

  SELECT timestamp 
  FROM  Graph_Marked_Nodes 
  WHERE scopeId = ? 
  AND scopeType = ? 
  AND (nodeId, nodeType) IN (
      (uuid1, 'foo'), (uuid1, 'bar'), 
      (uuid2, 'foo'), (uuid2, 'bar'), 
      (uuid3, 'foo'), (uuid3, 'bar')
  );

  ml

  PS Of course you could boldly go to 2.1 now for a nice performance boost :)





  On Sat, Aug 30, 2014 at 8:59 PM, Todd Nine 
<javascript:_e(%7B%7D,'cvml','toddn...@apache.org');> wrote:

    Hi all, 
      I'm working on transferring our thrift DAOs over to CQL.  It's going 
well, except for 2 cases that both use multi get.  The use case is very simple. 
 It is a narrow row, by design, with only a few columns.  When I perform a 
multiget, I need to get up to 1k rows at a time.  I do not want to turn these 
into a wide row using scopeId and scopeType as the row key.


    On the physical level, my Column Family needs something similar to the 
following format.


    scopeId, scopeType, nodeId, nodeType :{ timestamp: 0x00 }


    I've defined by table with the following CQL.


    CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes 
    ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar, timestamp 
bigint,  
    PRIMARY KEY ((scopeId , scopeType, nodeId, nodeType)) 
    )WITH caching = 'all'


    This works well for inserts deletes and single reads.  I always know the 
scopeId, scopeType, nodeId, and nodeType, so I want to return the timestamp 
columns.  I thought I could use the IN operation and specify the pairs of 
nodeId and nodeTypes I have as input, however this doesn't work.  

    Can anyone give me a suggestion on how to perform a multiget when I have 
several values for the nodeId and the nodeType?  This read occurs on every read 
of edges so making 1k trips is not going to work from a performance perspective.

    Below is the query I've tried.

    SELECT timestamp FROM  Graph_Marked_Nodes WHERE scopeId = ? AND scopeType = 
? AND nodeId IN (uuid1, uuid2, uuid3) AND nodeType IN ('foo','bar')

    I've found this issue, which looks like it's a solution to my problem.

    https://issues.apache.org/jira/browse/CASSANDRA-6875


    However, I'm not able to get the syntax in the issue description to work 
either.  Any input would be appreciated!

    Cassandra: 2.0.10
    Datastax Driver: 2.1.0

    Thanks,
    Todd




Reply via email to