Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

Todd Nine Sun, 31 Aug 2014 16:50:37 -0700

Hey Jack,
  I don't want to use Titan, though it's a great product.  IIRC, they
require Order Preserving Partitioner to load edges that span multiple row
keys for their Cassandra storage engine.


Also, the graph is embedded as part of this open source product.

http://usergrid.incubator.apache.org/

I still have my original question, how can you perform multiget equivalents
in CQL?  So far, it seems like the feature I need is missing, and is still
only available via thrift.  If I'm missing the solution, please let me know.

Thanks,
Todd


On Sun, Aug 31, 2014 at 1:07 PM, Jack Krupansky <j...@basetechnology.com>
wrote:

>   You might want to take a look at Titan, a graph database that can use
> Cassandra as its storage engine, and see how it does these things.
>
> -- Jack Krupansky
>
>  *From:* Todd Nine <todd.n...@gmail.com>
> *Sent:* Sunday, August 31, 2014 11:06 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10
>
> Hey Michael,
>    Thanks for the response.  If I use the clustered columns in the way you
> described, won't that make the row key of the column family scopeId and
> scopeType?
>
> The scope fields represent a graph's owner.  The graph itself can have
> several billion nodes in it.  When a lot of deletes start occurring on the
> same graph, I will quickly saturate the row capacity of a column family if
> the physical row key is only the scope.
>
> This is why I have each node on its own row key.  As long as our cluster
> has the capacity to handle the load, we won't hit the upper bounds of the
> maximum columns in a row.
>
> I'm new to CQL in our code.  I've only been using it for administration.
> I've been using the thrift interface in code since the 0.6 days.
>
> I feel I have a strong understanding of the internals of the column family
> structure.   I'm struggling to find documentation on the CQL to physical
> layout that isn't a trivial example, especially are around multiget use
> cases.  Do you have any pointers to blogs or tutorials you've found
> helpful?
>
> Thanks,
> Todd
>
> On Sunday, August 31, 2014, Laing, Michael <michael.l...@nytimes.com>
> wrote:
>
>> Actually I think you do want to use scopeId, scopeType as the partition
>> key (and drop row caching until you upgrade to 2.1 where "rows" are in fact
>> rows and not partitions):
>>
>>  CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes
>> (
>>     scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar,
>> timestamp bigint,
>>     PRIMARY KEY ((scopeId , scopeType), nodeId, nodeType)
>> );
>>
>> Then you can select using IN on the cartesian product of your clustering
>> keys:
>>
>>  SELECT timestamp
>> FROM  Graph_Marked_Nodes
>> WHERE scopeId = ?
>> AND scopeType = ?
>> AND (nodeId, nodeType) IN (
>>     (uuid1, 'foo'), (uuid1, 'bar'),
>>     (uuid2, 'foo'), (uuid2, 'bar'),
>>     (uuid3, 'foo'), (uuid3, 'bar')
>> );
>>
>> ml
>>
>> PS Of course you could boldly go to 2.1 now for a nice performance boost
>> :)
>>
>>
>>
>>
>> On Sat, Aug 30, 2014 at 8:59 PM, Todd Nine <
>> javascript:_e(%7B%7D,'cvml','toddn...@apache.org');> wrote:
>>
>>> Hi all,
>>>   I'm working on transferring our thrift DAOs over to CQL.  It's going
>>> well, except for 2 cases that both use multi get.  The use case is very
>>> simple.  It is a narrow row, by design, with only a few columns.  When I
>>> perform a multiget, I need to get up to 1k rows at a time.  I do not want
>>> to turn these into a wide row using scopeId and scopeType as the row key.
>>>
>>>
>>> On the physical level, my Column Family needs something similar to the
>>> following format.
>>>
>>>
>>> scopeId, scopeType, nodeId, nodeType :{ timestamp: 0x00 }
>>>
>>>
>>> I've defined by table with the following CQL.
>>>
>>>
>>>  CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes
>>> ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar,
>>> timestamp bigint,
>>> PRIMARY KEY ((scopeId , scopeType, nodeId, nodeType))
>>> )WITH caching = 'all'
>>>
>>>
>>> This works well for inserts deletes and single reads.  I always know the
>>> scopeId, scopeType, nodeId, and nodeType, so I want to return the timestamp
>>> columns.  I thought I could use the IN operation and specify the pairs of
>>> nodeId and nodeTypes I have as input, however this doesn't work.
>>>
>>> Can anyone give me a suggestion on how to perform a multiget when I have
>>> several values for the nodeId and the nodeType?  This read occurs on every
>>> read of edges so making 1k trips is not going to work from a performance
>>> perspective.
>>>
>>> Below is the query I've tried.
>>>
>>> SELECT timestamp FROM  Graph_Marked_Nodes WHERE scopeId = ? AND
>>> scopeType = ? AND nodeId IN (uuid1, uuid2, uuid3) AND nodeType IN
>>> ('foo','bar')
>>>
>>> I've found this issue, which looks like it's a solution to my problem.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-6875
>>>
>>> However, I'm not able to get the syntax in the issue description to work
>>> either.  Any input would be appreciated!
>>>
>>> Cassandra: 2.0.10
>>> Datastax Driver: 2.1.0
>>>
>>> Thanks,
>>> Todd
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

Reply via email to