Aaron,   thanks for the super-rapid response.    That clarifies a lot for me,
but I think I am still wondering about one point embedded below.

________________________________
> From: aa...@thelastpickle.com 
> Subject: Re: is the select result grouped by the value of the partition key? 
> Date: Thu, 12 Sep 2013 14:19:06 +1200 
> To: user@cassandra.apache.org 
>  
> GROUP BY "feature", 
> I would not think of it like that, this is about physical order of rows. 
>  
> since it seems really important yet does not seem to be mentioned in the 
> CQL reference documentation. 
> It's baked in, this is how the data is organised on the row. 

Yes,   I see,   and I absolutely get the relevance of where columns are stored 
on disk to,
say,  doing INSERTs.
But what I am wondering about is,  in the context of a SELECT,    we seem to be 
relying on
the Cassandra client api preserving that on-disk order while returning rows.
My high-level understanding of how Cassandra handles a SELECT is that :
      (excuse incorrect terminology)
  1.  client connects to some node N
  2.  node N acts as a kind of coordinator and fires off the thrift or 
binary-protocol messages
      to all other nodes to fetch rows off the memtables and/or disks
  3.   coordinator merges,  truncates,  etc the sets from the nodes and returns 
one answer set to client.

It is step 3 which has me wondering  -   does it explicitly preserve the 
on-disk order?
In fact  -  does it simply keep each individual node's answer set separate?   
Is that how it works?

>  
> http://www.datastax.com/dev/blog/thrift-to-cql3 
> We often say the PRIMARY KEY is the PARTITION KEY and the GROUPING COLUMNS 
> http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_reference/create_table_r.html
>  
>  
> See also http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html 
>  
> Is it something we can bet the farm and farmer's family on? 
> Sure. 
>  
> The kinds of scenarios where I am wondering if it's possible for  
> partition-key groups 
> to get intermingled are : 
> All instances of the table entity with the same value(s) for the  
> PARTITION KEY portion of the PRIMARY KEY existing in the same storage  
> engine row. 
>  
>    .   what if the node containing primary copy of a row is down 
> There is no primary copy of a row. 
>  
>    .   what if there is a heavy stream of UPDATE activity from  
> applications which 
>        connect to all nodes,   causing different nodes to have different  
> versions of replicas of same row? 
> That's fine with me. 
> It's only an issue when the data is read, and at that point the  
> Consistency Level determines what we do. 
>  
> Hope that helps. 
>  
>  
> ----------------- 
> Aaron Morton 
> New Zealand 
> @aaronmorton 
>  
> Co-Founder & Principal Consultant 
> Apache Cassandra Consulting 
> http://www.thelastpickle.com 
>  
> On 12/09/2013, at 7:43 AM, John Lumby  
> <johnlu...@hotmail.com<mailto:johnlu...@hotmail.com>> wrote: 
>  
> I would like to make quite sure about this implicit GROUP BY "feature", 
>  
> since it seems really important yet does not seem to be mentioned in the 
> CQL reference documentation. 
>  
>  
>  
> Aaron,   you said "yes"  --   is that "yes,  always,   in all scenarios  
> no matter what" 
>  
> or "yes usually"?      Is it something we can bet the farm and farmer's  
> family on? 
>  
>  
>  
> The kinds of scenarios where I am wondering if it's possible for  
> partition-key groups 
> to get intermingled are : 
>  
>  
>  
>    .   what if the node containing primary copy of a row is down 
>                  and 
> cassandra fetches this row from a replica on a different node 
>                 (e.g.  with CONSISTENCY ONE) 
>  
>    .   what if there is a heavy stream of UPDATE activity from  
> applications which 
>        connect to all nodes,   causing different nodes to have different  
> versions of replicas of same row? 
>  
>  
>  
> Can you point me to some place in the cassandra source code where this  
> grouping is ensured? 
>  
>  
>  
> Many thanks, 
>  
> John Lumby 
>                                         

Reply via email to