[ 
https://issues.apache.org/jira/browse/CASSANDRA-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937711#comment-17937711
 ] 

Nadav Har'El commented on CASSANDRA-20470:
------------------------------------------

As I said above, cql3/validation/operations/SelectTest.java has many tests that 
assume that SELECT without ORDER BY returns a specific order and not an 
unspecified order. I think the most enlightening example of why this assumption 
is usefu is in testClusteringOrderWithSlice():

This test creates a table with a descending WITH CLUSTERING ORDER BY, and then 
runs a SELECT query against this table without an explicit "ORDER BY". The 
expectation of the test is that the SELECT without ORDER BY will return the 
same on-disk order defined at table creation time (the descending order). The 
author of this test obviously assumed that the SELECT order is guaranteed to be 
the same on-disk order defined at table creation time, and an explicit ORDER BY 
is _not needed_ just to request this default order.

> Document SELECT order without ORDER BY
> --------------------------------------
>
>                 Key: CASSANDRA-20470
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20470
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Nadav Har'El
>            Priority: Normal
>
> In SQL, doing a "SELECT" without an ORDER BY does not guarantee any order of 
> the results.
> I've been using Cassandra for many years and until today, I was under the 
> impression that in Cassandra, things are different: I thought that SELECT 
> officially defaults to the "standard" order which sorts partitions according 
> to tokens (or the partition key in case of tie) and inside a partition rows 
> are sorted according to the clustering key(s). I was under the impression 
> that Cassandra users rarely, if ever, use "ORDER BY ASC(c)" in each SELECT 
> query - they just leave it out when wanting the default sort order.
> I think this is impression is also hard-coded in some of the unit tests. For 
> example, cql3/validation/operations/SelectTest.java checks the result order 
> with various ORDER BY, but its first check actually omits the ORDER BY and 
> verifies that the results come in ascending clustering key order. 
> But today I realized that this isn't documented anywhere I can find! I 
> couldn't find any explicit documentation of what happens in a SELECT 
> *without* an "ORDER BY". Is the order completely unspecified (as in SQL) or 
> can users expect the default sort order is the "standard" one which I 
> mentioned above? My request in this issue is to make the answer to this 
> question clear and explicit in the documentation.
> Given that Cassandra already goes to all the trouble to guarantee a specific 
> order of the rows on disk (and the on-disk part *is* documented), it seems 
> strange not to make the most efficient way to read it - in that same order - 
> be the default, and document that it is the default. But if this just "seems" 
> to be the default but for some reason is *not* the default, then this fact 
> should be documented, so users won't rely on it. Although to be honest, I'll 
> be surprised if most Cassandra users don't rely on this already - it seems I 
> always did... If this default sort order also can't be relied on, probably 
> the unit tests that assume it will need to be changed, to use 
> assertRowsIgnoringOrder() or something.
> Interestingly, DynamoDB which whose data model is more-or-less identical to 
> Cassandra's, made the explicit decision that the default sort order always 
> applies to queries: Their clustering key is called a "sort key" and rows are 
> always sorted by it - e.g., 
> [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html]
>  says "Query results are always sorted by the sort key value.".
> Beyond the above, I think we also need to document how sort order applies 
> when SELECT uses a secondary index. When the query is restricted by an 
> indexed column, an explicit "ORDER BY" is outright forbidden (the error 
> message is "ORDER BY with 2ndary indexes is not supported, except for ANN 
> queries."), but more importantly it seems that although "usually" the 
> standard default sort order is maintained in such queries, this is not 
> guaranteed if one node returns a long list of results and the coordinator 
> decides not to even try other nodes. I think this fact can cause unpleasant 
> surprises for application developers - results will be sorted as expected 
> when testing on a a small installation, but when moving to a large many-node 
> cluster, suddenly the sort order will no longer be guaranteed. I think this 
> should be documented as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to