[ https://issues.apache.org/jira/browse/CASSANDRA-20470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17937711#comment-17937711 ]
Nadav Har'El commented on CASSANDRA-20470: ------------------------------------------ As I said above, cql3/validation/operations/SelectTest.java has many tests that assume that SELECT without ORDER BY returns a specific order and not an unspecified order. I think the most enlightening example of why this assumption is usefu is in testClusteringOrderWithSlice(): This test creates a table with a descending WITH CLUSTERING ORDER BY, and then runs a SELECT query against this table without an explicit "ORDER BY". The expectation of the test is that the SELECT without ORDER BY will return the same on-disk order defined at table creation time (the descending order). The author of this test obviously assumed that the SELECT order is guaranteed to be the same on-disk order defined at table creation time, and an explicit ORDER BY is _not needed_ just to request this default order. > Document SELECT order without ORDER BY > -------------------------------------- > > Key: CASSANDRA-20470 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20470 > Project: Apache Cassandra > Issue Type: Improvement > Components: Documentation > Reporter: Nadav Har'El > Priority: Normal > > In SQL, doing a "SELECT" without an ORDER BY does not guarantee any order of > the results. > I've been using Cassandra for many years and until today, I was under the > impression that in Cassandra, things are different: I thought that SELECT > officially defaults to the "standard" order which sorts partitions according > to tokens (or the partition key in case of tie) and inside a partition rows > are sorted according to the clustering key(s). I was under the impression > that Cassandra users rarely, if ever, use "ORDER BY ASC(c)" in each SELECT > query - they just leave it out when wanting the default sort order. > I think this is impression is also hard-coded in some of the unit tests. For > example, cql3/validation/operations/SelectTest.java checks the result order > with various ORDER BY, but its first check actually omits the ORDER BY and > verifies that the results come in ascending clustering key order. > But today I realized that this isn't documented anywhere I can find! I > couldn't find any explicit documentation of what happens in a SELECT > *without* an "ORDER BY". Is the order completely unspecified (as in SQL) or > can users expect the default sort order is the "standard" one which I > mentioned above? My request in this issue is to make the answer to this > question clear and explicit in the documentation. > Given that Cassandra already goes to all the trouble to guarantee a specific > order of the rows on disk (and the on-disk part *is* documented), it seems > strange not to make the most efficient way to read it - in that same order - > be the default, and document that it is the default. But if this just "seems" > to be the default but for some reason is *not* the default, then this fact > should be documented, so users won't rely on it. Although to be honest, I'll > be surprised if most Cassandra users don't rely on this already - it seems I > always did... If this default sort order also can't be relied on, probably > the unit tests that assume it will need to be changed, to use > assertRowsIgnoringOrder() or something. > Interestingly, DynamoDB which whose data model is more-or-less identical to > Cassandra's, made the explicit decision that the default sort order always > applies to queries: Their clustering key is called a "sort key" and rows are > always sorted by it - e.g., > [https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html] > says "Query results are always sorted by the sort key value.". > Beyond the above, I think we also need to document how sort order applies > when SELECT uses a secondary index. When the query is restricted by an > indexed column, an explicit "ORDER BY" is outright forbidden (the error > message is "ORDER BY with 2ndary indexes is not supported, except for ANN > queries."), but more importantly it seems that although "usually" the > standard default sort order is maintained in such queries, this is not > guaranteed if one node returns a long list of results and the coordinator > decides not to even try other nodes. I think this fact can cause unpleasant > surprises for application developers - results will be sorted as expected > when testing on a a small installation, but when moving to a large many-node > cluster, suddenly the sort order will no longer be guaranteed. I think this > should be documented as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org