I created https://issues.apache.org/jira/browse/CASSANDRA-4915
On Mon, Nov 5, 2012 at 3:27 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: >> A remark like "maybe we just shouldn't allow that and leave that to the >> map-reduce side" would make sense, but I don't see how this is "misleading". > > Yes. Bingo. > > It is misleading because it is not useful in any other context besides > someone playing around with a ten row table in cqlsh. CQL stops me > from executing some queries that are not efficient, yet it allows this > one. If I am new to Cassandra and developing, this query works and > produces a result then once my database gets real data produces a > different result (likely an empty one). > > When I first saw this query two things came to my mind. > > 1) CQL (and Cassandra) must be somehow indexing all the fields of a > primary key to make this search optimal. > > 2) This is impossible CQL must be gathering the first hundred random > rows and finding this thing. > > What it is happening is case #2. In a nutshell CQL is just sampling > some data and running the query on it. We could support all types of > query constructs if we just take the first 100 rows and apply this > logic to it, but these things are not helpful for anything but light > ad-hoc data exploration. > > My suggestions: > 1) force people to supply a LIMIT clause on any query that is going to > page over get_range_slice > 2) having some type of explain support so I can establish if this > query will work in the > > I say this because as an end user I do not understand if a given query > is actually going to return the same results with different data. > > On Mon, Nov 5, 2012 at 1:40 PM, Sylvain Lebresne <sylv...@datastax.com> wrote: >> >> On Mon, Nov 5, 2012 at 6:55 PM, Edward Capriolo <edlinuxg...@gmail.com> >> wrote: >>> >>> I see. It is fairly misleading because it is a query that does not >>> work at scale. This syntax is only helpful if you have less then a few >>> thousand rows in Cassandra. >> >> >> Just for the sake of argument, how is that misleading? If you have billions >> of rows and do the select statement from you initial mail, what did the >> syntax lead you to believe it would return? >> >> A remark like "maybe we just shouldn't allow that and leave that to the >> map-reduce side" would make sense, but I don't see how this is "misleading". >> >> But again, this translate directly to a get_range_slice (that don't scale if >> you have billion of rows and don't limit the output either) so there is >> nothing new here.