Re: [Rust][DataFusion] Inconsistent array ordering with "GROUP BY" SQL

2021-02-21 Thread Adam Hooper
On Sun, Feb 21, 2021 at 5:27 AM Andrew Lamb wrote: > For what it is worth, my experience with some SQL databases has been the > opposite -- ordering can and does differ from statement to statement if the > clause has a GROUP BY but no ORDER BY). > Is this a security issue? If the GROUP BY result

Re: [Rust][DataFusion] Inconsistent array ordering with "GROUP BY" SQL

2021-02-21 Thread Andrew Lamb
For what it is worth, my experience with some SQL databases has been the opposite -- ordering can and does differ from statement to statement if the clause has a GROUP BY but no ORDER BY). As Andy mentioned, the core reason is performance -- as can require additional computation to ensure the resu

Re: [Rust][DataFusion] Inconsistent array ordering with "GROUP BY" SQL

2021-02-20 Thread Marc Prud'hommeaux
I understand that GROUP BY ought not imply any particular ordering; it's just that working with other SQL databases, I've come to expect that ordering will be consistent between multiple runs of the same statement, at least within the context of a single transaction on a single connection. I

Re: [Rust][DataFusion] Inconsistent array ordering with "GROUP BY" SQL

2021-02-20 Thread Andy Grove
The SQL standard in general makes no guarantee of the order of resulting data unless there is an explicit ORDER BY clause. I would guess that there are two factors in play here: 1. The use of hash-based data structures, as you mention 2. If you have partitioned data then it is processed on multip

[Rust][DataFusion] Inconsistent array ordering with "GROUP BY" SQL

2021-02-20 Thread Marc Prud'hommeaux
When I group by a column in DataFusion SQL, the order of the results is different every time. For example, "select country from data group by country" against https://github.com/Teradata/kylo/blob/master/samples/sample-data/csv/userdata3.csv might return "Moldova" first one time, and then "Swed