I think yes -- if you would like to scrutinize the results, perhaps sorting and conducting diff would be the best way. If you would like to test the results quickly with a bit of uncertainty allowed, I guess comparing the number of rows would be sufficient because two different results are unlikely to contain the same number of rows, e.g., 440704 rows.
--- Sungwoo On Wed, Jun 26, 2019 at 9:01 PM Edward Capriolo <edlinuxg...@gmail.com> wrote: > I like the approach of applying an arbitrary limit. Hive's q files tend to > add an ordering to everything. Would it make sense to simply order by > multiple columns in the result set and conduct a large diff on them? > > On Wednesday, June 26, 2019, Sungwoo Park <glap...@gmail.com> wrote: > >> I have published a new article on the correctness of Hive on MR3, Presto, >> and Impala: >> >> >> https://mr3.postech.ac.kr/blog/2019/06/26/correctness-hivemr3-presto-impala/ >> >> Hope you enjoy reading the article. >> >> --- Sungwoo >> >> > > -- > Sorry this was sent from mobile. Will do less grammar and spell check than > usual. >