Re: Article on the correctness of Hive on MR3, Presto, and Impala

Sungwoo Park Wed, 26 Jun 2019 09:52:09 -0700

I think yes -- if you would like to scrutinize the results, perhaps sorting
and conducting diff would be the best way. If you would like to test the
results quickly with a bit of uncertainty allowed, I guess comparing the
number of rows would be sufficient because two different results are
unlikely to contain the same number of rows, e.g., 440704 rows.


--- Sungwoo

On Wed, Jun 26, 2019 at 9:01 PM Edward Capriolo <[email protected]>
wrote:

> I like the approach of applying an arbitrary limit. Hive's q files tend to
> add an ordering to everything. Would it make sense to simply order by
> multiple columns in the result set and conduct a large diff on them?
>
> On Wednesday, June 26, 2019, Sungwoo Park <[email protected]> wrote:
>
>> I have published a new article on the correctness of Hive on MR3, Presto,
>> and Impala:
>>
>>
>> https://mr3.postech.ac.kr/blog/2019/06/26/correctness-hivemr3-presto-impala/
>>
>> Hope you enjoy reading the article.
>>
>> --- Sungwoo
>>
>>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

Re: Article on the correctness of Hive on MR3, Presto, and Impala

Reply via email to