I'm using HDP 2.5.0 with 1.2.1 Hive.
Performing some tests I noticed that my query works better if I don't use
limit clause.

My query is:

insert into table *results_table *partition (task_id=xxx)
select * from *data_table *
where dt=20171102
and .....
limit 1000000


This query runs in about 30 seconds, but without limit clause I can get
about 20 seconds.

Query execution plan with limit <https://pastebin.com/Cmp2rPNr>, and without
<https://pastebin.com/z1ps2EhG>.


I can't remove limit clause because in some cases there's to much results
and I don't whant to store them all in result table.
Why limit affects performance so much?  Intuitively, it seems that with
limit clause it should work faster. What can I do to improve prefomance?

Reply via email to