debug on hive

2017-06-14 Thread vergil
Hi, Because I am going to rewrite and change some source code based on our demands,I need to debug on hive in eclipse.But it do not works refer to https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide and https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ. Anyone could

Re: Hive query on ORC table is really slow compared to Presto

2017-06-14 Thread Gopal Vijayaraghavan
> SELECT COUNT(DISTINCT ip) FROM table - 71 seconds > SELECT COUNT(DISTINCT id) FROM table - 12,399 seconds Ok, I misunderstood your gist. > While ip is more unique that id, ip runs many times faster than id. > > How can I debug this ? Nearly the same way - just replace "ip" with "id" in my exp

Re: Hive query on ORC table is really slow compared to Presto

2017-06-14 Thread Premal Shah
Hi Gopal, Thanx for the reply. I just want to clarify a few things. 1. The count distinct ip query runs fast and so it's not a problem 2. I would not expect the ip column to use DICTIONARY encoding too 3. I am more concerned about the count distinct id or count distinct master_id column which if