Re: Hive query on ORC table is really slow compared to Presto

2017-06-12 Thread Gopal Vijayaraghavan
Hi, I think this is worth fixing because this seems to be triggered by the data quality itself - so let me dig in a bit into a couple more scenarios. > hive.optimize.distinct.rewrite is True by default FYI, we're tackling the count(1) + count(distinct col) case in the Optimizer now (which came

Re: Hive query on ORC table is really slow compared to Presto

2017-06-12 Thread Michael Segel
Silly question… What about using COUNT() and a GROUP BY() instead? I’m going from memory…. this may or may not work. Since you want the row_id only in order to de-dupe, right? On Jun 12, 2017, at 3:59 PM, Premal Shah mailto:premal.j.s...@gmail.com>> wrote: Thanx Gopal. Sorry, took me a few d

Re: Hive query on ORC table is really slow compared to Presto

2017-06-12 Thread Premal Shah
Thanx Gopal. Sorry, took me a few days to respond. Here are some findings. hive.optimize.distinct.rewrite is True by default I do see Reducer 2 + 3. However, this might be worth mentioning. The distinct query on an ORC table takes a ton of time. I created a table with the TEXTFILE format from th

maven compiling cannot resolve project.parent.basedir

2017-06-12 Thread wuchang
I am using maven to compile apache-hive-2.1.1-src for debug reason ,I use -X paremeter to print out the debug information. but finally , the compilation failed: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.3:single (assemble) on project hive-packaging: Failed