Hi,
I think this is worth fixing because this seems to be triggered by the data
quality itself - so let me dig in a bit into a couple more scenarios.
> hive.optimize.distinct.rewrite is True by default
FYI, we're tackling the count(1) + count(distinct col) case in the Optimizer
now (which came
Silly question…
What about using COUNT() and a GROUP BY() instead?
I’m going from memory…. this may or may not work. Since you want the row_id
only in order to de-dupe, right?
On Jun 12, 2017, at 3:59 PM, Premal Shah
mailto:premal.j.s...@gmail.com>> wrote:
Thanx Gopal.
Sorry, took me a few d
Thanx Gopal.
Sorry, took me a few days to respond. Here are some findings.
hive.optimize.distinct.rewrite is True by default
I do see Reducer 2 + 3.
However, this might be worth mentioning. The distinct query on an ORC table
takes a ton of time. I created a table with the TEXTFILE format from th
I am using maven to compile apache-hive-2.1.1-src for debug reason ,I use -X
paremeter to print out the debug information. but finally , the compilation
failed:
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-assembly-plugin:2.3:single (assemble) on project
hive-packaging: Failed