Re: How to optimize multiple count( distinct col) in Hive SQL

2017-09-03 Thread panfei
-24 9:42 GMT+08:00 panfei : > by decreasing mapreduce.reduce.shuffle.parallelcopies from 20 to 5, it > seems that everything goes well, no OOM ~~ > > 2017-08-23 17:19 GMT+08:00 panfei : > >> The full error stack is (which described here : >> https://issues.apache.org

Re: How to optimize multiple count( distinct col) in Hive SQL

2017-08-23 Thread panfei
by decreasing mapreduce.reduce.shuffle.parallelcopies from 20 to 5, it seems that everything goes well, no OOM ~~ 2017-08-23 17:19 GMT+08:00 panfei : > The full error stack is (which described here : https://issues.apache.org/ > jira/browse/MAPREDUCE-6108) : > > this error can n

Re: How to optimize multiple count( distinct col) in Hive SQL

2017-08-23 Thread panfei
] org.apache.hadoop.mapred.Task: Runnning cleanup for the task 2017-08-23 13:10 GMT+08:00 panfei : > Hi Gopal, Thanks for all the information and suggestion. > > The Hive version is 2.0.1 and use Hive-on-MR as the execution engine. > > I think I should create a intermediate table whi

Re: How to optimize multiple count( distinct col) in Hive SQL

2017-08-22 Thread panfei
Hi Gopal, Thanks for all the information and suggestion. The Hive version is 2.0.1 and use Hive-on-MR as the execution engine. I think I should create a intermediate table which includes all the dimensions (including the serval kinds of ids), and then use spark-sql to calculate the distinct value

Fwd: How to optimize multiple count( distinct col) in Hive SQL

2017-08-22 Thread panfei
-- Forwarded message -- From: panfei Date: 2017-08-23 12:26 GMT+08:00 Subject: Fwd: How to optimize multiple count( distinct col) in Hive SQL To: hive-...@hadoop.apache.org -- Forwarded message -- From: panfei Date: 2017-08-23 12:26 GMT+08:00 Subject: How to