Have you tried to increase the heap size (worked for me)?
E.g. -
bash
mkdir t
awk 'BEGIN{OFS=",";for(i=0;i<10000000;++i){print i,i}}' > t/t.csv
hdfs dfs -put t /tmp
export HADOOP_OPTS="$HADOOP_OPTS -Xmx1024m"
hive
create external table t (i int,s string) row format delimited fields terminated
by ',' location '/tmp/t';
select i%10,collect_list(s) from t group by i%10;
Dudu
From: Mahender Sarangam [mailto:[email protected]]
Sent: Thursday, June 16, 2016 1:47 AM
To: [email protected]
Subject: Is there any GROUP_CONCAT Function in Hive
Hi,
We have Hive table with 3 GB of data like 1000000 rows. We are looking for any
functionality in hive, which can perform GROUP_CONCAT Function.
We tried implement Group_Concat function with use Collect_List and Collect_Set.
But we are getting heap space error. Because, For each group key around 100000
rows are present, now these rows which needs to be concatenate.
Any direct way to concat row data into single string column by GROUP BY.