Please check out HIVE-11502. For your poc, you can simply get around using other data types instead of double.
On Thu, Aug 20, 2015 at 2:08 AM, Nishant Aggarwal <nishant....@gmail.com> wrote: > Thanks for the reply Noam. I have already tried the later point of > dividing the query. But the challenge comes during the joining of the table. > > > Thanks and Regards > Nishant Aggarwal, PMP > Cell No:- +91 99588 94305 > > > On Thu, Aug 20, 2015 at 2:19 PM, Noam Hasson <noam.has...@kenshoo.com> > wrote: > >> Hi, >> >> Have you look at counters in Hadoop side? It's possible you are dealing >> with a bad join which causes multiplication of items, if you see huge >> number of record input/output in map/reduce phase and keeps increasing >> that's probably the case. >> >> Another thing I would try is to divide the job into several different >> smaller queries, for example start with filter only, after than join and so >> on. >> >> Noam. >> >> On Thu, Aug 20, 2015 at 10:55 AM, Nishant Aggarwal <nishant....@gmail.com >> > wrote: >> >>> Dear Hive Users, >>> >>> I am in process of running over a poc to one of my customer >>> demonstrating the huge performance benefits of Hadoop BigData using Hive. >>> >>> Following is the problem statement i am stuck with. >>> >>> I have generate a large table with 28 columns( all are double). Table >>> size on disk is 70GB (i ultimately created compressed table using ORC >>> format to save disk space bringing down the table size to < 1GB) with more >>> than 450Million records. >>> >>> In order to demonstrate a complex use case i joined this table with >>> itself. Following are the queries i have used to create table and join >>> query i am using. >>> >>> *Create Table and Loading Data, Hive parameters settigs:* >>> set hive.vectorized.execution.enabled = true; >>> set hive.vectorized.execution.reduce.enabled = true; >>> set mapred.max.split.size=100000000; >>> set mapred.min.split.size=1000000; >>> set hive.auto.convert.join=false; >>> set hive.enforce.sorting=true; >>> set hive.enforce.bucketing=true; >>> set hive.exec.dynamic.partition=true; >>> set hive.exec.dynamic.partition.mode=nonstrict; >>> set mapreduce.reduce.input.limit=-1; >>> set hive.exec.parallel = true; >>> >>> CREATE TABLE huge_numeric_table_orc2(col1 double,col2 double,col3 >>> double,col4 double,col5 double,col6 double,col7 double,col8 double,col9 >>> double,col10 double,col11 double,col12 double,col13 double,col14 >>> double,col15 double,col16 double,col17 double,col18 double,col19 >>> double,col20 double,col21 double,col22 double,col23 double,col24 >>> double,col25 double,col26 double,col27 double,col28 double) >>> clustered by (col1) sorted by (col1) into 240 buckets >>> STORED AS ORC tblproperties ("orc.compress"="SNAPPY"); >>> >>> from huge_numeric_table insert overwrite table huge_numeric_table_orc2 >>> select * sort by col1; >>> >>> >>> *JOIN QUERY:* >>> >>> select (avg(t1.col1)*avg(t1.col6))/(avg(t1.col11)*avg(t1.col16)) as AVG5 >>> from huge_numeric_table_orc2 t1 left outer join huge_numeric_table_orc2 t2 >>> on t1.col1=t2.col1 where (t1.col1) > 34.11 and (t2.col1) >10.12 >>> >>> >>> *The problem is that this query gets stuck at reducers :80-85%. and goes >>> in a loop and never finishes. * >>> >>> Version of Hive is 1.2. >>> >>> Please help. >>> >>> >>> Thanks and Regards >>> Nishant Aggarwal, PMP >>> Cell No:- +91 99588 94305 >>> >>> >> >> This e-mail, as well as any attached document, may contain material which >> is confidential and privileged and may include trademark, copyright and >> other intellectual property rights that are proprietary to Kenshoo Ltd, >> its subsidiaries or affiliates ("Kenshoo"). This e-mail and its >> attachments may be read, copied and used only by the addressee for the >> purpose(s) for which it was disclosed herein. If you have received it in >> error, please destroy the message and any attachment, and contact us >> immediately. If you are not the intended recipient, be aware that any >> review, reliance, disclosure, copying, distribution or use of the contents >> of this message without Kenshoo's express permission is strictly prohibited. > > >