Re: HIVE:1.2, Query taking huge time

Xuefu Zhang Thu, 20 Aug 2015 06:46:54 -0700

Please check out HIVE-11502. For your poc, you can simply get around using
other data types instead of double.


On Thu, Aug 20, 2015 at 2:08 AM, Nishant Aggarwal <nishant....@gmail.com>
wrote:

> Thanks for the reply Noam. I have already tried the later point of
> dividing the query. But the challenge comes during the joining of the table.
>
>
> Thanks and Regards
> Nishant Aggarwal, PMP
> Cell No:- +91 99588 94305
>
>
> On Thu, Aug 20, 2015 at 2:19 PM, Noam Hasson <noam.has...@kenshoo.com>
> wrote:
>
>> Hi,
>>
>> Have you look at counters in Hadoop side? It's possible you are dealing
>> with a bad join which causes multiplication of items, if you see huge
>> number of record input/output in map/reduce phase and keeps increasing
>> that's probably the case.
>>
>> Another thing I would try is to divide the job into several different
>> smaller queries, for example start with filter only, after than join and so
>> on.
>>
>> Noam.
>>
>> On Thu, Aug 20, 2015 at 10:55 AM, Nishant Aggarwal <nishant....@gmail.com
>> > wrote:
>>
>>> Dear Hive Users,
>>>
>>> I am in process of running over a poc to one of my customer
>>> demonstrating the huge performance benefits of Hadoop BigData using Hive.
>>>
>>> Following is the problem statement i am stuck with.
>>>
>>> I have generate a large table with 28 columns( all are double). Table
>>> size on disk is 70GB (i ultimately created compressed table using ORC
>>> format to save disk space bringing down the table size to < 1GB) with more
>>> than 450Million records.
>>>
>>> In order to demonstrate a complex use case i joined this table with
>>> itself. Following are the queries i have used to create table and  join
>>> query i am using.
>>>
>>> *Create Table and Loading Data, Hive parameters settigs:*
>>> set hive.vectorized.execution.enabled = true;
>>> set hive.vectorized.execution.reduce.enabled = true;
>>> set mapred.max.split.size=100000000;
>>> set mapred.min.split.size=1000000;
>>> set hive.auto.convert.join=false;
>>> set hive.enforce.sorting=true;
>>> set hive.enforce.bucketing=true;
>>> set hive.exec.dynamic.partition=true;
>>> set hive.exec.dynamic.partition.mode=nonstrict;
>>> set mapreduce.reduce.input.limit=-1;
>>> set hive.exec.parallel = true;
>>>
>>> CREATE TABLE huge_numeric_table_orc2(col1 double,col2 double,col3
>>> double,col4 double,col5 double,col6 double,col7 double,col8 double,col9
>>> double,col10 double,col11 double,col12 double,col13 double,col14
>>> double,col15 double,col16 double,col17 double,col18 double,col19
>>> double,col20 double,col21 double,col22 double,col23 double,col24
>>> double,col25 double,col26 double,col27 double,col28 double)
>>> clustered by (col1) sorted by (col1) into 240 buckets
>>> STORED AS ORC tblproperties ("orc.compress"="SNAPPY");
>>>
>>> from huge_numeric_table insert overwrite table huge_numeric_table_orc2
>>> select * sort by col1;
>>>
>>>
>>> *JOIN QUERY:*
>>>
>>> select (avg(t1.col1)*avg(t1.col6))/(avg(t1.col11)*avg(t1.col16)) as AVG5
>>> from huge_numeric_table_orc2 t1 left outer join huge_numeric_table_orc2 t2
>>> on t1.col1=t2.col1 where (t1.col1) > 34.11 and (t2.col1) >10.12
>>>
>>>
>>> *The problem is that this query gets stuck at reducers :80-85%. and goes
>>> in a loop and never finishes. *
>>>
>>> Version of Hive is 1.2.
>>>
>>> Please help.
>>>
>>>
>>> Thanks and Regards
>>> Nishant Aggarwal, PMP
>>> Cell No:- +91 99588 94305
>>>
>>>
>>
>> This e-mail, as well as any attached document, may contain material which
>> is confidential and privileged and may include trademark, copyright and
>> other intellectual property rights that are proprietary to Kenshoo Ltd,
>>  its subsidiaries or affiliates ("Kenshoo"). This e-mail and its
>> attachments may be read, copied and used only by the addressee for the
>> purpose(s) for which it was disclosed herein. If you have received it in
>> error, please destroy the message and any attachment, and contact us
>> immediately. If you are not the intended recipient, be aware that any
>> review, reliance, disclosure, copying, distribution or use of the contents
>> of this message without Kenshoo's express permission is strictly prohibited.
>
>
>

Re: HIVE:1.2, Query taking huge time

Reply via email to