RE: Hive indexing optimization

Bennie Leo Tue, 30 Jun 2015 14:42:07 -0700

Thank you, I will do that.
B
 
Subject: Re: Hive indexing optimization
From: jpullokka...@hortonworks.com
To: user@hive.apache.org
Date: Tue, 30 Jun 2015 18:46:50 +0000







Index doesn’t seems to be kicking in this case.
Please file a bug for this.



Thanks
John





From: Bennie Leo <tben...@hotmail.com>

Reply-To: "user@hive.apache.org" <user@hive.apache.org>

Date: Monday, June 29, 2015 at 5:25 PM

To: "user@hive.apache.org" <user@hive.apache.org>

Subject: RE: Hive indexing optimization








I've attached the output. Thanks.

  
B

                    

 


Subject: Re: Hive indexing optimization

From: jpullokka...@hortonworks.com

To: user@hive.apache.org

Date: Mon, 29 Jun 2015 19:17:44 +0000



Could you post explain extended output?





From: Bennie Leo <tben...@hotmail.com>

Reply-To: "user@hive.apache.org" <user@hive.apache.org>

Date: Monday, June 29, 2015 at 10:35 AM

To: "user@hive.apache.org" <user@hive.apache.org>

Subject: RE: Hive indexing optimization







Here is the explain output:

 

STAGE PLANS:

  Stage: Stage-1

    Tez

      Edges:

        Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)

      Vertices:

        Map 1 

            Map Operator Tree:

                TableScan

                  alias: logontable

                  filterExpr: isipv4(ip) (type: boolean)

                  Statistics: Num rows: 0 Data size: 550 Basic stats: PARTIAL 
Column stats: NONE

                  Filter Operator

                    predicate: isipv4(ip) (type: boolean)

                    Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
Column stats: NONE

                    Select Operator

                      expressions: ip (type: bigint)

                      outputColumnNames: _col0

                      Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
Column stats: NONE

                      Reduce Output Operator

                        sort order: 

                        Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
Column stats: NONE

                        value expressions: _col0 (type: bigint)

        Map 3 

            Map Operator Tree:

                TableScan

                  alias: ipv4geotable

                  Statistics: Num rows: 41641243 Data size: 5144651200 Basic 
stats: COMPLETE Column stats: NONE

                  Select Operator

                    expressions: startip (type: bigint), endip (type: bigint), 
country (type: string)

                    outputColumnNames: _col0, _col1, _col2

                    Statistics: Num rows: 41641243 Data size: 5144651200 Basic 
stats: COMPLETE Column stats: NONE

                    Reduce Output Operator

                      sort order: 

                      Statistics: Num rows: 41641243 Data size: 5144651200 
Basic stats: COMPLETE Column stats: NONE

                      value expressions: _col0 (type: bigint), _col1 (type: 
bigint), _col2 (type: string)

        Reducer 2 

            Reduce Operator Tree:

              Join Operator

                condition map:

                     Left Outer Join0 to 1

                condition expressions:

                  0 {VALUE._col0} 

                  1 {VALUE._col0} {VALUE._col1} {VALUE._col2}

                filter predicates:

                  0 {isipv4(VALUE._col0)}

                  1 

                outputColumnNames: _col0, _col1, _col2, _col3

                Statistics: Num rows: 43281312 Data size: 5020632576 Basic 
stats: COMPLETE Column stats: NONE

                Filter Operator

                  predicate: ((_col1 <= _col0) and (_col0 <= _col2)) (type: 
boolean)

                  Statistics: Num rows: 5209034 Data size: 497855986 Basic 
stats: COMPLETE Column stats: NONE

                  Select Operator

                    expressions: _col0 (type: bigint), _col3 (type: string)

                    outputColumnNames: _col0, _col1

                    Statistics: Num rows: 5209034 Data size: 497855986 Basic 
stats: COMPLETE Column stats: NONE

                    File Output Operator

                      compressed: false

                      Statistics: Num rows: 5209034 Data size: 497855986 Basic 
stats: COMPLETE Column stats: NONE

                      table:

                          input format: org.apache.hadoop.mapred.TextInputFormat

                          output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

                          serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0

    Fetch Operator

      limit: -1


 

Thank you,

B

 

> Subject: Re: Hive indexing optimization

> From: jpullokka...@hortonworks.com

> To: user@hive.apache.org

> CC: tben...@hotmail.com

> Date: Sat, 27 Jun 2015 16:02:08 +0000

> 

> "SELECT StartIp, EndIp, Country FROM ipv4geotable” should have been

> rewritten as a scan against index table.

> 

> BitMap Indexes seems to support inequalities (<=, <, >=).

> 

> Post the explain plan.

> 

> On 6/26/15, 8:56 PM, "Gopal Vijayaraghavan" <gop...@apache.org> wrote:

> 

> >Hi,

> >

> >Hive indexes won¹t really help you speed up that query right now, because

> >of the plan it generates due to the <= clauses.

> >

> >> CREATETABLE ipv4table

> >> AS

> >> SELECT logon.IP, ipv4.Country

> >> FROM

> >> (SELECT * FROM logontable WHERE isIpv4(IP)) logon

> >> LEFT OUTER JOIN

> >> (SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON

> >> isIpv4(logon.IP)

> >> WHERE ipv4.StartIp <=logon.IP AND logon.IP <= ipv4.EndIp;

> >

> >That¹s a cross-product join, which can¹t be distributed at all & will take

> >forever, even if you use Tez/LLAP.

> >

> >Range join queries have partial distribution rewrites, but AFAIK none of

> >them use filter indexes in hive.

> >

> >But before I suggest a rewrite, can you post the ³explain <query>;² for

> >the above query, so that I can check if this is indeed producing a

> >cross-product + filter?

> >

> >Cheers,

> >Gopal

> >

> >

> >

> >

> >

> >

> >

> >

>

RE: Hive indexing optimization

Reply via email to