Index doesn’t seems to be kicking in this case. Please file a bug for this.
Thanks John From: Bennie Leo <tben...@hotmail.com<mailto:tben...@hotmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Monday, June 29, 2015 at 5:25 PM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: RE: Hive indexing optimization I've attached the output. Thanks. B ________________________________ Subject: Re: Hive indexing optimization From: jpullokka...@hortonworks.com<mailto:jpullokka...@hortonworks.com> To: user@hive.apache.org<mailto:user@hive.apache.org> Date: Mon, 29 Jun 2015 19:17:44 +0000 Could you post explain extended output? From: Bennie Leo <tben...@hotmail.com<mailto:tben...@hotmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Monday, June 29, 2015 at 10:35 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Subject: RE: Hive indexing optimization Here is the explain output: STAGE PLANS: Stage: Stage-1 Tez Edges: Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE) Vertices: Map 1 Map Operator Tree: TableScan alias: logontable filterExpr: isipv4(ip) (type: boolean) Statistics: Num rows: 0 Data size: 550 Basic stats: PARTIAL Column stats: NONE Filter Operator predicate: isipv4(ip) (type: boolean) Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator expressions: ip (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE value expressions: _col0 (type: bigint) Map 3 Map Operator Tree: TableScan alias: ipv4geotable Statistics: Num rows: 41641243 Data size: 5144651200 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: startip (type: bigint), endip (type: bigint), country (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 41641243 Data size: 5144651200 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 41641243 Data size: 5144651200 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: string) Reducer 2 Reduce Operator Tree: Join Operator condition map: Left Outer Join0 to 1 condition expressions: 0 {VALUE._col0} 1 {VALUE._col0} {VALUE._col1} {VALUE._col2} filter predicates: 0 {isipv4(VALUE._col0)} 1 outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 43281312 Data size: 5020632576 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: ((_col1 <= _col0) and (_col0 <= _col2)) (type: boolean) Statistics: Num rows: 5209034 Data size: 497855986 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: bigint), _col3 (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 5209034 Data size: 497855986 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 5209034 Data size: 497855986 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Thank you, B > Subject: Re: Hive indexing optimization > From: jpullokka...@hortonworks.com<mailto:jpullokka...@hortonworks.com> > To: user@hive.apache.org<mailto:user@hive.apache.org> > CC: tben...@hotmail.com<mailto:tben...@hotmail.com> > Date: Sat, 27 Jun 2015 16:02:08 +0000 > > "SELECT StartIp, EndIp, Country FROM ipv4geotable” should have been > rewritten as a scan against index table. > > BitMap Indexes seems to support inequalities (<=, <, >=). > > Post the explain plan. > > On 6/26/15, 8:56 PM, "Gopal Vijayaraghavan" > <gop...@apache.org<mailto:gop...@apache.org>> wrote: > > >Hi, > > > >Hive indexes won¹t really help you speed up that query right now, because > >of the plan it generates due to the <= clauses. > > > >> CREATETABLE ipv4table > >> AS > >> SELECT logon.IP, ipv4.Country > >> FROM > >> (SELECT * FROM logontable WHERE isIpv4(IP)) logon > >> LEFT OUTER JOIN > >> (SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON > >> isIpv4(logon.IP) > >> WHERE ipv4.StartIp <=logon.IP AND logon.IP <= ipv4.EndIp; > > > >That¹s a cross-product join, which can¹t be distributed at all & will take > >forever, even if you use Tez/LLAP. > > > >Range join queries have partial distribution rewrites, but AFAIK none of > >them use filter indexes in hive. > > > >But before I suggest a rewrite, can you post the ³explain <query>;² for > >the above query, so that I can check if this is indeed producing a > >cross-product + filter? > > > >Cheers, > >Gopal > > > > > > > > > > > > > > > > >