Bennie Can created HIVE-11154: --------------------------------- Summary: Indexing not activated with left outer join and where clause Key: HIVE-11154 URL: https://issues.apache.org/jira/browse/HIVE-11154 Project: Hive Issue Type: Bug Components: Hive, Indexing, Tez Affects Versions: 0.13.1, tez-branch Environment: HDInsight Reporter: Bennie Can Assignee: Bennie Can
I am attempting to optimize a query using indexing. However, indexing is not being activated, and my query takes an impractical amount of time to complete. Note that John Pullokkaran at Hortonworks has had a look at the "explain extended" log and suggested that I file a bug report. Here is how I set up indexing: Set hive.optimize.index.filter=true; DROP INDEX IF EXISTS ipv4indexes ON ipv4geotable; CREATE INDEX ipv4indexes ON TABLE ipv4geotable (StartIp, EndIp) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD IDXPROPERTIES ('hive.index.compact.binary.search'='true'); ALTER INDEX ipv4indexes ON ipv4geotable REBUILD; And here is my query: DROP TABLE IF EXISTS ipv4table; CREATE TABLE ipv4table AS SELECT logon.IP, ipv4.Country FROM (SELECT * FROM logontable WHERE isIpv4(IP)) logon LEFT OUTER JOIN (SELECT StartIp, EndIp, Country FROM ipv4geotable) ipv4 ON isIpv4(logon.IP) WHERE ipv4.StartIp <= logon.IP AND logon.IP <= ipv4.EndIp; What the query is doing is extracting an IP from logontable and finding in which range it lies within a geolocation table (which is sorted). When a range is found, a country matching the IP range is returned. I need to set up indexing in order to conduct a binary search rather than going through millions of rows one at a time. Note that I use the Tez engine, but the problem also occurs with map-reduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)