[ 
https://issues.apache.org/jira/browse/HIVE-11033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yan updated HIVE-11033:
-----------------------------
    Description: 
There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class 
which caused the bloom filter index saved in the ORC file not being used. The 
root cause is the bloomFilterIndices variable defined in the SargApplier class 
superseded the one defined in its parent class. Therefore, in the 
ReaderImpl.pickRowGroups()
{code}
  protected boolean[] pickRowGroups() throws IOException {
    // if we don't have a sarg or indexes, we read everything
    if (sargApp == null) {
      return null;
    }
    readRowIndex(currentStripe, included, sargApp.sargColumns);
    return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
  }
{code}

The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp 
object. One solution is to make SargApplier.bloomFilterIndices a reference of 
the one defined in its parent class.
{noformat}
18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
174d173
<     bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
178c177
<           sarg, options.getColumnNames(), strideRate, types, included.length, 
bloomFilterIndices);
---
>           sarg, options.getColumnNames(), strideRate, types, included.length);
204a204
>     bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
673c673
<         List<OrcProto.Type> types, int includedCount, 
OrcProto.BloomFilterIndex[] bloomFilterIndices) {
---
>         List<OrcProto.Type> types, int includedCount) {
677c677
<       this.bloomFilterIndices = bloomFilterIndices;
---
>       bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
{noformat}



  was:
There is a bug in the org.apache.hadoop.hive.ql.io.orc.ReaderImpl class which 
caused the bloom filter index saved in the ORC file not being used. The root 
cause is the bloomFilterIndices variable defined in the SargApplier class 
superseded the one defined in its parent class. Therefore, in the 
ReaderImpl.pickRowGroups()
{code}
  protected boolean[] pickRowGroups() throws IOException {
    // if we don't have a sarg or indexes, we read everything
    if (sargApp == null) {
      return null;
    }
    readRowIndex(currentStripe, included, sargApp.sargColumns);
    return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
  }
{code}
The bloomFilterIndices populated by readRowIndex() is not picked up by sargApp 
object. One solution is to simply pass it to the sargApp.pickRowGroups()
{noformat}
18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
174d173
<     bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
178c177
<           sarg, options.getColumnNames(), strideRate, types, included.length, 
bloomFilterIndices);
---
>           sarg, options.getColumnNames(), strideRate, types, included.length);
204a204
>     bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
673c673
<         List<OrcProto.Type> types, int includedCount, 
OrcProto.BloomFilterIndex[] bloomFilterIndices) {
---
>         List<OrcProto.Type> types, int includedCount) {
677c677
<       this.bloomFilterIndices = bloomFilterIndices;
---
>       bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
{noformat}




> BloomFilter index is not honored by ORC reader
> ----------------------------------------------
>
>                 Key: HIVE-11033
>                 URL: https://issues.apache.org/jira/browse/HIVE-11033
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Allan Yan
>
> There is a bug in the org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl class 
> which caused the bloom filter index saved in the ORC file not being used. The 
> root cause is the bloomFilterIndices variable defined in the SargApplier 
> class superseded the one defined in its parent class. Therefore, in the 
> ReaderImpl.pickRowGroups()
> {code}
>   protected boolean[] pickRowGroups() throws IOException {
>     // if we don't have a sarg or indexes, we read everything
>     if (sargApp == null) {
>       return null;
>     }
>     readRowIndex(currentStripe, included, sargApp.sargColumns);
>     return sargApp.pickRowGroups(stripes.get(currentStripe), indexes);
>   }
> {code}
> The bloomFilterIndices populated by readRowIndex() is not picked up by 
> sargApp object. One solution is to make SargApplier.bloomFilterIndices a 
> reference of the one defined in its parent class.
> {noformat}
> 18:46 $ diff src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 
> src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java.original
> 174d173
> <     bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
> 178c177
> <           sarg, options.getColumnNames(), strideRate, types, 
> included.length, bloomFilterIndices);
> ---
> >           sarg, options.getColumnNames(), strideRate, types, 
> > included.length);
> 204a204
> >     bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
> 673c673
> <         List<OrcProto.Type> types, int includedCount, 
> OrcProto.BloomFilterIndex[] bloomFilterIndices) {
> ---
> >         List<OrcProto.Type> types, int includedCount) {
> 677c677
> <       this.bloomFilterIndices = bloomFilterIndices;
> ---
> >       bloomFilterIndices = new OrcProto.BloomFilterIndex[types.size()];
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to