vaibhav created HIVE-21397: ------------------------------ Summary: BoomFilter for hive Managed [ACID] table does not work as expected Key: HIVE-21397 URL: https://issues.apache.org/jira/browse/HIVE-21397 Project: Hive Issue Type: Bug Components: Hive, HiveServer2 Affects Versions: 3.1.1 Reporter: vaibhav
Steps to Reproduce this issue : ----------------------------------------- 1. Create a HIveManaged table as below : ----------------------------------------- {code:java} CREATE TABLE `bloomTest`( `msisdn` string, `imsi` varchar(20), `imei` bigint, `cell_id` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/managed/hive/bloomTest; TBLPROPERTIES ( 'bucketing_version'='2', 'orc.bloom.filter.columns'='msisdn,cell_id,imsi', 'orc.bloom.filter.fpp'='0.02', 'transactional'='true', 'transactional_properties'='default', 'transient_lastDdlTime'='1551206683') {code} ----------------------------------------- 2. Insert a few rows. ----------------------------------------- ----------------------------------------- 3. Check if bloom filter or active : [ It does not show bloom filters for hive managed tables ] ----------------------------------------- {code:java} [hive@c1162-node2 root]$ hive --orcfiledump hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/managed/hive/bloomTest/delta_0000001_0000001_0000 | grep -i bloom SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Processing data file hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/managed/hive/bloomTest/delta_0000001_0000001_0000/bucket_00000 [length: 791] Structure for hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/managed/hive/bloomTest/delta_0000001_0000001_0000/bucket_00000 {code} ----------------------------------------- On Another hand: For hive External tables it works : ----------------------------------------- {code:java} CREATE external TABLE `ext_bloomTest`( `msisdn` string, `imsi` varchar(20), `imei` bigint, `cell_id` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ( 'bucketing_version'='2', 'orc.bloom.filter.columns'='msisdn,cell_id,imsi', 'orc.bloom.filter.fpp'='0.02') {code} ----------------------------------------- {code:java} [hive@c1162-node2 root]$ hive --orcfiledump hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/external/hive/ext_bloomTest/000000_0 | grep -i bloom SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Processing data file hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/external/hive/ext_bloomTest/000000_0 [length: 755] Structure for hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/external/hive/ext_bloomTest/000000_0 Stream: column 1 section BLOOM_FILTER_UTF8 start: 41 length 110 Stream: column 2 section BLOOM_FILTER_UTF8 start: 178 length 114 Stream: column 4 section BLOOM_FILTER_UTF8 start: 340 length 109 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)