[ 
https://issues.apache.org/jira/browse/HIVE-24162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24162:
----------------------------------
    Labels: pull-request-available  (was: )

> Query based compaction looses bloom filter
> ------------------------------------------
>
>                 Key: HIVE-24162
>                 URL: https://issues.apache.org/jira/browse/HIVE-24162
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Peter Varga
>            Assignee: Peter Varga
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
>   
> {noformat}
> +----------------------------------------------------+
> |                   createtab_stmt                   |
> +----------------------------------------------------+
> | CREATE TABLE `bloomTest`(                          |
> |   `msisdn` string,                                 |
> |   `imsi` varchar(20),                              |
> |   `imei` bigint,                                   |
> |   `cell_id` bigint)                                |
> | ROW FORMAT SERDE                                   |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'      |
> | STORED AS INPUTFORMAT                              |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  |
> | OUTPUTFORMAT                                       |
> |   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
> | LOCATION                                           |
> |   
> 's3a://dwxtpcds30-wwgq-dwx-managed/clusters/env-6cwwgq/warehouse-1580338415-7dph/warehouse/tablespace/managed/hive/del_db.db/bloomtest'
>  |
> | TBLPROPERTIES (                                    |
> |   'bucketing_version'='2',                         |
> |   'orc.bloom.filter.columns'='msisdn,cell_id,imsi',  |
> |   'orc.bloom.filter.fpp'='0.02',                   |
> |   'transactional'='true',                          |
> |   'transactional_properties'='default',            |
> |   'transient_lastDdlTime'='1597222946')            |
> +----------------------------------------------------+
> insert into  bloomTest values ("a", "b", 10, 20);
> insert into  bloomTest values ("aa", "bb", 100, 200);
> insert into  bloomTest values ("aaa", "bbb", 1000, 2000);
> select * from bloomTest;
> +-------------------+-----------------+-----------------+--------------------+
> | bloomtest.msisdn  | bloomtest.imsi  | bloomtest.imei  | bloomtest.cell_id  |
> +-------------------+-----------------+-----------------+--------------------+
> | a                 | b               | 10              | 20                 |
> | aa                | bb              | 100             | 200                |
> | aaa               | bbb             | 1000            | 2000               |
> +-------------------+-----------------+-----------------+--------------------+
> {noformat}
>  - Compact the table
> {code:java}
> alter table bloomTest compact 'MAJOR';
> {code}
>  - Wait for the compaction to be over and check for bloom filters in dataset.
>   
>  - delta would have it, but not in the base dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to