Mert Hocanin created HIVE-21419:
-----------------------------------

             Summary: Partition Pruning not happening when using Apache Ranger 
masking
                 Key: HIVE-21419
                 URL: https://issues.apache.org/jira/browse/HIVE-21419
             Project: Hive
          Issue Type: Bug
          Components: Physical Optimizer, Query Planning
    Affects Versions: 2.3.2
         Environment: I used an AWS Cloudformation script from AWS's big data 
blog[1]. The EMR AMI uses Hive 2.3.3 and Apache Ranger 1.0.0. 

Source Table:

CREATE EXTERNAL TABLE analyst1.lineitem_partitioned (
    `l_orderkey` int, 
    `l_partkey` int, 
    `l_suppkey` int, 
    `l_linenumber` int, 
    `l_quantity` double, 
    `l_extendedprice` double, 
    `l_discount` double, 
    `l_tax` double, 
    `l_returnflag` string, 
    `l_linestatus` string, 
    `l_commitdate` string, 
    `l_receiptdate` string, 
    `l_shipinstruct` string, 
    `l_shipmode` string, 
   `l_comment` string
) PARTITIONED BY (`l_shipdate` string)
STORED AS PARQUET
LOCATION '/user/analyst1/tpch/sf100/lineitem';

Destination Table:

CREATE EXTERNAL TABLE analyst1.test1(
   l_commitdate string,
   l_receiptdate string
) PARTITIONED BY (`l_shipdate` string)
STORED AS PARQUET
LOCATION '/user/analyst1/tpch/sf100/lineitem_parq_partitioned';

Query:

insert overwrite table analyst1.test1 PARTITION (l_shipdate)
select l_commitdate, l_receiptdate, l_shipdate
from default.lineitem_parq_partitioned 
where l_shipdate = '1992-01-02';

Ranger Masking Rule:

Hive Database: analyst1
Hive Table: lineitem_partitioned
Mask Condition Option: Custom: "XXXXXX" (replace the column with a static 
string for simplicity, but our use case uses a complex UDF).


[1] 
https://aws.amazon.com/blogs/big-data/implementing-authorization-and-auditing-using-apache-ranger-on-amazon-emr/
 
            Reporter: Mert Hocanin
         Attachments: Operators-in-debugger-with-masking.png, 
Operators-in-debugger-without-masking.png, hive-jira-schema-explain-plan.txt

I have a partitioned table, which I have a Ranger masking policy on a 
non-partition column. When I am attempting to query the table that includes the 
column that has masking enabled, then partition pruning no longer occurs. 

To reproduce:

Create two partitioned tables. I used TPC-H tables as they are publicly 
available and will provide the schemas and queries I used. Insert into the 
second table from the first table. For example:

insert overwrite table analyst1.test1 PARTITION (l_shipdate)
select l_commitdate, l_receiptdate, l_shipdate
from analyst1.lineitem_partitioned 
where l_shipdate = '1992-01-02';

I have attached the explain plan when a masking rule on l_commitdate is enabled 
and when not enabled.

I have done a bit of deep dive and see that the pruning expression is not being 
set when the masking rule is enabled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to