Hi Maruthi,

I think those versions do not contain column level encryption.  As I
remember this feature came with HIVE-26809: Upgrade ORC to 1.8.3. by difin
· Pull Request #4138 · apache/hive (github.com)
<https://github.com/apache/hive/pull/4138> (There were several PR around
this upgrade)
I believe this function you could only try out with Hive 4 (GA) soon.

-Attila

On Tue, May 9, 2023 at 2:45 AM Maruthi Inukonda <maruthi.inuko...@gmail.com>
wrote:

> hello Hive developer/users,
>
> We have a SOC (security ops center) that uses Hadoop (3.3.1) and Hive
> (3.1.2) community versions on ubuntu 20.04. We use hive external tables
> with ORC files stored in HDFS. We want to use the columnar encryption
> feature available in ORC 1.6. (
> https://issues.apache.org/jira/browse/ORC-14)  I figured out that as of
> today, Hive 3.1.3 ships ORC 1.5.8, and Hive 4.0.0-alpha2 ships ORC 1.6.9.
> The ORC files created by hive external tables are in 0.12 and ORC_517
>
> $ hive --orcfiledump <whateverpath>/000000_0
> ..
> File Version: 0.12 with ORC_517
>
> I am trying Hive 4.0.0-alpha2 in a staging environment to test column
> encryption using Hadoop KMS. Used these commands, but the data is still in
> plain text.
>
> I created a db siem_p_csv having tables in plain text files (csv), and
> another db siem_e_orc db to have tables in encrypted form (orc).
>
> 0: jdbc:hive2://> create external table siem_p_csv.mytable (a BIGINT, b
> STRING, c INT) PARTITIONED BY (year INT, month INT, day INT, hour INT) ROW
> FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE  LOCATION
> '/test/mytable/p/csv';
>
> 0: jdbc:hive2://> ALTER TABLE siem_p_csv.mytable ADD IF NOT EXISTS
> PARTITION (year=2023, month=5, day=5, hour=5) LOCATION
> '/test/mytable/p/csv/year=2023/month=5/day=5/hour=5';
>
> 0: jdbc:hive2://> create external table siem_e_orc.mytable (a BIGINT, b
> STRING, c INT) PARTITIONED BY (year INT, month INT, day INT, hour INT)
> STORED AS ORC LOCATION '/test/mytable/e/orc'
> TBLPROPERTIES('hadoop.security.key.provider.path' =
> "kms://http@hadoop-masternn0a:9600/kms", 'orc.key.provider' = "hadoop",
> 'orc.encrypt.pii' = "b");
>
> 0: jdbc:hive2://> SET hive.exec.dynamic.partition=true;
> 0: jdbc:hive2://> SET hive.exec.dynamic.partition.mode=nonstrict;
>
> 0: jdbc:hive2://> INSERT OVERWRITE TABLE siem_e_orc.mytable PARTITION
> (year, month, day, hour) SELECT a, b, c, year, month, day, hour FROM
> siem_p_csv.mytable;
>
> 0: jdbc:hive2://> select * from siem_p_csv.mytable;
>
> +-------------+------------+------------+---------------+----------------+--------------+---------------+
> |  mytable.a  | mytable.b  | mytable.c  | mytable.year  | mytable.month  |
> mytable.day  | mytable.hour  |
>
> +-------------+------------+------------+---------------+----------------+--------------+---------------+
> | 1683393894  | secret     | 50505      | 2023          | 5              |
> 5            | 5             |
>
> +-------------+------------+------------+---------------+----------------+--------------+---------------+
> 1 row selected (0.15 seconds)
>
> 0: jdbc:hive2://> select * from siem_e_orc.mytable;
>
> +-------------+------------+------------+---------------+----------------+--------------+---------------+
> |  mytable.a  | mytable.b  | mytable.c  | mytable.year  | mytable.month  |
> mytable.day  | mytable.hour  |
>
> +-------------+------------+------------+---------------+----------------+--------------+---------------+
> | 1683393894  | secret     | 50505      | 2023          | 5              |
> 5            | 5             |
>
> +-------------+------------+------------+---------------+----------------+--------------+---------------+
>
> 0: jdbc:hive2://> exit
>
> $ hive --orcfiledump
> /test/mytable/e/orc/year=2023/month=5/day=5/hour=5/000000_0
> .
> .
> Processing data file
> /test/mytable/e/orc/year=2023/month=5/day=5/hour=5/000000_0 [length: 407]
> Structure for /test/mytable/e/orc/year=2023/month=5/day=5/hour=5/000000_0
> File Version: 0.12 with ORC_14
>
> Rows: 1
>
> Compression: ZLIB
>
> Compression size: 262144
>
> Calendar: Julian/Gregorian
>
> Type: struct<a:bigint,b:string,c:int>
>
>
> Stripe Statistics:
>   Stripe 1:
>
>     Column 0: count: 1 hasNull: false
>     Column 1: count: 1 hasNull: false bytesOnDisk: 9 min: 1683393894 max:
> 1683393894 sum: 1683393894
>     Column 2: count: 1 hasNull: false bytesOnDisk: 15 min: secret max:
> secret sum: 6
>     Column 3: count: 1 hasNull: false bytesOnDisk: 8 min: 50505 max: 50505
> sum: 50505
>
>
> File Statistics:
>
>   Column 0: count: 1 hasNull: false
>
>   Column 1: count: 1 hasNull: false bytesOnDisk: 9 min: 1683393894 max:
> 1683393894 sum: 1683393894
>   Column 2: count: 1 hasNull: false bytesOnDisk: 15 min: secret max:
> secret sum: 6
>   Column 3: count: 1 hasNull: false bytesOnDisk: 8 min: 50505 max: 50505
> sum: 50505
>
> Stripes:
>
>   Stripe: offset: 3 data: 32 rows: 1 tail: 54 index: 103
>
>     Stream: column 0 section ROW_INDEX start: 3 length 11
>     Stream: column 1 section ROW_INDEX start: 14 length 31
>     Stream: column 2 section ROW_INDEX start: 45 length 32
>     Stream: column 3 section ROW_INDEX start: 77 length 29
>
>     Stream: column 1 section DATA start: 106 length 9
>
>     Stream: column 2 section DATA start: 115 length 9
>     Stream: column 2 section LENGTH start: 124 length 6
>     Stream: column 3 section DATA start: 130 length 8
>     Encoding column 0: DIRECT
>     Encoding column 1: DIRECT_V2
>     Encoding column 2: DIRECT_V2
>
>     Encoding column 3: DIRECT_V2
>
> File length: 407 bytes
>
> Padding length: 0 bytes
> Padding ratio: 0%
>
>
> _____________________________________________________________________________________
>
>
> I still see that the ORC file generated by 4.0.0-alpha2 also has plain
> text in column "b". Am I missing something? I appreciate any help you can
> give.
>
> best regards
> -- Maruthi
>

Reply via email to