Hi Maruthi, I think those versions do not contain column level encryption. As I remember this feature came with HIVE-26809: Upgrade ORC to 1.8.3. by difin · Pull Request #4138 · apache/hive (github.com) <https://github.com/apache/hive/pull/4138> (There were several PR around this upgrade) I believe this function you could only try out with Hive 4 (GA) soon.
-Attila On Tue, May 9, 2023 at 2:45 AM Maruthi Inukonda <maruthi.inuko...@gmail.com> wrote: > hello Hive developer/users, > > We have a SOC (security ops center) that uses Hadoop (3.3.1) and Hive > (3.1.2) community versions on ubuntu 20.04. We use hive external tables > with ORC files stored in HDFS. We want to use the columnar encryption > feature available in ORC 1.6. ( > https://issues.apache.org/jira/browse/ORC-14) I figured out that as of > today, Hive 3.1.3 ships ORC 1.5.8, and Hive 4.0.0-alpha2 ships ORC 1.6.9. > The ORC files created by hive external tables are in 0.12 and ORC_517 > > $ hive --orcfiledump <whateverpath>/000000_0 > .. > File Version: 0.12 with ORC_517 > > I am trying Hive 4.0.0-alpha2 in a staging environment to test column > encryption using Hadoop KMS. Used these commands, but the data is still in > plain text. > > I created a db siem_p_csv having tables in plain text files (csv), and > another db siem_e_orc db to have tables in encrypted form (orc). > > 0: jdbc:hive2://> create external table siem_p_csv.mytable (a BIGINT, b > STRING, c INT) PARTITIONED BY (year INT, month INT, day INT, hour INT) ROW > FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION > '/test/mytable/p/csv'; > > 0: jdbc:hive2://> ALTER TABLE siem_p_csv.mytable ADD IF NOT EXISTS > PARTITION (year=2023, month=5, day=5, hour=5) LOCATION > '/test/mytable/p/csv/year=2023/month=5/day=5/hour=5'; > > 0: jdbc:hive2://> create external table siem_e_orc.mytable (a BIGINT, b > STRING, c INT) PARTITIONED BY (year INT, month INT, day INT, hour INT) > STORED AS ORC LOCATION '/test/mytable/e/orc' > TBLPROPERTIES('hadoop.security.key.provider.path' = > "kms://http@hadoop-masternn0a:9600/kms", 'orc.key.provider' = "hadoop", > 'orc.encrypt.pii' = "b"); > > 0: jdbc:hive2://> SET hive.exec.dynamic.partition=true; > 0: jdbc:hive2://> SET hive.exec.dynamic.partition.mode=nonstrict; > > 0: jdbc:hive2://> INSERT OVERWRITE TABLE siem_e_orc.mytable PARTITION > (year, month, day, hour) SELECT a, b, c, year, month, day, hour FROM > siem_p_csv.mytable; > > 0: jdbc:hive2://> select * from siem_p_csv.mytable; > > +-------------+------------+------------+---------------+----------------+--------------+---------------+ > | mytable.a | mytable.b | mytable.c | mytable.year | mytable.month | > mytable.day | mytable.hour | > > +-------------+------------+------------+---------------+----------------+--------------+---------------+ > | 1683393894 | secret | 50505 | 2023 | 5 | > 5 | 5 | > > +-------------+------------+------------+---------------+----------------+--------------+---------------+ > 1 row selected (0.15 seconds) > > 0: jdbc:hive2://> select * from siem_e_orc.mytable; > > +-------------+------------+------------+---------------+----------------+--------------+---------------+ > | mytable.a | mytable.b | mytable.c | mytable.year | mytable.month | > mytable.day | mytable.hour | > > +-------------+------------+------------+---------------+----------------+--------------+---------------+ > | 1683393894 | secret | 50505 | 2023 | 5 | > 5 | 5 | > > +-------------+------------+------------+---------------+----------------+--------------+---------------+ > > 0: jdbc:hive2://> exit > > $ hive --orcfiledump > /test/mytable/e/orc/year=2023/month=5/day=5/hour=5/000000_0 > . > . > Processing data file > /test/mytable/e/orc/year=2023/month=5/day=5/hour=5/000000_0 [length: 407] > Structure for /test/mytable/e/orc/year=2023/month=5/day=5/hour=5/000000_0 > File Version: 0.12 with ORC_14 > > Rows: 1 > > Compression: ZLIB > > Compression size: 262144 > > Calendar: Julian/Gregorian > > Type: struct<a:bigint,b:string,c:int> > > > Stripe Statistics: > Stripe 1: > > Column 0: count: 1 hasNull: false > Column 1: count: 1 hasNull: false bytesOnDisk: 9 min: 1683393894 max: > 1683393894 sum: 1683393894 > Column 2: count: 1 hasNull: false bytesOnDisk: 15 min: secret max: > secret sum: 6 > Column 3: count: 1 hasNull: false bytesOnDisk: 8 min: 50505 max: 50505 > sum: 50505 > > > File Statistics: > > Column 0: count: 1 hasNull: false > > Column 1: count: 1 hasNull: false bytesOnDisk: 9 min: 1683393894 max: > 1683393894 sum: 1683393894 > Column 2: count: 1 hasNull: false bytesOnDisk: 15 min: secret max: > secret sum: 6 > Column 3: count: 1 hasNull: false bytesOnDisk: 8 min: 50505 max: 50505 > sum: 50505 > > Stripes: > > Stripe: offset: 3 data: 32 rows: 1 tail: 54 index: 103 > > Stream: column 0 section ROW_INDEX start: 3 length 11 > Stream: column 1 section ROW_INDEX start: 14 length 31 > Stream: column 2 section ROW_INDEX start: 45 length 32 > Stream: column 3 section ROW_INDEX start: 77 length 29 > > Stream: column 1 section DATA start: 106 length 9 > > Stream: column 2 section DATA start: 115 length 9 > Stream: column 2 section LENGTH start: 124 length 6 > Stream: column 3 section DATA start: 130 length 8 > Encoding column 0: DIRECT > Encoding column 1: DIRECT_V2 > Encoding column 2: DIRECT_V2 > > Encoding column 3: DIRECT_V2 > > File length: 407 bytes > > Padding length: 0 bytes > Padding ratio: 0% > > > _____________________________________________________________________________________ > > > I still see that the ORC file generated by 4.0.0-alpha2 also has plain > text in column "b". Am I missing something? I appreciate any help you can > give. > > best regards > -- Maruthi >