Xinli Shang created HIVE-21848:
----------------------------------

             Summary: Table property name definition between ORC and Parquet 
                 Key: HIVE-21848
                 URL: https://issues.apache.org/jira/browse/HIVE-21848
             Project: Hive
          Issue Type: Task
          Components: Metastore
    Affects Versions: 3.0.0
            Reporter: Xinli Shang
            Assignee: Xinli Shang
             Fix For: 3.0.0


The goal of this Jira is to define a superset of unified table property names 
that can be used for both Parquet and ORC column encryption. There is no code 
change needed for this Jira. 

*Background:* 

ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To 
configure the encryption, e.g. which column is sensitive, what master key to be 
used, algorithm, etc, table properties can be used. It is important that both 
Parquet and ORC can use unified names.

According to the slide 
[https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692],
 ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in 
the Parquet community, we are still discussing to provide several ways and 
using table properties is one of the options, while we don't have the detailed 
design of the table property names yet. 

So it is a good time to discuss within two communities to have unified table 
names as a superset. 

*Proposal:* 

There are several encryption properties that need to be specified for a table. 
Here is the list. This is the superset of Parquet and ORC. Some of them might 
not apply to both. 
 # PII columns including nest columns
 # Column key metadata, master key metadata 
 # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. ORC 
might support AES_CTR.
 # Encryption footer - Parquet allow footer to be encrypted or plaintext
 # Footer key metadata 

Here is the table properties proposal.  
|*Table Property Name* |*Value* |*Notes* |
|encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption. |
|encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted 
footer. By default, it is encrypted.|
|encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to 
the KMS to define what key metadata is. The metadata should have enough 
information to figure out the corresponding key by the KMS.  |
|encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column name 
for example, ‘address.zipcode’. 

It is up to the KMS to define what key metadata is. The metadata should have 
enough information to figure out the corresponding key by the KMS. |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to