[ https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875247#comment-16875247 ]
Xinli Shang commented on HIVE-21848: ------------------------------------ Hi [~owen.omalley], yes, I looked at the HadoopShims.java earlier. I still remember you had a super smart workaround to avoid two round trips to get generate/encrypt a working key from KMS. It reduced half of the traffic. For the nested column questions above, I generally agree that makes sense. There are only a few corner cases that we need to discuss. For the example above "name: struct<first:string,last:string>", if we see the table properties have the following entry, "encrypt.columns" = "pii:name;other_category:name.first", what do we do? Should we through exception? Or we just ignore "other_category:name.first" to let parent to override it? Do we allow exclusion of some leaf columns not to be encrypted, if their parent is specified to be encrypted? I guess people will raise the feature request later when it is roll out. With that said, I am not objecting the proposal but just some thoughts on corner cases. > Table property name definition between ORC and Parquet encrytion > ---------------------------------------------------------------- > > Key: HIVE-21848 > URL: https://issues.apache.org/jira/browse/HIVE-21848 > Project: Hive > Issue Type: Task > Components: Metastore > Affects Versions: 3.0.0 > Reporter: Xinli Shang > Assignee: Xinli Shang > Priority: Major > Fix For: 3.0.0 > > > The goal of this Jira is to define a superset of unified table property names > that can be used for both Parquet and ORC column encryption. There is no code > change needed for this Jira. > *Background:* > ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To > configure the encryption, e.g. which column is sensitive, what master key to > be used, algorithm, etc, table properties can be used. It is important that > both Parquet and ORC can use unified names. > According to the slide > [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692], > ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in > the Parquet community, it is still discussing to provide several ways and > using table properties is one of the options, while there is no detailed > design of the table property names yet. > So it is a good time to discuss within two communities to have unified table > names as a superset. > *Proposal:* > There are several encryption properties that need to be specified for a > table. Here is the list. This is the superset of Parquet and ORC. Some of > them might not apply to both. > # PII columns including nest columns > # Column key metadata, master key metadata > # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. > ORC might support AES_CTR. > # Encryption footer - Parquet allow footer to be encrypted or plaintext > # Footer key metadata > Here is the table properties proposal. > |*Table Property Name*|*Value*|*Notes*| > |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.| > |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted > footer. By default, it is encrypted.| > |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to > the KMS to define what key metadata is. The metadata should have enough > information to figure out the corresponding key by the KMS. | > |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column > name for example, ‘address.zipcode’. > > It is up to the KMS to define what key metadata is. The metadata should have > enough information to figure out the corresponding key by the KMS.| > -- This message was sent by Atlassian JIRA (v7.6.3#76005)