[ https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867785#comment-16867785 ]
Gidon Gershinsky edited comment on HIVE-21848 at 6/19/19 4:26 PM: ------------------------------------------------------------------ [~sha...@uber.com], a few comments: for either footer or columns, key metadata should not be passed as a property. Instead, it should be derived from the properties (such as key names, wrapping method, KMS type, etc). on the other hand, a few substantial properties are missing in your list (like key names, token, etc) actually, we have a draft that already defines the Parquet encryption properties, please have a look at [https://docs.google.com/document/d/1boH6HPkG0ZhgxcaRkGk3QpZ8X_J91uXZwVGwYN45St4/edit?usp=sharing] It had not been reviewed by the community yet, so its a bit early to try to unify ORC and Parquet properties. We might find at the end that the differences are bigger than the common. But in any case, I think this exercise of finding the common is helpful; its just a bit early at this point. was (Author: gershinsky): [~sha...@uber.com], a few comments: * for either footer or columns, key metadata should not be passed as a property. Instead, it should be derived from the properties (such as key names, wrapping method, KMS type, etc). * on the other hand, a few substantial properties are missing in your list (like KMS client type, token, etc) * actually, we have a draft that already defines the Parquet encryption properties, please have a look at > Table property name definition between ORC and Parquet encrytion > ---------------------------------------------------------------- > > Key: HIVE-21848 > URL: https://issues.apache.org/jira/browse/HIVE-21848 > Project: Hive > Issue Type: Task > Components: Metastore > Affects Versions: 3.0.0 > Reporter: Xinli Shang > Assignee: Xinli Shang > Priority: Major > Fix For: 3.0.0 > > > The goal of this Jira is to define a superset of unified table property names > that can be used for both Parquet and ORC column encryption. There is no code > change needed for this Jira. > *Background:* > ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To > configure the encryption, e.g. which column is sensitive, what master key to > be used, algorithm, etc, table properties can be used. It is important that > both Parquet and ORC can use unified names. > According to the slide > [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692], > ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in > the Parquet community, it is still discussing to provide several ways and > using table properties is one of the options, while there is no detailed > design of the table property names yet. > So it is a good time to discuss within two communities to have unified table > names as a superset. > *Proposal:* > There are several encryption properties that need to be specified for a > table. Here is the list. This is the superset of Parquet and ORC. Some of > them might not apply to both. > # PII columns including nest columns > # Column key metadata, master key metadata > # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. > ORC might support AES_CTR. > # Encryption footer - Parquet allow footer to be encrypted or plaintext > # Footer key metadata > Here is the table properties proposal. > |*Table Property Name*|*Value*|*Notes*| > |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.| > |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted > footer. By default, it is encrypted.| > |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to > the KMS to define what key metadata is. The metadata should have enough > information to figure out the corresponding key by the KMS. | > |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column > name for example, ‘address.zipcode’. > > It is up to the KMS to define what key metadata is. The metadata should have > enough information to figure out the corresponding key by the KMS.| > -- This message was sent by Atlassian JIRA (v7.6.3#76005)