[ https://issues.apache.org/jira/browse/HIVE-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867767#comment-16867767 ]
Xinli Shang edited comment on HIVE-21848 at 6/19/19 3:45 PM: ------------------------------------------------------------- Thanks, Owen! I Just have slight different thinking for "*encrypt.with.pii*" = "*col1,col2"*. In case of that a company needs "*encrypt.with.abc*" = "*col3,col4*" and '*abc*' is not predefined in Hive/ORC/Parquet, does it mean they need to change the code of Hive/ORC/Parquet? This is real usage in production. was (Author: sha...@uber.com): Thanks Owen! I Just have a slight different thinking for "*encrypt.with.pii*" = "*col1,col2"*. In case of that a company needs "*encrypt.with.abc*" = "*col3,col4*" and '*abc*' is not predefined in Hive/ORC/Parquet, does it mean they need to change code?This is realy usage in production. > Table property name definition between ORC and Parquet encrytion > ---------------------------------------------------------------- > > Key: HIVE-21848 > URL: https://issues.apache.org/jira/browse/HIVE-21848 > Project: Hive > Issue Type: Task > Components: Metastore > Affects Versions: 3.0.0 > Reporter: Xinli Shang > Assignee: Xinli Shang > Priority: Major > Fix For: 3.0.0 > > > The goal of this Jira is to define a superset of unified table property names > that can be used for both Parquet and ORC column encryption. There is no code > change needed for this Jira. > *Background:* > ORC-14 and Parquet-1178 introduced column encryption to ORC and Parquet. To > configure the encryption, e.g. which column is sensitive, what master key to > be used, algorithm, etc, table properties can be used. It is important that > both Parquet and ORC can use unified names. > According to the slide > [https://www.slideshare.net/oom65/fine-grain-access-control-for-big-data-orc-column-encryption-137308692], > ORC use table properties like orc.encrypt.pii, orc.encrypt.credit. While in > the Parquet community, it is still discussing to provide several ways and > using table properties is one of the options, while there is no detailed > design of the table property names yet. > So it is a good time to discuss within two communities to have unified table > names as a superset. > *Proposal:* > There are several encryption properties that need to be specified for a > table. Here is the list. This is the superset of Parquet and ORC. Some of > them might not apply to both. > # PII columns including nest columns > # Column key metadata, master key metadata > # Encryption algorithm, for example, Parquet support AES_GCM and AES_CTR. > ORC might support AES_CTR. > # Encryption footer - Parquet allow footer to be encrypted or plaintext > # Footer key metadata > Here is the table properties proposal. > |*Table Property Name*|*Value*|*Notes*| > |encrypt_algorithm|aes_ctr, aes_gcm|The algorithm to be used for encryption.| > |encrypt_footer_plaintext|true, false|Parquet support plaintext and encrypted > footer. By default, it is encrypted.| > |encrypt_footer_key_metadata|base64 string of footer key metadata|It is up to > the KMS to define what key metadata is. The metadata should have enough > information to figure out the corresponding key by the KMS. | > |encrypt_col_xxx|base64 string of column key metadata|‘xxx’ is the column > name for example, ‘address.zipcode’. > > It is up to the KMS to define what key metadata is. The metadata should have > enough information to figure out the corresponding key by the KMS.| > -- This message was sent by Atlassian JIRA (v7.6.3#76005)