Vikas Kumar created HADOOP-19138: ------------------------------------ Summary: CSE-KMS S3A: Support for InstructionFile to store ECEK meta info Key: HADOOP-19138 URL: https://issues.apache.org/jira/browse/HADOOP-19138 Project: Hadoop Common Issue Type: New Feature Components: command, tools Reporter: Vikas Kumar
{*}Task{*}: Support for InstructionFile to store ECEK meta info *Current implementation/Context:* Hadoop-aws supports CSE-KMS. During CSE, key encryption info needs to be kept somewhere. AWS SDK supports two ways: # *S3 Object's metadata* : Current integration in haddop-aws only supports this approach. ## But S3 metadata has limitation of 2 KB size. ## Also, metadata can not be updated independently. It would be complete object read/write operation even if we only need to change the metadata. # *Instruction file approach:* It's a small file containing meta-info in the same bucket at the same location. This approach needs one extra trip to S3 Read/Write operation but could be useful if business needs frequent metadata changes. *Use case:* to implement KMS RE-ENCRYPT, where only CEK(DEK) needs to be encrypted with new key material. Here instruction file approach could be useful. Plus there could be many other use cases based on different business needs. *My analysis:* I tried to enable this by setting *CryptoStorageMode.InstructionFile* in CryptoConfigurationV2 while building AmazonS3EncryptionClientV2Builder. Note: ObjectMetadata is the default value. {*}Result{*}: Write operation worked but read failed due to missing instruction file. *RCA:* On debugging, I found following: On put request, say myfile.txt : * First , S3AFileSystem writes the file to S3 like *myfile.txt_COPYING_* * Second, it writes the corresponding instruction file as *myfile.txt_COPYING_.instruction* * Third, it calls rename. ** Rename here means copy the file bytes to *myfile.txt and* ** *delete the* *myfile.txt_COPYING* * Here problem occurs, ** AmazonS3EncryptionClientV2 class, after deleting any file it looks for corresponding instruction file and if found it deletes that one also. As a result, it deletes *myfile.txt_COPYING_.instruction* as well. Related Code: com.amazonaws.services.s3.AmazonS3EncryptionClientV2.deleteObject() // part of aws sdk bundle *Possible solution:* S3AFileSystem (part of hadoop-aws) needs to be updated to first rename the instruction file , then the original file. This way deletion of instruction file can be avoided. It also requires config changes to take Objemetadata/InstructionFile as config parameter. Let's discuss if we have any better solution and can be incorporated. Once we agree on one common solution, I can work on implementation part. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org