Vikas Kumar created HADOOP-19138:
------------------------------------

             Summary: CSE-KMS S3A: Support for InstructionFile to store ECEK 
meta info
                 Key: HADOOP-19138
                 URL: https://issues.apache.org/jira/browse/HADOOP-19138
             Project: Hadoop Common
          Issue Type: New Feature
          Components: command, tools
            Reporter: Vikas Kumar


{*}Task{*}: Support for InstructionFile to store ECEK meta info 

*Current implementation/Context:*  

Hadoop-aws supports CSE-KMS. During CSE, key encryption info needs to be kept 
somewhere. AWS SDK supports two ways:
 # *S3 Object's metadata* : Current integration in haddop-aws only supports 
this approach.
 ## But S3 metadata has limitation of 2 KB size.
 ## Also, metadata can not be updated independently. It would be complete 
object read/write operation even if we only need to change the metadata.  
 # *Instruction file approach:* It's a small file containing meta-info in the 
same bucket at the same location. This approach needs one extra trip to S3 
Read/Write operation but could be useful if business needs frequent metadata 
changes.

*Use case:* to implement KMS RE-ENCRYPT, where only CEK(DEK) needs to be 
encrypted with new key material. Here instruction file approach could be useful.

Plus there could be many other use cases based on different business needs.

*My analysis:* I tried to enable this by setting 
*CryptoStorageMode.InstructionFile* in 

CryptoConfigurationV2 while building AmazonS3EncryptionClientV2Builder. 

Note: ObjectMetadata is the default value.

{*}Result{*}: Write operation worked but read failed due to missing instruction 
file.

*RCA:* On debugging, I found following:

On put request, say myfile.txt : 
 * First , S3AFileSystem writes the file to S3 like *myfile.txt_COPYING_*
 * Second, it writes the corresponding instruction file as  
*myfile.txt_COPYING_.instruction*
 * Third, it calls rename.
 ** Rename here means copy the file bytes to *myfile.txt and*
 ** *delete the* *myfile.txt_COPYING*
 * Here problem occurs, 
 ** AmazonS3EncryptionClientV2 class, after deleting any file it looks for 
corresponding instruction file and if found it deletes that one also. As a 
result, it deletes *myfile.txt_COPYING_.instruction* as well.

Related  Code:

com.amazonaws.services.s3.AmazonS3EncryptionClientV2.deleteObject() // part of 
aws sdk bundle

*Possible solution:* S3AFileSystem (part of hadoop-aws) needs to be updated to 
first rename the instruction file , then the original file. This way deletion 
of instruction file can be avoided.

It also requires config changes to take Objemetadata/InstructionFile as config 
parameter.

Let's discuss if we have any better solution and can be incorporated.

Once we agree on one common solution, I can work on implementation part.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to