Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20447 )
Change subject: IMPALA-13039: AES Encryption/ Decryption Support in Impala ...................................................................... Patch Set 9: (89 comments) Thanks Pranav for the new patch set. http://gerrit.cloudera.org:8080/#/c/20447/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20447/9//COMMIT_MSG@19 PS9, Line 19: 1. AES-GCM, AES-ECB, AES-CTR, and AES-CFB encryption and decryption functionalities. Many lines are longer than 72 chars. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@182 PS3, Line 182: "mul > Done Not done. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@551 PS3, Line 551: ic_a > Done Not done. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@854 PS3, Line 854: n UTF8 mode, po > Done Not done. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1076 PS3, Line 1076: > Done Not done. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1112 PS3, Line 1112: _ptr<re > Done Not done. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1344 PS3, Line 1344: > Done Not done. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1624 PS3, Line 1624: > Done Not done. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1625 PS3, Line 1625: le m = static_cast<double > Done Not done. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@21 PS9, Line 21: #include <cstring> What is the cstring header used for? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@23 PS9, Line 23: #include <openssl/bio.h> : #include <openssl/evp.h> Sorry, these should be in their own include group because they are not std lib includes. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@26 PS9, Line 26: #include <string> What is the string header used for? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1810 PS9, Line 1810: expression Why do you call it "expression"? I think "input" was better. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1812 PS9, Line 1812: If a particular mode passed by user is not supported then by default : // the default mode is chosen I find it a bit misleading, shouldn't we return an error in this case? Now the user could think that encrypting with some mode works but we're using another mode behind the scenes. It may also cause errors if the user wants to decrypt the data with another engine that actually supports the chosen mode. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1846 PS9, Line 1846: if (expr.is_null) { We should check if 'key' or 'iv' (in case of GCM) is null. Shouldn't that be an error? In these cases I don't think returning NULL is good because it may lead to some kind of information loss (from non-NULL plaintext to NULL ciphertext, no way to get it back). http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1857 PS9, Line 1857: if (iv.len > AES_BLOCK_SIZE) { Should we care about the IV when using e.g. ECB which doesn't use an IV? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1863 PS9, Line 1863: t Could use a more descriptive variable name, e.g. 'encryption' ('key' is already used for the key string). http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1868 PS9, Line 1868: else If there is an ELSE branch, always use braces ('{}') around the blocks. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1872 PS9, Line 1872: iv.ptr, iv.len IV could be NULL. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1876 PS9, Line 1876: t->StringToMode(reinterpret_cast<const char*>(mode.ptr), mode.len) This could be extracted to a variable before L1869 so it can be used there too. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1881 PS9, Line 1881: is multiple Nit: "is a multiple". http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1886 PS9, Line 1886: uint8_t out[expected_out_len]; Variable-length arrays are not part of the C++ standard (though GCC supports them as an extension). In this case we also don't want to store the output on the stack as it could be huge. Instead, create the 'result' StringVal variable here instead of on L1904, and use this constructor: StringVal(FunctionContext* context, int len); This way the resulting buffer will be allocated and you can pass this to EncryptionKey::Encrypt, no need to copy it afterwards. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1889 PS9, Line 1889: int len = expr.len; No need to extract this into a variable. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1891 PS9, Line 1891: (uint8_t*) No need for the cast as uint8_t and char are the same type (though technically not guaranteed to be the same). But even if you'd like to keep it, using C-style casts is less readable because it gives the reader no information about why the cast is necessary, i.e. reinterpret_cast vs const_cast etc. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1914 PS9, Line 1914: aes_decrypt The comments on aes_encrypt() also apply here. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h File be/src/exprs/string-functions.h: http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h@28 PS9, Line 28: #include <cstring> Unneeded import. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h@30 PS9, Line 30: #include <openssl/bio.h> Unneeded import. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h@31 PS9, Line 31: #include <openssl/evp.h> Unneeded import. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h@33 PS9, Line 33: #include <string> Unneeded import. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h@52 PS9, Line 52: enum class AES_CIPHER_MODE; Unused forward declaration. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util-test.cc File be/src/util/openssl-util-test.cc: http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util-test.cc@68 PS9, Line 68: impala We're in the impala namespace, no need to specify it. Applies to the other elements and L98 and L149 too. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h File be/src/util/openssl-util.h: http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@25 PS9, Line 25: #include "exprs/string-functions.h" Unneeded include. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@72 PS9, Line 72: currently supported Are all of these supported? The comment of class EncryptionKey only lists GCM, CTR, CFB, and ECB. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@73 PS9, Line 73: enum class AES_CIPHER_MODE { Why are we removing AES_192_ECB? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@80 PS9, Line 80: def DEFAULT or AES_DEFAULT would be better. On the other hand I don't think we should add a default here at all. The default value will be one of the above and can be obtained by calling GetSupportedDefaultMode(). http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@127 PS9, Line 127: This configuration yields a stream cipher ECB is not a stream cipher mode, this is misleading now. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@138 PS9, Line 138: confidentiality As far as I know the AAD is not confidential, and the previous text didn't say that GCM protects its confidentiality. What is the reason for re-writing this part of the comment, down from "Notes for GCM"? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@145 PS9, Line 145: /// Initializes a key for temporary use with randomly generated data and clears the I don't see the need for rewriting this paragraph. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@153 PS9, Line 153: The Why change "This key" to "The key"? "key" here refers to the EncryptionKey object. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@153 PS9, Line 153: Exactly 'len' bytes will be written to 'out' This is no longer true for ECB because an extra block can be added. This should be mentioned in this comment. We should also include that the 'out' buffer should be big enough to hold the extra block too. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@157 PS9, Line 157: NULL Use nullptr instead. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@157 PS9, Line 157: data_read Isn't this the output length? It should be named accordingly and also mentioned in the comment. Remember to rename the parameter in the .cc file as well. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@160 PS9, Line 160: Exactly 'len' bytes will be written to 'out' See L153. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@161 PS9, Line 161: the See L153 (this vs. the). http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@164 PS9, Line 164: NULL Use nullptr instead. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@164 PS9, Line 164: data_read See L157. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@166 PS9, Line 166: While currently only used for testing purposes Is it still true after this patch? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@174 PS9, Line 174: impala We're inside the impala namespace, so no need to specify it. Also applies to L175 and L178-179. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@174 PS9, Line 174: return Start this on a new line. We only skip the newline after the opening "{" if the whole function fits on the same line. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@178 PS9, Line 178: return Add a newline. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@181 PS9, Line 181: InitializeFields Add function comment. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@197 PS9, Line 197: Exactly 'len' bytes will be written to 'out'. See L153. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@201 PS9, Line 201: data_read See L157. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@224 PS9, Line 224: int key_length; This should be right after L214. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@224 PS9, Line 224: h; Should end with an underscore: 'key_length_'. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@225 PS9, Line 225: h; Should end with an underscore: 'iv_length_'. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@225 PS9, Line 225: int iv_length; This should be right after L217. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc File be/src/util/openssl-util.cc: http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@77 PS9, Line 77: // OpenSSL 1.1+ doesn't allow runtime detection of the supported TLS version. It Why did you rewrite this? It just makes the patch bigger and increases the probability of conflicts. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@215 PS9, Line 215: If it is ECB mode then padding will be enabled : // for this ctx, otherwise it will be disabled. Instead of this comment, you could have a variable bool padding_enabled = IsEcbMode(); http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@221 PS9, Line 221: eliminating the requirement for multiples of 16 bytes Isn't ECB padded to a full block? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@226 PS9, Line 226: NULL, NULL, NULL Should use nullptr instead of NULL, also on the next line. On the other hand, why do you initialise it here without a key and IV and then also on L239? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@233 PS9, Line 233: NULL Now that we've touched the line, it should also be changed to nullptr. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@239 PS9, Line 239: success = encrypt ? EVP_EncryptInit_ex(ctx.ctx, NULL, NULL, key_, iv_) : See L226 about initialising it twice. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@245 PS9, Line 245: // The OpenSSL encryption APIs use INT for buffer lengths. To support larger buffers, No need to rewrite this. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@255 PS9, Line 255: return OpenSSLErr(encrypt ? "EVP_EncryptUpdate" : "EVP_DecryptUpdate", err_context); Why did you remove DCHECK_EQ(in_len, out_len)? If it is no longer the case that in_len == out_len, then we need to use different offsets for input and output on L252-253. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@257 PS9, Line 257: NULL Use nullptr instead. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@274 PS9, Line 274: NULL Use nullptr instead. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@277 PS9, Line 277: NULL Use nullptr instead. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@280 PS9, Line 280: // The error below needs to be uncommented whenever support is added to use Is it sure that we can't handle it without gcm_tag? It's not good if we can't check whether this step succeeded, I guess errors not related to the tag could also occur which are not reported now. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@289 PS9, Line 289: NULL Use nullptr instead. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@305 PS9, Line 305: impala We're in the impala namespace, no need to add "impala::" here. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@311 PS9, Line 311: ecb Why are we changing the default mode? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@316 PS9, Line 316: if (m == impala::AES_CIPHER_MODE::def) { As I commented at openssl-util.h:80, I don't think having a 'default' member in AES_CIPHER_MODE is a good choice. Here we could do the following: either the caller should call GetSupportedDefaultMode() and then call this function, or there could be a separate SetDefaultCipherMode() function. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@336 PS9, Line 336: if (iv_length != 0) Use braces around the IF and ELSE blocks. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@339 PS9, Line 339: memset(iv_, 0, sizeof(iv_)); Why do we need to null out the 'iv_' array if 'iv_len' is zero while when 0 < iv_len < sizeof(iv_) we don't need to null out the remaining, unused bytes? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@344 PS9, Line 344: impala No need to write "impala::". Applies to the other cases too. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@345 PS9, Line 345: // Ensuring compatibility with GCM mode presents a challenge due to OpenSSL's No need to rewrite this paragraph. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@361 PS9, Line 361: // If TLS1.2 is supported, then it indicates a version of OpenSSL that No need to rewrite this paragraph. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@365 PS9, Line 365: case impala::AES_CIPHER_MODE::AES_256_CFB: I don't think it's necessary to insert return statements after each case, we could continue to use switch fallthrough like we did before - we use it in many places in Impala. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@371 PS9, Line 371: AES_128_ECB What happened to AES_192_ECB? Did we remove support for it? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@374 PS9, Line 374: AES_128_GCM Doesn't AES_128_GCM have the same problem as AES_256_GCM, described on L345? http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@383 PS9, Line 383: impala No need for "impala::". http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@384 PS9, Line 384: return impala::AES_CIPHER_MODE::AES_256_GCM; If the whole IF statement doesn't fit on one line, use braces around the IF body. Applies also to L386. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@392 PS9, Line 392: GCM Shouldn't we include that it is the 256 bit version of GCM? We also have AES_128_GCM now. for ECB, the previous code included it. This applies to ECB in this version too. This is especially true as the other direction, StringToMode(), does include the bit width. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@392 PS9, Line 392: impala No need for "impala::". http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@403 PS9, Line 403: const Const doesn't make much sense when returning a value type (as opposed to pointers and references), like enum values. http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@404 PS9, Line 404: if (memcmp("AES_256_GCM", str, len) == 0 ) When we have ELSE [IF] branches, we always use braces around the statement bodies. http://gerrit.cloudera.org:8080/#/c/20447/9/common/function-registry/impala_functions.py File common/function-registry/impala_functions.py: http://gerrit.cloudera.org:8080/#/c/20447/9/common/function-registry/impala_functions.py@1077 PS9, Line 1077: [['to_utc_timestamp'], 'TIMESTAMP', ['TIMESTAMP', 'STRING', 'BOOLEAN'], Does this come from a rebase? http://gerrit.cloudera.org:8080/#/c/20447/9/testdata/workloads/functional-query/queries/QueryTest/exprs.test File testdata/workloads/functional-query/queries/QueryTest/exprs.test: http://gerrit.cloudera.org:8080/#/c/20447/9/testdata/workloads/functional-query/queries/QueryTest/exprs.test@3291 PS9, Line 3291: # AES encryption/ decryption examples: The commit message claims that also AES-CTR and AES-CFB are supported, is it true? If so, we should add tests for them. -- To view, visit http://gerrit.cloudera.org:8080/20447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3902f2b1d95da4d06995cbd687e79c48e16190c9 Gerrit-Change-Number: 20447 Gerrit-PatchSet: 9 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Comment-Date: Wed, 12 Jun 2024 12:04:39 +0000 Gerrit-HasComments: Yes
