Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20447 )

Change subject: IMPALA-13039: AES Encryption/ Decryption Support in Impala
......................................................................


Patch Set 9:

(89 comments)

Thanks Pranav for the new patch set.

http://gerrit.cloudera.org:8080/#/c/20447/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20447/9//COMMIT_MSG@19
PS9, Line 19: 1. AES-GCM, AES-ECB, AES-CTR, and AES-CFB encryption and 
decryption functionalities.
Many lines are longer than 72 chars.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@182
PS3, Line 182: "mul
> Done
Not done.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@551
PS3, Line 551: ic_a
> Done
Not done.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@854
PS3, Line 854: n UTF8 mode, po
> Done
Not done.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1076
PS3, Line 1076:
> Done
Not done.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1112
PS3, Line 1112: _ptr<re
> Done
Not done.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1344
PS3, Line 1344:
> Done
Not done.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1624
PS3, Line 1624:
> Done
Not done.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1625
PS3, Line 1625: le m = static_cast<double
> Done
Not done.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@21
PS9, Line 21: #include <cstring>
What is the cstring header used for?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@23
PS9, Line 23: #include <openssl/bio.h>
            : #include <openssl/evp.h>
Sorry, these should be in their own include group because they are not std lib 
includes.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@26
PS9, Line 26: #include <string>
What is the string header used for?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1810
PS9, Line 1810: expression
Why do you call it "expression"? I think "input" was better.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1812
PS9, Line 1812: If a particular mode passed by user is not supported then by 
default
              : // the default mode is chosen
I find it a bit misleading, shouldn't we return an error in this case? Now the 
user could think that encrypting with some mode works but we're using another 
mode behind the scenes. It may also cause errors if the user wants to decrypt 
the data with another engine that actually supports the chosen mode.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1846
PS9, Line 1846:   if (expr.is_null) {
We should check if 'key' or 'iv' (in case of GCM) is null. Shouldn't that be an 
error? In these cases I don't think returning NULL is good because it may lead 
to some kind of information loss (from non-NULL plaintext to NULL ciphertext, 
no way to get it back).


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1857
PS9, Line 1857:   if (iv.len > AES_BLOCK_SIZE) {
Should we care about the IV when using e.g. ECB which doesn't use an IV?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1863
PS9, Line 1863: t
Could use a more descriptive variable name, e.g. 'encryption' ('key' is already 
used for the key string).


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1868
PS9, Line 1868:   else
If there is an ELSE branch, always use braces ('{}') around the blocks.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1872
PS9, Line 1872: iv.ptr, iv.len
IV could be NULL.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1876
PS9, Line 1876: t->StringToMode(reinterpret_cast<const char*>(mode.ptr), 
mode.len)
This could be extracted to a variable before L1869 so it can be used there too.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1881
PS9, Line 1881: is multiple
Nit: "is a multiple".


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1886
PS9, Line 1886:   uint8_t out[expected_out_len];
Variable-length arrays are not part of the C++ standard (though GCC supports 
them as an extension). In this case we also don't want to store the output on 
the stack as it could be huge. Instead, create the 'result' StringVal variable 
here instead of on L1904, and use this constructor:
  StringVal(FunctionContext* context, int len);
This way the resulting buffer will be allocated and you can pass this to 
EncryptionKey::Encrypt, no need to copy it afterwards.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1889
PS9, Line 1889:   int len = expr.len;
No need to extract this into a variable.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1891
PS9, Line 1891: (uint8_t*)
No need for the cast as uint8_t and char are the same type (though technically 
not guaranteed to be the same). But even if you'd like to keep it, using 
C-style casts is less readable because it gives the reader no information about 
why the cast is necessary, i.e. reinterpret_cast vs const_cast etc.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions-ir.cc@1914
PS9, Line 1914: aes_decrypt
The comments on aes_encrypt() also apply here.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h
File be/src/exprs/string-functions.h:

http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h@28
PS9, Line 28: #include <cstring>
Unneeded import.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h@30
PS9, Line 30: #include <openssl/bio.h>
Unneeded import.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h@31
PS9, Line 31: #include <openssl/evp.h>
Unneeded import.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h@33
PS9, Line 33: #include <string>
Unneeded import.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/exprs/string-functions.h@52
PS9, Line 52: enum class AES_CIPHER_MODE;
Unused forward declaration.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util-test.cc
File be/src/util/openssl-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util-test.cc@68
PS9, Line 68: impala
We're in the impala namespace, no need to specify it. Applies to the other 
elements and L98 and L149 too.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h
File be/src/util/openssl-util.h:

http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@25
PS9, Line 25: #include "exprs/string-functions.h"
Unneeded include.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@72
PS9, Line 72: currently supported
Are all of these supported? The comment of class EncryptionKey only lists GCM, 
CTR, CFB, and ECB.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@73
PS9, Line 73: enum class AES_CIPHER_MODE {
Why are we removing AES_192_ECB?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@80
PS9, Line 80: def
DEFAULT or AES_DEFAULT would be better. On the other hand I don't think we 
should add a default here at all. The default value will be one of the above 
and can be obtained by calling GetSupportedDefaultMode().


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@127
PS9, Line 127: This configuration yields a stream cipher
ECB is not a stream cipher mode, this is misleading now.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@138
PS9, Line 138: confidentiality
As far as I know the AAD is not confidential, and the previous text didn't say 
that GCM protects its confidentiality. What is the reason for re-writing this 
part of the comment, down from "Notes for GCM"?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@145
PS9, Line 145:   /// Initializes a key for temporary use with randomly 
generated data and clears the
I don't see the need for rewriting this paragraph.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@153
PS9, Line 153: The
Why change "This key" to "The key"? "key" here refers to the EncryptionKey 
object.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@153
PS9, Line 153: Exactly 'len' bytes will be written to 'out'
This is no longer true for ECB because an extra block can be added.
This should be mentioned in this comment. We should also include that the 'out' 
buffer should be big enough to hold the extra block too.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@157
PS9, Line 157: NULL
Use nullptr instead.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@157
PS9, Line 157: data_read
Isn't this the output length? It should be named accordingly and also mentioned 
in the comment. Remember to rename the parameter in the .cc file as well.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@160
PS9, Line 160: Exactly 'len' bytes will be written to 'out'
See L153.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@161
PS9, Line 161: the
See L153 (this vs. the).


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@164
PS9, Line 164: NULL
Use nullptr instead.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@164
PS9, Line 164: data_read
See L157.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@166
PS9, Line 166: While currently only used for testing purposes
Is it still true after this patch?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@174
PS9, Line 174: impala
We're inside the impala namespace, so no need to specify it. Also applies to 
L175 and L178-179.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@174
PS9, Line 174: return
Start this on a new line. We only skip the newline after the opening "{" if the 
whole function fits on the same line.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@178
PS9, Line 178: return
Add a newline.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@181
PS9, Line 181: InitializeFields
Add function comment.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@197
PS9, Line 197: Exactly 'len' bytes will be written to 'out'.
See L153.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@201
PS9, Line 201: data_read
See L157.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@224
PS9, Line 224:   int key_length;
This should be right after L214.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@224
PS9, Line 224: h;
Should end with an underscore: 'key_length_'.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@225
PS9, Line 225: h;
Should end with an underscore: 'iv_length_'.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.h@225
PS9, Line 225:   int iv_length;
This should be right after L217.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc
File be/src/util/openssl-util.cc:

http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@77
PS9, Line 77:   // OpenSSL 1.1+ doesn't allow runtime detection of the 
supported TLS version. It
Why did you rewrite this? It just makes the patch bigger and increases the 
probability of conflicts.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@215
PS9, Line 215: If it is ECB mode then padding will be enabled
             :   // for this ctx, otherwise it will be disabled.
Instead of this comment, you could have a variable
bool padding_enabled = IsEcbMode();


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@221
PS9, Line 221: eliminating the requirement for multiples of 16 bytes
Isn't ECB padded to a full block?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@226
PS9, Line 226: NULL, NULL, NULL
Should use nullptr instead of NULL, also on the next line.
On the other hand, why do you initialise it here without a key and IV and then 
also on L239?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@233
PS9, Line 233: NULL
Now that we've touched the line, it should also be changed to nullptr.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@239
PS9, Line 239:   success = encrypt ? EVP_EncryptInit_ex(ctx.ctx, NULL, NULL, 
key_, iv_) :
See L226 about initialising it twice.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@245
PS9, Line 245:   // The OpenSSL encryption APIs use INT for buffer lengths. To 
support larger buffers,
No need to rewrite this.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@255
PS9, Line 255:       return OpenSSLErr(encrypt ? "EVP_EncryptUpdate" : 
"EVP_DecryptUpdate", err_context);
Why did you remove DCHECK_EQ(in_len, out_len)? If it is no longer the case that 
in_len == out_len, then we need to use different offsets for input and output 
on L252-253.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@257
PS9, Line 257: NULL
Use nullptr instead.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@274
PS9, Line 274: NULL
Use nullptr instead.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@277
PS9, Line 277: NULL
Use nullptr instead.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@280
PS9, Line 280:   // The error below needs to be uncommented whenever support is 
added to use
Is it sure that we can't handle it without gcm_tag? It's not good if we can't 
check whether this step succeeded, I guess errors not related to the tag could 
also occur which are not reported now.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@289
PS9, Line 289: NULL
Use nullptr instead.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@305
PS9, Line 305: impala
We're in the impala namespace, no need to add "impala::" here.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@311
PS9, Line 311: ecb
Why are we changing the default mode?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@316
PS9, Line 316:   if (m == impala::AES_CIPHER_MODE::def) {
As I commented at openssl-util.h:80, I don't think having a 'default' member in 
AES_CIPHER_MODE is a good choice. Here we could do the following: either the 
caller should call GetSupportedDefaultMode() and then call this function, or 
there could be a separate SetDefaultCipherMode() function.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@336
PS9, Line 336:   if (iv_length != 0)
Use braces around the IF and ELSE blocks.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@339
PS9, Line 339:     memset(iv_, 0, sizeof(iv_));
Why do we need to null out the 'iv_' array if 'iv_len' is zero while when 0 < 
iv_len < sizeof(iv_) we don't need to null out the remaining, unused bytes?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@344
PS9, Line 344: impala
No need to write "impala::". Applies to the other cases too.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@345
PS9, Line 345:       // Ensuring compatibility with GCM mode presents a 
challenge due to OpenSSL's
No need to rewrite this paragraph.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@361
PS9, Line 361:       // If TLS1.2 is supported, then it indicates a version of 
OpenSSL that
No need to rewrite this paragraph.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@365
PS9, Line 365:     case impala::AES_CIPHER_MODE::AES_256_CFB:
I don't think it's necessary to insert return statements after each case, we 
could continue to use switch fallthrough like we did before - we use it in many 
places in Impala.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@371
PS9, Line 371: AES_128_ECB
What happened to AES_192_ECB? Did we remove support for it?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@374
PS9, Line 374: AES_128_GCM
Doesn't AES_128_GCM have the same problem as AES_256_GCM, described on L345?


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@383
PS9, Line 383: impala
No need for "impala::".


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@384
PS9, Line 384:     return impala::AES_CIPHER_MODE::AES_256_GCM;
If the whole IF statement doesn't fit on one line, use braces around the IF 
body. Applies also to L386.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@392
PS9, Line 392: GCM
Shouldn't we include that it is the 256 bit version of GCM? We also have 
AES_128_GCM now. for ECB, the previous code included it. This applies to ECB in 
this version too.
This is especially true as the other direction, StringToMode(), does include 
the bit width.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@392
PS9, Line 392: impala
No need for "impala::".


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@403
PS9, Line 403: const
Const doesn't make much sense when returning a value type (as opposed to 
pointers and references), like enum values.


http://gerrit.cloudera.org:8080/#/c/20447/9/be/src/util/openssl-util.cc@404
PS9, Line 404:   if (memcmp("AES_256_GCM", str, len) == 0 )
When we have ELSE [IF] branches, we always use braces around the statement 
bodies.


http://gerrit.cloudera.org:8080/#/c/20447/9/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/20447/9/common/function-registry/impala_functions.py@1077
PS9, Line 1077:   [['to_utc_timestamp'], 'TIMESTAMP', ['TIMESTAMP', 'STRING', 
'BOOLEAN'],
Does this come from a rebase?


http://gerrit.cloudera.org:8080/#/c/20447/9/testdata/workloads/functional-query/queries/QueryTest/exprs.test
File testdata/workloads/functional-query/queries/QueryTest/exprs.test:

http://gerrit.cloudera.org:8080/#/c/20447/9/testdata/workloads/functional-query/queries/QueryTest/exprs.test@3291
PS9, Line 3291: # AES encryption/ decryption examples:
The commit message claims that also AES-CTR and AES-CFB are supported, is it 
true? If so, we should add tests for them.



--
To view, visit http://gerrit.cloudera.org:8080/20447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3902f2b1d95da4d06995cbd687e79c48e16190c9
Gerrit-Change-Number: 20447
Gerrit-PatchSet: 9
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Comment-Date: Wed, 12 Jun 2024 12:04:39 +0000
Gerrit-HasComments: Yes

Reply via email to