Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/20447 )
Change subject: IMPALA-12418: Include crypto functions supported by Hive ...................................................................... Patch Set 3: (43 comments) http://gerrit.cloudera.org:8080/#/c/20447/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20447/3//COMMIT_MSG@9 PS3, Line 9: AES (Advanced Encryption Standard) is a widely recognized This paragraph is a bit like an advert for AES :) Instead, we should write what new Impala functions are added by this commit, how they can be used etc, but of course a short description of AES should also be included. http://gerrit.cloudera.org:8080/#/c/20447/3//COMMIT_MSG@19 PS3, Line 19: ECB Could describe in short what ECB is. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@40 PS3, Line 40: #include "util/openssl-util.h" : #include "udf/udf.h" The includes should be in alphabetical order. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@43 PS3, Line 43: #include <string> : #include <openssl/bio.h> : #include <openssl/evp.h> : #include <cstring> Put these together with the other std library includes (L20), in alphabetical order. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@182 PS3, Line 182: Unnecessary spaces. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@551 PS3, Line 551: Unnecessary spaces. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@854 PS3, Line 854: Unnecessary spaces. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1076 PS3, Line 1076: Unnecessary spaces. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1112 PS3, Line 1112: Unnecessary spaces. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1344 PS3, Line 1344: Unnecessary spaces. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1624 PS3, Line 1624: Unnecessary spaces. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1625 PS3, Line 1625: Unnecessary spaces. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1806 PS3, Line 1806: unsigned char* Could be const. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1806 PS3, Line 1806: ctx1 Why call it 'ctx1'? It could simply be 'ctx'. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1806 PS3, Line 1806: encrypt_ecb Add a doc comment describing what this function does, especially what 'ecb' is, and what the function returns. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1807 PS3, Line 1807: , Nit: add space after the comma. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1807 PS3, Line 1807: unsigned char* Could be const. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1808 PS3, Line 1808: int cipher_len = 0; : int len = 0; No need to declare these here. 'len' could be done just before L1823, and 'cipher_len' could be defined on L1828 as "int cipher_len = len;" http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1811 PS3, Line 1811: ctx This could be 'cipher_ctx'. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1811 PS3, Line 1811: EVP_CIPHER_CTX_new We should check whether we can reuse the EVP_CIPHER_CTX between calls to the UDF, in the FunctionContext. I haven't checked OpenSSL yet, I don't know if it supports multiple initialisations. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1814 PS3, Line 1814: \n Do we need the newline at the end of error messages? Applies to all other error messages too. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1818 PS3, Line 1818: NULL Use 'nullptr' instead of 'NULL'. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1818 PS3, Line 1818: NULL Use 'nullptr' instead of 'NULL'. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1836 PS3, Line 1836: EVP_CIPHER_CTX_free(ctx); We have some early returns in the function, in which case this "free" function is not called. If the EVP_CIPHER_CTX cannot be reused (see L1811), we should either call this free function in before each return statement OR wrap 'cipher_ctx' in an object that calls this function in its destructor. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1841 PS3, Line 1841: unsigned char* Could be const. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1841 PS3, Line 1841: ctx1 See L1806. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1842 PS3, Line 1842: * We usually put the '*' on the type, not the variable name. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1842 PS3, Line 1842: unsigned char* Could be const. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1843 PS3, Line 1843: int cipher_len = 0; : int len = 0; See L1808. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1853 PS3, Line 1853: NULL Use 'nullptr' instead of 'NULL'. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1853 PS3, Line 1853: NULL Use 'nullptr' instead of 'NULL'. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1871 PS3, Line 1871: EVP_CIPHER_CTX_free(ctx); See L1836. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1884 PS3, Line 1884: key_len I don't think extracting 'key.len' into a variable is useful. Could simply use 'key.len'. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1885 PS3, Line 1885: input_len See L1884 about 'key_len'. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1886 PS3, Line 1886: % Surround '%' with spaces. A comment should explain why we have this length. If you'd like to round it up to the next 16 bytes, this won't work well because if 'input_len' is already a multiple of 16, it will add 16 extra bytes. Or is it intentional? http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1887 PS3, Line 1887: input_vec No need for this vector, could simply use the memory of 'input', i.e. "input.ptr" http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1889 PS3, Line 1889: key_vec No need for this vector, could simply use the memory of 'key', i.e. "key.ptr" http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1904 PS3, Line 1904: NULL Use 'nullptr' instead of 'NULL'. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1912 PS3, Line 1912: ret_size != output_len We should add a DCHECK for the specific case where "ret_size > output_len" because it is a buffer overflow, which is undefined behaviour. The DCHECK should be in addition to the normal check. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1916 PS3, Line 1916: StringVal result = StringVal::CopyFrom(ctx, output_vec, output_len); You could allocate the result directly as a StringVal with a length of 'output_len' and use its buffer in the call to encrypt_ecb(). It would save us a copy. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1920 PS3, Line 1920: StringVal StringFunctions::aes_decrypt(FunctionContext* ctx, See comments for aes_encrypt() above, they apply also here. http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1954 PS3, Line 1954: StringVal result = StringVal::CopyFrom(ctx, output_vec, ret_size); In the case of decryption, this copy may be worth keeping because the result may be shorter than 'output_len'. http://gerrit.cloudera.org:8080/#/c/20447/3/bin/cmake_aux/create_py3_virtualenv.sh File bin/cmake_aux/create_py3_virtualenv.sh: http://gerrit.cloudera.org:8080/#/c/20447/3/bin/cmake_aux/create_py3_virtualenv.sh@48 PS3, Line 48: # Remove the directory that Python3 venv created, so impala-virtualenv can start Did this come from a rebase? -- To view, visit http://gerrit.cloudera.org:8080/20447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3902f2b1d95da4d06995cbd687e79c48e16190c9 Gerrit-Change-Number: 20447 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Comment-Date: Tue, 14 Nov 2023 10:05:26 +0000 Gerrit-HasComments: Yes
