Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/23730 )
Change subject: IMPALA-14566: Add euclidean_distance and cosine_similarity functions for ARRAY<FLOAT> ...................................................................... Patch Set 10: (10 comments) Thanks for working on this! http://gerrit.cloudera.org:8080/#/c/23730/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23730/10//COMMIT_MSG@9 PS10, Line 9: ations: nit: commit message body should be no longer than 72 chars http://gerrit.cloudera.org:8080/#/c/23730/10/be/src/exprs/vector-functions-ir.cc File be/src/exprs/vector-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/23730/10/be/src/exprs/vector-functions-ir.cc@49 PS10, Line 49: tuple_size Is tuple_size equals TUPLE_SIZE_FOR_FLOAT_ARRAY? In that case we could just remove the parameter, and use TUPLE_SIZE_FOR_FLOAT_ARRAY. Same goes for SLOT_OFFSET. http://gerrit.cloudera.org:8080/#/c/23730/10/be/src/exprs/vector-functions-ir.cc@51 PS10, Line 51: // This function should only be called after checking for NULL elements. : // If an element is NULL, the calling function should return NULL instead. : DCHECK(!IsArrayElementNull(array_ptr, index, tuple_size, NULL_INDICATOR_OFFSET)); nit: You could move this to the beginning. http://gerrit.cloudera.org:8080/#/c/23730/10/be/src/exprs/vector-functions-ir.cc@61 PS10, Line 61: int null_indicator_offset Is it possible to call it with any other value than NULL_INDICATOR_OFFSET? If no, we could remove this argument. http://gerrit.cloudera.org:8080/#/c/23730/10/be/src/exprs/vector-functions-ir.cc@64 PS10, Line 64: // if Bit 0 = 1 then element is NULL : //If Bit 0 = 0 then the element is NOT NULL nit: could be just: // Bit 0 is the null-indicator bit http://gerrit.cloudera.org:8080/#/c/23730/10/be/src/exprs/vector-functions-ir.cc@72 PS10, Line 72: if (vec1.is_null || vec2.is_null) { : return false; : } nit: fits single line Same goes for the following 2 ifs as well http://gerrit.cloudera.org:8080/#/c/23730/10/be/src/exprs/vector-functions-ir.cc@89 PS10, Line 89: s nit: missing space http://gerrit.cloudera.org:8080/#/c/23730/10/be/src/exprs/vector-functions.h File be/src/exprs/vector-functions.h: http://gerrit.cloudera.org:8080/#/c/23730/10/be/src/exprs/vector-functions.h@32 PS10, Line 32: // nit: should be /// http://gerrit.cloudera.org:8080/#/c/23730/10/be/src/exprs/vector-functions.h@53 PS10, Line 53: 0.0 if the element is NULL I don't think it's guarenteed that 0.0 is returned in such case. Anyway, it should never happen as GetFloatFromArray() is only called if none of the elements are null. http://gerrit.cloudera.org:8080/#/c/23730/10/testdata/workloads/functional-query/queries/QueryTest/vector-distance-functions.test File testdata/workloads/functional-query/queries/QueryTest/vector-distance-functions.test: http://gerrit.cloudera.org:8080/#/c/23730/10/testdata/workloads/functional-query/queries/QueryTest/vector-distance-functions.test@228 PS10, Line 228: ==== You should add negative tests for the case when euclidean_distance/cosine_similarity is invoked on different types such as scalar values, complex types other than ARRAY<FLOAT>. In such cases we just expect a useful error message. -- To view, visit http://gerrit.cloudera.org:8080/23730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id305acc87530d5d0e53613fe8df9a631ea4e1080 Gerrit-Change-Number: 23730 Gerrit-PatchSet: 10 Gerrit-Owner: Raghav Jindal <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Pranav Lodha <[email protected]> Gerrit-Reviewer: Raghav Jindal <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 08 Jan 2026 16:49:31 +0000 Gerrit-HasComments: Yes
