Yida Wu has posted comments on this change. ( http://gerrit.cloudera.org:8080/23730 )
Change subject: IMPALA-14566: Add euclidean_distance and cosine_similarity functions for ARRAY<FLOAT> ...................................................................... Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/23730/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23730/5//COMMIT_MSG@16 PS5, Line 16: I think we can add some tests. One place maybe be/src/exprs/expr-test.cc http://gerrit.cloudera.org:8080/#/c/23730/5/be/src/exprs/vector-functions-ir.cc File be/src/exprs/vector-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/23730/5/be/src/exprs/vector-functions-ir.cc@35 PS5, Line 35: Followed nit. followed http://gerrit.cloudera.org:8080/#/c/23730/5/be/src/exprs/vector-functions-ir.cc@116 PS5, Line 116: // Handle NULL inputs : if (vec1.is_null || vec2.is_null) { : return DoubleVal::null(); : } : : // Handle empty arrays : if (vec1.num_tuples == 0 || vec2.num_tuples == 0) { : ctx->SetError("Cosine similarity requires non-empty vectors"); : return DoubleVal::null(); : } : : // Validate that vectors have the same length : if (vec1.num_tuples != vec2.num_tuples) { : ctx->SetError( : "Vectors must have the same length for similarity calculation"); : return DoubleVal::null(); : } It seems this logic is quite similar to the checks in EuclideanDistance(). Maybe we can use a helper function for these checks to reduce duplication? http://gerrit.cloudera.org:8080/#/c/23730/5/be/src/exprs/vector-functions-ir.cc@169 PS5, Line 169: void VectorFunctions::VectorDistancePrepare(FunctionContext* ctx, : FunctionContext::FunctionStateScope scope) { : } : : void VectorFunctions::VectorDistanceClose(FunctionContext* ctx, : FunctionContext::FunctionStateScope scope) { : } Maybe we can remove them if these are not currently being used to avoid dead code http://gerrit.cloudera.org:8080/#/c/23730/5/be/src/exprs/vector-functions.h File be/src/exprs/vector-functions.h: http://gerrit.cloudera.org:8080/#/c/23730/5/be/src/exprs/vector-functions.h@18 PS5, Line 18: #ifndef IMPALA_EXPRS_VECTOR_FUNCTIONS_H : #define IMPALA_EXPRS_VECTOR_FUNCTIONS_H For new headers, I think we can use "#pragma once" which is cleaner -- To view, visit http://gerrit.cloudera.org:8080/23730 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id305acc87530d5d0e53613fe8df9a631ea4e1080 Gerrit-Change-Number: 23730 Gerrit-PatchSet: 5 Gerrit-Owner: Raghav Jindal <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Pranav Lodha <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Wed, 03 Dec 2025 10:14:37 +0000 Gerrit-HasComments: Yes
