Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23730 )

Change subject: IMPALA-14566: Add euclidean_distance and cosine_similarity 
functions for ARRAY<FLOAT>
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/23730/7/be/src/exprs/vector-functions-ir.cc
File be/src/exprs/vector-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/23730/7/be/src/exprs/vector-functions-ir.cc@65
PS7, Line 65:   memcpy(&value, tuple_ptr + slot_offset, sizeof(float));
I wonder if you could avoid this memcpy by returning a float&, as in

  return (float) (tuple_ptr + slot_offset);

and doing arithmetic with the references. I'm not 100% sure that will work. 
Also not sure if it'd be faster, but it'd be kind of fun to benchmark.


http://gerrit.cloudera.org:8080/#/c/23730/7/be/src/exprs/vector-functions-ir.cc@119
PS7, Line 119:     sum_squared_diff += static_cast<double>(diff * diff);
This should avoid overflow. Please add some boundary tests to verify that.


http://gerrit.cloudera.org:8080/#/c/23730/7/be/src/exprs/vector-functions.h
File be/src/exprs/vector-functions.h:

http://gerrit.cloudera.org:8080/#/c/23730/7/be/src/exprs/vector-functions.h@46
PS7, Line 46:   /// Declaring the Helper functions under private to get a float 
value
nit: this sentence is a bit verbose, declaring helper functions under private 
is obvious.

If you're using significant contributions from generative AI tools, please 
review https://www.apache.org/legal/generative-tooling.html and add

  Generated-By: <tool> (<model>)

to the commit message. An example would be 
https://gerrit.cloudera.org/c/22462/20//COMMIT_MSG#23.


http://gerrit.cloudera.org:8080/#/c/23730/7/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/23730/7/common/function-registry/impala_functions.py@668
PS7, Line 668:   # Vector distance functions for semantic search
The description of this format would make more sense at the top of this list, 
it's not specific to the new functions.



--
To view, visit http://gerrit.cloudera.org:8080/23730
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id305acc87530d5d0e53613fe8df9a631ea4e1080
Gerrit-Change-Number: 23730
Gerrit-PatchSet: 7
Gerrit-Owner: Raghav Jindal <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Pranav Lodha <[email protected]>
Gerrit-Reviewer: Raghav Jindal <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Tue, 09 Dec 2025 23:44:56 +0000
Gerrit-HasComments: Yes

Reply via email to