[
https://issues.apache.org/jira/browse/IMPALA-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18041150#comment-18041150
]
Raghav Jindal commented on IMPALA-14566:
----------------------------------------
I tried to push my code and create a pvt branch but I am not having the
permissions
{code:java}
git push origin semanticsearchtest
remote: Permission to apache/impala.git denied to rajindal8.
fatal: unable to access 'https://github.com/apache/impala.git/': The requested
URL returned error: 403 {code}
Initial Code for vector-functions.h
{code:java}
#ifndef IMPALA_EXPRS_VECTOR_FUNCTIONS_H#define IMPALA_EXPRS_VECTOR_FUNCTIONS_H
#include "udf/udf.h"
namespace impala {
using impala_udf::FunctionContext;using impala_udf::DoubleVal;using
impala_udf::CollectionVal;
class VectorFunctions { public: /// The Return Type for the below distance
functions DOUBLE and I did not use FLOAT because of better precision. //
Distance calculations usually involve square roots which will benefit from
15-17 digit precision in Doubke vs 7 digits in FLOAT. // Value returned from
this Euclidean distance function is either a DOUBLE, or NULL if inputs are
invalid /// ctx is a Function context for memory allocation and error
reporting /// vec1 is the First vector as ARRAY<FLOAT> /// vec2 is the Second
vector as ARRAY<FLOAT> static DoubleVal EuclideanDistance(FunctionContext*
ctx, const CollectionVal& vec1, const CollectionVal& vec2);
static DoubleVal CosineSimilarity(FunctionContext* ctx, const
CollectionVal& vec1, const CollectionVal& vec2);
/// Prepare function to initialize the function state. static void
VectorDistancePrepare(FunctionContext* ctx,
FunctionContext::FunctionStateScope scope);
/// Close function to clean up the function state. static void
VectorDistanceClose(FunctionContext* ctx,
FunctionContext::FunctionStateScope scope);
private: /// Declaring the Helper functions under private to get a float
value from an array element. /// For ARRAY<FLOAT>, elements are stored as
tuples and this function will extract /// the float value from the tuple at a
given index. /// array_ptr Pointer to the start of the array tuple data ///
index Index of the element to retrieve /// tuple_size Size of each tuple in
bytes /// slot_offset Offset of the float slot within the tuple /// The float
value, or 0.0 if the element is NULL static float GetFloatFromArray(const
uint8_t* array_ptr, int index, int tuple_size, int slot_offset);
/// Helper function to check if an array element is NULL. static bool
IsArrayElementNull(const uint8_t* array_ptr, int index, int tuple_size,
int null_indicator_offset);};
} // namespace impala
#endif // IMPALA_EXPRS_VECTOR_FUNCTIONS_H {code}
> Add support for cosine similarity function
> ------------------------------------------
>
> Key: IMPALA-14566
> URL: https://issues.apache.org/jira/browse/IMPALA-14566
> Project: IMPALA
> Issue Type: Task
> Reporter: Abhishek Rawat
> Assignee: Raghav Jindal
> Priority: Major
>
> The cosine similarity function measures the angle between two vectors,
> regardless of their length (magnitude). The use cases include measuring text
> similarity and is ideal when the direction (semantic meaning/concept) is more
> important than the magnitude.
> Impala doesn't support a native vector data type yet, so we could possibly
> use an ARRAY<FLOAT> data type for representing vectors.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]