Attached is a proof-of-concept/work-in-progress patch set that adds functions for "vectors" repreѕented with one-dimensional float8 arrays. These functions may be used in a variety of applications, but I am proposing them with the AI/ML use-cases in mind. I am posting this early in the v17 cycle in hopes of gathering feedback prior to PGCon.
With the accessibility of AI/ML tools such as large language models (LLMs), there has been a demand for storing and manipulating high-dimensional vectors in PostgreSQL, particularly around nearest-neighbor queries. Many of these vectors have more than 1500 dimensions. The cube extension [0] provides some of the distance functionality (e.g., taxicab, Euclidean, and Chebyshev), but it is missing some popular functions (e.g., cosine similarity, dot product), and it is limited to 100 dimensions. We could extend cube to support more dimensions, but this would require reworking its indexing code and filling in gaps between the cube data type and the array types. For some previous discussion about using the cube extension for this kind of data, see [1]. float8[] is well-supported and allows for effectively unlimited dimensions of data. float8 matches the common output format of many AI embeddings, and it allows us or extensions to implement indexing methods around these functions. This patch set does not yet contain indexing support, but we are exploring using GiST or GIN for the use-cases in question. It might also be desirable to add support for other linear algebra operations (e.g., operations on matrices). The attached patches likely only scratch the surface of the "vector search" use-case. The patch set is broken up as follows: * 0001 does some minor refactoring of dsqrt() in preparation for 0002. * 0002 adds several vector-related functions, including distance functions and a kmeans++ implementation. * 0003 adds support for optionally using the OpenBLAS library, which is an implementation of the Basic Linear Algebra Subprograms [2] specification. Basic testing with this library showed a small performance boost, although perhaps not enough to justify giving this patch serious consideration. Of course, there are many open questions. For example, should PostgreSQL support this stuff out-of-the-box in the first place? And should we introduce a vector data type or SQL domains for treating float8[] as vectors? IMHO these vector search use-cases are an exciting opportunity for the PostgreSQL project, so I am eager to hear what folks think. [0] https://www.postgresql.org/docs/current/cube.html [1] https://postgr.es/m/2271927.1593097400%40sss.pgh.pa.us [2] https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
>From b54b938bf35f26bc8e07cd57d4bf616b7af36709 Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nathandboss...@gmail.com> Date: Sun, 19 Mar 2023 14:37:54 -0700 Subject: [PATCH v1 1/3] Refactor dsqrt(). This moves most of the logic for dsqrt() to a helper function in float.h so that it can be reused elsewhere. --- src/backend/utils/adt/float.c | 16 +--------------- src/include/utils/float.h | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 15 deletions(-) diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index 9b51da2382..4f655c281a 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -1445,21 +1445,7 @@ dtrunc(PG_FUNCTION_ARGS) Datum dsqrt(PG_FUNCTION_ARGS) { - float8 arg1 = PG_GETARG_FLOAT8(0); - float8 result; - - if (arg1 < 0) - ereport(ERROR, - (errcode(ERRCODE_INVALID_ARGUMENT_FOR_POWER_FUNCTION), - errmsg("cannot take square root of a negative number"))); - - result = sqrt(arg1); - if (unlikely(isinf(result)) && !isinf(arg1)) - float_overflow_error(); - if (unlikely(result == 0.0) && arg1 != 0.0) - float_underflow_error(); - - PG_RETURN_FLOAT8(result); + PG_RETURN_FLOAT8(float8_sqrt(PG_GETARG_FLOAT8(0))); } diff --git a/src/include/utils/float.h b/src/include/utils/float.h index 7529899d63..b4e50cccaf 100644 --- a/src/include/utils/float.h +++ b/src/include/utils/float.h @@ -250,6 +250,25 @@ float8_div(const float8 val1, const float8 val2) return result; } +static inline float8 +float8_sqrt(const float8 val) +{ + float8 result; + + if (unlikely(val < 0)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_ARGUMENT_FOR_POWER_FUNCTION), + errmsg("cannot take square root of a negative number"))); + + result = sqrt(val); + if (unlikely(isinf(result)) && !isinf(val)) + float_overflow_error(); + if (unlikely(result == 0.0) && val != 0.0) + float_underflow_error(); + + return result; +} + /* * Routines for NaN-aware comparisons * -- 2.25.1
>From 351e5e05b471255d0f831d330f48ce95f16095eb Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nathandboss...@gmail.com> Date: Sun, 19 Mar 2023 23:21:58 -0700 Subject: [PATCH v1 2/3] Introduce basic vector support. This change introduces a variety of SQL-callable functions for "vectors", or 1-D float8 arrays with no NULL values and at least one element. --- doc/src/sgml/func.sgml | 241 +++++++++++ src/backend/utils/adt/Makefile | 1 + src/backend/utils/adt/meson.build | 1 + src/backend/utils/adt/vector.c | 589 ++++++++++++++++++++++++++ src/include/catalog/pg_proc.dat | 49 +++ src/test/regress/expected/vectors.out | 154 +++++++ src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/vectors.sql | 61 +++ 8 files changed, 1097 insertions(+), 1 deletion(-) create mode 100644 src/backend/utils/adt/vector.c create mode 100644 src/test/regress/expected/vectors.out create mode 100644 src/test/regress/sql/vectors.sql diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5a47ce4343..1bd71ccefe 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -18950,6 +18950,247 @@ SELECT NULLIF(value, '(none)') ... </programlisting> </para></entry> </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>array_is_vector</primary> + </indexterm> + <function>array_is_vector</function> ( <type>float8[]</type> ) + <returnvalue>boolean</returnvalue> + </para> + <para> + Returns whether the array is a vector (i.e., a one-dimensional array of + <type>float8</type> with no NULL values and at least one element). The + values in such arrays are treated as Cartesian coordinates in + <literal>n</literal>-dimensional Euclidean space, where + <literal>n</literal> is the cardinality of the array. + </para> + <para> + <literal>array_is_vector{'1,2,3}')</literal> + <returnvalue>t</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>euclidean_norm</primary> + </indexterm> + <function>euclidean_norm</function> ( <type>float8[]</type> ) + <returnvalue>float8</returnvalue> + </para> + <para> + Returns the Euclidean norm of the array. The array must be a vector + (see <function>array_is_vector</function> for details about what + constitutes a vector). + </para> + <para> + <literal>euclidean_norm('{1,2,3}')</literal> + <returnvalue>3.7416573867739413</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>normalize_vector</primary> + </indexterm> + <function>normalize_vector</function> ( <type>float8[]</type> ) + <returnvalue>float8[]</returnvalue> + </para> + <para> + Returns a normalized version of the array. The array must be a vector + (see <function>array_is_vector</function> for details about what + constitutes a vector), and its Euclidean norm must be nonzero. + </para> + <para> + <literal>normalize_vector('{1,2,3}')</literal> + <returnvalue>{0.2672612419124244,0.5345224838248488,0.8017837257372732}</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>dot_product</primary> + </indexterm> + <function>dot_product</function> ( <type>float8[]</type>, <type>float8[]</type> ) + <returnvalue>float8</returnvalue> + </para> + <para> + Returns the dot product of the arrays. The arrays must be vectors (see + <function>array_is_vector</function> for details about what constitutes + a vector). + </para> + <para> + <literal>dot_product('{1,2,3}', '{4,5,6}')</literal> + <returnvalue>32</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>squared_euclidean_distance</primary> + </indexterm> + <function>squared_euclidean_distance</function> ( <type>float8[]</type>, <type>float8[]</type> ) + <returnvalue>float8</returnvalue> + </para> + <para> + Returns the squared Euclidean distance between the arrays. The arrays + must be vectors (see <function>array_is_vector</function> for details + about what constitutes a vector). + </para> + <para> + <literal>squared_euclidean_distance('{1,2,3}', '{4,5,6}')</literal> + <returnvalue>27</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>euclidean_distance</primary> + </indexterm> + <function>euclidean_distance</function> ( <type>float8[]</type>, <type>float8[]</type> ) + <returnvalue>float8</returnvalue> + </para> + <para> + Returns the Euclidean distance between the arrays. The arrays must be + vectors (see <function>array_is_vector</function> for details about + what constitutes a vector). + </para> + <para> + <literal>euclidean_distance('{1,2,3}', '{4,5,6}')</literal> + <returnvalue>5.196152422706632</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>cosine_distance</primary> + </indexterm> + <function>cosine_distance</function> ( <type>float8[]</type>, <type>float8[]</type> ) + <returnvalue>float8</returnvalue> + </para> + <para> + Returns the cosine distance between the arrays. The arrays must be + vectors (see <function>array_is_vector</function> for details about + what constitutes a vector). + </para> + <para> + <literal>cosine_distance('{1,2,3}', '{4,5,6}')</literal> + <returnvalue>0.025368153802923787</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>taxicab_distance</primary> + </indexterm> + <function>taxicab_distance</function> ( <type>float8[]</type>, <type>float8[]</type> ) + <returnvalue>float8</returnvalue> + </para> + <para> + Returns the taxicab distance between the arrays. The arrays must be + vectors (see <function>array_is_vector</function> for details about + what constitutes a vector). + </para> + <para> + <literal>taxicab_distance('{1,2,3}', '{4,5,6}')</literal> + <returnvalue>9</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>chebyshev_distance</primary> + </indexterm> + <function>chebyshev_distance</function> ( <type>float8[]</type>, <type>float8[]</type> ) + <returnvalue>float8</returnvalue> + </para> + <para> + Returns the Chebyshev distance between the arrays. The arrays must be + vectors (see <function>array_is_vector</function> for details about + what constitutes a vector). + </para> + <para> + <literal>chebyshev_distance('{1,2,3}', '{4,5,6}')</literal> + <returnvalue>3</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>standard_unit_vector</primary> + </indexterm> + <function>standard_unit_vector</function> ( <parameter>d</parameter> <type>integer</type>, <parameter>n</parameter> <type>integer</type> ) + <returnvalue>float8[]</returnvalue> + </para> + <para> + Returns an array of <type>float8</type> with <parameter>d</parameter> + elements. All elements are set to <literal>0</literal> except for the + <parameter>n</parameter>th element, which is set to + <literal>1</literal>. + </para> + <para> + <literal>standard_unit_vector(3, 1)</literal> + <returnvalue>{1,0,0}</returnvalue> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>kmeans</primary> + </indexterm> + <function>kmeans</function> ( <parameter>n</parameter> <type>integer</type>, <parameter>v</parameter> <type>float8[]</type>, <parameter>i</parameter> <type>integer</type> ) + <returnvalue>setof float8[]</returnvalue> + </para> + <para> + Returns <parameter>n</parameter> centers calculated via + <parameter>i</parameter> iterations of the <literal>kmeans++</literal> + algorithm on the array of vectors <parameter>v</parameter> (see + <function>array_is_vector</function> for details about what constitutes + a vector). + </para> + <para> + <literal>kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1)</literal> + <returnvalue></returnvalue> +<programlisting> + kmeans +---------------- + {2.5,3.5,4.5} + {8.5,9.5,10.5} +</programlisting> + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>closest_vector</primary> + </indexterm> + <function>closest_vector</function> ( <parameter>c</parameter> <type>float8[]</type>, <parameter>v</parameter> <type>float8[]</type> ) + <returnvalue>integer</returnvalue> + </para> + <para> + Returns the index of the vector in <parameter>c</parameter> that is + closest to <parameter>v</parameter>. <parameter>c</parameter> must be + an array of vectors, and <parameter>v</parameter> must be a vector (see + <function>array_is_vector</function> for details about what constitutes + a vector). + </para> + <para> + <literal>closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', '{10,11,12}')</literal> + <returnvalue>2</returnvalue> + </para></entry> + </row> </tbody> </tgroup> </table> diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 0de0bbb1b8..ffda4038f1 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -114,6 +114,7 @@ OBJS = \ varbit.o \ varchar.o \ varlena.o \ + vector.o \ version.o \ windowfuncs.o \ xid.o \ diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build index 8515cd9365..dfa4eeb52f 100644 --- a/src/backend/utils/adt/meson.build +++ b/src/backend/utils/adt/meson.build @@ -101,6 +101,7 @@ backend_sources += files( 'varbit.c', 'varchar.c', 'varlena.c', + 'vector.c', 'version.c', 'windowfuncs.c', 'xid.c', diff --git a/src/backend/utils/adt/vector.c b/src/backend/utils/adt/vector.c new file mode 100644 index 0000000000..dba1fd7eef --- /dev/null +++ b/src/backend/utils/adt/vector.c @@ -0,0 +1,589 @@ +/*------------------------------------------------------------------------- + * + * vector.c + * Support functions for float8 arrays treated as vectors. + * + * For the purposes of this file, a "vector" is a one-dimensional float8 + * array with no NULL values and at least one element. The values in such + * arrays are treated as Cartesian coordinates in n-dimensional Euclidean + * space, where "n" is the cardinality of the array. For more information, + * see https://en.wikipedia.org/wiki/Cartesian_coordinate_system. + * + * Copyright (c) 2023, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/backend/utils/adt/vector.c + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include <float.h> + +#include "catalog/pg_type.h" +#include "common/int.h" +#include "common/pg_prng.h" +#include "funcapi.h" +#include "utils/array.h" +#include "utils/float.h" +#include "utils/fmgrprotos.h" + +static inline int float8_array_is_vector_internal(ArrayType *v, bool error); +static inline float8 euclidean_norm_internal(int n, const float8 *a); +static inline float8 squared_euclidean_distance_internal(int n, const float8 *a, const float8 *b); +static inline float8 euclidean_distance_internal(int n, const float8 *a, const float8 *b); + +/* + * Checks that an array is a vector (as described at the top of this file) and + * returns its cardinality. If "error" is true, this function ERRORs if the + * array is not a vector. If "error" is false, this function returns -1 if the + * array is not a vector. + */ +static inline int +float8_array_is_vector_internal(ArrayType *v, bool error) +{ + int n = ArrayGetNItems(ARR_NDIM(v), ARR_DIMS(v)); + + if (unlikely(n == 0)) + { + if (error) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("vectors must have at least one element"))); + return -1; + } + + if (unlikely(ARR_NDIM(v) != 1)) + { + if (error) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("vectors must be one-dimensional arrays"))); + return -1; + } + + if (unlikely(ARR_HASNULL(v) && array_contains_nulls(v))) + { + if (error) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("vectors must not contain nulls"))); + return -1; + } + + return n; +} + +/* + * SQL-callable version of float8_array_is_vector_internal(). Unlike the + * internal version, this does not return the array's cardinality. + */ +Datum +float8_array_is_vector(PG_FUNCTION_ARGS) +{ + ArrayType *v = PG_GETARG_ARRAYTYPE_P(0); + bool r = (float8_array_is_vector_internal(v, false) != -1); + + PG_RETURN_BOOL(r); +} + +/* + * Workhorse for euclidean_norm() and normalize_vector(). + */ +static inline float8 +euclidean_norm_internal(int n, const float8 *a) +{ + float8 en = 0.0; + + for (int i = 0; i < n; i++) + en = float8_pl(en, float8_mul(a[i], a[i])); + + en = float8_sqrt(en); + + return en; +} + +/* + * Returns the Euclidean norm of a vector. For more information, see + * https://en.wikipedia.org/wiki/Norm_(mathematics)#Euclidean_norm + */ +Datum +euclidean_norm(PG_FUNCTION_ARGS) +{ + ArrayType *a = PG_GETARG_ARRAYTYPE_P(0); + int an = float8_array_is_vector_internal(a, true); + float8 *ad = (float8 *) ARR_DATA_PTR(a); + + PG_RETURN_FLOAT8(euclidean_norm_internal(an, ad)); +} + +/* + * Returns a normalized version of the given vector. For more information, see + * https://en.wikipedia.org/wiki/Unit_vector. + */ +Datum +normalize_vector(PG_FUNCTION_ARGS) +{ + ArrayType *a = PG_GETARG_ARRAYTYPE_P(0); + ArrayType *ua; + int an = float8_array_is_vector_internal(a, true); + float8 *ad = (float8 *) ARR_DATA_PTR(a); + float8 en = euclidean_norm_internal(an, ad); + Datum *ud = palloc(an * sizeof(Datum)); + + for (int i = 0; i < an; i++) + ud[i] = Float8GetDatumFast(float8_div(ad[i], en)); + + ua = construct_array(ud, an, FLOAT8OID, sizeof(float8), + FLOAT8PASSBYVAL, TYPALIGN_DOUBLE); + + PG_RETURN_ARRAYTYPE_P(ua); +} + +/* + * Returns the dot product of two vectors. For more information, see + * https://en.wikipedia.org/wiki/Dot_product. + */ +Datum +dot_product(PG_FUNCTION_ARGS) +{ + ArrayType *a = PG_GETARG_ARRAYTYPE_P(0); + ArrayType *b = PG_GETARG_ARRAYTYPE_P(1); + int an = float8_array_is_vector_internal(a, true); + int bn = float8_array_is_vector_internal(b, true); + float8 *ad = (float8 *) ARR_DATA_PTR(a); + float8 *bd = (float8 *) ARR_DATA_PTR(b); + float8 dp = 0.0; + + if (an != bn) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("vectors must have the same number of elements"))); + + for (int i = 0; i < an; i++) + dp = float8_pl(dp, float8_mul(ad[i], bd[i])); + + PG_RETURN_FLOAT8(dp); +} + +/* + * Workhorse for squared_euclidean_distance() and + * euclidean_distance_internal(). + */ +static inline float8 +squared_euclidean_distance_internal(int n, const float8 *a, const float8 *b) +{ + float8 sed = 0.0; + + for (int i = 0; i < n; i++) + { + float8 d = float8_mi(a[i], b[i]); + + sed = float8_pl(sed, float8_mul(d, d)); + } + + return sed; +} + +/* + * Returns the squared Euclidean distance between two vectors. For more + * information, see + * https://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance. + */ +Datum +squared_euclidean_distance(PG_FUNCTION_ARGS) +{ + ArrayType *a = PG_GETARG_ARRAYTYPE_P(0); + ArrayType *b = PG_GETARG_ARRAYTYPE_P(1); + int an = float8_array_is_vector_internal(a, true); + int bn = float8_array_is_vector_internal(b, true); + float8 *ad = (float8 *) ARR_DATA_PTR(a); + float8 *bd = (float8 *) ARR_DATA_PTR(b); + float8 sed; + + if (an != bn) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("vectors must have the same number of elements"))); + + sed = squared_euclidean_distance_internal(an, ad, bd); + + PG_RETURN_FLOAT8(sed); +} + +/* + * Workhorse for euclidean_distance(). + */ +static inline float8 +euclidean_distance_internal(int n, const float8 *a, const float8 *b) +{ + float8 sed = squared_euclidean_distance_internal(n, a, b); + + return float8_sqrt(sed); +} + +/* + * Returns the Euclidean distance between vectors. For more information, see + * https://en.wikipedia.org/wiki/Euclidean_distance. + */ +Datum +euclidean_distance(PG_FUNCTION_ARGS) +{ + ArrayType *a = PG_GETARG_ARRAYTYPE_P(0); + ArrayType *b = PG_GETARG_ARRAYTYPE_P(1); + int an = float8_array_is_vector_internal(a, true); + int bn = float8_array_is_vector_internal(b, true); + float8 *ad = (float8 *) ARR_DATA_PTR(a); + float8 *bd = (float8 *) ARR_DATA_PTR(b); + float8 ed; + + if (an != bn) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("vectors must have the same number of elements"))); + + ed = euclidean_distance_internal(an, ad, bd); + + PG_RETURN_FLOAT8(ed); +} + +/* + * Returns the cosine distance between two vectors. For more information, see + * https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_Distance. + */ +Datum +cosine_distance(PG_FUNCTION_ARGS) +{ + ArrayType *a = PG_GETARG_ARRAYTYPE_P(0); + ArrayType *b = PG_GETARG_ARRAYTYPE_P(1); + int an = float8_array_is_vector_internal(a, true); + int bn = float8_array_is_vector_internal(b, true); + float8 *ad = (float8 *) ARR_DATA_PTR(a); + float8 *bd = (float8 *) ARR_DATA_PTR(b); + float8 dp = 0.0; + float8 aen = 0.0; + float8 ben = 0.0; + float8 cd; + + if (an != bn) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("vectors must have the same number of elements"))); + + for (int i = 0; i < an; i++) + { + dp = float8_pl(dp, float8_mul(ad[i], bd[i])); + aen = float8_pl(aen, float8_mul(ad[i], ad[i])); + ben = float8_pl(ben, float8_mul(bd[i], bd[i])); + } + + cd = float8_mi(1.0, float8_div(dp, float8_sqrt(float8_mul(aen, ben)))); + + PG_RETURN_FLOAT8(cd); +} + +/* + * Returns the taxicab distance between two vectors. For more information, see + * https://en.wikipedia.org/wiki/Taxicab_geometry. + */ +Datum +taxicab_distance(PG_FUNCTION_ARGS) +{ + ArrayType *a = PG_GETARG_ARRAYTYPE_P(0); + ArrayType *b = PG_GETARG_ARRAYTYPE_P(1); + int an = float8_array_is_vector_internal(a, true); + int bn = float8_array_is_vector_internal(b, true); + float8 *ad = (float8 *) ARR_DATA_PTR(a); + float8 *bd = (float8 *) ARR_DATA_PTR(b); + float8 td = 0.0; + + if (an != bn) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("vectors must have the same number of elements"))); + + for (int i = 0; i < an; i++) + td = float8_pl(td, fabs(float8_mi(ad[i], bd[i]))); + + PG_RETURN_FLOAT8(td); +} + +/* + * Returns the Chebyshev distance between two vectors. For more information, + * see https://en.wikipedia.org/wiki/Chebyshev_distance. + */ +Datum +chebyshev_distance(PG_FUNCTION_ARGS) +{ + ArrayType *a = PG_GETARG_ARRAYTYPE_P(0); + ArrayType *b = PG_GETARG_ARRAYTYPE_P(1); + int an = float8_array_is_vector_internal(a, true); + int bn = float8_array_is_vector_internal(b, true); + float8 *ad = (float8 *) ARR_DATA_PTR(a); + float8 *bd = (float8 *) ARR_DATA_PTR(b); + float8 cd = 0.0; + + if (an != bn) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("vectors must have the same number of elements"))); + + for (int i = 0; i < an; i++) + cd = float8_max(cd, fabs(float8_mi(ad[i], bd[i]))); + + PG_RETURN_FLOAT8(cd); +} + +/* + * Returns a standard unit vector with the nth element set to 1. All other + * elements are set to 0. For more information, see + * https://en.wikipedia.org/wiki/Standard_basis. + */ +Datum +standard_unit_vector(PG_FUNCTION_ARGS) +{ + int d = PG_GETARG_INT32(0); + int n = PG_GETARG_INT32(1); + Datum *ud = palloc(d * sizeof(Datum)); + ArrayType *ua; + + if (d <= 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("standard unit vectors must have at least one element"))); + + if (n == 0 || n > d) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set nonexistent element to one"))); + + for (int i = 0; i < d; i++) + ud[i] = Float8GetDatumFast(0.0); + + ud[n - 1] = Float8GetDatumFast(1.0); + + ua = construct_array(ud, d, FLOAT8OID, sizeof(float8), + FLOAT8PASSBYVAL, TYPALIGN_DOUBLE); + + PG_RETURN_ARRAYTYPE_P(ua); +} + +/* + * Returns centers generated via the kmeans++ algorithm. For more information, + * see https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf. + * + * XXX: Accelerate using triange inequality (see + * https://cseweb.ucsd.edu/~elkan/kmeansicml03.pdf). + */ +Datum +kmeans(PG_FUNCTION_ARGS) +{ + int ncenters = PG_GETARG_INT32(0); + ArrayType *a = PG_GETARG_ARRAYTYPE_P(1); + int iterations = PG_GETARG_INT32(2); + int nsamples = ARR_NDIM(a) > 1 ? ARR_DIMS(a)[0] : ARR_NDIM(a); + int dimensions; + int *clusters = palloc(nsamples * sizeof(int)); + int *pops = palloc(ncenters * sizeof(int)); + float8 **centers = palloc(ncenters * sizeof(float8 *)); + float8 **vectors = palloc(nsamples * sizeof(float8 *)); + float8 *weights = palloc(nsamples * sizeof(float8)); + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + Datum *cdata; + pg_prng_state rstate; + + InitMaterializedSRF(fcinfo, MAT_SRF_USE_EXPECTED_DESC); + + if (ncenters < 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("must request at least one center"))); + + if (ncenters > nsamples) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("number of sample vectors must be greater than or equal to the number of requested centers"))); + + if (ARR_HASNULL(a) && array_contains_nulls(a)) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("vectors must not contain nulls"))); + + if (iterations < 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("must request at least one iteration"))); + + dimensions = ARR_NDIM(a) > 1 ? ARR_DIMS(a)[1] : ARR_DIMS(a)[0]; + for (int i = 0; i < nsamples; i++) + { + int r = -1; + + if (pg_mul_s32_overflow(i, dimensions, &r)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("sample vectors too large"))); + + vectors[i] = ((float8 *) ARR_DATA_PTR(a)) + r; + } + + /* choose initial center randomly */ + pg_prng_seed(&rstate, 0x5eed); + centers[0] = vectors[(int) float8_mul(pg_prng_double(&rstate), nsamples)]; + + for (int i = 0; i < nsamples; i++) + weights[i] = DBL_MAX; + + /* calculate initial values for the other centers */ + for (int i = 1; i < ncenters; i++) + { + float8 sum = 0.0; + float8 wi; + + for (int j = 0; j < nsamples; j++) + { + float8 distance = euclidean_distance_internal(dimensions, + centers[i - 1], + vectors[j]); + + weights[j] = float8_min(distance, weights[j]); + sum = float8_pl(sum, weights[j]); + } + + wi = float8_mul(pg_prng_double(&rstate), sum); + for (int j = 0; j < nsamples; j++) + { + wi = float8_mi(wi, weights[j]); + if (float8_le(wi, 0)) + { + centers[i] = vectors[j]; + break; + } + } + } + + /* XXX: exit early if converged */ + for (int i = 0; i < iterations; i++) + { + /* assign each sample to a center */ + for (int j = 0; j < nsamples; j++) + { + float8 min_distance = DBL_MAX; + + for (int k = 0; k < ncenters; k++) + { + float8 distance = euclidean_distance_internal(dimensions, + centers[k], + vectors[j]); + + if (float8_lt(distance, min_distance)) + { + min_distance = distance; + clusters[j] = k; + } + } + } + + /* clear centers */ + for (int j = 0; j < ncenters; j++) + { + if (i == 0) + centers[j] = palloc0(dimensions * sizeof(float8)); + else + memset(centers[j], 0, dimensions * sizeof(float8)); + + pops[j] = 0; + } + + /* set each center to average of all vectors in cluster */ + for (int j = 0; j < nsamples; j++) + { + int cluster = clusters[j]; + float8 *center = centers[cluster]; + + for (int k = 0; k < dimensions; k++) + center[k] = float8_pl(center[k], vectors[j][k]); + + pops[cluster]++; + } + + for (int j = 0; j < ncenters; j++) + { + for (int k = 0; k < dimensions; k++) + centers[j][k] = float8_div(centers[j][k], pops[j]); + } + } + + cdata = palloc(dimensions * sizeof(Datum)); + for (int i = 0; i < ncenters; i++) + { + Datum values[1]; + bool nulls[1]; + + for (int j = 0; j < dimensions; j++) + cdata[j] = Float8GetDatumFast(centers[i][j]); + + values[0] = PointerGetDatum(construct_array(cdata, dimensions, + FLOAT8OID, sizeof(float8), + FLOAT8PASSBYVAL, + TYPALIGN_DOUBLE)); + nulls[0] = false; + + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); + } + + return (Datum) 0; +} + +/* + * Returns the index of the closest vector. + */ +Datum +closest_vector(PG_FUNCTION_ARGS) +{ + ArrayType *a = PG_GETARG_ARRAYTYPE_P(0); + ArrayType *v = PG_GETARG_ARRAYTYPE_P(1); + int alen = ARR_NDIM(a) > 1 ? ARR_DIMS(a)[0] : ARR_NDIM(a); + int dimensions = float8_array_is_vector_internal(v, true); + float8 min_dist = DBL_MAX; + float8 *vd = (float8 *) ARR_DATA_PTR(v); + int idx = -1; + + if (alen == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("must provide at least one candidate vector"))); + + if (ARR_HASNULL(a) && array_contains_nulls(a)) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("vectors must not contain nulls"))); + + if ((ARR_NDIM(a) > 1 ? ARR_DIMS(a)[1] : ARR_DIMS(a)[0]) != dimensions) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("vectors must have the same number of elements"))); + + for (int i = 0; i < alen; i++) + { + float8 *c; + float8 d; + int r = -1; + + if (pg_mul_s32_overflow(i, dimensions, &r)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("candidate vectors too large"))); + + c = ((float8 *) ARR_DATA_PTR(a)) + r; + d = euclidean_distance_internal(dimensions, c, vd); + + if (r == -1 || float8_lt(d, min_dist)) + { + min_dist = d; + idx = i; + } + } + + PG_RETURN_INT32(idx); +} diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index b2bc81b15f..a07c5b10b0 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -1735,6 +1735,55 @@ proargtypes => 'internal oid internal int2 internal', prosrc => 'arraycontjoinsel' }, +{ oid => '4642', descr => 'verify float8 array is a vector', + proname => 'float8_array_is_vector', prorettype => 'bool', + proargtypes => '_float8', + prosrc => 'float8_array_is_vector' }, +{ oid => '4643', descr => 'euclidean norm', + proname => 'euclidean_norm', prorettype => 'float8', + proargtypes => '_float8', + prosrc => 'euclidean_norm' }, +{ oid => '4644', descr => 'dot product', + proname => 'dot_product', prorettype => 'float8', + proargtypes => '_float8 _float8', + prosrc => 'dot_product' }, +{ oid => '4645', descr => 'normalize a vector', + proname => 'normalize_vector', prorettype => '_float8', + proargtypes => '_float8', + prosrc => 'normalize_vector' }, +{ oid => '4646', descr => 'squared euclidean distance', + proname => 'squared_euclidean_distance', prorettype => 'float8', + proargtypes => '_float8 _float8', + prosrc => 'squared_euclidean_distance' }, +{ oid => '4647', descr => 'euclidean distance', + proname => 'euclidean_distance', prorettype => 'float8', + proargtypes => '_float8 _float8', + prosrc => 'euclidean_distance' }, +{ oid => '4648', descr => 'cosine distance', + proname => 'cosine_distance', prorettype => 'float8', + proargtypes => '_float8 _float8', + prosrc => 'cosine_distance' }, +{ oid => '4649', descr => 'taxicab distance', + proname => 'taxicab_distance', prorettype => 'float8', + proargtypes => '_float8 _float8', + prosrc => 'taxicab_distance' }, +{ oid => '4650', descr => 'chebyshev distance', + proname => 'chebyshev_distance', prorettype => 'float8', + proargtypes => '_float8 _float8', + prosrc => 'chebyshev_distance' }, +{ oid => '4651', descr => 'standard unit vector', + proname => 'standard_unit_vector', prorettype => '_float8', + proargtypes => 'int4 int4', + prosrc => 'standard_unit_vector' }, +{ oid => '4652', descr => 'kmeans', + proname => 'kmeans', prorettype => '_float8', proretset => 't', + proargtypes => 'int4 _float8 int4', prorows => '1000', + prosrc => 'kmeans' }, +{ oid => '4653', descr => 'closest vector', + proname => 'closest_vector', prorettype => 'int4', + proargtypes => '_float8 _float8', + prosrc => 'closest_vector' }, + { oid => '764', descr => 'large object import', proname => 'lo_import', provolatile => 'v', proparallel => 'u', prorettype => 'oid', proargtypes => 'text', prosrc => 'be_lo_import' }, diff --git a/src/test/regress/expected/vectors.out b/src/test/regress/expected/vectors.out new file mode 100644 index 0000000000..846f02433c --- /dev/null +++ b/src/test/regress/expected/vectors.out @@ -0,0 +1,154 @@ +SET extra_float_digits = -1; +-- float8_array_is_vector +SELECT float8_array_is_vector(ARRAY[]::float8[]); + float8_array_is_vector +------------------------ + f +(1 row) + +SELECT float8_array_is_vector('{{1.0},{1.0}}'); + float8_array_is_vector +------------------------ + f +(1 row) + +SELECT float8_array_is_vector(ARRAY[NULL]::float8[]); + float8_array_is_vector +------------------------ + f +(1 row) + +SELECT float8_array_is_vector('{1,2,3}'); + float8_array_is_vector +------------------------ + t +(1 row) + +-- euclidean_norm +SELECT euclidean_norm('{1,2,3}'); + euclidean_norm +----------------- + 3.7416573867739 +(1 row) + +-- normalize_vector +SELECT normalize_vector('{0,0,0}'); +ERROR: division by zero +SELECT normalize_vector('{1,2,3}'); + normalize_vector +------------------------------------------------------ + {0.26726124191242,0.53452248382485,0.80178372573727} +(1 row) + +-- dot_product +SELECT dot_product('{1,2,3}', '{4,5,6}'); + dot_product +------------- + 32 +(1 row) + +-- squared_euclidean_distance +SELECT squared_euclidean_distance('{1,2}', '{4,5,6}'); +ERROR: vectors must have the same number of elements +SELECT squared_euclidean_distance('{1,2,3}', '{4,5,6}'); + squared_euclidean_distance +---------------------------- + 27 +(1 row) + +-- euclidean_distance +SELECT euclidean_distance('{1,2}', '{4,5,6}'); +ERROR: vectors must have the same number of elements +SELECT euclidean_distance('{1,2,3}', '{4,5,6}'); + euclidean_distance +-------------------- + 5.1961524227066 +(1 row) + +-- cosine_distance +SELECT cosine_distance('{1,2}', '{4,5,6}'); +ERROR: vectors must have the same number of elements +SELECT cosine_distance('{1,2,3}', '{4,5,6}'); + cosine_distance +------------------- + 0.025368153802924 +(1 row) + +-- taxicab_distance +SELECT taxicab_distance('{1,2}', '{4,5,6}'); +ERROR: vectors must have the same number of elements +SELECT taxicab_distance('{1,2,3}', '{4,5,6}'); + taxicab_distance +------------------ + 9 +(1 row) + +-- chebyshev_distance +SELECT chebyshev_distance('{1,2}', '{4,5,6}'); +ERROR: vectors must have the same number of elements +SELECT chebyshev_distance('{1,2,3}', '{4,5,6}'); + chebyshev_distance +-------------------- + 3 +(1 row) + +-- standard_unit_vector +SELECT standard_unit_vector(0, 0); +ERROR: standard unit vectors must have at least one element +SELECT standard_unit_vector(1, 0); +ERROR: cannot set nonexistent element to one +SELECT standard_unit_vector(1, 2); +ERROR: cannot set nonexistent element to one +SELECT standard_unit_vector(3, 1); + standard_unit_vector +---------------------- + {1,0,0} +(1 row) + +SELECT standard_unit_vector(3, 3); + standard_unit_vector +---------------------- + {0,0,1} +(1 row) + +-- kmeans +SELECT * FROM kmeans(0, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1); +ERROR: must request at least one center +SELECT * FROM kmeans(5, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1); +ERROR: number of sample vectors must be greater than or equal to the number of requested centers +SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,NULL}}', 1); +ERROR: vectors must not contain nulls +SELECT * FROM kmeans(2, ARRAY[[],[]]::float8[][], 1); +ERROR: number of sample vectors must be greater than or equal to the number of requested centers +SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', -1); +ERROR: must request at least one iteration +SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1) ORDER BY 1; + kmeans +---------------- + {2.5,3.5,4.5} + {8.5,9.5,10.5} +(2 rows) + +-- closest_vector +SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', ARRAY[]::float8[]); +ERROR: vectors must have at least one element +SELECT closest_vector(ARRAY[]::float8[], '{1,2,3}'); +ERROR: must provide at least one candidate vector +SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,NULL}}', '{1,2,3}'); +ERROR: vectors must not contain nulls +SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', '{1,2}'); +ERROR: vectors must have the same number of elements +SELECT closest_vector('{1,2,3}', '{1,2}'); +ERROR: vectors must have the same number of elements +SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', '{10,11,12}'); + closest_vector +---------------- + 2 +(1 row) + +SELECT closest_vector('{1,2,3}', '{2,3,4}'); + closest_vector +---------------- + 0 +(1 row) + diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 3624035639..db453b4cdd 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -78,7 +78,7 @@ test: brin_bloom brin_multi # psql depends on create_am # amutils depends on geometry, create_index_spgist, hash_index, brin # ---------- -test: create_table_like alter_generic alter_operator misc async dbsize merge misc_functions sysviews tsrf tid tidscan tidrangescan collate.icu.utf8 incremental_sort create_role +test: create_table_like alter_generic alter_operator misc async dbsize merge misc_functions sysviews tsrf tid tidscan tidrangescan collate.icu.utf8 incremental_sort create_role vectors # collate.*.utf8 tests cannot be run in parallel with each other test: rules psql psql_crosstab amutils stats_ext collate.linux.utf8 collate.windows.win1252 diff --git a/src/test/regress/sql/vectors.sql b/src/test/regress/sql/vectors.sql new file mode 100644 index 0000000000..c2f45fff61 --- /dev/null +++ b/src/test/regress/sql/vectors.sql @@ -0,0 +1,61 @@ +SET extra_float_digits = -1; + +-- float8_array_is_vector +SELECT float8_array_is_vector(ARRAY[]::float8[]); +SELECT float8_array_is_vector('{{1.0},{1.0}}'); +SELECT float8_array_is_vector(ARRAY[NULL]::float8[]); +SELECT float8_array_is_vector('{1,2,3}'); + +-- euclidean_norm +SELECT euclidean_norm('{1,2,3}'); + +-- normalize_vector +SELECT normalize_vector('{0,0,0}'); +SELECT normalize_vector('{1,2,3}'); + +-- dot_product +SELECT dot_product('{1,2,3}', '{4,5,6}'); + +-- squared_euclidean_distance +SELECT squared_euclidean_distance('{1,2}', '{4,5,6}'); +SELECT squared_euclidean_distance('{1,2,3}', '{4,5,6}'); + +-- euclidean_distance +SELECT euclidean_distance('{1,2}', '{4,5,6}'); +SELECT euclidean_distance('{1,2,3}', '{4,5,6}'); + +-- cosine_distance +SELECT cosine_distance('{1,2}', '{4,5,6}'); +SELECT cosine_distance('{1,2,3}', '{4,5,6}'); + +-- taxicab_distance +SELECT taxicab_distance('{1,2}', '{4,5,6}'); +SELECT taxicab_distance('{1,2,3}', '{4,5,6}'); + +-- chebyshev_distance +SELECT chebyshev_distance('{1,2}', '{4,5,6}'); +SELECT chebyshev_distance('{1,2,3}', '{4,5,6}'); + +-- standard_unit_vector +SELECT standard_unit_vector(0, 0); +SELECT standard_unit_vector(1, 0); +SELECT standard_unit_vector(1, 2); +SELECT standard_unit_vector(3, 1); +SELECT standard_unit_vector(3, 3); + +-- kmeans +SELECT * FROM kmeans(0, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1); +SELECT * FROM kmeans(5, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1); +SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,NULL}}', 1); +SELECT * FROM kmeans(2, ARRAY[[],[]]::float8[][], 1); +SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', -1); +SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1) ORDER BY 1; + +-- closest_vector +SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', ARRAY[]::float8[]); +SELECT closest_vector(ARRAY[]::float8[], '{1,2,3}'); +SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,NULL}}', '{1,2,3}'); +SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', '{1,2}'); +SELECT closest_vector('{1,2,3}', '{1,2}'); +SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', '{10,11,12}'); +SELECT closest_vector('{1,2,3}', '{2,3,4}'); -- 2.25.1
>From 5db7d86a9a1b519ee8d8bdc5ce82a09f995cc929 Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nathandboss...@gmail.com> Date: Mon, 20 Mar 2023 14:23:39 -0700 Subject: [PATCH v1 3/3] Add support for OpenBLAS. If PostgreSQL is built with OpenBLAS support, euclidean_norm() and dot_product() use the optimized OpenBLAS routines instead of the open-coded formulas. --- configure | 269 ++++++++++++++++++++++++++++++ configure.ac | 35 ++++ doc/src/sgml/install-windows.sgml | 9 + doc/src/sgml/installation.sgml | 19 +++ meson.build | 21 +++ meson_options.txt | 3 + src/Makefile.global.in | 1 + src/backend/utils/adt/vector.c | 12 ++ src/include/pg_config.h.in | 6 + src/makefiles/meson.build | 1 + src/tools/msvc/Solution.pm | 13 ++ src/tools/msvc/config_default.pl | 1 + 12 files changed, 390 insertions(+) diff --git a/configure b/configure index 15daccc87f..d8f35ec6a3 100755 --- a/configure +++ b/configure @@ -649,6 +649,7 @@ PG_CRC32C_OBJS CFLAGS_CRC LIBOBJS OPENSSL +OPENBLAS ZSTD LZ4 UUID_LIBS @@ -695,6 +696,9 @@ STRIP_STATIC_LIB STRIP LDFLAGS_SL LDFLAGS_EX +OPENBLAS_LIBS +OPENBLAS_CFLAGS +with_openblas ZSTD_LIBS ZSTD_CFLAGS with_zstd @@ -872,6 +876,7 @@ with_system_tzdata with_zlib with_lz4 with_zstd +with_openblas with_ssl with_openssl enable_largefile @@ -902,6 +907,8 @@ LZ4_CFLAGS LZ4_LIBS ZSTD_CFLAGS ZSTD_LIBS +OPENBLAS_CFLAGS +OPENBLAS_LIBS LDFLAGS_EX LDFLAGS_SL PERL @@ -1584,6 +1591,7 @@ Optional Packages: --without-zlib do not use Zlib --with-lz4 build with LZ4 support --with-zstd build with ZSTD support + --with-openblas build with OpenBLAS support --with-ssl=LIB use LIB for SSL/TLS support (openssl) --with-openssl obsolete spelling of --with-ssl=openssl @@ -1614,6 +1622,10 @@ Some influential environment variables: LZ4_LIBS linker flags for LZ4, overriding pkg-config ZSTD_CFLAGS C compiler flags for ZSTD, overriding pkg-config ZSTD_LIBS linker flags for ZSTD, overriding pkg-config + OPENBLAS_CFLAGS + C compiler flags for OPENBLAS, overriding pkg-config + OPENBLAS_LIBS + linker flags for OPENBLAS, overriding pkg-config LDFLAGS_EX extra linker flags for linking executables only LDFLAGS_SL extra linker flags for linking shared libraries only PERL Perl program @@ -9605,6 +9617,148 @@ fi esac done fi + +# +# OpenBLAS +# +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with OpenBLAS support" >&5 +$as_echo_n "checking whether to build with OpenBLAS support... " >&6; } + + + +# Check whether --with-openblas was given. +if test "${with_openblas+set}" = set; then : + withval=$with_openblas; + case $withval in + yes) + +$as_echo "#define USE_OPENBLAS 1" >>confdefs.h + + ;; + no) + : + ;; + *) + as_fn_error $? "no argument expected for --with-openblas option" "$LINENO" 5 + ;; + esac + +else + with_openblas=no + +fi + + +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_openblas" >&5 +$as_echo "$with_openblas" >&6; } + + +if test "$with_openblas" = yes; then + +pkg_failed=no +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for openblas" >&5 +$as_echo_n "checking for openblas... " >&6; } + +if test -n "$OPENBLAS_CFLAGS"; then + pkg_cv_OPENBLAS_CFLAGS="$OPENBLAS_CFLAGS" + elif test -n "$PKG_CONFIG"; then + if test -n "$PKG_CONFIG" && \ + { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"openblas\""; } >&5 + ($PKG_CONFIG --exists --print-errors "openblas") 2>&5 + ac_status=$? + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; then + pkg_cv_OPENBLAS_CFLAGS=`$PKG_CONFIG --cflags "openblas" 2>/dev/null` + test "x$?" != "x0" && pkg_failed=yes +else + pkg_failed=yes +fi + else + pkg_failed=untried +fi +if test -n "$OPENBLAS_LIBS"; then + pkg_cv_OPENBLAS_LIBS="$OPENBLAS_LIBS" + elif test -n "$PKG_CONFIG"; then + if test -n "$PKG_CONFIG" && \ + { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"openblas\""; } >&5 + ($PKG_CONFIG --exists --print-errors "openblas") 2>&5 + ac_status=$? + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; then + pkg_cv_OPENBLAS_LIBS=`$PKG_CONFIG --libs "openblas" 2>/dev/null` + test "x$?" != "x0" && pkg_failed=yes +else + pkg_failed=yes +fi + else + pkg_failed=untried +fi + + + +if test $pkg_failed = yes; then + { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 +$as_echo "no" >&6; } + +if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then + _pkg_short_errors_supported=yes +else + _pkg_short_errors_supported=no +fi + if test $_pkg_short_errors_supported = yes; then + OPENBLAS_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "openblas" 2>&1` + else + OPENBLAS_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "openblas" 2>&1` + fi + # Put the nasty error message in config.log where it belongs + echo "$OPENBLAS_PKG_ERRORS" >&5 + + as_fn_error $? "Package requirements (openblas) were not met: + +$OPENBLAS_PKG_ERRORS + +Consider adjusting the PKG_CONFIG_PATH environment variable if you +installed software in a non-standard prefix. + +Alternatively, you may set the environment variables OPENBLAS_CFLAGS +and OPENBLAS_LIBS to avoid the need to call pkg-config. +See the pkg-config man page for more details." "$LINENO" 5 +elif test $pkg_failed = untried; then + { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 +$as_echo "no" >&6; } + { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it +is in your PATH or set the PKG_CONFIG environment variable to the full +path to pkg-config. + +Alternatively, you may set the environment variables OPENBLAS_CFLAGS +and OPENBLAS_LIBS to avoid the need to call pkg-config. +See the pkg-config man page for more details. + +To get pkg-config, see <http://pkg-config.freedesktop.org/>. +See \`config.log' for more details" "$LINENO" 5; } +else + OPENBLAS_CFLAGS=$pkg_cv_OPENBLAS_CFLAGS + OPENBLAS_LIBS=$pkg_cv_OPENBLAS_LIBS + { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +$as_echo "yes" >&6; } + +fi + # We only care about -I, -D, and -L switches; + # note that -lopenblas will be added by AC_CHECK_LIB below. + for pgac_option in $OPENBLAS_CFLAGS; do + case $pgac_option in + -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";; + esac + done + for pgac_option in $OPENBLAS_LIBS; do + case $pgac_option in + -L*) LDFLAGS="$LDFLAGS $pgac_option";; + esac + done +fi + # # Assignments # @@ -13276,6 +13430,56 @@ fi fi +if test "$with_openblas" = yes ; then + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for cblas_dnrm2 in -lopenblas" >&5 +$as_echo_n "checking for cblas_dnrm2 in -lopenblas... " >&6; } +if ${ac_cv_lib_openblas_cblas_dnrm2+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_check_lib_save_LIBS=$LIBS +LIBS="-lopenblas $LIBS" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +#ifdef __cplusplus +extern "C" +#endif +char cblas_dnrm2 (); +int +main () +{ +return cblas_dnrm2 (); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + ac_cv_lib_openblas_cblas_dnrm2=yes +else + ac_cv_lib_openblas_cblas_dnrm2=no +fi +rm -f core conftest.err conftest.$ac_objext \ + conftest$ac_exeext conftest.$ac_ext +LIBS=$ac_check_lib_save_LIBS +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_openblas_cblas_dnrm2" >&5 +$as_echo "$ac_cv_lib_openblas_cblas_dnrm2" >&6; } +if test "x$ac_cv_lib_openblas_cblas_dnrm2" = xyes; then : + cat >>confdefs.h <<_ACEOF +#define HAVE_LIBOPENBLAS 1 +_ACEOF + + LIBS="-lopenblas $LIBS" + +else + as_fn_error $? "library 'openblas' is required for OPENBLAS support" "$LINENO" 5 +fi + +fi + # Note: We can test for libldap_r only after we know PTHREAD_LIBS; # also, on AIX, we may need to have openssl in LIBS for this step. if test "$with_ldap" = yes ; then @@ -14064,6 +14268,71 @@ else fi +fi + +if test -z "$OPENBLAS"; then + for ac_prog in openblas +do + # Extract the first word of "$ac_prog", so it can be a program name with args. +set dummy $ac_prog; ac_word=$2 +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +$as_echo_n "checking for $ac_word... " >&6; } +if ${ac_cv_path_OPENBLAS+:} false; then : + $as_echo_n "(cached) " >&6 +else + case $OPENBLAS in + [\\/]* | ?:[\\/]*) + ac_cv_path_OPENBLAS="$OPENBLAS" # Let the user override the test with a path. + ;; + *) + as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + test -z "$as_dir" && as_dir=. + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + ac_cv_path_OPENBLAS="$as_dir/$ac_word$ac_exec_ext" + $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS + + ;; +esac +fi +OPENBLAS=$ac_cv_path_OPENBLAS +if test -n "$OPENBLAS"; then + { $as_echo "$as_me:${as_lineno-$LINENO}: result: $OPENBLAS" >&5 +$as_echo "$OPENBLAS" >&6; } +else + { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 +$as_echo "no" >&6; } +fi + + + test -n "$OPENBLAS" && break +done + +else + # Report the value of OPENBLAS in configure's output in all cases. + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for OPENBLAS" >&5 +$as_echo_n "checking for OPENBLAS... " >&6; } + { $as_echo "$as_me:${as_lineno-$LINENO}: result: $OPENBLAS" >&5 +$as_echo "$OPENBLAS" >&6; } +fi + +if test "$with_openblas" = yes; then + ac_fn_c_check_header_mongrel "$LINENO" "cblas.h" "ac_cv_header_cblas_h" "$ac_includes_default" +if test "x$ac_cv_header_cblas_h" = xyes; then : + +else + as_fn_error $? "cblas.h header file is required for OPENBLAS" "$LINENO" 5 +fi + + fi if test "$with_gssapi" = yes ; then diff --git a/configure.ac b/configure.ac index 97f5be6c73..e20ba144ec 100644 --- a/configure.ac +++ b/configure.ac @@ -1147,6 +1147,32 @@ if test "$with_zstd" = yes; then esac done fi + +# +# OpenBLAS +# +AC_MSG_CHECKING([whether to build with OpenBLAS support]) +PGAC_ARG_BOOL(with, openblas, no, [build with OpenBLAS support], + [AC_DEFINE([USE_OPENBLAS], 1, [Define to 1 to build with OpenBLAS support. (--with-openblas)])]) +AC_MSG_RESULT([$with_openblas]) +AC_SUBST(with_openblas) + +if test "$with_openblas" = yes; then + PKG_CHECK_MODULES(OPENBLAS, openblas) + # We only care about -I, -D, and -L switches; + # note that -lopenblas will be added by AC_CHECK_LIB below. + for pgac_option in $OPENBLAS_CFLAGS; do + case $pgac_option in + -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";; + esac + done + for pgac_option in $OPENBLAS_LIBS; do + case $pgac_option in + -L*) LDFLAGS="$LDFLAGS $pgac_option";; + esac + done +fi + # # Assignments # @@ -1418,6 +1444,10 @@ if test "$with_zstd" = yes ; then AC_CHECK_LIB(zstd, ZSTD_compress, [], [AC_MSG_ERROR([library 'zstd' is required for ZSTD support])]) fi +if test "$with_openblas" = yes ; then + AC_CHECK_LIB(openblas, cblas_dnrm2, [], [AC_MSG_ERROR([library 'openblas' is required for OPENBLAS support])]) +fi + # Note: We can test for libldap_r only after we know PTHREAD_LIBS; # also, on AIX, we may need to have openssl in LIBS for this step. if test "$with_ldap" = yes ; then @@ -1563,6 +1593,11 @@ if test "$with_zstd" = yes; then AC_CHECK_HEADER(zstd.h, [], [AC_MSG_ERROR([zstd.h header file is required for ZSTD])]) fi +PGAC_PATH_PROGS(OPENBLAS, openblas) +if test "$with_openblas" = yes; then + AC_CHECK_HEADER(cblas.h, [], [AC_MSG_ERROR([cblas.h header file is required for OPENBLAS])]) +fi + if test "$with_gssapi" = yes ; then AC_CHECK_HEADERS(gssapi/gssapi.h, [], [AC_CHECK_HEADERS(gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])]) diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml index cbc70a039c..a6409913d5 100644 --- a/doc/src/sgml/install-windows.sgml +++ b/doc/src/sgml/install-windows.sgml @@ -309,6 +309,15 @@ $ENV{MSBFLAGS}="/m"; </para></listitem> </varlistentry> + <varlistentry> + <term><productname>OpenBLAS</productname></term> + <listitem><para> + Required for supporting optimized linear algebra routines using + <productname>OpenBLAS</productname>. Binaries and source can be + downloaded from <ulink url="https://www.openblas.net/"></ulink>. + </para></listitem> + </varlistentry> + <varlistentry> <term><productname>OpenSSL</productname></term> <listitem><para> diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 75dc81a0a9..aa1191ced4 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -1009,6 +1009,15 @@ build-postgresql: </listitem> </varlistentry> + <varlistentry id="configure-option-with-openblas"> + <term><option>--with-openblas</option></term> + <listitem> + <para> + Build with <productname>OpenBLAS</productname> support. + </para> + </listitem> + </varlistentry> + <varlistentry id="configure-option-with-ssl"> <term><option>--with-ssl=<replaceable>LIBRARY</replaceable></option> <indexterm> @@ -2481,6 +2490,16 @@ ninja install </listitem> </varlistentry> + <varlistentry id="configure-with-openblas-meson"> + <term><option>-Dopenblas={ auto | enabled | disabled }</option></term> + <listitem> + <para> + Build with <productname>OpenBLAS</productname> support. Defaults to + auto. + </para> + </listitem> + </varlistentry> + <varlistentry id="configure-with-ssl-meson"> <term><option>-Dssl={ auto | <replaceable>LIBRARY</replaceable> }</option> <indexterm> diff --git a/meson.build b/meson.build index 096044628c..28ec293a7e 100644 --- a/meson.build +++ b/meson.build @@ -838,6 +838,25 @@ endif +############################################################### +# Library: OpenBLAS +############################################################### + +openblasopt = get_option('openblas') +if not openblasopt.disabled() + openblas = dependency('openblas', required: openblasopt) + + if openblas.found() + cdata.set('USE_OPENBLAS', 1) + cdata.set('HAVE_OPENBLAS', 1) + endif + +else + openblas = not_found_dep +endif + + + ############################################################### # Library: Tcl (for pltcl) # @@ -2848,6 +2867,7 @@ backend_both_deps += [ libintl, libxml, lz4, + openblas, pam, ssl, systemd, @@ -3378,6 +3398,7 @@ if meson.version().version_compare('>=0.57') 'llvm': llvm, 'lz4': lz4, 'nls': libintl, + 'openblas': openblas, 'openssl': ssl, 'pam': pam, 'plperl': perl_dep, diff --git a/meson_options.txt b/meson_options.txt index 5b44a8829d..fbe0cf2ca1 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -118,6 +118,9 @@ option('lz4', type : 'feature', value: 'auto', option('nls', type: 'feature', value: 'auto', description: 'native language support') +option('openblas', type: 'feature', value: 'auto', + description: 'OpenBLAS support') + option('pam', type : 'feature', value: 'auto', description: 'build with PAM support') diff --git a/src/Makefile.global.in b/src/Makefile.global.in index 772b91261d..8956f9a9fc 100644 --- a/src/Makefile.global.in +++ b/src/Makefile.global.in @@ -196,6 +196,7 @@ with_llvm = @with_llvm@ with_system_tzdata = @with_system_tzdata@ with_uuid = @with_uuid@ with_zlib = @with_zlib@ +with_openblas = @with_openblas@ enable_rpath = @enable_rpath@ enable_nls = @enable_nls@ enable_debug = @enable_debug@ diff --git a/src/backend/utils/adt/vector.c b/src/backend/utils/adt/vector.c index dba1fd7eef..2cec20c4c5 100644 --- a/src/backend/utils/adt/vector.c +++ b/src/backend/utils/adt/vector.c @@ -20,6 +20,10 @@ #include <float.h> +#ifdef USE_OPENBLAS +#include "cblas.h" +#endif + #include "catalog/pg_type.h" #include "common/int.h" #include "common/pg_prng.h" @@ -95,10 +99,14 @@ euclidean_norm_internal(int n, const float8 *a) { float8 en = 0.0; +#ifdef USE_OPENBLAS + en = cblas_dnrm2(n, a, 1); +#else for (int i = 0; i < n; i++) en = float8_pl(en, float8_mul(a[i], a[i])); en = float8_sqrt(en); +#endif return en; } @@ -160,8 +168,12 @@ dot_product(PG_FUNCTION_ARGS) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("vectors must have the same number of elements"))); +#ifdef USE_OPENBLAS + dp = cblas_ddot(an, ad, 1, bd, 1); +#else for (int i = 0; i < an; i++) dp = float8_pl(dp, float8_mul(ad[i], bd[i])); +#endif PG_RETURN_FLOAT8(dp); } diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index 6d572c3820..5d1452f346 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -322,6 +322,9 @@ /* Define to 1 if you have the `mkdtemp' function. */ #undef HAVE_MKDTEMP +/* Define to 1 if you have the 'OpenBLAS' library (-lopenblas). */ +#undef HAVE_OPENBLAS + /* Define to 1 if you have the `OPENSSL_init_ssl' function. */ #undef HAVE_OPENSSL_INIT_SSL @@ -730,6 +733,9 @@ /* Define to select named POSIX semaphores. */ #undef USE_NAMED_POSIX_SEMAPHORES +/* Define to 1 to build with OpenBLAS support. (--with-openblas) */ +#undef USE_OPENBLAS + /* Define to 1 to build with OpenSSL support. (--with-ssl=openssl) */ #undef USE_OPENSSL diff --git a/src/makefiles/meson.build b/src/makefiles/meson.build index 13045cbd6e..47c75e1446 100644 --- a/src/makefiles/meson.build +++ b/src/makefiles/meson.build @@ -231,6 +231,7 @@ pgxs_deps = { 'llvm': llvm, 'lz4': lz4, 'nls': libintl, + 'openblas': openblas, 'pam': pam, 'perl': perl_dep, 'python': python3_dep, diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm index ef10cda576..a45dff7b30 100644 --- a/src/tools/msvc/Solution.pm +++ b/src/tools/msvc/Solution.pm @@ -304,6 +304,7 @@ sub GenerateFiles HAVE_MEMORY_H => 1, HAVE_MEMSET_S => undef, HAVE_MKDTEMP => undef, + HAVE_OPENBLAS => undef, HAVE_OPENSSL_INIT_SSL => undef, HAVE_OSSP_UUID_H => undef, HAVE_PAM_PAM_APPL_H => undef, @@ -436,6 +437,7 @@ sub GenerateFiles USE_LDAP => $self->{options}->{ldap} ? 1 : undef, USE_LLVM => undef, USE_NAMED_POSIX_SEMAPHORES => undef, + USE_OPENBLAS => undef, USE_OPENSSL => undef, USE_PAM => undef, USE_SLICING_BY_8_CRC32C => undef, @@ -485,6 +487,11 @@ sub GenerateFiles $define{HAVE_LIBZSTD} = 1; $define{USE_ZSTD} = 1; } + if ($self->{options}->{openblas}) + { + $define{HAVE_OPENBLAS} = 1; + $define{USE_OPENBLAS} = 1; + } if ($self->{options}->{openssl}) { $define{USE_OPENSSL} = 1; @@ -1092,6 +1099,11 @@ sub AddProject $proj->AddIncludeDir($self->{options}->{zstd} . '\include'); $proj->AddLibrary($self->{options}->{zstd} . '\lib\libzstd.lib'); } + if ($self->{options}->{openblas}) + { + $proj->AddIncludeDir($self->{options}->{openblas} . '\include'); + $proj->AddLibrary($self->{options}->{openblas} . '\lib\openblas.lib'); + } if ($self->{options}->{uuid}) { $proj->AddIncludeDir($self->{options}->{uuid} . '\include'); @@ -1205,6 +1217,7 @@ sub GetFakeConfigure $cfg .= ' --with-libxslt' if ($self->{options}->{xslt}); $cfg .= ' --with-lz4' if ($self->{options}->{lz4}); $cfg .= ' --with-zstd' if ($self->{options}->{zstd}); + $cfg .= ' --with-openblas' if ($self->{options}->{openblas}); $cfg .= ' --with-gssapi' if ($self->{options}->{gss}); $cfg .= ' --with-icu' if ($self->{options}->{icu}); $cfg .= ' --with-tcl' if ($self->{options}->{tcl}); diff --git a/src/tools/msvc/config_default.pl b/src/tools/msvc/config_default.pl index 70b44d1531..8689439c0b 100644 --- a/src/tools/msvc/config_default.pl +++ b/src/tools/msvc/config_default.pl @@ -16,6 +16,7 @@ our $config = { icu => undef, # --with-icu=<path> lz4 => undef, # --with-lz4=<path> zstd => undef, # --with-zstd=<path> + openblas => undef, # --with-openblas=<path> nls => undef, # --enable-nls=<path> tap_tests => undef, # --enable-tap-tests tcl => undef, # --with-tcl=<path> -- 2.25.1