vector search support

Nathan Bossart Fri, 21 Apr 2023 17:07:46 -0700

Attached is a proof-of-concept/work-in-progress patch set that adds
functions for "vectors" repreѕented with one-dimensional float8 arrays.
These functions may be used in a variety of applications, but I am
proposing them with the AI/ML use-cases in mind.  I am posting this early
in the v17 cycle in hopes of gathering feedback prior to PGCon.


With the accessibility of AI/ML tools such as large language models (LLMs),
there has been a demand for storing and manipulating high-dimensional
vectors in PostgreSQL, particularly around nearest-neighbor queries.  Many
of these vectors have more than 1500 dimensions.  The cube extension [0]
provides some of the distance functionality (e.g., taxicab, Euclidean, and
Chebyshev), but it is missing some popular functions (e.g., cosine
similarity, dot product), and it is limited to 100 dimensions.  We could
extend cube to support more dimensions, but this would require reworking
its indexing code and filling in gaps between the cube data type and the
array types.  For some previous discussion about using the cube extension
for this kind of data, see [1].

float8[] is well-supported and allows for effectively unlimited dimensions
of data.  float8 matches the common output format of many AI embeddings,
and it allows us or extensions to implement indexing methods around these
functions.  This patch set does not yet contain indexing support, but we
are exploring using GiST or GIN for the use-cases in question.  It might
also be desirable to add support for other linear algebra operations (e.g.,
operations on matrices).  The attached patches likely only scratch the
surface of the "vector search" use-case.

The patch set is broken up as follows:

 * 0001 does some minor refactoring of dsqrt() in preparation for 0002.
 * 0002 adds several vector-related functions, including distance functions
   and a kmeans++ implementation.
 * 0003 adds support for optionally using the OpenBLAS library, which is an
   implementation of the Basic Linear Algebra Subprograms [2]
   specification.  Basic testing with this library showed a small
   performance boost, although perhaps not enough to justify giving this
   patch serious consideration.

Of course, there are many open questions.  For example, should PostgreSQL
support this stuff out-of-the-box in the first place?  And should we
introduce a vector data type or SQL domains for treating float8[] as
vectors?  IMHO these vector search use-cases are an exciting opportunity
for the PostgreSQL project, so I am eager to hear what folks think.

[0] https://www.postgresql.org/docs/current/cube.html
[1] https://postgr.es/m/2271927.1593097400%40sss.pgh.pa.us
[2] https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

>From b54b938bf35f26bc8e07cd57d4bf616b7af36709 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Sun, 19 Mar 2023 14:37:54 -0700
Subject: [PATCH v1 1/3] Refactor dsqrt().

This moves most of the logic for dsqrt() to a helper function in
float.h so that it can be reused elsewhere.
---
 src/backend/utils/adt/float.c | 16 +---------------
 src/include/utils/float.h     | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 9b51da2382..4f655c281a 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -1445,21 +1445,7 @@ dtrunc(PG_FUNCTION_ARGS)
 Datum
 dsqrt(PG_FUNCTION_ARGS)
 {
-	float8		arg1 = PG_GETARG_FLOAT8(0);
-	float8		result;
-
-	if (arg1 < 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_ARGUMENT_FOR_POWER_FUNCTION),
-				 errmsg("cannot take square root of a negative number")));
-
-	result = sqrt(arg1);
-	if (unlikely(isinf(result)) && !isinf(arg1))
-		float_overflow_error();
-	if (unlikely(result == 0.0) && arg1 != 0.0)
-		float_underflow_error();
-
-	PG_RETURN_FLOAT8(result);
+	PG_RETURN_FLOAT8(float8_sqrt(PG_GETARG_FLOAT8(0)));
 }
 
 
diff --git a/src/include/utils/float.h b/src/include/utils/float.h
index 7529899d63..b4e50cccaf 100644
--- a/src/include/utils/float.h
+++ b/src/include/utils/float.h
@@ -250,6 +250,25 @@ float8_div(const float8 val1, const float8 val2)
 	return result;
 }
 
+static inline float8
+float8_sqrt(const float8 val)
+{
+	float8		result;
+
+	if (unlikely(val < 0))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_ARGUMENT_FOR_POWER_FUNCTION),
+				 errmsg("cannot take square root of a negative number")));
+
+	result = sqrt(val);
+	if (unlikely(isinf(result)) && !isinf(val))
+		float_overflow_error();
+	if (unlikely(result == 0.0) && val != 0.0)
+		float_underflow_error();
+
+	return result;
+}
+
 /*
  * Routines for NaN-aware comparisons
  *
-- 
2.25.1

>From 351e5e05b471255d0f831d330f48ce95f16095eb Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Sun, 19 Mar 2023 23:21:58 -0700
Subject: [PATCH v1 2/3] Introduce basic vector support.

This change introduces a variety of SQL-callable functions for
"vectors", or 1-D float8 arrays with no NULL values and at least
one element.
---
 doc/src/sgml/func.sgml                | 241 +++++++++++
 src/backend/utils/adt/Makefile        |   1 +
 src/backend/utils/adt/meson.build     |   1 +
 src/backend/utils/adt/vector.c        | 589 ++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |  49 +++
 src/test/regress/expected/vectors.out | 154 +++++++
 src/test/regress/parallel_schedule    |   2 +-
 src/test/regress/sql/vectors.sql      |  61 +++
 8 files changed, 1097 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/utils/adt/vector.c
 create mode 100644 src/test/regress/expected/vectors.out
 create mode 100644 src/test/regress/sql/vectors.sql

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5a47ce4343..1bd71ccefe 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18950,6 +18950,247 @@ SELECT NULLIF(value, '(none)') ...
 </programlisting>
        </para></entry>
       </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>array_is_vector</primary>
+        </indexterm>
+        <function>array_is_vector</function> ( <type>float8[]</type> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Returns whether the array is a vector (i.e., a one-dimensional array of
+        <type>float8</type> with no NULL values and at least one element).  The
+        values in such arrays are treated as Cartesian coordinates in
+        <literal>n</literal>-dimensional Euclidean space, where
+        <literal>n</literal> is the cardinality of the array.
+       </para>
+       <para>
+        <literal>array_is_vector{'1,2,3}')</literal>
+        <returnvalue>t</returnvalue>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>euclidean_norm</primary>
+        </indexterm>
+        <function>euclidean_norm</function> ( <type>float8[]</type> )
+        <returnvalue>float8</returnvalue>
+       </para>
+       <para>
+        Returns the Euclidean norm of the array.  The array must be a vector
+        (see <function>array_is_vector</function> for details about what
+        constitutes a vector).
+       </para>
+       <para>
+        <literal>euclidean_norm('{1,2,3}')</literal>
+        <returnvalue>3.7416573867739413</returnvalue>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>normalize_vector</primary>
+        </indexterm>
+        <function>normalize_vector</function> ( <type>float8[]</type> )
+        <returnvalue>float8[]</returnvalue>
+       </para>
+       <para>
+        Returns a normalized version of the array.  The array must be a vector
+        (see <function>array_is_vector</function> for details about what
+        constitutes a vector), and its Euclidean norm must be nonzero.
+       </para>
+       <para>
+        <literal>normalize_vector('{1,2,3}')</literal>
+        <returnvalue>{0.2672612419124244,0.5345224838248488,0.8017837257372732}</returnvalue>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>dot_product</primary>
+        </indexterm>
+        <function>dot_product</function> ( <type>float8[]</type>, <type>float8[]</type> )
+        <returnvalue>float8</returnvalue>
+       </para>
+       <para>
+        Returns the dot product of the arrays.  The arrays must be vectors (see
+        <function>array_is_vector</function> for details about what constitutes
+        a vector).
+       </para>
+       <para>
+        <literal>dot_product('{1,2,3}', '{4,5,6}')</literal>
+        <returnvalue>32</returnvalue>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>squared_euclidean_distance</primary>
+        </indexterm>
+        <function>squared_euclidean_distance</function> ( <type>float8[]</type>, <type>float8[]</type> )
+        <returnvalue>float8</returnvalue>
+       </para>
+       <para>
+        Returns the squared Euclidean distance between the arrays.  The arrays
+        must be vectors (see <function>array_is_vector</function> for details
+        about what constitutes a vector).
+       </para>
+       <para>
+        <literal>squared_euclidean_distance('{1,2,3}', '{4,5,6}')</literal>
+        <returnvalue>27</returnvalue>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>euclidean_distance</primary>
+        </indexterm>
+        <function>euclidean_distance</function> ( <type>float8[]</type>, <type>float8[]</type> )
+        <returnvalue>float8</returnvalue>
+       </para>
+       <para>
+        Returns the Euclidean distance between the arrays.  The arrays must be
+        vectors (see <function>array_is_vector</function> for details about
+        what constitutes a vector).
+       </para>
+       <para>
+        <literal>euclidean_distance('{1,2,3}', '{4,5,6}')</literal>
+        <returnvalue>5.196152422706632</returnvalue>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>cosine_distance</primary>
+        </indexterm>
+        <function>cosine_distance</function> ( <type>float8[]</type>, <type>float8[]</type> )
+        <returnvalue>float8</returnvalue>
+       </para>
+       <para>
+        Returns the cosine distance between the arrays.  The arrays must be
+        vectors (see <function>array_is_vector</function> for details about
+        what constitutes a vector).
+       </para>
+       <para>
+        <literal>cosine_distance('{1,2,3}', '{4,5,6}')</literal>
+        <returnvalue>0.025368153802923787</returnvalue>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>taxicab_distance</primary>
+        </indexterm>
+        <function>taxicab_distance</function> ( <type>float8[]</type>, <type>float8[]</type> )
+        <returnvalue>float8</returnvalue>
+       </para>
+       <para>
+        Returns the taxicab distance between the arrays.  The arrays must be
+        vectors (see <function>array_is_vector</function> for details about
+        what constitutes a vector).
+       </para>
+       <para>
+        <literal>taxicab_distance('{1,2,3}', '{4,5,6}')</literal>
+        <returnvalue>9</returnvalue>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>chebyshev_distance</primary>
+        </indexterm>
+        <function>chebyshev_distance</function> ( <type>float8[]</type>, <type>float8[]</type> )
+        <returnvalue>float8</returnvalue>
+       </para>
+       <para>
+        Returns the Chebyshev distance between the arrays.  The arrays must be
+        vectors (see <function>array_is_vector</function> for details about
+        what constitutes a vector).
+       </para>
+       <para>
+        <literal>chebyshev_distance('{1,2,3}', '{4,5,6}')</literal>
+        <returnvalue>3</returnvalue>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>standard_unit_vector</primary>
+        </indexterm>
+        <function>standard_unit_vector</function> ( <parameter>d</parameter> <type>integer</type>, <parameter>n</parameter> <type>integer</type> )
+        <returnvalue>float8[]</returnvalue>
+       </para>
+       <para>
+        Returns an array of <type>float8</type> with <parameter>d</parameter>
+        elements.  All elements are set to <literal>0</literal> except for the
+        <parameter>n</parameter>th element, which is set to
+        <literal>1</literal>.
+       </para>
+       <para>
+        <literal>standard_unit_vector(3, 1)</literal>
+        <returnvalue>{1,0,0}</returnvalue>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>kmeans</primary>
+        </indexterm>
+        <function>kmeans</function> ( <parameter>n</parameter> <type>integer</type>, <parameter>v</parameter> <type>float8[]</type>, <parameter>i</parameter> <type>integer</type> )
+        <returnvalue>setof float8[]</returnvalue>
+       </para>
+       <para>
+        Returns <parameter>n</parameter> centers calculated via
+        <parameter>i</parameter> iterations of the <literal>kmeans++</literal>
+        algorithm on the array of vectors <parameter>v</parameter> (see
+        <function>array_is_vector</function> for details about what constitutes
+        a vector).
+       </para>
+       <para>
+        <literal>kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1)</literal>
+        <returnvalue></returnvalue>
+<programlisting>
+     kmeans
+----------------
+ {2.5,3.5,4.5}
+ {8.5,9.5,10.5}
+</programlisting>
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>closest_vector</primary>
+        </indexterm>
+        <function>closest_vector</function> ( <parameter>c</parameter> <type>float8[]</type>, <parameter>v</parameter> <type>float8[]</type> )
+        <returnvalue>integer</returnvalue>
+       </para>
+       <para>
+        Returns the index of the vector in <parameter>c</parameter> that is
+        closest to <parameter>v</parameter>.  <parameter>c</parameter> must be
+        an array of vectors, and <parameter>v</parameter> must be a vector (see
+        <function>array_is_vector</function> for details about what constitutes
+        a vector).
+       </para>
+       <para>
+        <literal>closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', '{10,11,12}')</literal>
+        <returnvalue>2</returnvalue>
+       </para></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 0de0bbb1b8..ffda4038f1 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -114,6 +114,7 @@ OBJS = \
 	varbit.o \
 	varchar.o \
 	varlena.o \
+	vector.o \
 	version.o \
 	windowfuncs.o \
 	xid.o \
diff --git a/src/backend/utils/adt/meson.build b/src/backend/utils/adt/meson.build
index 8515cd9365..dfa4eeb52f 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -101,6 +101,7 @@ backend_sources += files(
   'varbit.c',
   'varchar.c',
   'varlena.c',
+  'vector.c',
   'version.c',
   'windowfuncs.c',
   'xid.c',
diff --git a/src/backend/utils/adt/vector.c b/src/backend/utils/adt/vector.c
new file mode 100644
index 0000000000..dba1fd7eef
--- /dev/null
+++ b/src/backend/utils/adt/vector.c
@@ -0,0 +1,589 @@
+/*-------------------------------------------------------------------------
+ *
+ * vector.c
+ *      Support functions for float8 arrays treated as vectors.
+ *
+ * For the purposes of this file, a "vector" is a one-dimensional float8
+ * array with no NULL values and at least one element.  The values in such
+ * arrays are treated as Cartesian coordinates in n-dimensional Euclidean
+ * space, where "n" is the cardinality of the array.  For more information,
+ * see https://en.wikipedia.org/wiki/Cartesian_coordinate_system.
+ *
+ * Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/backend/utils/adt/vector.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include <float.h>
+
+#include "catalog/pg_type.h"
+#include "common/int.h"
+#include "common/pg_prng.h"
+#include "funcapi.h"
+#include "utils/array.h"
+#include "utils/float.h"
+#include "utils/fmgrprotos.h"
+
+static inline int float8_array_is_vector_internal(ArrayType *v, bool error);
+static inline float8 euclidean_norm_internal(int n, const float8 *a);
+static inline float8 squared_euclidean_distance_internal(int n, const float8 *a, const float8 *b);
+static inline float8 euclidean_distance_internal(int n, const float8 *a, const float8 *b);
+
+/*
+ * Checks that an array is a vector (as described at the top of this file) and
+ * returns its cardinality.  If "error" is true, this function ERRORs if the
+ * array is not a vector.  If "error" is false, this function returns -1 if the
+ * array is not a vector.
+ */
+static inline int
+float8_array_is_vector_internal(ArrayType *v, bool error)
+{
+	int			n = ArrayGetNItems(ARR_NDIM(v), ARR_DIMS(v));
+
+	if (unlikely(n == 0))
+	{
+		if (error)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("vectors must have at least one element")));
+		return -1;
+	}
+
+	if (unlikely(ARR_NDIM(v) != 1))
+	{
+		if (error)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("vectors must be one-dimensional arrays")));
+		return -1;
+	}
+
+	if (unlikely(ARR_HASNULL(v) && array_contains_nulls(v)))
+	{
+		if (error)
+			ereport(ERROR,
+					(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+					 errmsg("vectors must not contain nulls")));
+		return -1;
+	}
+
+	return n;
+}
+
+/*
+ * SQL-callable version of float8_array_is_vector_internal().  Unlike the
+ * internal version, this does not return the array's cardinality.
+ */
+Datum
+float8_array_is_vector(PG_FUNCTION_ARGS)
+{
+	ArrayType  *v = PG_GETARG_ARRAYTYPE_P(0);
+	bool		r = (float8_array_is_vector_internal(v, false) != -1);
+
+	PG_RETURN_BOOL(r);
+}
+
+/*
+ * Workhorse for euclidean_norm() and normalize_vector().
+ */
+static inline float8
+euclidean_norm_internal(int n, const float8 *a)
+{
+	float8		en = 0.0;
+
+	for (int i = 0; i < n; i++)
+		en = float8_pl(en, float8_mul(a[i], a[i]));
+
+	en = float8_sqrt(en);
+
+	return en;
+}
+
+/*
+ * Returns the Euclidean norm of a vector.  For more information, see
+ * https://en.wikipedia.org/wiki/Norm_(mathematics)#Euclidean_norm
+ */
+Datum
+euclidean_norm(PG_FUNCTION_ARGS)
+{
+	ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
+	int			an = float8_array_is_vector_internal(a, true);
+	float8	   *ad = (float8 *) ARR_DATA_PTR(a);
+
+	PG_RETURN_FLOAT8(euclidean_norm_internal(an, ad));
+}
+
+/*
+ * Returns a normalized version of the given vector.  For more information, see
+ * https://en.wikipedia.org/wiki/Unit_vector.
+ */
+Datum
+normalize_vector(PG_FUNCTION_ARGS)
+{
+	ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
+	ArrayType  *ua;
+	int			an = float8_array_is_vector_internal(a, true);
+	float8	   *ad = (float8 *) ARR_DATA_PTR(a);
+	float8		en = euclidean_norm_internal(an, ad);
+	Datum	   *ud = palloc(an * sizeof(Datum));
+
+	for (int i = 0; i < an; i++)
+		ud[i] = Float8GetDatumFast(float8_div(ad[i], en));
+
+	ua = construct_array(ud, an, FLOAT8OID, sizeof(float8),
+						 FLOAT8PASSBYVAL, TYPALIGN_DOUBLE);
+
+	PG_RETURN_ARRAYTYPE_P(ua);
+}
+
+/*
+ * Returns the dot product of two vectors.  For more information, see
+ * https://en.wikipedia.org/wiki/Dot_product.
+ */
+Datum
+dot_product(PG_FUNCTION_ARGS)
+{
+	ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
+	ArrayType  *b = PG_GETARG_ARRAYTYPE_P(1);
+	int			an = float8_array_is_vector_internal(a, true);
+	int			bn = float8_array_is_vector_internal(b, true);
+	float8	   *ad = (float8 *) ARR_DATA_PTR(a);
+	float8	   *bd = (float8 *) ARR_DATA_PTR(b);
+	float8		dp = 0.0;
+
+	if (an != bn)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("vectors must have the same number of elements")));
+
+	for (int i = 0; i < an; i++)
+		dp = float8_pl(dp, float8_mul(ad[i], bd[i]));
+
+	PG_RETURN_FLOAT8(dp);
+}
+
+/*
+ * Workhorse for squared_euclidean_distance() and
+ * euclidean_distance_internal().
+ */
+static inline float8
+squared_euclidean_distance_internal(int n, const float8 *a, const float8 *b)
+{
+	float8		sed = 0.0;
+
+	for (int i = 0; i < n; i++)
+	{
+		float8		d = float8_mi(a[i], b[i]);
+
+		sed = float8_pl(sed, float8_mul(d, d));
+	}
+
+	return sed;
+}
+
+/*
+ * Returns the squared Euclidean distance between two vectors.  For more
+ * information, see
+ * https://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance.
+ */
+Datum
+squared_euclidean_distance(PG_FUNCTION_ARGS)
+{
+	ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
+	ArrayType  *b = PG_GETARG_ARRAYTYPE_P(1);
+	int			an = float8_array_is_vector_internal(a, true);
+	int			bn = float8_array_is_vector_internal(b, true);
+	float8	   *ad = (float8 *) ARR_DATA_PTR(a);
+	float8	   *bd = (float8 *) ARR_DATA_PTR(b);
+	float8		sed;
+
+	if (an != bn)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("vectors must have the same number of elements")));
+
+	sed = squared_euclidean_distance_internal(an, ad, bd);
+
+	PG_RETURN_FLOAT8(sed);
+}
+
+/*
+ * Workhorse for euclidean_distance().
+ */
+static inline float8
+euclidean_distance_internal(int n, const float8 *a, const float8 *b)
+{
+	float8		sed = squared_euclidean_distance_internal(n, a, b);
+
+	return float8_sqrt(sed);
+}
+
+/*
+ * Returns the Euclidean distance between vectors.  For more information, see
+ * https://en.wikipedia.org/wiki/Euclidean_distance.
+ */
+Datum
+euclidean_distance(PG_FUNCTION_ARGS)
+{
+	ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
+	ArrayType  *b = PG_GETARG_ARRAYTYPE_P(1);
+	int			an = float8_array_is_vector_internal(a, true);
+	int			bn = float8_array_is_vector_internal(b, true);
+	float8	   *ad = (float8 *) ARR_DATA_PTR(a);
+	float8	   *bd = (float8 *) ARR_DATA_PTR(b);
+	float8		ed;
+
+	if (an != bn)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("vectors must have the same number of elements")));
+
+	ed = euclidean_distance_internal(an, ad, bd);
+
+	PG_RETURN_FLOAT8(ed);
+}
+
+/*
+ * Returns the cosine distance between two vectors.  For more information, see
+ * https://en.wikipedia.org/wiki/Cosine_similarity#Cosine_Distance.
+ */
+Datum
+cosine_distance(PG_FUNCTION_ARGS)
+{
+	ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
+	ArrayType  *b = PG_GETARG_ARRAYTYPE_P(1);
+	int			an = float8_array_is_vector_internal(a, true);
+	int			bn = float8_array_is_vector_internal(b, true);
+	float8	   *ad = (float8 *) ARR_DATA_PTR(a);
+	float8	   *bd = (float8 *) ARR_DATA_PTR(b);
+	float8		dp = 0.0;
+	float8		aen = 0.0;
+	float8		ben = 0.0;
+	float8		cd;
+
+	if (an != bn)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("vectors must have the same number of elements")));
+
+	for (int i = 0; i < an; i++)
+	{
+		dp = float8_pl(dp, float8_mul(ad[i], bd[i]));
+		aen = float8_pl(aen, float8_mul(ad[i], ad[i]));
+		ben = float8_pl(ben, float8_mul(bd[i], bd[i]));
+	}
+
+	cd = float8_mi(1.0, float8_div(dp, float8_sqrt(float8_mul(aen, ben))));
+
+	PG_RETURN_FLOAT8(cd);
+}
+
+/*
+ * Returns the taxicab distance between two vectors.  For more information, see
+ * https://en.wikipedia.org/wiki/Taxicab_geometry.
+ */
+Datum
+taxicab_distance(PG_FUNCTION_ARGS)
+{
+	ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
+	ArrayType  *b = PG_GETARG_ARRAYTYPE_P(1);
+	int			an = float8_array_is_vector_internal(a, true);
+	int			bn = float8_array_is_vector_internal(b, true);
+	float8	   *ad = (float8 *) ARR_DATA_PTR(a);
+	float8	   *bd = (float8 *) ARR_DATA_PTR(b);
+	float8		td = 0.0;
+
+	if (an != bn)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("vectors must have the same number of elements")));
+
+	for (int i = 0; i < an; i++)
+		td = float8_pl(td, fabs(float8_mi(ad[i], bd[i])));
+
+	PG_RETURN_FLOAT8(td);
+}
+
+/*
+ * Returns the Chebyshev distance between two vectors.  For more information,
+ * see https://en.wikipedia.org/wiki/Chebyshev_distance.
+ */
+Datum
+chebyshev_distance(PG_FUNCTION_ARGS)
+{
+	ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
+	ArrayType  *b = PG_GETARG_ARRAYTYPE_P(1);
+	int			an = float8_array_is_vector_internal(a, true);
+	int			bn = float8_array_is_vector_internal(b, true);
+	float8	   *ad = (float8 *) ARR_DATA_PTR(a);
+	float8	   *bd = (float8 *) ARR_DATA_PTR(b);
+	float8		cd = 0.0;
+
+	if (an != bn)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("vectors must have the same number of elements")));
+
+	for (int i = 0; i < an; i++)
+		cd = float8_max(cd, fabs(float8_mi(ad[i], bd[i])));
+
+	PG_RETURN_FLOAT8(cd);
+}
+
+/*
+ * Returns a standard unit vector with the nth element set to 1.  All other
+ * elements are set to 0.  For more information, see
+ * https://en.wikipedia.org/wiki/Standard_basis.
+ */
+Datum
+standard_unit_vector(PG_FUNCTION_ARGS)
+{
+	int			d = PG_GETARG_INT32(0);
+	int			n = PG_GETARG_INT32(1);
+	Datum	   *ud = palloc(d * sizeof(Datum));
+	ArrayType  *ua;
+
+	if (d <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("standard unit vectors must have at least one element")));
+
+	if (n == 0 || n > d)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("cannot set nonexistent element to one")));
+
+	for (int i = 0; i < d; i++)
+		ud[i] = Float8GetDatumFast(0.0);
+
+	ud[n - 1] = Float8GetDatumFast(1.0);
+
+	ua = construct_array(ud, d, FLOAT8OID, sizeof(float8),
+						 FLOAT8PASSBYVAL, TYPALIGN_DOUBLE);
+
+	PG_RETURN_ARRAYTYPE_P(ua);
+}
+
+/*
+ * Returns centers generated via the kmeans++ algorithm.  For more information,
+ * see https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf.
+ *
+ * XXX: Accelerate using triange inequality (see
+ * https://cseweb.ucsd.edu/~elkan/kmeansicml03.pdf).
+ */
+Datum
+kmeans(PG_FUNCTION_ARGS)
+{
+	int			ncenters = PG_GETARG_INT32(0);
+	ArrayType  *a = PG_GETARG_ARRAYTYPE_P(1);
+	int			iterations = PG_GETARG_INT32(2);
+	int			nsamples = ARR_NDIM(a) > 1 ? ARR_DIMS(a)[0] : ARR_NDIM(a);
+	int			dimensions;
+	int		   *clusters = palloc(nsamples * sizeof(int));
+	int		   *pops = palloc(ncenters * sizeof(int));
+	float8	   **centers = palloc(ncenters * sizeof(float8 *));
+	float8	   **vectors = palloc(nsamples * sizeof(float8 *));
+	float8	   *weights = palloc(nsamples * sizeof(float8));
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	Datum	   *cdata;
+	pg_prng_state rstate;
+
+	InitMaterializedSRF(fcinfo, MAT_SRF_USE_EXPECTED_DESC);
+
+	if (ncenters < 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("must request at least one center")));
+
+	if (ncenters > nsamples)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("number of sample vectors must be greater than or equal to the number of requested centers")));
+
+	if (ARR_HASNULL(a) && array_contains_nulls(a))
+		ereport(ERROR,
+				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+				 errmsg("vectors must not contain nulls")));
+
+	if (iterations < 1)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("must request at least one iteration")));
+
+	dimensions = ARR_NDIM(a) > 1 ? ARR_DIMS(a)[1] : ARR_DIMS(a)[0];
+	for (int i = 0; i < nsamples; i++)
+	{
+		int			r = -1;
+
+		if (pg_mul_s32_overflow(i, dimensions, &r))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("sample vectors too large")));
+
+		vectors[i] = ((float8 *) ARR_DATA_PTR(a)) + r;
+	}
+
+	/* choose initial center randomly */
+	pg_prng_seed(&rstate, 0x5eed);
+	centers[0] = vectors[(int) float8_mul(pg_prng_double(&rstate), nsamples)];
+
+	for (int i = 0; i < nsamples; i++)
+		weights[i] = DBL_MAX;
+
+	/* calculate initial values for the other centers */
+	for (int i = 1; i < ncenters; i++)
+	{
+		float8		sum = 0.0;
+		float8		wi;
+
+		for (int j = 0; j < nsamples; j++)
+		{
+			float8		distance = euclidean_distance_internal(dimensions,
+															   centers[i - 1],
+															   vectors[j]);
+
+			weights[j] = float8_min(distance, weights[j]);
+			sum = float8_pl(sum, weights[j]);
+		}
+
+		wi = float8_mul(pg_prng_double(&rstate), sum);
+		for (int j = 0; j < nsamples; j++)
+		{
+			wi = float8_mi(wi, weights[j]);
+			if (float8_le(wi, 0))
+			{
+				centers[i] = vectors[j];
+				break;
+			}
+		}
+	}
+
+	/* XXX: exit early if converged */
+	for (int i = 0; i < iterations; i++)
+	{
+		/* assign each sample to a center */
+		for (int j = 0; j < nsamples; j++)
+		{
+			float8		min_distance = DBL_MAX;
+
+			for (int k = 0; k < ncenters; k++)
+			{
+				float8		distance = euclidean_distance_internal(dimensions,
+																   centers[k],
+																   vectors[j]);
+
+				if (float8_lt(distance, min_distance))
+				{
+					min_distance = distance;
+					clusters[j] = k;
+				}
+			}
+		}
+
+		/* clear centers */
+		for (int j = 0; j < ncenters; j++)
+		{
+			if (i == 0)
+				centers[j] = palloc0(dimensions * sizeof(float8));
+			else
+				memset(centers[j], 0, dimensions * sizeof(float8));
+
+			pops[j] = 0;
+		}
+
+		/* set each center to average of all vectors in cluster */
+		for (int j = 0; j < nsamples; j++)
+		{
+			int			cluster = clusters[j];
+			float8	   *center = centers[cluster];
+
+			for (int k = 0; k < dimensions; k++)
+				center[k] = float8_pl(center[k], vectors[j][k]);
+
+			pops[cluster]++;
+		}
+
+		for (int j = 0; j < ncenters; j++)
+		{
+			for (int k = 0; k < dimensions; k++)
+				centers[j][k] = float8_div(centers[j][k], pops[j]);
+		}
+	}
+
+	cdata = palloc(dimensions * sizeof(Datum));
+	for (int i = 0; i < ncenters; i++)
+	{
+		Datum		values[1];
+		bool		nulls[1];
+
+		for (int j = 0; j < dimensions; j++)
+			cdata[j] = Float8GetDatumFast(centers[i][j]);
+
+		values[0] = PointerGetDatum(construct_array(cdata, dimensions,
+													FLOAT8OID, sizeof(float8),
+													FLOAT8PASSBYVAL,
+													TYPALIGN_DOUBLE));
+		nulls[0] = false;
+
+		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+	}
+
+	return (Datum) 0;
+}
+
+/*
+ * Returns the index of the closest vector.
+ */
+Datum
+closest_vector(PG_FUNCTION_ARGS)
+{
+	ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
+	ArrayType  *v = PG_GETARG_ARRAYTYPE_P(1);
+	int			alen = ARR_NDIM(a) > 1 ? ARR_DIMS(a)[0] : ARR_NDIM(a);
+	int			dimensions = float8_array_is_vector_internal(v, true);
+	float8		min_dist = DBL_MAX;
+	float8	   *vd = (float8 *) ARR_DATA_PTR(v);
+	int			idx = -1;
+
+	if (alen == 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("must provide at least one candidate vector")));
+
+	if (ARR_HASNULL(a) && array_contains_nulls(a))
+		ereport(ERROR,
+				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+				 errmsg("vectors must not contain nulls")));
+
+	if ((ARR_NDIM(a) > 1 ? ARR_DIMS(a)[1] : ARR_DIMS(a)[0]) != dimensions)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("vectors must have the same number of elements")));
+
+	for (int i = 0; i < alen; i++)
+	{
+		float8	   *c;
+		float8		d;
+		int			r = -1;
+
+		if (pg_mul_s32_overflow(i, dimensions, &r))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("candidate vectors too large")));
+
+		c = ((float8 *) ARR_DATA_PTR(a)) + r;
+		d = euclidean_distance_internal(dimensions, c, vd);
+
+		if (r == -1 || float8_lt(d, min_dist))
+		{
+			min_dist = d;
+			idx = i;
+		}
+	}
+
+	PG_RETURN_INT32(idx);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b2bc81b15f..a07c5b10b0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1735,6 +1735,55 @@
   proargtypes => 'internal oid internal int2 internal',
   prosrc => 'arraycontjoinsel' },
 
+{ oid => '4642', descr => 'verify float8 array is a vector',
+  proname => 'float8_array_is_vector', prorettype => 'bool',
+  proargtypes => '_float8',
+  prosrc => 'float8_array_is_vector' },
+{ oid => '4643', descr => 'euclidean norm',
+  proname => 'euclidean_norm', prorettype => 'float8',
+  proargtypes => '_float8',
+  prosrc => 'euclidean_norm' },
+{ oid => '4644', descr => 'dot product',
+  proname => 'dot_product', prorettype => 'float8',
+  proargtypes => '_float8 _float8',
+  prosrc => 'dot_product' },
+{ oid => '4645', descr => 'normalize a vector',
+  proname => 'normalize_vector', prorettype => '_float8',
+  proargtypes => '_float8',
+  prosrc => 'normalize_vector' },
+{ oid => '4646', descr => 'squared euclidean distance',
+  proname => 'squared_euclidean_distance', prorettype => 'float8',
+  proargtypes => '_float8 _float8',
+  prosrc => 'squared_euclidean_distance' },
+{ oid => '4647', descr => 'euclidean distance',
+  proname => 'euclidean_distance', prorettype => 'float8',
+  proargtypes => '_float8 _float8',
+  prosrc => 'euclidean_distance' },
+{ oid => '4648', descr => 'cosine distance',
+  proname => 'cosine_distance', prorettype => 'float8',
+  proargtypes => '_float8 _float8',
+  prosrc => 'cosine_distance' },
+{ oid => '4649', descr => 'taxicab distance',
+  proname => 'taxicab_distance', prorettype => 'float8',
+  proargtypes => '_float8 _float8',
+  prosrc => 'taxicab_distance' },
+{ oid => '4650', descr => 'chebyshev distance',
+  proname => 'chebyshev_distance', prorettype => 'float8',
+  proargtypes => '_float8 _float8',
+  prosrc => 'chebyshev_distance' },
+{ oid => '4651', descr => 'standard unit vector',
+  proname => 'standard_unit_vector', prorettype => '_float8',
+  proargtypes => 'int4 int4',
+  prosrc => 'standard_unit_vector' },
+{ oid => '4652', descr => 'kmeans',
+  proname => 'kmeans', prorettype => '_float8', proretset => 't',
+  proargtypes => 'int4 _float8 int4', prorows => '1000',
+  prosrc => 'kmeans' },
+{ oid => '4653', descr => 'closest vector',
+  proname => 'closest_vector', prorettype => 'int4',
+  proargtypes => '_float8 _float8',
+  prosrc => 'closest_vector' },
+
 { oid => '764', descr => 'large object import',
   proname => 'lo_import', provolatile => 'v', proparallel => 'u',
   prorettype => 'oid', proargtypes => 'text', prosrc => 'be_lo_import' },
diff --git a/src/test/regress/expected/vectors.out b/src/test/regress/expected/vectors.out
new file mode 100644
index 0000000000..846f02433c
--- /dev/null
+++ b/src/test/regress/expected/vectors.out
@@ -0,0 +1,154 @@
+SET extra_float_digits = -1;
+-- float8_array_is_vector
+SELECT float8_array_is_vector(ARRAY[]::float8[]);
+ float8_array_is_vector 
+------------------------
+ f
+(1 row)
+
+SELECT float8_array_is_vector('{{1.0},{1.0}}');
+ float8_array_is_vector 
+------------------------
+ f
+(1 row)
+
+SELECT float8_array_is_vector(ARRAY[NULL]::float8[]);
+ float8_array_is_vector 
+------------------------
+ f
+(1 row)
+
+SELECT float8_array_is_vector('{1,2,3}');
+ float8_array_is_vector 
+------------------------
+ t
+(1 row)
+
+-- euclidean_norm
+SELECT euclidean_norm('{1,2,3}');
+ euclidean_norm  
+-----------------
+ 3.7416573867739
+(1 row)
+
+-- normalize_vector
+SELECT normalize_vector('{0,0,0}');
+ERROR:  division by zero
+SELECT normalize_vector('{1,2,3}');
+                   normalize_vector                   
+------------------------------------------------------
+ {0.26726124191242,0.53452248382485,0.80178372573727}
+(1 row)
+
+-- dot_product
+SELECT dot_product('{1,2,3}', '{4,5,6}');
+ dot_product 
+-------------
+          32
+(1 row)
+
+-- squared_euclidean_distance
+SELECT squared_euclidean_distance('{1,2}', '{4,5,6}');
+ERROR:  vectors must have the same number of elements
+SELECT squared_euclidean_distance('{1,2,3}', '{4,5,6}');
+ squared_euclidean_distance 
+----------------------------
+                         27
+(1 row)
+
+-- euclidean_distance
+SELECT euclidean_distance('{1,2}', '{4,5,6}');
+ERROR:  vectors must have the same number of elements
+SELECT euclidean_distance('{1,2,3}', '{4,5,6}');
+ euclidean_distance 
+--------------------
+    5.1961524227066
+(1 row)
+
+-- cosine_distance
+SELECT cosine_distance('{1,2}', '{4,5,6}');
+ERROR:  vectors must have the same number of elements
+SELECT cosine_distance('{1,2,3}', '{4,5,6}');
+  cosine_distance  
+-------------------
+ 0.025368153802924
+(1 row)
+
+-- taxicab_distance
+SELECT taxicab_distance('{1,2}', '{4,5,6}');
+ERROR:  vectors must have the same number of elements
+SELECT taxicab_distance('{1,2,3}', '{4,5,6}');
+ taxicab_distance 
+------------------
+                9
+(1 row)
+
+-- chebyshev_distance
+SELECT chebyshev_distance('{1,2}', '{4,5,6}');
+ERROR:  vectors must have the same number of elements
+SELECT chebyshev_distance('{1,2,3}', '{4,5,6}');
+ chebyshev_distance 
+--------------------
+                  3
+(1 row)
+
+-- standard_unit_vector
+SELECT standard_unit_vector(0, 0);
+ERROR:  standard unit vectors must have at least one element
+SELECT standard_unit_vector(1, 0);
+ERROR:  cannot set nonexistent element to one
+SELECT standard_unit_vector(1, 2);
+ERROR:  cannot set nonexistent element to one
+SELECT standard_unit_vector(3, 1);
+ standard_unit_vector 
+----------------------
+ {1,0,0}
+(1 row)
+
+SELECT standard_unit_vector(3, 3);
+ standard_unit_vector 
+----------------------
+ {0,0,1}
+(1 row)
+
+-- kmeans
+SELECT * FROM kmeans(0, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1);
+ERROR:  must request at least one center
+SELECT * FROM kmeans(5, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1);
+ERROR:  number of sample vectors must be greater than or equal to the number of requested centers
+SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,NULL}}', 1);
+ERROR:  vectors must not contain nulls
+SELECT * FROM kmeans(2, ARRAY[[],[]]::float8[][], 1);
+ERROR:  number of sample vectors must be greater than or equal to the number of requested centers
+SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', -1);
+ERROR:  must request at least one iteration
+SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1) ORDER BY 1;
+     kmeans     
+----------------
+ {2.5,3.5,4.5}
+ {8.5,9.5,10.5}
+(2 rows)
+
+-- closest_vector
+SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', ARRAY[]::float8[]);
+ERROR:  vectors must have at least one element
+SELECT closest_vector(ARRAY[]::float8[], '{1,2,3}');
+ERROR:  must provide at least one candidate vector
+SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,NULL}}', '{1,2,3}');
+ERROR:  vectors must not contain nulls
+SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', '{1,2}');
+ERROR:  vectors must have the same number of elements
+SELECT closest_vector('{1,2,3}', '{1,2}');
+ERROR:  vectors must have the same number of elements
+SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', '{10,11,12}');
+ closest_vector 
+----------------
+              2
+(1 row)
+
+SELECT closest_vector('{1,2,3}', '{2,3,4}');
+ closest_vector 
+----------------
+              0
+(1 row)
+
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 3624035639..db453b4cdd 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -78,7 +78,7 @@ test: brin_bloom brin_multi
 # psql depends on create_am
 # amutils depends on geometry, create_index_spgist, hash_index, brin
 # ----------
-test: create_table_like alter_generic alter_operator misc async dbsize merge misc_functions sysviews tsrf tid tidscan tidrangescan collate.icu.utf8 incremental_sort create_role
+test: create_table_like alter_generic alter_operator misc async dbsize merge misc_functions sysviews tsrf tid tidscan tidrangescan collate.icu.utf8 incremental_sort create_role vectors
 
 # collate.*.utf8 tests cannot be run in parallel with each other
 test: rules psql psql_crosstab amutils stats_ext collate.linux.utf8 collate.windows.win1252
diff --git a/src/test/regress/sql/vectors.sql b/src/test/regress/sql/vectors.sql
new file mode 100644
index 0000000000..c2f45fff61
--- /dev/null
+++ b/src/test/regress/sql/vectors.sql
@@ -0,0 +1,61 @@
+SET extra_float_digits = -1;
+
+-- float8_array_is_vector
+SELECT float8_array_is_vector(ARRAY[]::float8[]);
+SELECT float8_array_is_vector('{{1.0},{1.0}}');
+SELECT float8_array_is_vector(ARRAY[NULL]::float8[]);
+SELECT float8_array_is_vector('{1,2,3}');
+
+-- euclidean_norm
+SELECT euclidean_norm('{1,2,3}');
+
+-- normalize_vector
+SELECT normalize_vector('{0,0,0}');
+SELECT normalize_vector('{1,2,3}');
+
+-- dot_product
+SELECT dot_product('{1,2,3}', '{4,5,6}');
+
+-- squared_euclidean_distance
+SELECT squared_euclidean_distance('{1,2}', '{4,5,6}');
+SELECT squared_euclidean_distance('{1,2,3}', '{4,5,6}');
+
+-- euclidean_distance
+SELECT euclidean_distance('{1,2}', '{4,5,6}');
+SELECT euclidean_distance('{1,2,3}', '{4,5,6}');
+
+-- cosine_distance
+SELECT cosine_distance('{1,2}', '{4,5,6}');
+SELECT cosine_distance('{1,2,3}', '{4,5,6}');
+
+-- taxicab_distance
+SELECT taxicab_distance('{1,2}', '{4,5,6}');
+SELECT taxicab_distance('{1,2,3}', '{4,5,6}');
+
+-- chebyshev_distance
+SELECT chebyshev_distance('{1,2}', '{4,5,6}');
+SELECT chebyshev_distance('{1,2,3}', '{4,5,6}');
+
+-- standard_unit_vector
+SELECT standard_unit_vector(0, 0);
+SELECT standard_unit_vector(1, 0);
+SELECT standard_unit_vector(1, 2);
+SELECT standard_unit_vector(3, 1);
+SELECT standard_unit_vector(3, 3);
+
+-- kmeans
+SELECT * FROM kmeans(0, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1);
+SELECT * FROM kmeans(5, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1);
+SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,NULL}}', 1);
+SELECT * FROM kmeans(2, ARRAY[[],[]]::float8[][], 1);
+SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', -1);
+SELECT * FROM kmeans(2, '{{1,2,3},{4,5,6},{7,8,9},{10,11,12}}', 1) ORDER BY 1;
+
+-- closest_vector
+SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', ARRAY[]::float8[]);
+SELECT closest_vector(ARRAY[]::float8[], '{1,2,3}');
+SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,NULL}}', '{1,2,3}');
+SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', '{1,2}');
+SELECT closest_vector('{1,2,3}', '{1,2}');
+SELECT closest_vector('{{1,2,3},{4,5,6},{7,8,9}}', '{10,11,12}');
+SELECT closest_vector('{1,2,3}', '{2,3,4}');
-- 
2.25.1

>From 5db7d86a9a1b519ee8d8bdc5ce82a09f995cc929 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Mon, 20 Mar 2023 14:23:39 -0700
Subject: [PATCH v1 3/3] Add support for OpenBLAS.

If PostgreSQL is built with OpenBLAS support, euclidean_norm() and
dot_product() use the optimized OpenBLAS routines instead of the
open-coded formulas.
---
 configure                         | 269 ++++++++++++++++++++++++++++++
 configure.ac                      |  35 ++++
 doc/src/sgml/install-windows.sgml |   9 +
 doc/src/sgml/installation.sgml    |  19 +++
 meson.build                       |  21 +++
 meson_options.txt                 |   3 +
 src/Makefile.global.in            |   1 +
 src/backend/utils/adt/vector.c    |  12 ++
 src/include/pg_config.h.in        |   6 +
 src/makefiles/meson.build         |   1 +
 src/tools/msvc/Solution.pm        |  13 ++
 src/tools/msvc/config_default.pl  |   1 +
 12 files changed, 390 insertions(+)

diff --git a/configure b/configure
index 15daccc87f..d8f35ec6a3 100755
--- a/configure
+++ b/configure
@@ -649,6 +649,7 @@ PG_CRC32C_OBJS
 CFLAGS_CRC
 LIBOBJS
 OPENSSL
+OPENBLAS
 ZSTD
 LZ4
 UUID_LIBS
@@ -695,6 +696,9 @@ STRIP_STATIC_LIB
 STRIP
 LDFLAGS_SL
 LDFLAGS_EX
+OPENBLAS_LIBS
+OPENBLAS_CFLAGS
+with_openblas
 ZSTD_LIBS
 ZSTD_CFLAGS
 with_zstd
@@ -872,6 +876,7 @@ with_system_tzdata
 with_zlib
 with_lz4
 with_zstd
+with_openblas
 with_ssl
 with_openssl
 enable_largefile
@@ -902,6 +907,8 @@ LZ4_CFLAGS
 LZ4_LIBS
 ZSTD_CFLAGS
 ZSTD_LIBS
+OPENBLAS_CFLAGS
+OPENBLAS_LIBS
 LDFLAGS_EX
 LDFLAGS_SL
 PERL
@@ -1584,6 +1591,7 @@ Optional Packages:
   --without-zlib          do not use Zlib
   --with-lz4              build with LZ4 support
   --with-zstd             build with ZSTD support
+  --with-openblas         build with OpenBLAS support
   --with-ssl=LIB          use LIB for SSL/TLS support (openssl)
   --with-openssl          obsolete spelling of --with-ssl=openssl
 
@@ -1614,6 +1622,10 @@ Some influential environment variables:
   LZ4_LIBS    linker flags for LZ4, overriding pkg-config
   ZSTD_CFLAGS C compiler flags for ZSTD, overriding pkg-config
   ZSTD_LIBS   linker flags for ZSTD, overriding pkg-config
+  OPENBLAS_CFLAGS
+              C compiler flags for OPENBLAS, overriding pkg-config
+  OPENBLAS_LIBS
+              linker flags for OPENBLAS, overriding pkg-config
   LDFLAGS_EX  extra linker flags for linking executables only
   LDFLAGS_SL  extra linker flags for linking shared libraries only
   PERL        Perl program
@@ -9605,6 +9617,148 @@ fi
     esac
   done
 fi
+
+#
+# OpenBLAS
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with OpenBLAS support" >&5
+$as_echo_n "checking whether to build with OpenBLAS support... " >&6; }
+
+
+
+# Check whether --with-openblas was given.
+if test "${with_openblas+set}" = set; then :
+  withval=$with_openblas;
+  case $withval in
+    yes)
+
+$as_echo "#define USE_OPENBLAS 1" >>confdefs.h
+
+      ;;
+    no)
+      :
+      ;;
+    *)
+      as_fn_error $? "no argument expected for --with-openblas option" "$LINENO" 5
+      ;;
+  esac
+
+else
+  with_openblas=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_openblas" >&5
+$as_echo "$with_openblas" >&6; }
+
+
+if test "$with_openblas" = yes; then
+
+pkg_failed=no
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for openblas" >&5
+$as_echo_n "checking for openblas... " >&6; }
+
+if test -n "$OPENBLAS_CFLAGS"; then
+    pkg_cv_OPENBLAS_CFLAGS="$OPENBLAS_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+    if test -n "$PKG_CONFIG" && \
+    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"openblas\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "openblas") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_OPENBLAS_CFLAGS=`$PKG_CONFIG --cflags "openblas" 2>/dev/null`
+		      test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+    pkg_failed=untried
+fi
+if test -n "$OPENBLAS_LIBS"; then
+    pkg_cv_OPENBLAS_LIBS="$OPENBLAS_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+    if test -n "$PKG_CONFIG" && \
+    { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"openblas\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "openblas") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_OPENBLAS_LIBS=`$PKG_CONFIG --libs "openblas" 2>/dev/null`
+		      test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+    pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+        { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+        _pkg_short_errors_supported=yes
+else
+        _pkg_short_errors_supported=no
+fi
+        if test $_pkg_short_errors_supported = yes; then
+	        OPENBLAS_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "openblas" 2>&1`
+        else
+	        OPENBLAS_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "openblas" 2>&1`
+        fi
+	# Put the nasty error message in config.log where it belongs
+	echo "$OPENBLAS_PKG_ERRORS" >&5
+
+	as_fn_error $? "Package requirements (openblas) were not met:
+
+$OPENBLAS_PKG_ERRORS
+
+Consider adjusting the PKG_CONFIG_PATH environment variable if you
+installed software in a non-standard prefix.
+
+Alternatively, you may set the environment variables OPENBLAS_CFLAGS
+and OPENBLAS_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details." "$LINENO" 5
+elif test $pkg_failed = untried; then
+        { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+	{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error $? "The pkg-config script could not be found or is too old.  Make sure it
+is in your PATH or set the PKG_CONFIG environment variable to the full
+path to pkg-config.
+
+Alternatively, you may set the environment variables OPENBLAS_CFLAGS
+and OPENBLAS_LIBS to avoid the need to call pkg-config.
+See the pkg-config man page for more details.
+
+To get pkg-config, see <http://pkg-config.freedesktop.org/>.
+See \`config.log' for more details" "$LINENO" 5; }
+else
+	OPENBLAS_CFLAGS=$pkg_cv_OPENBLAS_CFLAGS
+	OPENBLAS_LIBS=$pkg_cv_OPENBLAS_LIBS
+        { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+
+fi
+  # We only care about -I, -D, and -L switches;
+  # note that -lopenblas will be added by AC_CHECK_LIB below.
+  for pgac_option in $OPENBLAS_CFLAGS; do
+    case $pgac_option in
+      -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+    esac
+  done
+  for pgac_option in $OPENBLAS_LIBS; do
+    case $pgac_option in
+      -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+    esac
+  done
+fi
+
 #
 # Assignments
 #
@@ -13276,6 +13430,56 @@ fi
 
 fi
 
+if test "$with_openblas" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for cblas_dnrm2 in -lopenblas" >&5
+$as_echo_n "checking for cblas_dnrm2 in -lopenblas... " >&6; }
+if ${ac_cv_lib_openblas_cblas_dnrm2+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lopenblas  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char cblas_dnrm2 ();
+int
+main ()
+{
+return cblas_dnrm2 ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_openblas_cblas_dnrm2=yes
+else
+  ac_cv_lib_openblas_cblas_dnrm2=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_openblas_cblas_dnrm2" >&5
+$as_echo "$ac_cv_lib_openblas_cblas_dnrm2" >&6; }
+if test "x$ac_cv_lib_openblas_cblas_dnrm2" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBOPENBLAS 1
+_ACEOF
+
+  LIBS="-lopenblas $LIBS"
+
+else
+  as_fn_error $? "library 'openblas' is required for OPENBLAS support" "$LINENO" 5
+fi
+
+fi
+
 # Note: We can test for libldap_r only after we know PTHREAD_LIBS;
 # also, on AIX, we may need to have openssl in LIBS for this step.
 if test "$with_ldap" = yes ; then
@@ -14064,6 +14268,71 @@ else
 fi
 
 
+fi
+
+if test -z "$OPENBLAS"; then
+  for ac_prog in openblas
+do
+  # Extract the first word of "$ac_prog", so it can be a program name with args.
+set dummy $ac_prog; ac_word=$2
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
+$as_echo_n "checking for $ac_word... " >&6; }
+if ${ac_cv_path_OPENBLAS+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  case $OPENBLAS in
+  [\\/]* | ?:[\\/]*)
+  ac_cv_path_OPENBLAS="$OPENBLAS" # Let the user override the test with a path.
+  ;;
+  *)
+  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
+for as_dir in $PATH
+do
+  IFS=$as_save_IFS
+  test -z "$as_dir" && as_dir=.
+    for ac_exec_ext in '' $ac_executable_extensions; do
+  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
+    ac_cv_path_OPENBLAS="$as_dir/$ac_word$ac_exec_ext"
+    $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
+    break 2
+  fi
+done
+  done
+IFS=$as_save_IFS
+
+  ;;
+esac
+fi
+OPENBLAS=$ac_cv_path_OPENBLAS
+if test -n "$OPENBLAS"; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $OPENBLAS" >&5
+$as_echo "$OPENBLAS" >&6; }
+else
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+
+
+  test -n "$OPENBLAS" && break
+done
+
+else
+  # Report the value of OPENBLAS in configure's output in all cases.
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for OPENBLAS" >&5
+$as_echo_n "checking for OPENBLAS... " >&6; }
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $OPENBLAS" >&5
+$as_echo "$OPENBLAS" >&6; }
+fi
+
+if test "$with_openblas" = yes; then
+  ac_fn_c_check_header_mongrel "$LINENO" "cblas.h" "ac_cv_header_cblas_h" "$ac_includes_default"
+if test "x$ac_cv_header_cblas_h" = xyes; then :
+
+else
+  as_fn_error $? "cblas.h header file is required for OPENBLAS" "$LINENO" 5
+fi
+
+
 fi
 
 if test "$with_gssapi" = yes ; then
diff --git a/configure.ac b/configure.ac
index 97f5be6c73..e20ba144ec 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1147,6 +1147,32 @@ if test "$with_zstd" = yes; then
     esac
   done
 fi
+
+#
+# OpenBLAS
+#
+AC_MSG_CHECKING([whether to build with OpenBLAS support])
+PGAC_ARG_BOOL(with, openblas, no, [build with OpenBLAS support],
+              [AC_DEFINE([USE_OPENBLAS], 1, [Define to 1 to build with OpenBLAS support. (--with-openblas)])])
+AC_MSG_RESULT([$with_openblas])
+AC_SUBST(with_openblas)
+
+if test "$with_openblas" = yes; then
+  PKG_CHECK_MODULES(OPENBLAS, openblas)
+  # We only care about -I, -D, and -L switches;
+  # note that -lopenblas will be added by AC_CHECK_LIB below.
+  for pgac_option in $OPENBLAS_CFLAGS; do
+    case $pgac_option in
+      -I*|-D*) CPPFLAGS="$CPPFLAGS $pgac_option";;
+    esac
+  done
+  for pgac_option in $OPENBLAS_LIBS; do
+    case $pgac_option in
+      -L*) LDFLAGS="$LDFLAGS $pgac_option";;
+    esac
+  done
+fi
+
 #
 # Assignments
 #
@@ -1418,6 +1444,10 @@ if test "$with_zstd" = yes ; then
   AC_CHECK_LIB(zstd, ZSTD_compress, [], [AC_MSG_ERROR([library 'zstd' is required for ZSTD support])])
 fi
 
+if test "$with_openblas" = yes ; then
+  AC_CHECK_LIB(openblas, cblas_dnrm2, [], [AC_MSG_ERROR([library 'openblas' is required for OPENBLAS support])])
+fi
+
 # Note: We can test for libldap_r only after we know PTHREAD_LIBS;
 # also, on AIX, we may need to have openssl in LIBS for this step.
 if test "$with_ldap" = yes ; then
@@ -1563,6 +1593,11 @@ if test "$with_zstd" = yes; then
   AC_CHECK_HEADER(zstd.h, [], [AC_MSG_ERROR([zstd.h header file is required for ZSTD])])
 fi
 
+PGAC_PATH_PROGS(OPENBLAS, openblas)
+if test "$with_openblas" = yes; then
+  AC_CHECK_HEADER(cblas.h, [], [AC_MSG_ERROR([cblas.h header file is required for OPENBLAS])])
+fi
+
 if test "$with_gssapi" = yes ; then
   AC_CHECK_HEADERS(gssapi/gssapi.h, [],
 	[AC_CHECK_HEADERS(gssapi.h, [], [AC_MSG_ERROR([gssapi.h header file is required for GSSAPI])])])
diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index cbc70a039c..a6409913d5 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -309,6 +309,15 @@ $ENV{MSBFLAGS}="/m";
      </para></listitem>
     </varlistentry>
 
+    <varlistentry>
+     <term><productname>OpenBLAS</productname></term>
+     <listitem><para>
+      Required for supporting optimized linear algebra routines using
+      <productname>OpenBLAS</productname>.  Binaries and source can be
+      downloaded from <ulink url="https://www.openblas.net/";></ulink>.
+     </para></listitem>
+    </varlistentry>
+
     <varlistentry>
      <term><productname>OpenSSL</productname></term>
      <listitem><para>
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 75dc81a0a9..aa1191ced4 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -1009,6 +1009,15 @@ build-postgresql:
        </listitem>
       </varlistentry>
 
+      <varlistentry id="configure-option-with-openblas">
+       <term><option>--with-openblas</option></term>
+       <listitem>
+        <para>
+         Build with <productname>OpenBLAS</productname> support.
+        </para>
+       </listitem>
+      </varlistentry>
+
       <varlistentry id="configure-option-with-ssl">
        <term><option>--with-ssl=<replaceable>LIBRARY</replaceable></option>
        <indexterm>
@@ -2481,6 +2490,16 @@ ninja install
       </listitem>
      </varlistentry>
 
+     <varlistentry id="configure-with-openblas-meson">
+      <term><option>-Dopenblas={ auto | enabled | disabled }</option></term>
+      <listitem>
+       <para>
+        Build with <productname>OpenBLAS</productname> support.  Defaults to
+        auto.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="configure-with-ssl-meson">
       <term><option>-Dssl={ auto | <replaceable>LIBRARY</replaceable> }</option>
       <indexterm>
diff --git a/meson.build b/meson.build
index 096044628c..28ec293a7e 100644
--- a/meson.build
+++ b/meson.build
@@ -838,6 +838,25 @@ endif
 
 
 
+###############################################################
+# Library: OpenBLAS
+###############################################################
+
+openblasopt = get_option('openblas')
+if not openblasopt.disabled()
+  openblas = dependency('openblas', required: openblasopt)
+
+  if openblas.found()
+    cdata.set('USE_OPENBLAS', 1)
+    cdata.set('HAVE_OPENBLAS', 1)
+  endif
+
+else
+  openblas = not_found_dep
+endif
+
+
+
 ###############################################################
 # Library: Tcl (for pltcl)
 #
@@ -2848,6 +2867,7 @@ backend_both_deps += [
   libintl,
   libxml,
   lz4,
+  openblas,
   pam,
   ssl,
   systemd,
@@ -3378,6 +3398,7 @@ if meson.version().version_compare('>=0.57')
       'llvm': llvm,
       'lz4': lz4,
       'nls': libintl,
+      'openblas': openblas,
       'openssl': ssl,
       'pam': pam,
       'plperl': perl_dep,
diff --git a/meson_options.txt b/meson_options.txt
index 5b44a8829d..fbe0cf2ca1 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -118,6 +118,9 @@ option('lz4', type : 'feature', value: 'auto',
 option('nls', type: 'feature', value: 'auto',
   description: 'native language support')
 
+option('openblas', type: 'feature', value: 'auto',
+  description: 'OpenBLAS support')
+
 option('pam', type : 'feature', value: 'auto',
   description: 'build with PAM support')
 
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 772b91261d..8956f9a9fc 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -196,6 +196,7 @@ with_llvm	= @with_llvm@
 with_system_tzdata = @with_system_tzdata@
 with_uuid	= @with_uuid@
 with_zlib	= @with_zlib@
+with_openblas = @with_openblas@
 enable_rpath	= @enable_rpath@
 enable_nls	= @enable_nls@
 enable_debug	= @enable_debug@
diff --git a/src/backend/utils/adt/vector.c b/src/backend/utils/adt/vector.c
index dba1fd7eef..2cec20c4c5 100644
--- a/src/backend/utils/adt/vector.c
+++ b/src/backend/utils/adt/vector.c
@@ -20,6 +20,10 @@
 
 #include <float.h>
 
+#ifdef USE_OPENBLAS
+#include "cblas.h"
+#endif
+
 #include "catalog/pg_type.h"
 #include "common/int.h"
 #include "common/pg_prng.h"
@@ -95,10 +99,14 @@ euclidean_norm_internal(int n, const float8 *a)
 {
 	float8		en = 0.0;
 
+#ifdef USE_OPENBLAS
+	en = cblas_dnrm2(n, a, 1);
+#else
 	for (int i = 0; i < n; i++)
 		en = float8_pl(en, float8_mul(a[i], a[i]));
 
 	en = float8_sqrt(en);
+#endif
 
 	return en;
 }
@@ -160,8 +168,12 @@ dot_product(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("vectors must have the same number of elements")));
 
+#ifdef USE_OPENBLAS
+	dp = cblas_ddot(an, ad, 1, bd, 1);
+#else
 	for (int i = 0; i < an; i++)
 		dp = float8_pl(dp, float8_mul(ad[i], bd[i]));
+#endif
 
 	PG_RETURN_FLOAT8(dp);
 }
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 6d572c3820..5d1452f346 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -322,6 +322,9 @@
 /* Define to 1 if you have the `mkdtemp' function. */
 #undef HAVE_MKDTEMP
 
+/* Define to 1 if you have the 'OpenBLAS' library (-lopenblas). */
+#undef HAVE_OPENBLAS
+
 /* Define to 1 if you have the `OPENSSL_init_ssl' function. */
 #undef HAVE_OPENSSL_INIT_SSL
 
@@ -730,6 +733,9 @@
 /* Define to select named POSIX semaphores. */
 #undef USE_NAMED_POSIX_SEMAPHORES
 
+/* Define to 1 to build with OpenBLAS support. (--with-openblas) */
+#undef USE_OPENBLAS
+
 /* Define to 1 to build with OpenSSL support. (--with-ssl=openssl) */
 #undef USE_OPENSSL
 
diff --git a/src/makefiles/meson.build b/src/makefiles/meson.build
index 13045cbd6e..47c75e1446 100644
--- a/src/makefiles/meson.build
+++ b/src/makefiles/meson.build
@@ -231,6 +231,7 @@ pgxs_deps = {
   'llvm': llvm,
   'lz4': lz4,
   'nls': libintl,
+  'openblas': openblas,
   'pam': pam,
   'perl': perl_dep,
   'python': python3_dep,
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index ef10cda576..a45dff7b30 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -304,6 +304,7 @@ sub GenerateFiles
 		HAVE_MEMORY_H               => 1,
 		HAVE_MEMSET_S               => undef,
 		HAVE_MKDTEMP                => undef,
+		HAVE_OPENBLAS               => undef,
 		HAVE_OPENSSL_INIT_SSL       => undef,
 		HAVE_OSSP_UUID_H            => undef,
 		HAVE_PAM_PAM_APPL_H         => undef,
@@ -436,6 +437,7 @@ sub GenerateFiles
 		USE_LDAP                   => $self->{options}->{ldap} ? 1 : undef,
 		USE_LLVM                   => undef,
 		USE_NAMED_POSIX_SEMAPHORES => undef,
+		USE_OPENBLAS               => undef,
 		USE_OPENSSL                => undef,
 		USE_PAM                    => undef,
 		USE_SLICING_BY_8_CRC32C    => undef,
@@ -485,6 +487,11 @@ sub GenerateFiles
 		$define{HAVE_LIBZSTD} = 1;
 		$define{USE_ZSTD}     = 1;
 	}
+	if ($self->{options}->{openblas})
+	{
+		$define{HAVE_OPENBLAS} = 1;
+		$define{USE_OPENBLAS}  = 1;
+	}
 	if ($self->{options}->{openssl})
 	{
 		$define{USE_OPENSSL} = 1;
@@ -1092,6 +1099,11 @@ sub AddProject
 		$proj->AddIncludeDir($self->{options}->{zstd} . '\include');
 		$proj->AddLibrary($self->{options}->{zstd} . '\lib\libzstd.lib');
 	}
+	if ($self->{options}->{openblas})
+	{
+		$proj->AddIncludeDir($self->{options}->{openblas} . '\include');
+		$proj->AddLibrary($self->{options}->{openblas} . '\lib\openblas.lib');
+	}
 	if ($self->{options}->{uuid})
 	{
 		$proj->AddIncludeDir($self->{options}->{uuid} . '\include');
@@ -1205,6 +1217,7 @@ sub GetFakeConfigure
 	$cfg .= ' --with-libxslt'       if ($self->{options}->{xslt});
 	$cfg .= ' --with-lz4'           if ($self->{options}->{lz4});
 	$cfg .= ' --with-zstd'          if ($self->{options}->{zstd});
+	$cfg .= ' --with-openblas'      if ($self->{options}->{openblas});
 	$cfg .= ' --with-gssapi'        if ($self->{options}->{gss});
 	$cfg .= ' --with-icu'           if ($self->{options}->{icu});
 	$cfg .= ' --with-tcl'           if ($self->{options}->{tcl});
diff --git a/src/tools/msvc/config_default.pl b/src/tools/msvc/config_default.pl
index 70b44d1531..8689439c0b 100644
--- a/src/tools/msvc/config_default.pl
+++ b/src/tools/msvc/config_default.pl
@@ -16,6 +16,7 @@ our $config = {
 	icu       => undef,    # --with-icu=<path>
 	lz4       => undef,    # --with-lz4=<path>
 	zstd      => undef,    # --with-zstd=<path>
+	openblas  => undef,    # --with-openblas=<path>
 	nls       => undef,    # --enable-nls=<path>
 	tap_tests => undef,    # --enable-tap-tests
 	tcl       => undef,    # --with-tcl=<path>
-- 
2.25.1

vector search support

Reply via email to