[Issue] Can't recompile cube extension as PGXS, utils/float.h is not installed
1. Currently, cube extension has CUBE_MAX_DIM set as 100. A recent github issue. [1] 2. To compile a custom version of the extension off the tree requires: ``` make -C contrib/custom_cube USE_PGXS=1 ``` 3. But utils/float.h required by cube.c and cubeparse.y is not installed. It's not present in the latest release file [2], nor being installed when running make install when compiling from git. 4. Current workaround is to use ``` #include "../../src/include/utils/float.h" ``` in cube.c and cubeparse.y when compiling in git tree. [1] https://github.com/postgres/postgres/pull/38 [2] https://github.com/postgres/postgres/archive/REL_11_2.tar.gz
Fwd: [Issue] Can't recompile cube extension as PGXS, utils/float.h is not installed
-- Forwarded message - From: Siarhei Siniak Date: Tue, 5 Mar 2019 at 23:31 Subject: Re: [Issue] Can't recompile cube extension as PGXS, utils/float.h is not installed To: Tom Lane >AFAICT, that file only exists in HEAD, not in any released branch, and >it is installed during "make install" from HEAD. Please be sure you >are using installed files that match whatever branch you're trying >to build from. Yeah june and july 2018, a month in between. Just thought a release was not so long ago. ``` git log REL_11_BETA2..6bf0bc842bd75 --format=oneline -- | wc -l # 175 ``` Ok, then probably no more questions.
GiST limits on contrib/cube with dimension > 100?
I've been using cube extension recompiled with #define MAX_DIM 256. But with a version 11.3 I'm getting the following error:failed to add item to index page in There's a regression unit test in contrib/cube/expected/cube.out: CREATE TABLE test_cube (c cube); \copy test_cube from 'data/test_cube.data' CREATE INDEX test_cube_ix ON test_cube USING gist (c); SELECT * FROM test_cube WHERE c && '(3000,1000),(0,0)' ORDER BY c; I've created gist index in the same way, i.e. create index on using gist(); If MAX_DIM equals to 512, btree index complaints as:index row size 4112 exceeds maximum 2712 for index HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing. That's why 256 has been set. But gist doesn't provide explanation on its error. These are the places where the message might have been generated:src/backend/access/gist/gist.c:418: elog(ERROR, "failed to add item to index page in \"%s\"", RelationGetRelationName(rel)); src/backend/access/gist/gist.c:540: elog(ERROR, "failed to add item to index page in \"%s\"", Question is what restrains from setting MAX_DIM bigger than 100 in a custom recompiled cube extension version?In practice the error messages are too cryptic. contrib/cube/cube.c has the following methods regarding GIST:/* ** GiST support methods */ PG_FUNCTION_INFO_V1(g_cube_consistent); PG_FUNCTION_INFO_V1(g_cube_compress); PG_FUNCTION_INFO_V1(g_cube_decompress); PG_FUNCTION_INFO_V1(g_cube_penalty); PG_FUNCTION_INFO_V1(g_cube_picksplit); PG_FUNCTION_INFO_V1(g_cube_union); PG_FUNCTION_INFO_V1(g_cube_same); PG_FUNCTION_INFO_V1(g_cube_distance); g_cube_compress has the following body: PG_RETURN_DATUM(PG_GETARG_DATUM(0)); Does it just returns void pointer to the underlying x array? cube data structure: typedef struct NDBOX { /* varlena header (do not touch directly!) */ int32 vl_len_; /*-- * Header contains info about NDBOX. For binary compatibility with old * versions, it is defined as "unsigned int". * * Following information is stored: * * bits 0-7 : number of cube dimensions; * bits 8-30 : unused, initialize to zero; * bit 31 : point flag. If set, the upper right coordinates are not * stored, and are implicitly the same as the lower left * coordinates. *-- */ unsigned int header; /* * The lower left coordinates for each dimension come first, followed by * upper right coordinates unless the point flag is set. */ double x[FLEXIBLE_ARRAY_MEMBER]; } NDBOX; Can it be a problem of not fitting into some limits when building or updating gist index for cube with MAX_DIM > 100?
Re: GiST limits on contrib/cube with dimension > 100?
Can you point out a failling unit test in the codebase? P.S sorry for a late reply, has got this message in the spam folder ) Le lundi 10 juin 2019 à 14:57:32 UTC+3, Daniel Gustafsson a écrit : > On 9 Jun 2019, at 20:05, Siarhei Siniak wrote: > > I've been using cube extension recompiled with > #define MAX_DIM 256. > > But with a version 11.3 I'm getting the following error: > failed to add item to index page in This sounds like a variant of the issue reported on -bugs in am6pr06mb57318c9882c021879dd4101ea3...@am6pr06mb5731.eurprd06.prod.outlook.com and is also reproducible on HEAD. cheers ./daniel
Re: GiST limits on contrib/cube with dimension > 100?
I've added debug prints to cube extension.g_custom_cube_a_f8 g_custom_cube_picksplit are the only called methods after that it prints import psycopg2 import logging import numpy import unittest import python.utils.logging import python.custom_db.backends import python.custom_db.backends.postgresql class TestPostgresql(unittest.TestCase): def test_gist(self): b = python.custom_db.backends.postgresql.Postgresql( databases=dict( test=dict( minconn=1, maxconn=1 ) ) ) b.connect() try: c = b.get_connection(use='test') c2 = c[0] with c2.cursor() as cur: cur.execute(r''' drop table if exists test; create table test(image_id integer primary key, latent_code custom_cube); create index lc_idx on test using gist(latent_code); ''') c2.commit() with self.assertRaises(psycopg2.errors.InternalError_): for k in range(10): logging.info('test_postgresql.test_gist, k = %d' % k) cur.execute( r''' insert into test (image_id, latent_code) values (%s, custom_cube(%s)) ''', [ k, [float(x) for x in numpy.random.uniform(0, 1, 512)], ] ) c2.commit() finally: b.put_connection(c2, 'test') ```
Re: GiST limits on contrib/cube with dimension > 100?
I've added debug prints to cube extension. g_custom_cube_a_f8 and g_custom_cube_picksplit are the only called methods. After that it prints: ERROR: failed to add item to index page in "lc_idx" Cube extension modifications: #define MAX_DIM (512) Python test source code has been attached to the letter. P.S. sorry for the previous letter, didn't configure plain text composition import psycopg2 import logging import numpy import unittest import python.utils.logging import python.custom_db.backends import python.custom_db.backends.postgresql class TestPostgresql(unittest.TestCase): def test_gist(self): b = python.custom_db.backends.postgresql.Postgresql( databases=dict( test=dict( minconn=1, maxconn=1 ) ) ) b.connect() try: c = b.get_connection(use='test') c2 = c[0] with c2.cursor() as cur: cur.execute(r''' drop table if exists test; create table test(image_id integer primary key, latent_code custom_cube); create index lc_idx on test using gist(latent_code); ''') c2.commit() with self.assertRaises(psycopg2.errors.InternalError_): for k in range(10): logging.info('test_postgresql.test_gist, k = %d' % k) cur.execute( r''' insert into test (image_id, latent_code) values (%s, custom_cube(%s)) ''', [ k, [float(x) for x in numpy.random.uniform(0, 1, 512)], ] ) c2.commit() finally: b.put_connection(c2, 'test')
Re: GiST limits on contrib/cube with dimension > 100?
A uniform set of points with a dimension 128 and type cube. That has a size of 50 ** 3. Can be queried for a nearest neighbor at a speed of 10 queries per second with limit varying from 1 to 25. It works better than when no index used at all. So gist gives here a speed up. The documentation of postgresql doesn't mention complexity of such an index. I've got confused as to its speed. Does postgresql allow for an approximate nearest neighbor search? https://github.com/erikbern/ann-benchmarks has a lot of efficient implementations.
Re: GiST limits on contrib/cube with dimension > 100?
>ANN is beyond concepts of SQL standard: database with index must return same >results as without index. >I can add ANN to github.com/x4m/ags which is a fork of GiST.How to recompile >that extension and not to get a name conflict with a standard one? I've renamed everything for cube extension. When I needed to fork it.But it's impractical.