[Issue] Can't recompile cube extension as PGXS, utils/float.h is not installed

2019-03-05 Thread Siarhei Siniak
1. Currently, cube extension has CUBE_MAX_DIM set as 100.
A recent github issue. [1]
2. To compile a custom version of the extension off the tree requires:
```
   make -C contrib/custom_cube USE_PGXS=1
```
3. But utils/float.h required by cube.c and cubeparse.y is not installed.
It's not present in the latest release file [2],
nor being installed when running
make install when compiling from git.
4. Current workaround is to use
```
#include "../../src/include/utils/float.h"
```
in cube.c and cubeparse.y when compiling in git tree.

[1] https://github.com/postgres/postgres/pull/38
[2] https://github.com/postgres/postgres/archive/REL_11_2.tar.gz


Fwd: [Issue] Can't recompile cube extension as PGXS, utils/float.h is not installed

2019-03-05 Thread Siarhei Siniak
-- Forwarded message -
From: Siarhei Siniak 
Date: Tue, 5 Mar 2019 at 23:31
Subject: Re: [Issue] Can't recompile cube extension as PGXS, utils/float.h
is not installed
To: Tom Lane 


>AFAICT, that file only exists in HEAD, not in any released branch, and
>it is installed during "make install" from HEAD.  Please be sure you
>are using installed files that match whatever branch you're trying
>to build from.
Yeah june and july 2018, a month in between. Just thought a release was not
so long ago.
```
git log  REL_11_BETA2..6bf0bc842bd75 --format=oneline -- | wc -l
# 175
```
Ok, then probably no more questions.


GiST limits on contrib/cube with dimension > 100?

2019-06-09 Thread Siarhei Siniak
I've been using cube extension recompiled with #define MAX_DIM 256.
But with a version 11.3 I'm getting the following error:failed to add item to 
index page in 
There's a regression unit test in contrib/cube/expected/cube.out:
CREATE TABLE test_cube (c cube);
\copy test_cube from 'data/test_cube.data'
CREATE INDEX test_cube_ix ON test_cube USING gist (c);
SELECT * FROM test_cube WHERE c && '(3000,1000),(0,0)' ORDER BY c;
I've created gist index in the same way, i.e. create index  on 
 using gist();
If MAX_DIM equals to 512, btree index complaints as:index row size 4112 exceeds 
maximum 2712 for index 
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.   
   
Consider a function index of an MD5 hash of the value, or use full text 
indexing.    

That's why 256 has been set.
But gist doesn't provide explanation on its error.
These are the places where the message might have been 
generated:src/backend/access/gist/gist.c:418:   
  elog(ERROR, "failed to add item to index page in \"%s\"", 
RelationGetRelationName(rel));
src/backend/access/gist/gist.c:540: 
elog(ERROR, "failed to add item to index page in \"%s\"",

Question is what restrains from setting MAX_DIM bigger than 100 in a custom 
recompiled cube extension version?In practice the error messages are too 
cryptic.
contrib/cube/cube.c has the following methods regarding GIST:/*
** GiST support methods
*/

PG_FUNCTION_INFO_V1(g_cube_consistent);
PG_FUNCTION_INFO_V1(g_cube_compress);
PG_FUNCTION_INFO_V1(g_cube_decompress);
PG_FUNCTION_INFO_V1(g_cube_penalty);
PG_FUNCTION_INFO_V1(g_cube_picksplit);
PG_FUNCTION_INFO_V1(g_cube_union);
PG_FUNCTION_INFO_V1(g_cube_same);
PG_FUNCTION_INFO_V1(g_cube_distance);

g_cube_compress has the following body:    PG_RETURN_DATUM(PG_GETARG_DATUM(0));

Does it just returns void pointer to the underlying x array?
cube data structure:
typedef struct NDBOX
{
    /* varlena header (do not touch directly!) */
    int32        vl_len_;

    /*--
     * Header contains info about NDBOX. For binary compatibility with old
     * versions, it is defined as "unsigned int".
     *
     * Following information is stored:
     *
     *    bits 0-7  : number of cube dimensions;
     *    bits 8-30 : unused, initialize to zero;
     *    bit  31   : point flag. If set, the upper right coordinates are not
     *                stored, and are implicitly the same as the lower left
     *                coordinates.
     *--
     */
    unsigned int header;

    /*
     * The lower left coordinates for each dimension come first, followed by
     * upper right coordinates unless the point flag is set.
     */
    double        x[FLEXIBLE_ARRAY_MEMBER];
} NDBOX;

Can it be a problem of not fitting into some limits when building or updating 
gist index for cube with MAX_DIM > 100?  


Re: GiST limits on contrib/cube with dimension > 100?

2019-06-11 Thread Siarhei Siniak
 Can you point out a failling unit test in the codebase?
P.S sorry for a late reply, has got this message in the spam folder )
Le lundi 10 juin 2019 à 14:57:32 UTC+3, Daniel Gustafsson  
a écrit :  
 
 > On 9 Jun 2019, at 20:05, Siarhei Siniak  wrote:
> 
> I've been using cube extension recompiled with
> #define MAX_DIM 256.
> 
> But with a version 11.3 I'm getting the following error:
> failed to add item to index page in 

This sounds like a variant of the issue reported on -bugs in
am6pr06mb57318c9882c021879dd4101ea3...@am6pr06mb5731.eurprd06.prod.outlook.com
and is also reproducible on HEAD.

cheers ./daniel  

Re: GiST limits on contrib/cube with dimension > 100?

2019-06-12 Thread Siarhei Siniak
 I've added debug prints to cube extension.g_custom_cube_a_f8
g_custom_cube_picksplit
are the only called methods
after that it prints
import psycopg2
import logging
import numpy
import unittest


import python.utils.logging
import python.custom_db.backends
import python.custom_db.backends.postgresql


class TestPostgresql(unittest.TestCase):
    def test_gist(self):
    b = python.custom_db.backends.postgresql.Postgresql(
    databases=dict(
    test=dict(
    minconn=1,
    maxconn=1
    )
    )
    )

    b.connect()

    try:
    c = b.get_connection(use='test')

    c2 = c[0]

    with c2.cursor() as cur:
    cur.execute(r'''
    drop table if exists test;
    create table test(image_id integer primary key, latent_code 
custom_cube);
    create index lc_idx on test using gist(latent_code);
    ''')
    c2.commit()

    with self.assertRaises(psycopg2.errors.InternalError_):
    for k in range(10):
    logging.info('test_postgresql.test_gist, k = %d' % k)
    cur.execute(
    r'''
    insert into test (image_id, latent_code)
    values (%s, custom_cube(%s))
    ''',
    [
    k,
    [float(x) for x in numpy.random.uniform(0, 1, 
512)],
    ]
    )
    c2.commit()
    finally:
    b.put_connection(c2, 'test')

```   

Re: GiST limits on contrib/cube with dimension > 100?

2019-06-12 Thread Siarhei Siniak
I've added debug prints to cube extension. g_custom_cube_a_f8 and 
g_custom_cube_picksplit are the only called methods.
After that it prints:
 ERROR:  failed to add item to index page in "lc_idx" 
Cube extension modifications:
    #define MAX_DIM (512)
Python test source code has been attached to the letter.

P.S.
sorry for the previous letter, didn't configure plain text composition

import psycopg2
import logging
import numpy
import unittest


import python.utils.logging
import python.custom_db.backends
import python.custom_db.backends.postgresql


class TestPostgresql(unittest.TestCase):
def test_gist(self):
b = python.custom_db.backends.postgresql.Postgresql(
databases=dict(
test=dict(
minconn=1,
maxconn=1
)
)
)

b.connect()

try:
c = b.get_connection(use='test')

c2 = c[0]

with c2.cursor() as cur:
cur.execute(r'''
drop table if exists test;
create table test(image_id integer primary key, latent_code custom_cube);
create index lc_idx on test using gist(latent_code);
''')
c2.commit()

with self.assertRaises(psycopg2.errors.InternalError_):
for k in range(10):
logging.info('test_postgresql.test_gist, k = %d' % k)
cur.execute(
r'''
insert into test (image_id, latent_code)
values (%s, custom_cube(%s))
''',
[
k,
[float(x) for x in numpy.random.uniform(0, 1, 512)],
]
)
c2.commit()
finally:
b.put_connection(c2, 'test')


Re: GiST limits on contrib/cube with dimension > 100?

2019-06-12 Thread Siarhei Siniak
A uniform set of points with a dimension 128 and type cube. That has a size of 
50 ** 3. Can be queried for a nearest neighbor at a speed of 10 queries per 
second with limit varying from 1 to 25.
It works better than when no index used at all. So gist gives here a speed up.
The documentation of postgresql doesn't mention complexity of such an index. 
I've got confused as to its speed.
Does postgresql allow for an approximate nearest neighbor search?
 https://github.com/erikbern/ann-benchmarks has a lot of efficient 
implementations.

Re: GiST limits on contrib/cube with dimension > 100?

2019-06-12 Thread Siarhei Siniak
 
>ANN is beyond concepts of SQL standard: database with index must return same 
>results as without index.
>I can add ANN to github.com/x4m/ags which is a fork of GiST.How to recompile 
>that extension and not to get a name conflict with a standard one?
I've renamed everything for cube extension. When I needed to fork it.But it's 
impractical.