On 8/21/24 10:29 AM, Robert Haas wrote:
I went ahead and committed these patches. I know there's some debate over whether we want to show the # of disabled nodes and if so whether it should be controlled by COSTS, and I suspect I haven't completely allayed David's concerns about the initial_cost_XXX functions although I think that I did the right thing. But, I don't have the impression that anyone is desperately opposed to the basic concept, so I think it makes sense to put these into the tree and see what happens. We have quite a bit of time left in this release cycle to uncover bugs, hear from users or other developers, etc. about what problems there may be with this. If we end up deciding to reverse course or need to fix a bunch of stuff, so be it, but let's see what the feedback is.
We hit an issue with pgvector[0] where a regular `SELECT count(*) FROM table`[1] is attempting to scan the index on the vector column when `enable_seqscan` is disabled. Credit to Andrew Kane (CC'd) for flagging it.
I was able to trace this back to e2225346. Here is a reproducer: Setup ===== CREATE EXTENSION vector;CREATE OR REPLACE FUNCTION public.generate_random_normalized_vector(dim integer)
RETURNS vector LANGUAGE SQL AS $$ SELECT public.l2_normalize(array_agg(random()::real)::vector) FROM generate_series(1, $1); $$; CREATE TABLE test (id int, embedding vector(128)); INSERT INTO test SELECT n, public.generate_random_normalized_vector(128) FROM generate_series(1,5) n; CREATE INDEX ON test USING hnsw (embedding vector_cosine_ops); Test ==== SET enable_seqscan TO off; EXPLAIN ANALYZE SELECT count(*) FROM test; Before e2225346: ----------------Aggregate (cost=10000041965.00..10000041965.01 rows=1 width=8) (actual time=189.864..189.864 rows
=1 loops=1)-> Seq Scan on test (cost=10000000000.00..10000040715.00 rows=5 width=0) (actual time=0.01
8..168.294 rows=5 loops=1) (4 rows) With e2225346: ------------- ERROR: cannot scan hnsw index without orderSome things to note with the ivfflat/hnsw index AMs[3] in pgvector are that they're used for "ORDER BY" scans exclusively. They currently don't support index only scans (noting as I tried reproducing the issue with GIST and couldn't do so because of that), but we wouldn't want to do a full table "count(*)" on a IVFFlat/HNSW index anyway as it'd be more expensive than just a full table scan.
Thanks, Jonathan [0] https://github.com/pgvector/pgvector [1] https://github.com/pgvector/pgvector/actions/runs/10519052945 [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e2225346 [3] https://github.com/pgvector/pgvector/blob/master/src/hnsw.c#L192
OpenPGP_signature.asc
Description: OpenPGP digital signature