Re: [HACKERS] high-dimensional knn-GIST tests (was Re: Cube extension kNN support)

Gordon Mohr Sat, 26 Oct 2013 20:36:24 -0700

On 10/23/13 9:05 PM, Alvaro Herrera wrote:

Gordon Mohr wrote:

Thanks for this! I decided to give the patch a try at the bleeding
edge with some high-dimensional vectors, specifically the 1.4
million 1000-dimensional Freebase entity vectors from the Google
'word2vec' project:

https://code.google.com/p/word2vec/#Pre-trained_entity_vectors_with_Freebase_naming

Unfortunately, here's what I found:


I wonder if these results would improve with this patch:
http://www.postgresql.org/message-id/efedc2bf-ab35-4e2c-911f-fc88da647...@gmail.com

Thanks for the pointer; I'd missed that relevant update from StasKelvich. I applied that patch, and reindexed.


On the 100-dimension, 850K vector set:

indexing:  1137s (vs. 1344s)
DATA size: 4.7G (vs 5.0G)
top-11-nearest-neighbor query: 32s (vs ~57s)

On the 500-dimension, 100K vector set:

indexing: 756s (vs. 977s)
DATA size: 4.5G (vs. 4.8G)
top-11-nearest-neighbor query: 18s (vs ~46s)

So, moderate (5-20%) improvements in indexing time and size, and larger(40-60%) speedups in index-assisted (<->) queries... but thoseindex-assisted queries are still ~10X+ slower than the sequence-scan(distance_euclid()) queries, so the existence of the knn-GIST index isstill harming rather than hurting performance.

Will update if my understanding changes; still interested to hear ifI've missed a key factor/switch needed for these indexes to work well.


- Gordon Mohr






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] high-dimensional knn-GIST tests (was Re: Cube extension kNN support)

Reply via email to