Tom Lane wrote:
Jay Levitt<jay.lev...@gmail.com> writes:
- I'm not sure how to represent arbitrary column-like features without
reinventing the wheel and putting a database in the database.
ISTM you could define a composite type and then create operators and an
operator class over that type. If you were trying to make a btree
opclass there might be a conflict with the built-in record_ops opclass,
but since you're only interested in GIST I don't see any real
roadblocks.
Perfect. Composite types are exactly what I need here; the application can
declare its composite type and provide distance functions for each member,
and the extension can use those to calculate similarity. How do I introspect
the composite type's pg_class to see what it contains? I assume there's a
better way than SPI on system catalogs :) Should I be using systable_*
functions from genam, or is there an in-memory tree? I feel like funcapi
gets me partway there but there's magic in the middle.
Can you think of any code that would serve as a sample, maybe whatever
creates the output for psql's \d?
The main potential disadvantage of this is that you'd have
the standard tuple header as overhead in index entries --- but maybe the
entries are large enough that that doesn't matter, and in any case you
could probably make use of the GIST "compress" method to get rid of most
of the header. Maybe convert to MinimalTuple, for instance, if you want
to still be able to leverage existing support code for field extraction.
Probably not worth it to save the 8 bytes; we're starting out at about 20
floats per row. But good to know for later optimization...
- Can domains have operators, or are operators defined on types?
I think the current state of play is that you can have such things but
the system will only consider them for exact type matches, so you might
need more explicit casts than you ordinarily would. However, we only
support domains over base types not composites, so this isn't really
going to be a profitable direction for you anyway.
Actually, as mentioned to Alexander, I'm thinking of domains per feature,
not for the overall tuple, so birthdate<->birthdate differs from
now()<->posting_date. Sounds like that might work - I'll play.
- Does KNN-GiST run into problems when<-> returns values that don't "make
sense" in the physical world?
Wouldn't surprise me. In general, non-strict index operators are a bad
idea. However, if the indexed entities are records, it would be
entirely your own business how you handled individual fields being NULL.
Yeah, that example conflated NULLs in the feature fields (we don't know your
birthdate) with <-> on the whole tuple. Oops.
I guess I can just test this by verifying that KNN-GiST ordered by distance
returns the same results as without the index.
Thanks for your help here.
Jay
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers