I've been poking at the FTS problem example recently submitted by Artur Dabrowski. It contains an index declared like so:
CREATE INDEX idx_keywords_ger ON search_tab USING gin (to_tsvector('german'::regconfig, keywords)); which can be queried like so select * from search_tab where to_tsvector('german', keywords) @@ to_tsquery('german', 'whatever'); which is all fine and matches suggestions in our documentation. But I was dismayed to discover that ANALYZE wasn't storing any statistics for this expression index, which rather hobbles the planner's ability to produce useful estimates for the selectivity of such WHERE clauses. The reason for that turns out to be this bit in analyze.c: /* * Can't analyze if the opclass uses a storage type * different from the expression result type. We'd get * confused because the type shown in pg_attribute for * the index column doesn't match what we are getting * from the expression. Perhaps this can be fixed * someday, but for now, punt. */ if (exprType(indexkey) != Irel[ind]->rd_att->attrs[i]->atttypid) continue; Since a tsvector index does have a storage type different from tsvector (in fact, it's shown as TEXT), this code decides to punt and not analyze the column. I think it's past time we did something about this. After a bit of study of the code, it appears to me that it's not too difficult to fix: we just have to use the expression's result type rather than the index column's atttypid in the subsequent processing. ANALYZE never actually looks at the index column contents (indeed cannot, since our index AMs provide no API for extracting the contents), so atttypid isn't really all that relevant to it. However, this'll imply an API/ABI break for custom typanalyze functions: we'll need to store the actual type OID in struct VacAttrStats, and use that rather than looking to attr->atttypid or any of the other type-dependent fields of the Form_pg_attribute. So that seems to make it not practical to back-patch. I thought of an ugly hack that would avoid an API/ABI break: since VacAttrStats.attr is a copy of the pg_attribute data, we could scribble on it, changing atttypid, attlen, attbyval, etc to match the index expression result type. This seems pretty grotty, but it would allow the fix to be back-patched into existing branches. Comments? I'm not sure which way to jump here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers