Tom Lane wrote: > Josh Berkus <j...@agliodbs.com> writes: > > Bruce, > >> The ordering of the lexems was changed: > > > What does that get us in terms of performance etc.? > > It was changed to support partial-match tsvector queries. Without it, > a partial match query would have to scan entire tsvectors instead > of applying binary search. I don't know if Oleg and Teodor did any > actual performance tests on the size of the hit, but it seems like > it could be pretty awful for large documents.
I started thinking about the performance issues of the tsvector changes. Teodor gave me this code for conversion that basically does: qsort_arg((void *) ARRPTR(t), t->size, sizeof(WordEntry), cmpLexeme, (void*) t); So, basically, every time there is a cast we have to do a sort, which for a large document would yield poor performance, and because we are not storing the sorted result, it happens for every access; this might be an unacceptable performance burden. So, one idea would be, instead of a cast, have pg_migrator rebuild the tsvector columns with ALTER TABLE, so then the 8.4 index code could be used. But then we might as well just tell the users to migrate the tsvector tables themselves, which is how pg_migrator behaves now. Obviously we are still trying to figure out the best way to handle data type changes; I think as soon as we figure out a plan for tsvector we can use that method for future changes. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers