Alvaro Herrera <[EMAIL PROTECTED]> writes: > Jan UrbaÅski wrote: >> Oh, one important thing. You need to choose a bucket width for the LC >> algorithm, that is decide after how many elements will you prune your >> data structure. I chose to prune after every twenty tsvectors.
> Do you prune after X tsvectors regardless of the numbers of lexemes in > them? I don't think that preserves the algorithm properties; if there's > a bunch of very short tsvectors and then long tsvectors, the pruning > would take place too early for the initial lexemes. I think you should > count lexemes, not tsvectors. Yeah. I haven't read the Lossy Counting paper in detail yet, but I suspect that the mathematical proof of limited error doesn't work if the pruning is done on a variable spacing. I don't see anything very wrong with pruning intra-tsvector; the effects ought to average out, since the point where you prune is going to move around with respect to the tsvector boundaries. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers