Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-11-03 Thread Kevin Grittner
I wrote: > Tom Lane wrote: >> But the particular example shown here doesn't make a very good case >> for that, because it's hard to tell how much of a penalty would be >> taken in more realistic examples. > > Fair enough. We're in the early stages of moving to tsearch2 and I > haven't run acr

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-11-03 Thread Jesper Krogh
Tom Lane wrote: > It may well be that Jesper's identified a place where the GIN code could > be improved --- it seems like having the top-level search logic be more > aware of the AND/OR structure of queries would be useful. But the > particular example shown here doesn't make a very good case for

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-11-03 Thread Kevin Grittner
"Kevin Grittner" wrote: > I'm wondering if anyone has ever confirmed that probing for the more > frequent term through the index is *ever* a win, versus using the > index for the most common of the top level AND conditions and doing > the rest on recheck. s/most/least/ -Kevin -- Sent via p

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-11-03 Thread Kevin Grittner
Tom Lane wrote: > The answer to that clearly is to not index common terms My understanding is that we don't currently get statistics on how common the terms in a tsvector column are until we ANALYZE the *index* created from it. Seems like sort of a Catch 22. Also, if we exclude words which a

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-11-03 Thread Tom Lane
"Kevin Grittner" writes: > Perhaps I'm missing something. My point was that there are words > which are too common to be useful for index searches, yet uncommon > enough to usefully limit the results. These words could typically > benefit from tsearch2 style parsing and dictionaries; so declarin

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-11-03 Thread Kevin Grittner
Tom Lane wrote: > "Kevin Grittner" writes: >> Tom Lane wrote: >>> Any sane text search application is going to try to filter out >>> common words as stopwords; it's only the failure to do that that's >>> making this run slow. > >> I'd rather have the index used for the selective test, and appl

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-11-02 Thread Tom Lane
"Kevin Grittner" writes: > Tom Lane wrote: >> Any sane text search application is going to try to filter out >> common words as stopwords; it's only the failure to do that that's >> making this run slow. > I'd rather have the index used for the selective test, and apply the > remaining tests to

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-11-02 Thread Kevin Grittner
Tom Lane wrote: > Any sane text search application is going to try to filter out > common words as stopwords; it's only the failure to do that that's > making this run slow. Imagine a large table with a GIN index on a tsvector. The user wants a particular document, and is sure four words are

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-31 Thread Greg Stark
On Fri, Oct 30, 2009 at 8:11 PM, Tom Lane wrote: > But having said that, this particular test case is far from compelling. > Any sane text search application is going to try to filter out > common words as stopwords; it's only the failure to do that that's > making this run slow. Well it would be

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-30 Thread Jesper Krogh
Tom Lane wrote: > But having said that, this particular test case is far from compelling. > Any sane text search application is going to try to filter out > common words as stopwords; it's only the failure to do that that's > making this run slow. Below is tests-runs not with a "commonterm" but an

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-30 Thread Tom Lane
Jesper Krogh writes: > I've now got a test-set that can reproduce the problem where the two > fully equivalent queries ( > body_fts @@ to_tsquery("commonterm & nonexistingterm") > and > body_fts @@ to_tsquery("coomonterm") AND body_fts @@ > to_tsquery("nonexistingterm") > give a difference of x30

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-30 Thread Jesper Krogh
Hi. I've now got a test-set that can reproduce the problem where the two fully equivalent queries ( body_fts @@ to_tsquery("commonterm & nonexistingterm") and body_fts @@ to_tsquery("coomonterm") AND body_fts @@ to_tsquery("nonexistingterm") give a difference of x300 in execution time. (grows wi

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread Jeff Davis
On Fri, 2009-10-23 at 17:27 +0100, Richard Huxton wrote: > Returns an array of keys given a value to be queried; that is, query is > the value on the right-hand side of an indexable operator whose > left-hand side is the indexed column > > So - that is presumably two separate arrays of keys being

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread Richard Huxton
Jeff Davis wrote: > On Fri, 2009-10-23 at 09:26 +0100, Richard Huxton wrote: >> That structure isn't exposed to the planner though, so it doesn't >> benefit from any re-ordering the planner would normally do for normal >> (exposed) AND/OR clauses. > > I don't think that explains it, because in the

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread Jeff Davis
On Fri, 2009-10-23 at 09:45 +0200, jes...@krogh.cc wrote: > No, it definately has to go visit the index/table to confirm findings, but > that why I wrote Queryplan in the subject line, because this os only about > the strategy to pursue to obtain the results. And a strategy about > limiting the amo

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread Jeff Davis
On Fri, 2009-10-23 at 09:26 +0100, Richard Huxton wrote: > That structure isn't exposed to the planner though, so it doesn't > benefit from any re-ordering the planner would normally do for normal > (exposed) AND/OR clauses. I don't think that explains it, because in the second plan you only see a

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread Richard Huxton
jes...@krogh.cc wrote: >> That structure isn't exposed to the planner though, so it doesn't >> benefit from any re-ordering the planner would normally do for normal >> (exposed) AND/OR clauses. >> >> Now, to_tsquery() can't re-order the search terms because it doesn't >> know what column it's being

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread jesper
> jes...@krogh.cc wrote: >> >> So getting them with AND inbetween gives x100 better performance. All >> queries are run on "hot disk" repeated 3-5 times and the number are from >> the last run, so disk-read effects should be filtered away. >> >> Shouldn't it somehow just do what it allready are cap

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread Richard Huxton
jes...@krogh.cc wrote: > > So getting them with AND inbetween gives x100 better performance. All > queries are run on "hot disk" repeated 3-5 times and the number are from > the last run, so disk-read effects should be filtered away. > > Shouldn't it somehow just do what it allready are capable o

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-23 Thread jesper
> On Fri, 2009-10-23 at 07:18 +0200, Jesper Krogh wrote: >> > In effect, what you want are words that aren't searched (or stored) in >> > the index, but are included in the tsvector (so the RECHECK still >> > works). That sounds like it would solve your problem and it would >> reduce >> > index siz

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-22 Thread Jeff Davis
On Fri, 2009-10-23 at 07:18 +0200, Jesper Krogh wrote: > This is indeed information on individual terms from the statistics that > enable this. My mistake, I didn't know it was that smart about it. > > In effect, what you want are words that aren't searched (or stored) in > > the index, but are i

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-22 Thread Jesper Krogh
Jeff Davis wrote: > On Thu, 2009-10-22 at 18:28 +0200, Jesper Krogh wrote: >> I somehow would expect the index-search to take advantage of the MCV's >> informations in the statistics that sort of translate it into a search >> and post-filtering (as PG's queryplanner usually does at the SQL-level).

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-22 Thread Jeff Davis
On Thu, 2009-10-22 at 18:28 +0200, Jesper Krogh wrote: > I somehow would expect the index-search to take advantage of the MCV's > informations in the statistics that sort of translate it into a search > and post-filtering (as PG's queryplanner usually does at the SQL-level). MCVs are full values t

Re: [PERFORM] Queryplan within FTS/GIN index -search.

2009-10-22 Thread Kevin Grittner
Jesper Krogh wrote: > I'm searching the gin-index for 1-5 terms, where all of them matches > the same document. TERM1 is unique by itself, TERM2 is a bit more > common (52 rows), TERM3 more common, TERM4 close to all and TERM5 > all records. >Recheck Cond: (ftsbody_body_fts @@