I wrote:
> Tom Lane wrote:
>> But the particular example shown here doesn't make a very good case
>> for that, because it's hard to tell how much of a penalty would be
>> taken in more realistic examples.
>
> Fair enough. We're in the early stages of moving to tsearch2 and I
> haven't run acr
Tom Lane wrote:
> It may well be that Jesper's identified a place where the GIN code could
> be improved --- it seems like having the top-level search logic be more
> aware of the AND/OR structure of queries would be useful. But the
> particular example shown here doesn't make a very good case for
"Kevin Grittner" wrote:
> I'm wondering if anyone has ever confirmed that probing for the more
> frequent term through the index is *ever* a win, versus using the
> index for the most common of the top level AND conditions and doing
> the rest on recheck.
s/most/least/
-Kevin
--
Sent via p
Tom Lane wrote:
> The answer to that clearly is to not index common terms
My understanding is that we don't currently get statistics on how
common the terms in a tsvector column are until we ANALYZE the *index*
created from it. Seems like sort of a Catch 22. Also, if we exclude
words which a
"Kevin Grittner" writes:
> Perhaps I'm missing something. My point was that there are words
> which are too common to be useful for index searches, yet uncommon
> enough to usefully limit the results. These words could typically
> benefit from tsearch2 style parsing and dictionaries; so declarin
Tom Lane wrote:
> "Kevin Grittner" writes:
>> Tom Lane wrote:
>>> Any sane text search application is going to try to filter out
>>> common words as stopwords; it's only the failure to do that that's
>>> making this run slow.
>
>> I'd rather have the index used for the selective test, and appl
"Kevin Grittner" writes:
> Tom Lane wrote:
>> Any sane text search application is going to try to filter out
>> common words as stopwords; it's only the failure to do that that's
>> making this run slow.
> I'd rather have the index used for the selective test, and apply the
> remaining tests to
Tom Lane wrote:
> Any sane text search application is going to try to filter out
> common words as stopwords; it's only the failure to do that that's
> making this run slow.
Imagine a large table with a GIN index on a tsvector. The user wants
a particular document, and is sure four words are
On Fri, Oct 30, 2009 at 8:11 PM, Tom Lane wrote:
> But having said that, this particular test case is far from compelling.
> Any sane text search application is going to try to filter out
> common words as stopwords; it's only the failure to do that that's
> making this run slow.
Well it would be
Tom Lane wrote:
> But having said that, this particular test case is far from compelling.
> Any sane text search application is going to try to filter out
> common words as stopwords; it's only the failure to do that that's
> making this run slow.
Below is tests-runs not with a "commonterm" but an
Jesper Krogh writes:
> I've now got a test-set that can reproduce the problem where the two
> fully equivalent queries (
> body_fts @@ to_tsquery("commonterm & nonexistingterm")
> and
> body_fts @@ to_tsquery("coomonterm") AND body_fts @@
> to_tsquery("nonexistingterm")
> give a difference of x30
Hi.
I've now got a test-set that can reproduce the problem where the two
fully equivalent queries (
body_fts @@ to_tsquery("commonterm & nonexistingterm")
and
body_fts @@ to_tsquery("coomonterm") AND body_fts @@
to_tsquery("nonexistingterm")
give a difference of x300 in execution time. (grows wi
On Fri, 2009-10-23 at 17:27 +0100, Richard Huxton wrote:
> Returns an array of keys given a value to be queried; that is, query is
> the value on the right-hand side of an indexable operator whose
> left-hand side is the indexed column
>
> So - that is presumably two separate arrays of keys being
Jeff Davis wrote:
> On Fri, 2009-10-23 at 09:26 +0100, Richard Huxton wrote:
>> That structure isn't exposed to the planner though, so it doesn't
>> benefit from any re-ordering the planner would normally do for normal
>> (exposed) AND/OR clauses.
>
> I don't think that explains it, because in the
On Fri, 2009-10-23 at 09:45 +0200, jes...@krogh.cc wrote:
> No, it definately has to go visit the index/table to confirm findings, but
> that why I wrote Queryplan in the subject line, because this os only about
> the strategy to pursue to obtain the results. And a strategy about
> limiting the amo
On Fri, 2009-10-23 at 09:26 +0100, Richard Huxton wrote:
> That structure isn't exposed to the planner though, so it doesn't
> benefit from any re-ordering the planner would normally do for normal
> (exposed) AND/OR clauses.
I don't think that explains it, because in the second plan you only see
a
jes...@krogh.cc wrote:
>> That structure isn't exposed to the planner though, so it doesn't
>> benefit from any re-ordering the planner would normally do for normal
>> (exposed) AND/OR clauses.
>>
>> Now, to_tsquery() can't re-order the search terms because it doesn't
>> know what column it's being
> jes...@krogh.cc wrote:
>>
>> So getting them with AND inbetween gives x100 better performance. All
>> queries are run on "hot disk" repeated 3-5 times and the number are from
>> the last run, so disk-read effects should be filtered away.
>>
>> Shouldn't it somehow just do what it allready are cap
jes...@krogh.cc wrote:
>
> So getting them with AND inbetween gives x100 better performance. All
> queries are run on "hot disk" repeated 3-5 times and the number are from
> the last run, so disk-read effects should be filtered away.
>
> Shouldn't it somehow just do what it allready are capable o
> On Fri, 2009-10-23 at 07:18 +0200, Jesper Krogh wrote:
>> > In effect, what you want are words that aren't searched (or stored) in
>> > the index, but are included in the tsvector (so the RECHECK still
>> > works). That sounds like it would solve your problem and it would
>> reduce
>> > index siz
On Fri, 2009-10-23 at 07:18 +0200, Jesper Krogh wrote:
> This is indeed information on individual terms from the statistics that
> enable this.
My mistake, I didn't know it was that smart about it.
> > In effect, what you want are words that aren't searched (or stored) in
> > the index, but are i
Jeff Davis wrote:
> On Thu, 2009-10-22 at 18:28 +0200, Jesper Krogh wrote:
>> I somehow would expect the index-search to take advantage of the MCV's
>> informations in the statistics that sort of translate it into a search
>> and post-filtering (as PG's queryplanner usually does at the SQL-level).
On Thu, 2009-10-22 at 18:28 +0200, Jesper Krogh wrote:
> I somehow would expect the index-search to take advantage of the MCV's
> informations in the statistics that sort of translate it into a search
> and post-filtering (as PG's queryplanner usually does at the SQL-level).
MCVs are full values t
Jesper Krogh wrote:
> I'm searching the gin-index for 1-5 terms, where all of them matches
> the same document. TERM1 is unique by itself, TERM2 is a bit more
> common (52 rows), TERM3 more common, TERM4 close to all and TERM5
> all records.
>Recheck Cond: (ftsbody_body_fts @@
24 matches
Mail list logo