Re: [BUGS] query very slow when enable_seqscan=on

Tomasz Ostrowski Tue, 04 Jul 2006 07:45:10 -0700

On Tue, 04 Jul 2006, Tom Lane wrote:

> I think the real problem here is that regex matching is the wrong
> tool for the job.  Have you looked into a full-text index
> (tsearch2)?


So much to do with so little time...

I've briefly looked into it but:

- it's complicated;

- it is not needed - basic scan is good enough for the amount of data
  we have (if a sane query plan is chosen by a database);

- we have data in many languages (including based on cyryllic
  alphabet) - languages which use different forms of the same word
  based on context, for example:
    Warszawa
    Warszawy
    Warszawie
    Warszawę
    Warszawą
    Warszawo
  All of the above could be translated to "Warsaw". So we need to
  support matching parts of words ("warszaw"), which I haven't seen
  in tsearch2 (maybe I've overlooked). We also have words, which
  different forms look like this: "stół" "stole" "stołu" (Polish for
  "table") - when we need to find it we'd need to list every possible
  form (about 10) or use a regex like: 'st[oó][lł]'.

> With something like that, the index operator has at least got the
> correct conceptual model, ie, looking for indexed words.  I'm not sure
> if they have any decent statistical support for it :-( but in theory
> that seems doable, whereas regex estimation will always be a crapshoot.

So why estimate regex expressions if there is no estimation possible?
Let's set this estimate to be pessimistic (match everything or
everything not null) and it will choose better plans. At least until
somebody will figure out better approach.

Pozdrawiam
Tometzky
-- 
Best of prhn - najzabawniejsze teksty polskiego UseNet-u
http://prhn.dnsalias.org/
  Chaos zawsze pokonuje porządek, gdyż jest lepiej zorganizowany.
                                              [ Terry Pratchett ]

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [BUGS] query very slow when enable_seqscan=on

Reply via email to