On Tue, 04 Jul 2006, Tom Lane wrote: > I think the real problem here is that regex matching is the wrong > tool for the job. Have you looked into a full-text index > (tsearch2)?
So much to do with so little time... I've briefly looked into it but: - it's complicated; - it is not needed - basic scan is good enough for the amount of data we have (if a sane query plan is chosen by a database); - we have data in many languages (including based on cyryllic alphabet) - languages which use different forms of the same word based on context, for example: Warszawa Warszawy Warszawie Warszawę Warszawą Warszawo All of the above could be translated to "Warsaw". So we need to support matching parts of words ("warszaw"), which I haven't seen in tsearch2 (maybe I've overlooked). We also have words, which different forms look like this: "stół" "stole" "stołu" (Polish for "table") - when we need to find it we'd need to list every possible form (about 10) or use a regex like: 'st[oó][lł]'. > With something like that, the index operator has at least got the > correct conceptual model, ie, looking for indexed words. I'm not sure > if they have any decent statistical support for it :-( but in theory > that seems doable, whereas regex estimation will always be a crapshoot. So why estimate regex expressions if there is no estimation possible? Let's set this estimate to be pessimistic (match everything or everything not null) and it will choose better plans. At least until somebody will figure out better approach. Pozdrawiam Tometzky -- Best of prhn - najzabawniejsze teksty polskiego UseNet-u http://prhn.dnsalias.org/ Chaos zawsze pokonuje porządek, gdyż jest lepiej zorganizowany. [ Terry Pratchett ] ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org