On Mon, Mar 25, 2013 at 2:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorot...@gmail.com> writes: > > On Mon, Mar 25, 2013 at 1:50 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Similarly, pushing PG-specific declarations like RE_compile_and_cache() > >> into regex/regex.h is completely not the right thing for preserving a > >> clear library boundary (even positing that we want to expose that > >> function outside adt/regexp.c, which I'd rather we didn't). > >> > >> Perhaps we could avoid these issues by defining a library API that > >> provides accessors like these for the opaque regex_t struct: > >> > >> * get the number of states in the CNFA > >> > >> * get the numbers of the initial and final states > >> > >> * get the number of out-arcs for the N'th state > >> > >> * get the out-arcs for the N'th state into a caller-provided array > >> (sized using the previous function), where each out-arc is represented > >> by a color and an end-state > >> > >> * get the number of character codes represented by color C > >> > >> * get the wchar codes for color C into a caller-provided array > >> > >> (The reason for letting the caller allocate the result arrays is so we > >> can use palloc for that; if we allocate it in backend/regex/ we must > >> use malloc, which will greatly increase the risk of leakages. Also, > >> as far as the color API goes, the above lets the caller decide how > >> many characters is "too many" to bother with.) > > > I like the this idea. Seems like clear and not over-engineered API. I can > > implement it. Could you propose something particular to do with > > RE_compile_and_cache in this patch? With this API we still need a way to > > get regex_t or equivalent from string. > > Well, the brute force answer is for pg_trgm to not go through > RE_compile_and_cache at all, but just call the regex library for itself. > That would lose the ability to cache regex compilations, but I'm not > sure we care. The motivation for RE_compile_and_cache is mainly to > prevent having to compile the regex again for *every row*, which is what > would otherwise happen in a simple SELECT using a regex function. > Unless I'm misunderstanding something, pg_trgm would only need to > compile the regex once per indexscan, which is probably tolerable. > For me it makes sense. Patch with proposed API implementation is attached. ------ With best regards, Alexander Korotkov.
trgm-regexp-0.14.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers