Alexander Korotkov <aekorot...@gmail.com> writes: > Now I have working implemetation of this API. Comments still need rework. > Could you give me any feedback?
I looked at this a little bit, but it's not very far along at all towards resolving my API worries. The basic point that I'm concerned about is that we would like to split off the regex library (ie, backend/regex/) as a standalone project someday. There are already some problems to be resolved to make that possible, and I don't want this patch to introduce new ones. From that standpoint, the proposed regextract.c file is a disaster, because it depends on a boatload of Postgres-specific stuff (at least StringInfo, List, HTAB, and pg_wchar; to say nothing of palloc). We can't consider that to be on the regex side of the fence. Similarly, pushing PG-specific declarations like RE_compile_and_cache() into regex/regex.h is completely not the right thing for preserving a clear library boundary (even positing that we want to expose that function outside adt/regexp.c, which I'd rather we didn't). Given that you've already got a notion of callbacks provided by contrib/pg_trgm, perhaps this can be fixed by pushing more of the work into those callbacks, so that the heavy-duty data structures like the hash table live over there and the API exposed by backend/regex/ is at a much simpler level than what you have here. But right now I don't see any usable library API here. Perhaps we could avoid these issues by defining a library API that provides accessors like these for the opaque regex_t struct: * get the number of states in the CNFA * get the numbers of the initial and final states * get the number of out-arcs for the N'th state * get the out-arcs for the N'th state into a caller-provided array (sized using the previous function), where each out-arc is represented by a color and an end-state * get the number of character codes represented by color C * get the wchar codes for color C into a caller-provided array (The reason for letting the caller allocate the result arrays is so we can use palloc for that; if we allocate it in backend/regex/ we must use malloc, which will greatly increase the risk of leakages. Also, as far as the color API goes, the above lets the caller decide how many characters is "too many" to bother with.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers