On Mon, Mar 25, 2013 at 1:50 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Alexander Korotkov <aekorot...@gmail.com> writes: > > Now I have working implemetation of this API. Comments still need rework. > > Could you give me any feedback? > > I looked at this a little bit, but it's not very far along at all > towards resolving my API worries. The basic point that I'm concerned > about is that we would like to split off the regex library (ie, > backend/regex/) as a standalone project someday. There are already > some problems to be resolved to make that possible, and I don't want > this patch to introduce new ones. From that standpoint, the proposed > regextract.c file is a disaster, because it depends on a boatload of > Postgres-specific stuff (at least StringInfo, List, HTAB, and pg_wchar; > to say nothing of palloc). We can't consider that to be on the regex > side of the fence. > Now I can see your position about API much more clearly. Previously I thought only about algorithmic side of things. Similarly, pushing PG-specific declarations like RE_compile_and_cache() > into regex/regex.h is completely not the right thing for preserving a > clear library boundary (even positing that we want to expose that > function outside adt/regexp.c, which I'd rather we didn't). > > Given that you've already got a notion of callbacks provided by > contrib/pg_trgm, perhaps this can be fixed by pushing more of the work > into those callbacks, so that the heavy-duty data structures like the > hash table live over there and the API exposed by backend/regex/ is at > a much simpler level than what you have here. But right now I don't > see any usable library API here. > > Perhaps we could avoid these issues by defining a library API that > provides accessors like these for the opaque regex_t struct: > > * get the number of states in the CNFA > > * get the numbers of the initial and final states > > * get the number of out-arcs for the N'th state > > * get the out-arcs for the N'th state into a caller-provided array > (sized using the previous function), where each out-arc is represented > by a color and an end-state > > * get the number of character codes represented by color C > > * get the wchar codes for color C into a caller-provided array > > (The reason for letting the caller allocate the result arrays is so we > can use palloc for that; if we allocate it in backend/regex/ we must > use malloc, which will greatly increase the risk of leakages. Also, > as far as the color API goes, the above lets the caller decide how > many characters is "too many" to bother with.) > I like the this idea. Seems like clear and not over-engineered API. I can implement it. Could you propose something particular to do with RE_compile_and_cache in this patch? With this API we still need a way to get regex_t or equivalent from string. ------ With best regards, Alexander Korotkov.