On Mon, Nov 26, 2012 at 4:55 PM, Heikki Linnakangas <hlinnakan...@vmware.com > wrote: > > Great, that top-level comment helped tremendously! I feel enlightened. > > I fixed some spelling, formatting etc. trivial stuff while reading through > the patch, see attached. Below is some feedback on the details: > > * I don't like the PG_TRY/CATCH trick. It's not generally safe to catch an > error, without propagating it further or rolling back the whole > (sub)transation. It might work in this case, as you're only suppressing > errors with the special sqlcode that are used in the same file, but it > nevertheless feels naughty. I believe none of the limits that are being > checked are strict; it's OK to exceed the limits somewhat, as long as you > terminate the processing in a reasonable time, in case of pathological > input. I'd suggest putting an explicit check for the limits somewhere, and > not rely on ereport(). Something like this, in the code that recurses: > > if (trgmCNFA->arcsCount > MAX_RESULT_ARCS || > hash_get_num_entries(trgmCNFA-**>states) > MAX_RESULT_STATES) > { > trgmCNFA->overflowed = true; > return; > } >
PG_TRY/CATCH trick is replaced with some number of if/return. Code becomes a bit more complicated, but your notes does matter. And then check for the overflowed flag at the top level. > > * This part of the high-level comment was not clear to me: > > * States of the graph produced in the first stage are marked with >> "keys". Key is a pair >> * of a "prefix" and a state of the original automaton. "Prefix" is a last >> * characters. So, knowing the prefix is enough to know what is a trigram >> when we read some new >> * character. However, we can know single character of prefix or don't >> know any >> * characters of it. Each state of resulting graph have an "enter key" >> (with that >> * key we've entered this state) and a set of keys which are reachable >> without >> * reading any predictable trigram. The algorithm of processing each state >> * of resulting graph are so: >> * 1) Add all keys which achievable without reading of any predictable >> trigram. >> * 2) Add outgoing arcs labeled with trigrams. >> * Step 2 leads to creation of new states and recursively processing >> them. So, >> * we use depth-first algorithm. >> > > I didn't understand that. Can you elaborate? It might help to work through > an example, with some ascii art depicting the graph. > This comment is extended with example. > * It would be nice to add some comments to TrgmCNFA struct, explaining > which fields are valid at which stages. For example, it seems that 'trgms' > array is calculated only after building the CNFA, by getTrgmVector() > function, while arcsCount is updated on the fly, while recursing in the > getState() function. > TrgmCNFA comment is extended with this. > * What is the representation used for the path matrix? Needs a comment. > Comments are added to PackedTrgmPaths and TrgmStatePath. > * What do the getColorinfo() getColorInfo comment now references to ColorInfo struct which have comment. and scanColorMap() functions do? scanColorMap comment now have description of colormap structure. > What exactly does a color represent? This is added to the top comment. > What's the tradeoff in choosing MAX_COLOR_CHARS? Comment is added to the macro. ------ With best regards, Alexander Korotkov.
trgm-regexp-0.6.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers