Re: [HACKERS] WIP: index support for regexp search

Alexander Korotkov Mon, 26 Nov 2012 11:50:18 -0800

On Mon, Nov 26, 2012 at 4:55 PM, Heikki Linnakangas <hlinnakan...@vmware.com
> wrote:
>
> Great, that top-level comment helped tremendously! I feel enlightened.
>
> I fixed some spelling, formatting etc. trivial stuff while reading through
> the patch, see attached. Below is some feedback on the details:
>
> * I don't like the PG_TRY/CATCH trick. It's not generally safe to catch an
> error, without propagating it further or rolling back the whole
> (sub)transation. It might work in this case, as you're only suppressing
> errors with the special sqlcode that are used in the same file, but it
> nevertheless feels naughty. I believe none of the limits that are being
> checked are strict; it's OK to exceed the limits somewhat, as long as you
> terminate the processing in a reasonable time, in case of pathological
> input. I'd suggest putting an explicit check for the limits somewhere, and
> not rely on ereport(). Something like this, in the code that recurses:
>
> if (trgmCNFA->arcsCount > MAX_RESULT_ARCS ||
>     hash_get_num_entries(trgmCNFA-**>states) > MAX_RESULT_STATES)
> {
>         trgmCNFA->overflowed = true;
>         return;
> }
>


PG_TRY/CATCH trick is replaced with some number of if/return. Code becomes
a bit more complicated, but your notes does matter.

And then check for the overflowed flag at the top level.
>
> * This part of the high-level comment was not clear to me:
>
>   * States of the graph produced in the first stage are marked with
>> "keys". Key is a pair
>>  * of a "prefix" and a state of the original automaton. "Prefix" is a last
>>  * characters. So, knowing the prefix is enough to know what is a trigram
>> when we read some new
>>  * character. However, we can know single character of prefix or don't
>> know any
>>  * characters of it. Each state of resulting graph have an "enter key"
>> (with that
>>  * key we've entered this state) and a set of keys which are reachable
>> without
>>  * reading any predictable trigram. The algorithm of processing each state
>>  * of resulting graph are so:
>>  * 1) Add all keys which achievable without reading of any predictable
>> trigram.
>>  * 2) Add outgoing arcs labeled with trigrams.
>>  * Step 2 leads to creation of new states and recursively processing
>> them. So,
>>  * we use depth-first algorithm.
>>
>
> I didn't understand that. Can you elaborate? It might help to work through
> an example, with some ascii art depicting the graph.
>

This comment is extended with example.


> * It would be nice to add some comments to TrgmCNFA struct, explaining
> which fields are valid at which stages. For example, it seems that 'trgms'
> array is calculated only after building the CNFA, by getTrgmVector()
> function, while arcsCount is updated on the fly, while recursing in the
> getState() function.
>

TrgmCNFA comment is extended with this.


> * What is the representation used for the path matrix? Needs a comment.
>

Comments are added to PackedTrgmPaths and TrgmStatePath.


> * What do the getColorinfo()


getColorInfo comment now references to ColorInfo struct which have comment.

and scanColorMap() functions do?


 scanColorMap comment now have description of colormap structure.


> What exactly does a color represent?


This is added to the top comment.


> What's the tradeoff in choosing MAX_COLOR_CHARS?


Comment is added to the macro.

------
With best regards,
Alexander Korotkov.

trgm-regexp-0.6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

Reply via email to