Hi Tom, Following example does not work as expected:
-- Should return TRUE but returning FALSE SELECT 'Programmer' ~ '(\w).*?\1' as t; -- Should return P, a and er i.e. 3 rows but returning just one row with -- value Programmer SELECT REGEXP_SPLIT_TO_TABLE('Programmer','(\w).*?\1'); Initially I thought that back-reference is not supported and thus we are getting those result. But while trying few cases related to back-reference I see that it is giving an error "invalid back-reference number", it means we do have support for back-reference. So I tried few more scenarios. And I observed that if we have input string as 'rogrammer' we are getting perfect results i.e. when very first character is back-referenced. But failing when first character is not part of back-reference. This is happening only for shorter pattern matching. Longer match '(\w).*\1' works well. Clearly, above example has two matching pattern 'rogr' and 'mm'. So I started debugging it to get a root cause for this. It is too complex to understand what exactly is happening here. But while debugging I got this chunk in regexec.c:cfindloop() function from where we are returning with REG_NOMATCH { /* no point in trying again */ *coldp = cold; return REG_NOMATCH; } It was starting at 'P' and ending in above block. It was strange that why it is not continuing with next character i.e. from 'r'. So I replaced above chunk with break statement so that it will continue from next character. This trick worked well. Since I have very little idea at this code area, I myself unsure that it is indeed a correct fix. And thus thought of mailing on hackers. I have attached patch which does above changes along with few tests in regex.sql Your valuable insights please... Thanks -- Jeevan B Chalke
regexp_backref_shorter.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers