> On Aug 22, 2021, at 7:47 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Yeah: the POSIX standard says what the error codes from regcomp() are.

I'm not sure how to interpret them.  The language "The implementation may 
define additional macros or constants using names beginning with REG_" at the 
bottom of the docs might imply that one can add to the list.

> POSIX defines
> 
>       REG_ESUBREG
>               Number in \digit invalid or in error.
> 
> which does seem to cover this case,

Hmm.  The number is neither invalid nor in error.  The only thing arguing in 
favor of using this code is that the error message contains the word 
"backreference":

        "REG_ESUBREG", "invalid backreference number"

which gives the reader a clue that the problem has something to do with a 
backreference in the pattern. But the POSIX wording "Number in \digit invalid 
or in error." doesn't even have that advantage.  We seem to be using the wrong 
return code.  I would think a more generic 

        REG_BADPAT
                Invalid regular expression.

would be the correct code, though arguably far less informative.

> so what I'd argue is that we should
> improve the "invalid backreference number" text rather than invent
> a nonstandard error code.  Maybe about like "backreference number does
> not exist or cannot be referenced from here"?

Assuming we leave the error codes alone, how about, "backreference number 
invalid or cannot be referenced from here"?

> (Admittedly, there's not a huge reason why src/backend/regex/ needs to
> stay compliant with the POSIX API today.  But I still have ambitions to
> split that out as a free-standing library someday, as Henry Spencer had
> originally planned to do.  So I'd rather stick to the API spec.)

That's fine.  Something else might kill that ambition, but this quibble over 
error messages isn't nearly important enough to do so.

> It might be worth checking what text is attributed to this error code
> by PCRE and other implementations of the POSIX spec.

Reading the docs at pcre.org, it appears that capture groups are allowed in 
look-around assertions.  Our engine doesn't do that, instead treating all 
groups within assertions as non-capturing.  I don't see anything about whether 
backreferences are allowed within pcre assertions, but I know that perl itself 
does allow them.  So maybe the error text used by other implementations is 
irrelevant?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to