The following bug has been logged online: Bug reference: 4044 Logged by: Rui Martins Email address: [EMAIL PROTECTED] PostgreSQL version: 8.3 Operating system: Win2000 Description: Incorrect RegExp substring Output Details:
To setup for test, run these SQL commands CREATE TABLE TEST_REGEXP ( BedNo VARCHAR(20) ); INSERT INTO TEST_REGEXP ( BedNo ) VALUES ( '123' ); INSERT INTO TEST_REGEXP ( BedNo ) VALUES ( '4325:1' ); INSERT INTO TEST_REGEXP ( BedNo ) VALUES ( '2464M' ); INSERT INTO TEST_REGEXP ( BedNo ) VALUES ( '5678M:2' ); INSERT INTO TEST_REGEXP ( BedNo ) VALUES ( '453L:1' ); >From the following link http://www.postgresql.org/docs/8.3/static/functions-matching.html We can read the following text: The substring function with two parameters, substring(string from pattern), provides extraction of a substring that matches a POSIX regular expression pattern. It returns null if there is no match, otherwise the portion of the text that matched the pattern. But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned. You can put parentheses around the whole expression if you want to use parentheses within it without triggering this exception. If you need parentheses in the pattern before the subexpression you want to extract, see the non-capturing parentheses described below. -- -- -- -- -- No suppose we want to split up the "BedNo" column, into its parts (DoorNumber, RoomSize and BedNumber) SELECT BedNo, SUBSTRING( BedNo FROM '^([[:digit:]]+)[a-zA-Z]*(:[[:digit:]]+)?$' ) AS DoorNumber, SUBSTRING( BedNo FROM '^[[:digit:]]+([a-zA-Z]*)(:[[:digit:]]+)?$' ) AS RoomSize, SUBSTRING( BedNo FROM '^[[:digit:]]+[a-zA-Z]*(:[[:digit:]]+)?$' ) AS BedNumber FROM TEST_REGEXP; Or in it's other form SELECT BedNo, SUBSTRING( BedNo, '^([[:digit:]]+)[a-zA-Z]*(:[[:digit:]]+)?$' ) AS DoorNumber, SUBSTRING( BedNo, '^[[:digit:]]+([a-zA-Z]*)(:[[:digit:]]+)?$' ) AS RoomSize, SUBSTRING( BedNo, '^[[:digit:]]+[a-zA-Z]*(:[[:digit:]]+)?$' ) AS BedNumber FROM TEST_REGEXP; Both will return the same result: BedNo DoorNumber RoomSize BedNumber 123 123 123 4325:1 4325 :1 2464M 2464 M 2464M 5678M:2 5678 M :2 453L:1 453 L :1 NOTE: Tabs may note be seen correctly on a web interface. This is clearly, not the expected result, in particular for BedNo IN ( '123', '2464M' ) The BedNumber returned in these cases is NOT logical! However, the manual is NOT EXPLICIT in what happens, if the returned MATCHED parentheses part is the equivalent of an empty string! Although it states: "But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the one whose left parenthesis comes first) is returned." Apparently, the function is returning the entire MATCHED string, instead of just the parenthesized subexpression. I would expect the result for BedNumber to be either NULL or the EMPTY String, and the later seems more logical. But the documentation doesn't state which should be returned! Not withstanding, the expected result should be: BedNo DoorNumber RoomSize BedNumber 123 123 4325:1 4325 :1 2464M 2464 M 5678M:2 5678 M :2 453L:1 453 L :1 NOTE: Tabs may note be seen correctly on a web interface. -- Hack Note for this specific case ! For this specific case, we can do a hack, and change the RegExp for BedNumber, like in the next SELECT. SELECT BedNo, SUBSTRING( BedNo FROM '^([[:digit:]]+)[a-zA-Z]*(:[[:digit:]]+)?$' ) AS DoorNumber, SUBSTRING( BedNo FROM '^[[:digit:]]+([a-zA-Z]*)(:[[:digit:]]+)?$' ) AS RoomSize, SUBSTRING( BedNo FROM '^[[:digit:]]+[a-zA-Z]*:([[:digit:]]+)?$' ) AS BedNumber FROM TEST_REGEXP; or alternatively SELECT BedNo, SUBSTRING( BedNo, '^([[:digit:]]+)[a-zA-Z]*(:[[:digit:]]+)?$' ) AS DoorNumber, SUBSTRING( BedNo, '^[[:digit:]]+([a-zA-Z]*)(:[[:digit:]]+)?$' ) AS RoomSize, SUBSTRING( BedNo, '^[[:digit:]]+[a-zA-Z]*:([[:digit:]]+)?$' ) AS BedNumber FROM TEST_REGEXP This will return NULL when there is no BedNumber, by forcing the regExp to fail the match. But this only works, because the format uses ":" in that specific location, if there was no ":" anywhere, we would NOT get away so easily. Hope to have bee of help, in finding this documentation issue and implementation bug. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs