On Tue, Mar 2, 2021, at 01:12, Mark Dilger wrote: > I like the idea so I did a bit of testing. I think the following should not > error, but does: > > +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i'); > +ERROR: range lower bound must be less than or equal to range upper bound
Doh! How stupid of me. I realize now I had a off-by-one thinko in my 0001 patch using int4range. I didn't use the raw "so" and "eo" values in regexp.c like I should have, instead, I incorrectly used (so + 1) as the startpos, and just eo as the endpos. This is what caused all the problems. The fix is simple: - lower.val = Int32GetDatum(so + 1); + lower.val = Int32GetDatum(so); The example that gave the error now works properly: SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i'); regexp_positions ------------------ {"[6,7)"} (1 row) I've also created a SQL PoC of the composite range type idea, and convenience wrapper functions for int4range and int8range. CREATE TYPE range AS (start int8, stop int8); Helper functions: range(start int8, stop int8) -> range range(int8range) -> range range(int4range) -> range range(int8range[]) -> range[] range(int4range[]) -> range[] Demo: regexp_positions() returns setof int4range[]: SELECT r FROM regexp_positions('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g') AS r; r ----------------------- {"[3,7)","[6,12)"} {"[11,17)","[16,21)"} (2 rows) Convert int4range[] -> range[]: SELECT range(r) FROM regexp_positions('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g') AS r; range ----------------------- {"(3,6)","(6,11)"} {"(11,16)","(16,20)"} (2 rows) "start" and "stop" fields: SELECT (range(r[1])).* FROM regexp_positions('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g') AS r; start | stop -------+------ 3 | 6 11 | 16 (2 rows) zero-length match at beginning: SELECT r FROM regexp_positions('','^','g') AS r; r ----------- {"[0,1)"} (1 row) SELECT (range(r[1])).* FROM regexp_positions('','^','g') AS r; start | stop -------+------ 0 | 0 (1 row) My conclusion is that we should use setof int4range[] as the return value for regexp_positions(). New patch attached. The composite range type and helper functions are of course not at all necessary, but I think they would be a nice addition, to make it easier to work with ranges for composite types. I intentionally didn't create anyrange versions of them, since they can only support composite types, since they don't require the inclusive/exclusive semantics. /Joel
range.sql
Description: Binary data
0003-regexp-positions.patch
Description: Binary data