On Tue, Mar 2, 2021, at 01:12, Mark Dilger wrote:
> I like the idea so I did a bit of testing.  I think the following should not 
> error, but does:
> 
> +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
> +ERROR:  range lower bound must be less than or equal to range upper bound

Doh! How stupid of me. I realize now I had a off-by-one thinko in my 0001 patch 
using int4range.

I didn't use the raw "so" and "eo" values in regexp.c like I should have,
instead, I incorrectly used (so + 1) as the startpos,
and just eo as the endpos.

This is what caused all the problems.

The fix is simple:
-  lower.val = Int32GetDatum(so + 1);
+ lower.val = Int32GetDatum(so);

The example that gave the error now works properly:

SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
regexp_positions
------------------
{"[6,7)"}
(1 row)

I've also created a SQL PoC of the composite range type idea,
and convenience wrapper functions for int4range and int8range.

CREATE TYPE range AS (start int8, stop int8);

Helper functions:
range(start int8, stop int8) -> range
range(int8range) -> range
range(int4range) -> range
range(int8range[]) -> range[]
range(int4range[]) -> range[]

Demo:

regexp_positions() returns setof int4range[]:

SELECT r FROM regexp_positions('foobarbequebazilbarfbonk', 
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
           r
-----------------------
{"[3,7)","[6,12)"}
{"[11,17)","[16,21)"}
(2 rows)

Convert int4range[] -> range[]:

SELECT range(r) FROM regexp_positions('foobarbequebazilbarfbonk', 
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
         range
-----------------------
{"(3,6)","(6,11)"}
{"(11,16)","(16,20)"}
(2 rows)

"start" and "stop" fields:

SELECT (range(r[1])).* FROM regexp_positions('foobarbequebazilbarfbonk', 
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
start | stop
-------+------
     3 |    6
    11 |   16
(2 rows)

zero-length match at beginning:

SELECT r FROM regexp_positions('','^','g') AS r;
     r
-----------
{"[0,1)"}
(1 row)

SELECT (range(r[1])).* FROM regexp_positions('','^','g') AS r;
start | stop
-------+------
     0 |    0
(1 row)

My conclusion is that we should use setof int4range[] as the return value for 
regexp_positions().

New patch attached.

The composite range type and helper functions are of course not at all 
necessary,
but I think they would be a nice addition, to make it easier to work with ranges
for composite types. I intentionally didn't create anyrange versions of them,
since they can only support composite types,
since they don't require the inclusive/exclusive semantics.

/Joel

Attachment: range.sql
Description: Binary data

Attachment: 0003-regexp-positions.patch
Description: Binary data

Reply via email to