On Thu, Mar 4, 2021, at 16:40, Tom Lane wrote:
> "Joel Jacobson" <j...@compiler.org> writes:
> > Having abandoned the cute idea that didn't work,
> > here comes a new patch with a regexp_positions() instead returning
> > setof record (start_pos integer[], end_pos integer[]).
> 
> I wonder if a 2-D integer array wouldn't be a better idea,
> ie {{startpos1,length1},{startpos2,length2},...}.  My experience
> with working with parallel arrays in SQL has been unpleasant.

I considered it, but I prefer two separate simple arrays for two reasons:

a) more pedagogic, it's at least then obvious what values are start and end 
positions,
then you only have to understand what the values means.

b) 2-D doesn't arrays don't work well with unnest().
If you would unnest() the 2-D array you couldn't separate the start positions 
from the end positions,
whereas with two separate, you could do:

SELECT unnest(start_pos) AS start_pos, unnest(end_pos) AS end_pos FROM 
regexp_positions('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g');
start_pos | end_pos
-----------+---------
         3 |       6
         6 |      11
        11 |      16
        16 |      20
(4 rows)

Can give some details on your unpleasant experiences of parallel arrays?


> 
> Also, did you see
> 
> https://www.postgresql.org/message-id/fc160ee0-c843-b024-29bb-97b5da61971f%40darold.net
> 
> 
> Seems like there may be some overlap in these proposals.

Yes, I saw it, it was sent shortly after my proposal, so I couldn't take it 
into account.
Seems useful, except regexp_instr() seems redundant, I would rather have 
regexp_positions(),
but maybe regexp_instr() should also be added for compatibility reasons.

/Joel

Reply via email to