On Thu, Mar 4, 2021, at 16:40, Tom Lane wrote: > "Joel Jacobson" <j...@compiler.org> writes: > > Having abandoned the cute idea that didn't work, > > here comes a new patch with a regexp_positions() instead returning > > setof record (start_pos integer[], end_pos integer[]). > > I wonder if a 2-D integer array wouldn't be a better idea, > ie {{startpos1,length1},{startpos2,length2},...}. My experience > with working with parallel arrays in SQL has been unpleasant.
I considered it, but I prefer two separate simple arrays for two reasons: a) more pedagogic, it's at least then obvious what values are start and end positions, then you only have to understand what the values means. b) 2-D doesn't arrays don't work well with unnest(). If you would unnest() the 2-D array you couldn't separate the start positions from the end positions, whereas with two separate, you could do: SELECT unnest(start_pos) AS start_pos, unnest(end_pos) AS end_pos FROM regexp_positions('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g'); start_pos | end_pos -----------+--------- 3 | 6 6 | 11 11 | 16 16 | 20 (4 rows) Can give some details on your unpleasant experiences of parallel arrays? > > Also, did you see > > https://www.postgresql.org/message-id/fc160ee0-c843-b024-29bb-97b5da61971f%40darold.net > > > Seems like there may be some overlap in these proposals. Yes, I saw it, it was sent shortly after my proposal, so I couldn't take it into account. Seems useful, except regexp_instr() seems redundant, I would rather have regexp_positions(), but maybe regexp_instr() should also be added for compatibility reasons. /Joel