On Tue, Mar 9, 2021, at 17:42, Tom Lane wrote: > "Joel Jacobson" <j...@compiler.org> writes: > > Tom - can you please give details on your unpleasant experiences with > > parallel arrays? > > The problems I can recall running into were basically down to not having > an easy way to iterate through parallel arrays. There are ways to do > that in SQL, certainly, but they all constrain how you write the query, > and usually force ugly stuff like splitting it into sub-selects.
I see now what you mean, many thanks for explaining. > > As an example, presuming that regexp_positions is defined along the > lines of > > regexp_positions(str text, pat text, out starts int[], out lengths int[]) > returns setof record +1 I think this is the most feasible best option so far. Attached is a patch implementing it this way. I changed the start to begin at 1, since this is how position ( substring text IN string text ) → integer works. SELECT * FROM regexp_positions('foobarbequebaz', '^', 'g'); starts | lengths --------+--------- {1} | {0} (1 row) SELECT * FROM regexp_positions('foobarbequebaz', 'ba.', 'g'); starts | lengths --------+--------- {4} | {3} {12} | {3} (2 rows) Mark's examples: SELECT * FROM regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i'); starts | lengths --------+--------- {7} | {0} (1 row) SELECT * FROM regexp_positions('foobarbequebaz', '(?<=z)', 'g'); starts | lengths --------+--------- {15} | {0} (1 row) I've also tested your template queries: > > then to actually get the identified substrings you'd have to do something > like > > select > substring([input string] from starts[i] for lengths[i]) > from > regexp_positions([input string], [pattern]) r, > lateral > generate_series(1, array_length(starts, 1)) i; select substring('foobarbequebaz' from starts[i] for lengths[i]) from regexp_positions('foobarbequebaz', 'ba.', 'g') r, lateral generate_series(1, array_length(starts, 1)) i; substring ----------- bar baz (2 rows) > I think the last time I confronted this, we didn't have multi-array > UNNEST. Now that we do, we can get rid of the generate_series(), > but it's still not beautiful: > > select > substring([input string] from s for l) > from > regexp_positions([input string], [pattern]) r, > lateral > unnest(starts, lengths) u(s,l); select substring('foobarbequebaz' from s for l) from regexp_positions('foobarbequebaz', 'ba.', 'g') r, lateral unnest(starts, lengths) u(s,l); substring ----------- bar baz (2 rows) > Having said that, the other alternative with a 2-D array: > > regexp_positions(str text, pat text) returns setof int[] > > seems to still need UNNEST, though now it's not the magic multi-array > UNNEST but this slicing version: > > select > substring([input string] from u[1] for u[2]) > from > regexp_positions([input string], [pattern]) r, > lateral > unnest_slice(r, 1) u; Unable to test this one since there is no unnest_slice() (yet) > > Anyway, I'd counsel trying to write out SQL implementations > of regexp_matches() and other useful things based on any > particular regexp_positions() API you might be thinking about. > Can we do anything useful without a LATERAL UNNEST thingie? > Are some of them more legible than others? Hmm, I cannot think of a way. /Joel
0004-regexp-positions.patch
Description: Binary data