Re: Should we optimize the `ORDER BY random() LIMIT x` case?

Andrei Lepikhov Thu, 15 May 2025 01:01:36 -0700

On 15/5/2025 01:41, Aleksander Alekseev wrote:

One can do `SELECT (the query above) ORDER BY random() LIMIT x` but
this produces an inefficient plan. Alternatively one could create
temporary tables using `CREATE TEMP TABLE ... AS SELECT * FROM tbl
TABLESAMPLE BERNOULLI(20)` but this is inconvenient and would be
suboptimal even if we supported global temporary tables.


1. Do you think there might be value in addressing this issue?
2. If yes, how would you suggest addressing it from the UI point of
view - by adding a special syntax, some sort of aggregate function, or
...?

I think I got your point, but just to be sure:

Do you want to have some random sampling from an arbitrary subquery withthe guarantee that N distinct (by tid) tuples will be produced or allthe tuples if the underlying subquery produces less than N?

What kind of optimisation trick may the optimiser use here to provide anoptimal plan? As I see it, it will need to think that all the tuplesshould be returned from the subquery. The only profit is to skip sortingthe massive sample.

As a palliative, you may laterally join your subquery with a storedprocedure, which will process the incoming tuple and implement the logicof random sampling.

Implementation of that in the core will need a new "skip result" nodeand new syntax, which may be too much if a workaround is found.


--
regards, Andrei Lepikhov

Re: Should we optimize the `ORDER BY random() LIMIT x` case?

Reply via email to