Re: [GENERAL] Selecting K random rows - efficiently!

Patrick TJ McPhee Mon, 29 Oct 2007 19:37:36 -0800

In article <[EMAIL PROTECTED]>, cluster  <[EMAIL PROTECTED]> wrote:
% > How important is true randomness?
% 
% The goal is an even distribution but currently I have not seen any way 
% to produce any kind of random sampling efficiently. Notice the word


How about generating the ctid randomly? You can get the number of pages
from pg_class and estimate the number of rows either using the number
of tuples in pg_class or just based on what you know about the data.
Then just generate two series of random numbers, one from 0 to the number
of pages and the other from 1 to the number of rows per page, and keep
picking rows until you have enough numbers. Assuming there aren't too
many dead tuples and your estimates are good, this should retrieve n rows
with roughly n look-ups.

If your estimates are low, there will be tuples which can never be selected,
and so far as I know, there's no way to construct a random ctid in a stock
postgres database, but apart from that it seems like a good plan. If
efficiency is important, you could create a C function which returns a
series of random tids and join on that.
-- 

Patrick TJ McPhee
North York  Canada
[EMAIL PROTECTED]

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Selecting K random rows - efficiently!

Reply via email to