2007/10/30, vincent <[EMAIL PROTECTED]>:
> > 2007/10/26, Patrick TJ McPhee <[EMAIL PROTECTED]>:
> >> In article <[EMAIL PROTECTED]>, cluster <[EMAIL PROTECTED]>
> >> wrote:
> >> % > How important is true randomness?
> >> %
> >> % The goal is an even distribution but currently I have not seen any w
> 2007/10/26, Patrick TJ McPhee <[EMAIL PROTECTED]>:
>> In article <[EMAIL PROTECTED]>, cluster <[EMAIL PROTECTED]>
>> wrote:
>> % > How important is true randomness?
>> %
>> % The goal is an even distribution but currently I have not seen any way
>> % to produce any kind of random sampling effici
2007/10/26, Patrick TJ McPhee <[EMAIL PROTECTED]>:
> In article <[EMAIL PROTECTED]>, cluster <[EMAIL PROTECTED]> wrote:
> % > How important is true randomness?
> %
> % The goal is an even distribution but currently I have not seen any way
> % to produce any kind of random sampling efficiently. Not
In article <[EMAIL PROTECTED]>, cluster <[EMAIL PROTECTED]> wrote:
% > How important is true randomness?
%
% The goal is an even distribution but currently I have not seen any way
% to produce any kind of random sampling efficiently. Notice the word
How about generating the ctid randomly? You
All you're doing is picking random =subsequences= from the same
permutation of the original data.
You have some good points in your reply. I am very much aware of this
non-random behavior you point out for the "static random-value column"
approach but at least it is fast, which is a requiremen
As far as I can tell, all of the proposed solutions lack sample
independence. Take the OP's suggested approach of doing something
like this:
SELECT * FROM mydata
WHERE mydata.random_number >= (SELECT RANDOM() OFFSET 0)
ORDER BY mydata.random_number ASC LIMIT 100
All you're doing is pi
Here's how I would do it. This assumes a static table that doesn't
change a lot.
1: find the row count n of the table.
2: randomly assign 1 through n to each row randomly. How to do this
is a whole not post.
3: create a sequence. If you always need 10 or 100 random rows, set
the increment to t
cluster wrote:
It has been suggested [1] that a good way to select K random rows
efficiently from a table is to
1) add a new column, random_number, to the table
and initialize it with random()
2) perform the following query:
SELECT *
FROM mydata
WHERE random_number
How important is true randomness?
The goal is an even distribution but currently I have not seen any way
to produce any kind of random sampling efficiently. Notice the word
"efficiently". The naive way of taking a random sample of size K:
(SELECT * FROM mydata ORDER BY random() LIMIT )
is
On Wed, Oct 24, 2007 at 10:59:46AM +0200, cluster wrote:
> Another way to look at the problem is: How do I sample a subset of size
> K efficiently? A query like
>
>SAMPLE 1000 OF
>(SELECT * FROM mydata WHERE )
How important is true randomness? To get the best possible distribution
most a
Another way to look at the problem is: How do I sample a subset of size
K efficiently? A query like
SAMPLE 1000 OF
(SELECT * FROM mydata WHERE )
should return 1000 random rows from the select statement so that two
consecutive evaluations of the query would only with very little
probabil
It has been suggested [1] that a good way to select K random rows
efficiently from a table is to
1) add a new column, random_number, to the table
and initialize it with random()
2) perform the following query:
SELECT *
FROM mydata
WHERE random_number >= (SELECT RANDO
12 matches
Mail list logo