On Sun, Mar 29, 2009 at 10:24 AM, marcin mank wrote:
> I think (a part of) Your problem is that order by random() is O(N
> logN) complexity, while You are after O(N) .
>
> The solution (in pseudocode)
>
[snip]
OK, I may be guiding You the wrong way
select g,g,g,g from generate_series(1,2500
I think (a part of) Your problem is that order by random() is O(N
logN) complexity, while You are after O(N) .
The solution (in pseudocode)
random_sample(resultset,K):
result := first K rows from resultset
resultset.scrollto(K+1)
p = K+1
while(resultset.hasMoreRows())
row = result
On Mar 27, 2009, at 8:38 PM, Mike Charnoky wrote:
Hi Alban,
I experimented with your sample() solution and was rather baffled by
the results, as it was actually 2x-3x slower than doing an ORDER BY
RANDOM() LIMIT n. I even precalculated the size of the result set,
so that only one sequen
On Mar 25, 2009, at 5:09 PM, Mike Charnoky wrote:
Due to the nature of the sampling (need to limit using several
parameters with a WHERE clause), I can't just generate random
numbers to select data that I need. Looks like I am stuck using
ORDER BY RANDOM(). The only other option at this p
Mike Charnoky wrote:
Scott Marlowe wrote:
On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky wrote:
The random sampling query is normally pretty snappy. It usually takes on
the order of 1 second to sample a few thousand rows of data out of a few
million. The sampling is consistently quick, too.
Scott Marlowe wrote:
On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky wrote:
The random sampling query is normally pretty snappy. It usually takes on
the order of 1 second to sample a few thousand rows of data out of a few
million. The sampling is consistently quick, too. However, on some days,
Gregory Stark wrote:
Tom Lane writes:
Mike Charnoky writes:
The sampling query which runs really slow on some days looks something
like this:
INSERT INTO sampled_data
(item_name, timestmp, ... )
SELECT item_name, timestmp, ... )
FROM raw_data
WHERE timestmp >= ? and timestmp < ?
Tom Lane writes:
> Mike Charnoky writes:
>> The sampling query which runs really slow on some days looks something
>> like this:
>
>> INSERT INTO sampled_data
>>(item_name, timestmp, ... )
>>SELECT item_name, timestmp, ... )
>>FROM raw_data
>>WHERE timestmp >= ? and timestmp < ?
On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky wrote:
> The random sampling query is normally pretty snappy. It usually takes on
> the order of 1 second to sample a few thousand rows of data out of a few
> million. The sampling is consistently quick, too. However, on some days,
> the sampling st
The random sampling query is normally pretty snappy. It usually takes
on the order of 1 second to sample a few thousand rows of data out of a
few million. The sampling is consistently quick, too. However, on some
days, the sampling starts off quick, then when the process starts
sampling from
Yeah, I wish I didn't have to resort to using ORDER BY RANDOM(). I
really wanted to use something like TABLESAMPLE, but that is not
implemented in PostgreSQL. Unfortunately, I cannot use use the random
sampling technique you mentioned, since I need to select samples across
various strata of t
Mike Charnoky writes:
> The sampling query which runs really slow on some days looks something
> like this:
> INSERT INTO sampled_data
>(item_name, timestmp, ... )
>SELECT item_name, timestmp, ... )
>FROM raw_data
>WHERE timestmp >= ? and timestmp < ?
>AND item_name=?
>AND
On Mon, Mar 9, 2009 at 1:55 PM, Mike Charnoky wrote:
> Hello,
>
> I'm looking for some insight on an intermittent PostgreSQL performance
> problem that has been very troublesome. Using PG 8.3.5 on a server
> running CentOS 5 2.6.18-8.el5 (Dual Xeon 2.00 GHz, 4 GB RAM, RAID-10
> SCSI 600GB array).
13 matches
Mail list logo