Re: [GENERAL] intermittant performance problem

2009-03-29 Thread marcin mank
On Sun, Mar 29, 2009 at 10:24 AM, marcin mank wrote: > I think (a part of) Your problem is that order by random() is O(N > logN) complexity, while You are after O(N) . > > The solution (in pseudocode) > [snip] OK, I may be guiding You the wrong way select g,g,g,g from generate_series(1,2500

Re: [GENERAL] intermittant performance problem

2009-03-29 Thread marcin mank
I think (a part of) Your problem is that order by random() is O(N logN) complexity, while You are after O(N) . The solution (in pseudocode) random_sample(resultset,K): result := first K rows from resultset resultset.scrollto(K+1) p = K+1 while(resultset.hasMoreRows()) row = result

Re: [GENERAL] intermittant performance problem

2009-03-28 Thread Alban Hertroys
On Mar 27, 2009, at 8:38 PM, Mike Charnoky wrote: Hi Alban, I experimented with your sample() solution and was rather baffled by the results, as it was actually 2x-3x slower than doing an ORDER BY RANDOM() LIMIT n. I even precalculated the size of the result set, so that only one sequen

Re: [GENERAL] intermittant performance problem

2009-03-26 Thread Alban Hertroys
On Mar 25, 2009, at 5:09 PM, Mike Charnoky wrote: Due to the nature of the sampling (need to limit using several parameters with a WHERE clause), I can't just generate random numbers to select data that I need. Looks like I am stuck using ORDER BY RANDOM(). The only other option at this p

Re: [GENERAL] intermittant performance problem

2009-03-25 Thread Mike Charnoky
Mike Charnoky wrote: Scott Marlowe wrote: On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky wrote: The random sampling query is normally pretty snappy. It usually takes on the order of 1 second to sample a few thousand rows of data out of a few million. The sampling is consistently quick, too.

Re: [GENERAL] intermittant performance problem

2009-03-16 Thread Mike Charnoky
Scott Marlowe wrote: On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky wrote: The random sampling query is normally pretty snappy. It usually takes on the order of 1 second to sample a few thousand rows of data out of a few million. The sampling is consistently quick, too. However, on some days,

Re: [GENERAL] intermittant performance problem

2009-03-10 Thread Mike Charnoky
Gregory Stark wrote: Tom Lane writes: Mike Charnoky writes: The sampling query which runs really slow on some days looks something like this: INSERT INTO sampled_data (item_name, timestmp, ... ) SELECT item_name, timestmp, ... ) FROM raw_data WHERE timestmp >= ? and timestmp < ?

Re: [GENERAL] intermittant performance problem

2009-03-10 Thread Gregory Stark
Tom Lane writes: > Mike Charnoky writes: >> The sampling query which runs really slow on some days looks something >> like this: > >> INSERT INTO sampled_data >>(item_name, timestmp, ... ) >>SELECT item_name, timestmp, ... ) >>FROM raw_data >>WHERE timestmp >= ? and timestmp < ?

Re: [GENERAL] intermittant performance problem

2009-03-09 Thread Scott Marlowe
On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky wrote: > The random sampling query is normally pretty snappy.  It usually takes on > the order of 1 second to sample a few thousand rows of data out of a few > million.  The sampling is consistently quick, too.  However, on some days, > the sampling st

Re: [GENERAL] intermittant performance problem

2009-03-09 Thread Mike Charnoky
The random sampling query is normally pretty snappy. It usually takes on the order of 1 second to sample a few thousand rows of data out of a few million. The sampling is consistently quick, too. However, on some days, the sampling starts off quick, then when the process starts sampling from

Re: [GENERAL] intermittant performance problem

2009-03-09 Thread Mike Charnoky
Yeah, I wish I didn't have to resort to using ORDER BY RANDOM(). I really wanted to use something like TABLESAMPLE, but that is not implemented in PostgreSQL. Unfortunately, I cannot use use the random sampling technique you mentioned, since I need to select samples across various strata of t

Re: [GENERAL] intermittant performance problem

2009-03-09 Thread Tom Lane
Mike Charnoky writes: > The sampling query which runs really slow on some days looks something > like this: > INSERT INTO sampled_data >(item_name, timestmp, ... ) >SELECT item_name, timestmp, ... ) >FROM raw_data >WHERE timestmp >= ? and timestmp < ? >AND item_name=? >AND

Re: [GENERAL] intermittant performance problem

2009-03-09 Thread Scott Marlowe
On Mon, Mar 9, 2009 at 1:55 PM, Mike Charnoky wrote: > Hello, > > I'm looking for some insight on an intermittent PostgreSQL performance > problem that has been very troublesome.  Using PG 8.3.5 on a server > running CentOS 5 2.6.18-8.el5 (Dual Xeon 2.00 GHz, 4 GB RAM, RAID-10 > SCSI 600GB array).