Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-17 Thread Jignesh K. Shah

You usertime is way too high for T2000...

If you have a 6 core machine with 24 threads, it says all 24 threads are 
reported as being busy with iostat output.


Best way to debug this is  use

prstat -amL
(or if you are dumping it in a file prstat -amLc > prstat.txt)

and find the pids with high user cpu time  and then use the usrcall.d on 
few of those pids.


Also how many database connections do you have and what's the type of 
query run by each connection?


-Jignesh



Arjen van der Meijden wrote:

Hi Jignesh,

The settings from that 'special T2000 dvd' differed from the recommended 
settings on the website you provided. But I don't see much difference in 
performance with any of the adjustments, it appears to be more or less 
the same.


Here are a few iostat lines by the way:

sd0   sd1   sd2   nfs1   cpu
kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
  7   1   12  958  50   350   070   00   13  1  0 85
  0   00  2353 29630   000   00   92  7  0  1
  0   00  2062 32620   000   00   93  7  0  0
  1   11  1575 35000   000   00   92  7  0  1
  0   00  1628 36200   000   00   92  8  0  1

It appears to be doing a little less kps/tps on sd1 when I restore my 
own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 2k 
work mem).


Is it possible to trace the stack's for semsys, like the memcpy-traces, 
or are those of no interest here?


Best regards,

Arjen


On 16-5-2006 17:52, Jignesh K. Shah wrote:


Hi Arjen,

Can you send me my colleagues's names in a private email?

One of the drawbacks of the syscall.d script is relative timings and 
hence if system CPU usage is very low, it gives the relative weightage 
about what portion in that low is associated with what call.. So even 
if you have say..1% system time.. it says that most of it was IO 
related or semsys related. So iostat output with -c option to include 
CPU times helps to put it in  the right perspective.



Also do check the tunables mentioned and make sure they are set.

Regards,
Jignesh


Arjen van der Meijden wrote:


Hi Jignesh,

Jignesh K. Shah wrote:


Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too 
many connections and they are contending to get a lock.. which is 
potentially the WAL log lock


* llseek is high which means you can obviously gain a bit with the 
right file system/files tuning by caching them right.


Have you set the values for Solaris for T2000 tuned for Postgresql?




Not particularly, we got a "special T2000 Solaris dvd" from your 
colleagues here in the Netherlands and installed that (actually one 
of your colleagues did). Doing so all the "better default" 
/etc/system-settings are supposed to be set. I haven't really checked 
that they are, since two of your colleagues have been working on it 
for the mysql-version of the benchmark and I assumed they'd have 
verified that.



Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

Try specially the /etc/system and postgresql.conf changes  and see 
if it changes/improves your performance.




I will see that those tunables are verified to be set.

I am a bit surprised though about your remarks, since they'd point at 
the I/O being in the way? But we only have about 600k/sec i/o 
according to vmstat. The database easily fits in memory.
In total I logged about 500k queries of which only 70k where altering 
queries, of which almost all where inserts in log-tables which aren't 
actively read in this benchmark.


But I'll give it a try.

Best regards,

Arjen



Arjen van der Meijden wrote:


Hi List,

In the past few weeks we have been developing a read-heavy 
mysql-benchmark to have an alternative take at 
cpu/platform-performance. Not really to have a look at how fast 
mysql can be.


This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled 
after our website's production database and the load generated on 
it is modelled after a simplified version of our visitor behaviour.


Long story short, we think the test is a nice example of the 
relatively lightweight, read-heavy webapplications out there and 
therefore decided to have a go at postgresql as well.
Of course the queries and indexes have been adjusted to (by our 
knowledge) best suit postgresql, while maintaining the same output 
to the application/interface layer. While the initial structure 
only got postgresql at about half the performance of mysql 4.1.x, 
the current version of our postgresql-benchmark has quite similar 
results to mysql 4.1.x, but both are quite a bit slower than 5.0.x 
(I think its about 30-40% faster).


Since the results from those benchmarks are not yet public (they 
will be put together in a story at our website), I won't go into 
too much

Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-17 Thread Arjen van der Meijden
We have the 4 core machine. However, these numbers are taken during a 
benchmark, not normal work load. So the output should display the system 
being working fully ;)


So its postgres doing a lot of work and you already had a look at the 
usrcall for that.


The benchmark just tries to do the queries for "random page visits". 
This totals up to about some 80 different queries being executed with 
mostly random parameters. The workload is generated using php so there 
are no connection pools, nor prepared statements.


The queries vary, but are all relatively lightweight queries with less 
than 6 or 7 joinable tables. Almost all queries can use indexes. Most 
tables are under a few MB of data, although there are a few larger than 
that. Most records are relatively small, consisting of mostly numbers 
(id's and such).


The results presented here was with 25 concurrent connections.

Best regards,

Arjen


Jignesh K. Shah wrote:

You usertime is way too high for T2000...

If you have a 6 core machine with 24 threads, it says all 24 threads are 
reported as being busy with iostat output.


Best way to debug this is  use

prstat -amL
(or if you are dumping it in a file prstat -amLc > prstat.txt)

and find the pids with high user cpu time  and then use the usrcall.d on 
few of those pids.


Also how many database connections do you have and what's the type of 
query run by each connection?


-Jignesh



Arjen van der Meijden wrote:

Hi Jignesh,

The settings from that 'special T2000 dvd' differed from the 
recommended settings on the website you provided. But I don't see much 
difference in performance with any of the adjustments, it appears to 
be more or less the same.


Here are a few iostat lines by the way:

sd0   sd1   sd2   nfs1   cpu
kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
  7   1   12  958  50   350   070   00   13  1  0 85
  0   00  2353 29630   000   00   92  7  0  1
  0   00  2062 32620   000   00   93  7  0  0
  1   11  1575 35000   000   00   92  7  0  1
  0   00  1628 36200   000   00   92  8  0  1

It appears to be doing a little less kps/tps on sd1 when I restore my 
own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 
2k work mem).


Is it possible to trace the stack's for semsys, like the 
memcpy-traces, or are those of no interest here?


Best regards,

Arjen


On 16-5-2006 17:52, Jignesh K. Shah wrote:


Hi Arjen,

Can you send me my colleagues's names in a private email?

One of the drawbacks of the syscall.d script is relative timings and 
hence if system CPU usage is very low, it gives the relative 
weightage about what portion in that low is associated with what 
call.. So even if you have say..1% system time.. it says that most of 
it was IO related or semsys related. So iostat output with -c option 
to include CPU times helps to put it in  the right perspective.



Also do check the tunables mentioned and make sure they are set.

Regards,
Jignesh


Arjen van der Meijden wrote:


Hi Jignesh,

Jignesh K. Shah wrote:


Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too 
many connections and they are contending to get a lock.. which is 
potentially the WAL log lock


* llseek is high which means you can obviously gain a bit with the 
right file system/files tuning by caching them right.


Have you set the values for Solaris for T2000 tuned for Postgresql?




Not particularly, we got a "special T2000 Solaris dvd" from your 
colleagues here in the Netherlands and installed that (actually one 
of your colleagues did). Doing so all the "better default" 
/etc/system-settings are supposed to be set. I haven't really 
checked that they are, since two of your colleagues have been 
working on it for the mysql-version of the benchmark and I assumed 
they'd have verified that.



Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

Try specially the /etc/system and postgresql.conf changes  and see 
if it changes/improves your performance.




I will see that those tunables are verified to be set.

I am a bit surprised though about your remarks, since they'd point 
at the I/O being in the way? But we only have about 600k/sec i/o 
according to vmstat. The database easily fits in memory.
In total I logged about 500k queries of which only 70k where 
altering queries, of which almost all where inserts in log-tables 
which aren't actively read in this benchmark.


But I'll give it a try.

Best regards,

Arjen



Arjen van der Meijden wrote:


Hi List,

In the past few weeks we have been developing a read-heavy 
mysql-benchmark to have an alternative take at 
cpu/platform-performance. Not really to have a look at how fast 
mysql can be.


This benchmark runs on 

Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-17 Thread Jignesh K. Shah
I have seen MemoryContextSwitchTo taking time before.. However I am not 
sure why would it take so much CPU time?

Maybe that function does not work efficiently on Solaris?

Also I donot have much idea about slot_getattr.

Anybody else? (Other option is to use "collect -p $pid" experiments to 
gather the data to figure out what instruction is causing the high CPU 
usage) Maybe the Sun engineers out there can help out


-Jignesh


Arjen van der Meijden wrote:
We have the 4 core machine. However, these numbers are taken during a 
benchmark, not normal work load. So the output should display the system 
being working fully ;)


So its postgres doing a lot of work and you already had a look at the 
usrcall for that.


The benchmark just tries to do the queries for "random page visits". 
This totals up to about some 80 different queries being executed with 
mostly random parameters. The workload is generated using php so there 
are no connection pools, nor prepared statements.


The queries vary, but are all relatively lightweight queries with less 
than 6 or 7 joinable tables. Almost all queries can use indexes. Most 
tables are under a few MB of data, although there are a few larger than 
that. Most records are relatively small, consisting of mostly numbers 
(id's and such).


The results presented here was with 25 concurrent connections.

Best regards,

Arjen


Jignesh K. Shah wrote:


You usertime is way too high for T2000...

If you have a 6 core machine with 24 threads, it says all 24 threads 
are reported as being busy with iostat output.


Best way to debug this is  use

prstat -amL
(or if you are dumping it in a file prstat -amLc > prstat.txt)

and find the pids with high user cpu time  and then use the usrcall.d 
on few of those pids.


Also how many database connections do you have and what's the type of 
query run by each connection?


-Jignesh



Arjen van der Meijden wrote:


Hi Jignesh,

The settings from that 'special T2000 dvd' differed from the 
recommended settings on the website you provided. But I don't see 
much difference in performance with any of the adjustments, it 
appears to be more or less the same.


Here are a few iostat lines by the way:

sd0   sd1   sd2   nfs1   cpu
kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
  7   1   12  958  50   350   070   00   13  1  0 85
  0   00  2353 29630   000   00   92  7  0  1
  0   00  2062 32620   000   00   93  7  0  0
  1   11  1575 35000   000   00   92  7  0  1
  0   00  1628 36200   000   00   92  8  0  1

It appears to be doing a little less kps/tps on sd1 when I restore my 
own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 
2k work mem).


Is it possible to trace the stack's for semsys, like the 
memcpy-traces, or are those of no interest here?


Best regards,

Arjen


On 16-5-2006 17:52, Jignesh K. Shah wrote:


Hi Arjen,

Can you send me my colleagues's names in a private email?

One of the drawbacks of the syscall.d script is relative timings and 
hence if system CPU usage is very low, it gives the relative 
weightage about what portion in that low is associated with what 
call.. So even if you have say..1% system time.. it says that most 
of it was IO related or semsys related. So iostat output with -c 
option to include CPU times helps to put it in  the right perspective.



Also do check the tunables mentioned and make sure they are set.

Regards,
Jignesh


Arjen van der Meijden wrote:


Hi Jignesh,

Jignesh K. Shah wrote:


Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too 
many connections and they are contending to get a lock.. which is 
potentially the WAL log lock


* llseek is high which means you can obviously gain a bit with the 
right file system/files tuning by caching them right.


Have you set the values for Solaris for T2000 tuned for Postgresql?





Not particularly, we got a "special T2000 Solaris dvd" from your 
colleagues here in the Netherlands and installed that (actually one 
of your colleagues did). Doing so all the "better default" 
/etc/system-settings are supposed to be set. I haven't really 
checked that they are, since two of your colleagues have been 
working on it for the mysql-version of the benchmark and I assumed 
they'd have verified that.



Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp 



Try specially the /etc/system and postgresql.conf changes  and see 
if it changes/improves your performance.





I will see that those tunables are verified to be set.

I am a bit surprised though about your remarks, since they'd point 
at the I/O being in the way? But we only have about 600k/sec i/o 
according to vmstat. The database easily fits in memory.
In total 

[PERFORM] SQL CPU time usage

2006-05-17 Thread Ruben Rubio Rey

Hi,

I have a web page, that executes several SQLs.

So, I would like to know witch one of those SQLs consumes more CPU.
For example,
I have SQL1 that is executed in 1.2 secs and a SQL2 that is executed in 
200 ms.


But SQL2 is executed 25 times and SQL1 is executed 1 time, so really 
SQL2 consumes more CPU time.


Is there any way to know this?
I have think that logging all SQLs and then cheking it is a way to do it 
... any other idea?


Thanks in advance

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] IMMUTABLE?

2006-05-17 Thread Jim C. Nasby
On Tue, May 16, 2006 at 07:08:51PM -0700, David Wheeler wrote:
> On May 16, 2006, at 18:29, Christopher Kings-Lynne wrote:
> 
> >>Yes, but there are definitely programming cases where memoization/ 
> >>caching definitely helps. And it's easy to tell for a given  
> >>function whether or not it really helps by simply trying it with  
> >>CACHED and without.
> >>Would this be a simple thing to implement?
> >
> >It's called a "table" :)
> 
>   http://www.justatheory.com/computers/databases/postgresql/ 
> higher_order_plpgsql.html
> 
> Yes, I know. :-P But it'd be easier to have a CACHED keyword, of course.

Rather than worrying about a generic form of memoization, what would be
extremely valuable would be to improve detection of the same function
being used multiple times in a query, ie:

SELECT moo(x), moo(x)/2 FROM table;

AFAIK PostgreSQL will currently execute moo(x) twice. Even if it knows
how to optimize this brain-dead example, I think there are other
examples it can't optimize right now. Having a much simpler memoization
scheme that only works on a tuple-by-tuple basis would probably
eliminate a lot of those (It wouldn't work for any executor node that
has to read it's entire input before returning anything, though, such as
sort).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Optimizer: limit not taken into account

2006-05-17 Thread Craig A. James

Here's a "corner case" that might interest someone.  It tripped up one of our 
programmers.

We have a table with > 10 million rows.  The ID column is indexed, the table 
has been vacuum/analyzed.  Compare these two queries:

  select * from tbl where id >= 1000 limit 1;
  select * from tbl where id >= 1000 order by id limit 1;

The first takes 4 seconds, and uses a full table scan.  The second takes 32 msec and uses the index.  
Details are below.


I understand why the planner makes the choices it does -- the "id > 1000" isn't very 
selective and under normal circumstances a full table scan is probably the right choice.  But the 
"limit 1" apparently doesn't alter the planner's strategy at all.  We were surprised by this.

Adding the "order by" was a simple solution.

Craig



pg=> explain analyze select url, url_digest from url_queue where priority >= 
1000 limit 1;
  QUERY PLAN 
--

Limit  (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117 
rows=1 loops=1)
  ->  Seq Scan on url_queue  (cost=0.00..391254.35 rows=606176 width=108) 
(actual time=4036.101..4036.101 rows=1 loops=1)
Filter: (priority >= 1000)
Total runtime: 4036.200 ms
(4 rows)

pg=> explain analyze select url, url_digest from url_queue where priority >= 
1000 order by priority limit 1;
  QUERY PLAN 
--

Limit  (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448 rows=1 
loops=1)
  ->  Index Scan using url_queue_priority on url_queue  (cost=0.00..1440200.41 
rows=606176 width=112) (actual time=32.434..32.434 rows=1 loops=1)
Index Cond: (priority >= 1000)
Total runtime: 32.566 ms

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] SQL CPU time usage

2006-05-17 Thread john_oshea

On 17 May 2006, at 16:21, Ruben Rubio Rey wrote:


I have a web page, that executes several SQLs.

So, I would like to know witch one of those SQLs consumes more CPU.
For example,
I have SQL1 that is executed in 1.2 secs and a SQL2 that is  
executed in 200 ms.


But SQL2 is executed 25 times and SQL1 is executed 1 time, so  
really SQL2 consumes more CPU time.


Is there any way to know this?
I have think that logging all SQLs and then cheking it is a way to  
do it ... any other idea?


Practical Query Analysis:  does  
exactly that (scan historic logs). Very nice indeed and more than  
worth the money (it's BSD-licensed)


--
John O'Shea
Wordbank Limited
33 Charlotte Street, London W1T 1RR
Direct line: +44 (0) 20 7903 8829
Fax: +44 (0) 20 7903 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Optimizer: limit not taken into account

2006-05-17 Thread Bruno Wolff III
Please don't reply to previous messages to start new threads. This makes it
harder to find stuff in the archives and may keep people from noticing your
message.

On Wed, May 17, 2006 at 08:54:52 -0700,
  "Craig A. James" <[EMAIL PROTECTED]> wrote:
> Here's a "corner case" that might interest someone.  It tripped up one of 
> our programmers.
> 
> We have a table with > 10 million rows.  The ID column is indexed, the 
> table has been vacuum/analyzed.  Compare these two queries:
> 
>   select * from tbl where id >= 1000 limit 1;
>   select * from tbl where id >= 1000 order by id limit 1;
> 
> The first takes 4 seconds, and uses a full table scan.  The second takes 32 
> msec and uses the index.  Details are below.

I suspect it wasn't intended to be a full table scan. But rather a sequential
scan until it found a matching row. If the data in the table is ordered by
by id, this strategy may not work out well. Where as if the data is randomly
ordered, it would be expected to find a match quickly.

Have you analyzed the table recently? If the planner has bad stats on the
table, that is going to make it more likely to choose a bad plan.


> I understand why the planner makes the choices it does -- the "id > 
> 1000" isn't very selective and under normal circumstances a full table 
> scan is probably the right choice.  But the "limit 1" apparently doesn't 
> alter the planner's strategy at all.  We were surprised by this.
> 
> Adding the "order by" was a simple solution.
> 
> Craig
> 
> 
> 
> pg=> explain analyze select url, url_digest from url_queue where priority 
> >= 1000 limit 1;
>   QUERY PLAN
> --
> Limit  (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117 
> rows=1 loops=1)
>   ->  Seq Scan on url_queue  (cost=0.00..391254.35 rows=606176 width=108) 
>   (actual time=4036.101..4036.101 rows=1 loops=1)
> Filter: (priority >= 1000)
> Total runtime: 4036.200 ms
> (4 rows)
> 
> pg=> explain analyze select url, url_digest from url_queue where priority 
> >= 1000 order by priority limit 1;
>   QUERY PLAN
> --
> Limit  (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448 
> rows=1 loops=1)
>   ->  Index Scan using url_queue_priority on url_queue  
>   (cost=0.00..1440200.41 rows=606176 width=112) (actual time=32.434..32.434 
>   rows=1 loops=1)
> Index Cond: (priority >= 1000)
> Total runtime: 32.566 ms
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Performance/Maintenance test result collection

2006-05-17 Thread Chris Mckenzie
Title: Performance/Maintenance test result collection





Hi.


I'm trying to plan for a performance test session where a large database is subject to regular hits from my application while both regular and full database maintenance is being performed. The idea is to gain a better idea on the impact maintenance will have on regular usage, and when to reasonably schedule both regular and full maintenance.

Is the verbose option for the VACUUM command and physical disk space usage enough? What's a good rule of thumb for verifying that the space supposedly recovered from a FULL vacuum is real? I can turn on verbose for a FULL vacuum and watch the "Total free space (including removable row versions) is 7032 bytes." details, but can it be reasonably correlated against disk linux system tools? (like du) Or only as a guidance that some maintenance is being performed?

Any other stat collection points I should be watching?


Here's an example lazy vacuum verbose output from an empty schema table: (not that you guys haven't seen this stuff enough)

VACUUM VERBOSE app.queuedemails;
INFO:  vacuuming "app.queuedemails"
INFO:  index "queuedemails1" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.


INFO:  index "queuedemails2" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.


INFO:  "queuedemails": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.


INFO:  vacuuming "pg_toast.pg_toast_17595"
INFO:  index "pg_toast_17595_index" now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.


INFO:  "pg_toast_17595": found 0 removable, 0 nonremovable row versions in 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.


THANKS!!!


- Chris
 





Re: [PERFORM] Optimizer: limit not taken into account

2006-05-17 Thread Simon Riggs
On Wed, 2006-05-17 at 08:54 -0700, Craig A. James wrote:
> Here's a "corner case" that might interest someone.  It tripped up one of our 
> programmers.
> 
> We have a table with > 10 million rows.  The ID column is indexed, the table 
> has been vacuum/analyzed.  Compare these two queries:
> 
>select * from tbl where id >= 1000 limit 1;
>select * from tbl where id >= 1000 order by id limit 1;
> 
> The first takes 4 seconds, and uses a full table scan.  The second takes 32 
> msec and uses the index.  
> Details are below.

The rows are not randomly distributed, so the SeqScan takes longer to
find 1 row than the index scan.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Optimizer: limit not taken into account

2006-05-17 Thread Tom Lane
Bruno Wolff III <[EMAIL PROTECTED]> writes:
> I suspect it wasn't intended to be a full table scan. But rather a sequential
> scan until it found a matching row. If the data in the table is ordered by
> by id, this strategy may not work out well. Where as if the data is randomly
> ordered, it would be expected to find a match quickly.

Right.  You can see from the differential in the estimates for the
SeqScan and the Limit nodes that the planner is not expecting the
seqscan to run to completion, but rather to find a matching row quite
quickly.

There is not anything in there that considers whether the table's
physical order is so nonrandom that the search will take much longer
than it would given uniform distribution.  It might be possible to do
something with the correlation statistic in simple cases ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Optimizer: limit not taken into account

2006-05-17 Thread Craig A. James

Tom Lane wrote:

There is not anything in there that considers whether the table's
physical order is so nonrandom that the search will take much longer
than it would given uniform distribution.  It might be possible to do
something with the correlation statistic in simple cases ...


In this case, the rows are not random at all, in fact they're inserted from a sequence, then rows are deleted as they are processed.  If the planner is hoping for random physical distribution, this particular case is exactly wrong. 


Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Speed Up Offset and Limit Clause

2006-05-17 Thread Jim C. Nasby
On Tue, May 16, 2006 at 07:20:12PM -0700, Craig A. James wrote:
> >Why I want to use offset and limit is for me to create a threaded
> >application so that they will not get the same results.
> 
> In order to return rows 1 to 15000, it must select all rows from zero 
> to 15000 and then discard the first 1 -- probably not what you were 
> hoping for.
> 
> You might add a "thread" column.  Say you want to run ten threads:

Another possibility is partitioning the table. If you do that using
inheritance-based partitioning, you could just select directly from
different partition tables, which probably be even faster than using a
single table. The downside is it's more work to setup.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Optimizer: limit not taken into account

2006-05-17 Thread Jim C. Nasby
On Wed, May 17, 2006 at 08:54:52AM -0700, Craig A. James wrote:
> Here's a "corner case" that might interest someone.  It tripped up one of 
> our programmers.
> 
> We have a table with > 10 million rows.  The ID column is indexed, the 
> table has been vacuum/analyzed.  Compare these two queries:
> 
>   select * from tbl where id >= 1000 limit 1;
>   select * from tbl where id >= 1000 order by id limit 1;
> 
> The first takes 4 seconds, and uses a full table scan.  The second takes 32 
> msec and uses the index.  Details are below.
> 
> I understand why the planner makes the choices it does -- the "id > 
> 1000" isn't very selective and under normal circumstances a full table 
> scan is probably the right choice.  But the "limit 1" apparently doesn't 
> alter the planner's strategy at all.  We were surprised by this.

Is it really not very selective? If there's 1000 rows in the table,
and id starts at 1 with very few gaps, then >= 1000 should actually
be very selective...

Also, I hope you understand there's a big difference between a limit
query that does and doesn't have an order by.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance/Maintenance test result collection

2006-05-17 Thread Jim C. Nasby
On Wed, May 17, 2006 at 01:50:22PM -0400, Chris Mckenzie wrote:
> Hi.
> 
> I'm trying to plan for a performance test session where a large database is
> subject to regular hits from my application while both regular and full
> database maintenance is being performed. The idea is to gain a better idea
> on the impact maintenance will have on regular usage, and when to reasonably
> schedule both regular and full maintenance.

What do you mean by "regular and full maintenance"? Do you mean VACUUM
FULL?

If you're vacuuming appropriately you shouldn't have any need to ever
VACUUM FULL...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Lot'sa joins - performance tip-up, please?

2006-05-17 Thread Mario Splivalo
On Wed, 2006-05-10 at 17:10 -0500, Jim C. Nasby wrote:
> On Thu, May 04, 2006 at 04:45:57PM +0200, Mario Splivalo wrote:
> Well, here's the problem...
> 
> >  ->  Nested Loop  (cost=0.00..176144.30 rows=57925 width=26)
> > (actual time=1074.984..992536.243 rows=57925 loops=1)
> >->  Seq Scan on ticketing_codes_played
> > (cost=0.00..863.25 rows=57925 width=8) (actual time=74.479..2047.993
> > rows=57925 loops=1)
> >->  Index Scan using ticketing_codes_pk on
> > ticketing_codes  (cost=0.00..3.01 rows=1 width=18) (actual
> > time=17.044..17.052 rows=1 loops=57925)
> >  Index Cond: (ticketing_codes.code_id =
> > "outer".code_id)
> 
> Anyone have any idea why on earth it's doing that instead of a hash or
> merge join?
> 
> In any case, try swapping the order of ticketing_codes_played and
> ticketing_codes. Actually, that'd probably make it worse.

I tried that, no luck. The best performance I achieve with creating
temporary table. And...

> 
> Try SET enable_nestloop = off;

This helps also. I don't get sequential scans any more. I'd like a tip
on how to set 'enable_nestloop = off' trough JDBC?

Mario
-- 
"I can do it quick, I can do it cheap, I can do it well. Pick any two."

Mario Splivalo
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-17 Thread Zeugswetter Andreas DCP SD

> Something else worth considering is not using the normal 
> catalog methods
> for storing information about temp tables, but hacking that together
> would probably be a rather large task.

But the timings suggest, that it cannot be the catalogs in the worst
case
he showed.

> 0.101 ms BEGIN
> 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER
NOT  
> NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP

1.4 seconds is not great for create table, is that what we expect ?

> 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id
DESC  
> LIMIT 20
> 0.443 ms ANALYZE tmp
> 0.365 ms SELECT * FROM tmp
> 0.310 ms DROP TABLE tmp
> 32.918 ms COMMIT
> 
>   CREATING the table is OK, but what happens on COMMIT ? I hear
the disk  
> seeking frantically.

The 32 seconds for commit can hardly be catalog related. It seems the
file is 
fsynced before it is dropped.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq