Re: [PERFORM] Performance problems with a higher number of clients

2003-12-11 Thread Jeff
On Thu, 11 Dec 2003 04:13:28 +
Alfranio Correia Junior <[EMAIL PROTECTED]> wrote:

>   r  b  w   swpd   free   buff  cache  si  sobibo   incs 
>   us sy  id
>   2 29  1 106716   9576   7000 409876  32 154  5888  1262  616  1575  
>   8 12  80

On linux I've found as soon as it has to swap its oh-so-wonderful VM
brings the machine to a screeching halt.  


> sort_mem = 10240
> 
Here's a big problem

This gives _EACH SORT_ 10MB (No more, no less) to play with. 
10MB * 500 connections == 5000MB in one case..  Some queries may
have more sort steps. It is possible 1 connection could be using
30-40MB of sort_mem. You'll need to bring that value down to prevent
swapping.

If you have a few "common" queries that are run a lot check out hte
explain analyze. You can see about how much sort_mem you'll need. Look
in the sort step. it should tell you the width and the # of rows.
Multiply those. That is sort of how much memory you'll need (I'd round
it up a bit)

If under normal workload your DB is swapping you have problems. You'll
need to either tune your config or get bigger hardware.  You may want to
also consider an OS that deals with that situation a bit better.

good luck.

-- 
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance problems with a higher number of clients

2003-12-11 Thread Tom Lane
Alfranio Correia Junior <[EMAIL PROTECTED]> writes:
> I am facing a problem trying to put 500 concurrent users accessing
> a postgresql instance.

I think you're going to need to buy more RAM.  1Gb of RAM means there
is a maximum of 2Mb available per Postgres process before you start
to go into swap hell --- in practice a lot less, since you have to allow
for other things like the kernel and other applications.

AFAIR TPC-C doesn't involve any complex queries, so it's possible you
could run it with only 1Mb of workspace per process, but not when
you've configured

> sort_mem = 10240

That's ten times more than your configuration can possibly support.
(I don't recall whether TPC-C uses any queries that would sort, so
it's possible this setting isn't affecting you; but if you are doing
any sorts then it's killing you.)

Bottom line is you probably need more RAM.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] hints in Postgres?

2003-12-11 Thread sandra ruiz
Hi list,

I need to know if there is anything like hints of Oracle in 
Postgres..otherwise..I wish to find a way to force a query plan to use the 
indexes or tell the optimizer things like "optimize based in statistics", "I 
want to define the order of the a join" , "optimize based on a execution 
plan that I consider the best" ...

thanks.

_
Las mejores tiendas, los precios mas bajos, entregas en todo el mundo, 
YupiMSN Compras: http://latam.msn.com/compras/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] hints in Postgres?

2003-12-11 Thread Pavel Stehule
hello

maybe

http://www.gtsm.com/oscon2003/toc.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

bye
Pavel


On Thu, 11 Dec 2003, sandra ruiz wrote:

> Hi list,
> 
> I need to know if there is anything like hints of Oracle in 
> Postgres..otherwise..I wish to find a way to force a query plan to use the 
> indexes or tell the optimizer things like "optimize based in statistics", "I 
> want to define the order of the a join" , "optimize based on a execution 
> plan that I consider the best" ...
> 
> thanks.
> 
> _
> Las mejores tiendas, los precios mas bajos, entregas en todo el mundo, 
> YupiMSN Compras: http://latam.msn.com/compras/
> 
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faqs/FAQ.html
> 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Command

2003-12-11 Thread nbarraza
show

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] hints in Postgres?

2003-12-11 Thread Christopher Browne
Quoth [EMAIL PROTECTED] ("sandra ruiz"):
> I need to know if there is anything like hints of Oracle in
> Postgres..otherwise..I wish to find a way to force a query plan to use
> the indexes or tell the optimizer things like "optimize based in
> statistics", "I want to define the order of the a join" , "optimize
> based on a execution plan that I consider the best" ...

It is commonly considered a MISFEATURE of Oracle that it forces you to
tweak all of those sorts of 'knobs.'

The approach taken with PostgreSQL is to use problems discovered to
try to improve the quality of the query optimizer.  It is usually
clever enough to do a good job, and if it can be improved to
automatically notice that "better" plan, then that is a better thing
than imposing the burden of tuning each query on you.

Tom Lane is "Doctor Optimization," and if you look at past discussion
threads of this sort, you'll see that he tends to rather strongly
oppose the introduction of "hints."
-- 
select 'aa454' || '@' || 'freenet.carleton.ca';
http://www3.sympatico.ca/cbbrowne/linux.html
As of next Monday, COMSAT will be flushed in favor of a string and two tin
cans.  Please update your software.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Hardware suggestions for Linux/PGSQL server

2003-12-11 Thread Jeff Bohmer
Hi everyone,

I want to pick your brains for hardware suggestions about a 
Linux-based PostgreSQL 7.4 server.  It will be a dedicated DB server 
backing our web sites and hit by application servers (which do 
connection pooling).  I've hopefully provided all relevant 
information below.  Any thoughts, comments or suggestions are welcome.

Our current server and database:
Mac OS X Server 10.2.8
single 1.25GHz G4
2 GB 333MHz RAM
7200 rpm SCSI drive for OS, logs
15k rpm SCSI drive for data
PostgreSQL 7.3.4
1 database, 1.1 GB in size, growing by ~15 MB / week
60 tables, 1 schema, largest is 1m rows, 1 at 600k, 3 at 100k
Peak traffic:
500 UPDATEs, INSERTs and DELETEs / minute
6000 SELECTs / minutes
90 connections
Performance is fine most of the time, but not during peak loads. 
We're never swapping and disk IO during the SELECT peaks is hardly 
anything (under 3MB/sec).  I think UPDATE peaks might be saturating 
disk IO.  Normally, most queries finish in under .05 seconds.  Some 
take 2-3 seconds.  During peaks, the fast queries are just OK and the 
slower ones take too long (like over 8 seconds).

We're moving to Linux from OS X for improved stability and more 
hardware options.  We need to do this soon.  The current server is 
max'd out at 2GB RAM and I'm afraid might start swapping in a month.

Projected database/traffic in 12 months:
Database size will be at least 2.5 GB
Largest table still 1m rows, but 100k tables will grow to 250k
Will be replicated to a suitable standby slave machine
Peak traffic:
2k UPDATEs, INSERTs, DELETEs / minute
20k SELECTs / minute
150 - 200 connections
We're willing to shell out extra bucks to get something that will 
undoubtedly handle the projected peak load in 12 months with 
excellent performance.  But we're not familiar with PG's performance 
on Linux and don't like to waste money.

I've been thinking of this (overkill? not enough?):
2 Intel 32-bit CPUs
Lowest clock speed chip for the fastest available memory bus
4 GB RAM (maybe we only need 3 GB to start with?)
SCSI RAID 1 for OS
For PostgreSQL data and logs ...
15k rpm SCSI disks
RAID 5, 7 disks, 256MB battery-backed write cache
(Should we save $ and get a 4-disk RAID 10 array?)
I wonder about the 32bit+bigmem vs. 64bit question.  At what database 
size will we need more than 4GB RAM?

We'd like to always have enough RAM to cache the entire database. 
While 64bit is in our long-term future, we're willing to stick with 
32bit Linux until 64bit Linux on Itanium/Opteron and 64bit PostgreSQL 
"settle in" to proven production-quality.

TIA,
- Jeff
--

Jeff Bohmer
VisionLink, Inc.
_
303.402.0170
www.visionlink.org
_
People. Tools. Change. Community.
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] hints in Postgres?

2003-12-11 Thread Bruno Wolff III
On Thu, Dec 11, 2003 at 11:00:19 -0500,
  sandra ruiz <[EMAIL PROTECTED]> wrote:
> Hi list,
> 
> I need to know if there is anything like hints of Oracle in 
> Postgres..otherwise..I wish to find a way to force a query plan to use the 
> indexes or tell the optimizer things like "optimize based in statistics", 
> "I want to define the order of the a join" , "optimize based on a execution 
> plan that I consider the best" ...

There are a few things you can do.

You can explicitly fix the join order using INNER JOIN (in 7.4 you have to set
a GUC variable for this to force join order).

You can disable specific plan types (though sequential just becomes very
expensive as sometimes there is no other way to do things).

You can set tuning values to properly express the relative cost of things
like CPU time, sequential disk reads and random disk reads.

These are done by setting GUC variables either in the postgres config
file or using SET commands. They are per backend so some queries can
be done using one set of values while others going on at the same time
use different values.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Hardware suggestions for Linux/PGSQL server

2003-12-11 Thread William Yu
Jeff Bohmer wrote:
We're willing to shell out extra bucks to get something that will 
undoubtedly handle the projected peak load in 12 months with excellent 
performance.  But we're not familiar with PG's performance on Linux and 
don't like to waste money.
Properly tuned, PG on Linux runs really nice. A few people have 
mentioned the VM swapping algorithm on Linux is semi-dumb. I get around 
that problem by having a ton of memory and almost no swap.

I've been thinking of this (overkill? not enough?):
2 Intel 32-bit CPUs
Lowest clock speed chip for the fastest available memory bus
4 GB RAM (maybe we only need 3 GB to start with?)
SCSI RAID 1 for OS
For PostgreSQL data and logs ...
15k rpm SCSI disks
RAID 5, 7 disks, 256MB battery-backed write cache
(Should we save $ and get a 4-disk RAID 10 array?)
I wonder about the 32bit+bigmem vs. 64bit question.  At what database 
size will we need more than 4GB RAM?
With 4GB of RAM, you're already running into bigmem. By default, Linux 
gives 2GB of address space to programs and 2GB to kernel. I usually see 
people quote 5%-15% penalty in general for using PAE versus a flat 
address space. I've seen simple MySQL benchmarks where 64-bit versions 
run 35%+ faster versus 32-bit+PAE but how that translates to PG, I dunno 
yet.

We'd like to always have enough RAM to cache the entire database. While 
64bit is in our long-term future, we're willing to stick with 32bit 
Linux until 64bit Linux on Itanium/Opteron and 64bit PostgreSQL "settle 
in" to proven production-quality.
Well if this is the case, you probably should get an Opteron server 
*now* and just run 32-bit Linux on it until you're sure about the 
software. No point in buying a Xeon and then throwing the machine away 
in a year when you decide you need 64-bit for more speed.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Hardware suggestions for Linux/PGSQL server

2003-12-11 Thread Jeff Bohmer

Properly tuned, PG on Linux runs really nice. A few people have 
mentioned the VM swapping algorithm on Linux is semi-dumb. I get 
around that problem by having a ton of memory and almost no swap.
I think we want your approach: enough RAM to avoid swapping altogether.



With 4GB of RAM, you're already running into bigmem. By default, 
Linux gives 2GB of address space to programs and 2GB to kernel.
It seems I don't fully understand the bigmem situation.  I've 
searched the archives, googled, checked RedHat's docs, etc.  But I'm 
getting conflicting, incomplete and/or out of date information.  Does 
anyone have pointers to bigmem info or configuration for the 2.4 
kernel?

If Linux is setup with 2GB for kernel and 2GB for user, would that be 
OK with a DB size of 2-2.5 GB?  I'm figuring the kernel will cache 
most/all of the DB in it's 2GB and there's 2GB left for PG processes. 
Where does PG's SHM buffers live, kernel or user?  (I don't plan on 
going crazy with buffers, but will guess we'd need about 128MB, 256MB 
at most.)



I usually see people quote 5%-15% penalty in general for using PAE 
versus a flat address space. I've seen simple MySQL benchmarks where 
64-bit versions run 35%+ faster versus 32-bit+PAE but how that 
translates to PG, I dunno yet.

We'd like to always have enough RAM to cache the entire database. 
While 64bit is in our long-term future, we're willing to stick with 
32bit Linux until 64bit Linux on Itanium/Opteron and 64bit 
PostgreSQL "settle in" to proven production-quality.
Well if this is the case, you probably should get an Opteron server 
*now* and just run 32-bit Linux on it until you're sure about the 
software. No point in buying a Xeon and then throwing the machine 
away in a year when you decide you need 64-bit for more speed.
That's a good point.  I had forgotten about the option to run 32bit 
on an Operton.  If we had 3GB or 4GB initially on an Opteron, we'd 
need bigmem for 32bit Linux, right?

This might work nicely since we'd factor in the penalty from PAE for 
now and have the performance boost from moving to 64bit available on 
demand.  Not having to build another DB server in a year would also 
be nice.

FYI, we need stability first and performance second.

Thank you,
- Jeff
--

Jeff Bohmer
VisionLink, Inc.
_
303.402.0170
www.visionlink.org
_
People. Tools. Change. Community.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware suggestions for Linux/PGSQL server

2003-12-11 Thread scott.marlowe

Just one more piece of advice, you might want to look into a good battery 
backed cache hardware RAID controller.  They work quite well for heavily 
updated databases.  The more drives you throw at the RAID array the faster 
it will be.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Hardware suggestions for Linux/PGSQL server

2003-12-11 Thread William Yu
Jeff Bohmer wrote:
It seems I don't fully understand the bigmem situation.  I've searched 
the archives, googled, checked RedHat's docs, etc.  But I'm getting 
conflicting, incomplete and/or out of date information.  Does anyone 
have pointers to bigmem info or configuration for the 2.4 kernel?
Bigmem is the name for Linux's PAE support.

If Linux is setup with 2GB for kernel and 2GB for user, would that be OK 
with a DB size of 2-2.5 GB?  I'm figuring the kernel will cache most/all 
of the DB in it's 2GB and there's 2GB left for PG processes. Where does 
PG's SHM buffers live, kernel or user?  (I don't plan on going crazy 
with buffers, but will guess we'd need about 128MB, 256MB at most.)
PG's SHM buffers live in user. Whether Linux's OS caches lives in user 
or kernel, I think it's in kernel and I remember reading a max of ~950KB 
w/o bigmem which means your 3.5GB of available OS memory will definitely 
have to be swapped in and out of kernel space using PAE.

Well if this is the case, you probably should get an Opteron server 
*now* and just run 32-bit Linux on it until you're sure about the 
software. No point in buying a Xeon and then throwing the machine away 
in a year when you decide you need 64-bit for more speed.
That's a good point.  I had forgotten about the option to run 32bit on 
an Operton.  If we had 3GB or 4GB initially on an Opteron, we'd need 
bigmem for 32bit Linux, right?

This might work nicely since we'd factor in the penalty from PAE for now 
and have the performance boost from moving to 64bit available on 
demand.  Not having to build another DB server in a year would also be 
nice.

FYI, we need stability first and performance second.
We ordered a 2x Opteron server the moment the CPU was released and it's 
been perfect -- except for one incident where the PCI riser card had 
drifted out of the PCI slot due to the heavy SCSI cables connected to 
the card.

I think most of the Opteron server MBs are pretty solid but you want 
extra peace-of-mind, you could get a server from Newisys as they pack in 
a cartload of extra monitoring features.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Measuring execution time for sql called from PL/pgSQL

2003-12-11 Thread Aram Kananov
Hi,

I've got very slow insert performance on some 
table which has trigger based on complex PL/pgSQL function.
Apparently insert is slow due some slow sql inside that function,
since CPU load is very high and disk usage is low during insert.
I run Red Hat 9
Anthlon 2.6
1GB ram
Fast IDE Disk

Setting following in postgres.conf apparently doesn't help:
log_statement = true
log_duration = true
since it logs only sql issued by client. It logs only once 
per session the sql text but during call to the PL/pgSQL function,
but of course no duration. 

Due the complexity of PL/pgSQL function trying to step by step 
see the execution plans is very time consuming. 

Q1) Is there any way to see which statements are called for PL/pgSQL
and their duration?

I've tried to measure the duration of sql with printing out
"localtimestamp"  but for some reason during the same pg/plsql call it
returns the same 
value:

Example:
Following gets and prints out the localtimestamp value in the loop
create or replace function foobar()
  returns integer as '
  declare 
v timestamp;
  begin 
loop
select localtimestamp into v;
raise notice ''Timestamp: %'', v;
end loop;
return null;
  end; ' language 'plpgsql'
;

and as result of "select foobar();" 

i constantly get the same value:
NOTICE:  Timestamp: 2003-12-12 01:51:35.768053
NOTICE:  Timestamp: 2003-12-12 01:51:35.768053
NOTICE:  Timestamp: 2003-12-12 01:51:35.768053
NOTICE:  Timestamp: 2003-12-12 01:51:35.768053
NOTICE:  Timestamp: 2003-12-12 01:51:35.768053

Q2) what i do wrong here and what is the "Proper Way" to measure
execution time of sql called inside PG/plSQL.

Thanks in advance 

WBR
--
Aram




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Performance problems with a higher number of clients

2003-12-11 Thread Alfranio Tavares Correia Junior
Thanks for the advices,
The performance is a bit better now. Unfortunately, the machine does not 
allow
to put more than 200 - ~250 users without noticing swap hell.
I have to face the fact that I don't have enough memory

I used the following configuration:

effective_cache_size = 65000 
shared_buffers = 1 
random_page_cost = 2  
cpu_index_tuple_cost = 0.0005 
sort_mem = 512- I tested each query to see the amount of space 
required to sort as Jeff suggested --> nothing above this value

I tested the system with 100, 200, 300, 400, 500 and finally 250 users.
Until ~250 users the system presents good response time and the swap 
almost does not exist.
During these expirements, I also started psql and tried to run some 
queries.
Unfortunately, even with ~250 users there is one query that takes too 
long to finish...
In fact, I canceled its execution after 5 minutes waiting to see anything.

This is the query:

select count(distinct(s_i_id))
  from stock, order_line
  where ol_w_id = _xx_ and
  ol_d_id = _xx_ and
  ol_o_id between _xx_ and
  _xx_ and
  s_w_id = ol_w_id and
  s_i_id = ol_i_id and
  s_quantity < _xx_;
When the system has no load, after a vacuum -f, I can execute the query 
and the plan produced is presented as follows:
Aggregate  (cost=49782.16..49782.16 rows=1 width=4) (actual 
time=52361.573..52361.574 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..49780.24 rows=768 width=4) (actual 
time=101.554..52328.913 rows=952 loops=1)
->  Index Scan using pk_order_line on order_line o  
(cost=0.00..15779.32 rows=8432 width=4) (actual time=84.352..151.345 
rows=8964 loops=1)
  Index Cond: ((ol_w_id = 4) AND (ol_d_id = 4) AND (ol_o_id 
>= 100) AND (ol_o_id <= 1000))
->  Index Scan using pk_stock on stock  (cost=0.00..4.02 rows=1 
width=4) (actual time=5.814..5.814 rows=0 loops=8964)
  Index Cond: ((stock.s_w_id = 4) AND (stock.s_i_id = 
"outer".ol_i_id))
  Filter: (s_quantity < 20)
Total runtime: 52403.673 ms
(8 rows)

The talbes are designed as follows:

--ROWS ~500
CREATE TABLE stock (
   s_i_id int NOT NULL ,
   s_w_id int NOT NULL ,
   s_quantity int NULL ,
   s_dist_01 char (24) NULL ,
   s_dist_02 char (24) NULL ,
   s_dist_03 char (24) NULL ,
   s_dist_04 char (24) NULL ,
   s_dist_05 char (24) NULL ,
   s_dist_06 char (24) NULL ,
   s_dist_07 char (24) NULL ,
   s_dist_08 char (24) NULL ,
   s_dist_09 char (24) NULL ,
   s_dist_10 char (24) NULL ,
   s_ytd int NULL ,
   s_order_cnt int NULL ,
   s_remote_cnt int NULL ,
   s_data char (50) NULL
);
--ROWS ~15196318
CREATE TABLE order_line (
   ol_o_id int NOT NULL ,
   ol_d_id int NOT NULL ,
   ol_w_id int NOT NULL ,
   ol_number int NOT NULL ,
   ol_i_id int NULL ,
   ol_supply_w_id int NULL ,
   ol_delivery_d timestamp NULL ,
   ol_quantity int NULL ,
   ol_amount numeric(6, 2) NULL ,
   ol_dist_info char (24) NULL
);
ALTER TABLE stock ADD
CONSTRAINT PK_stock PRIMARY KEY
   (
   s_w_id,
   s_i_id
   );
ALTER TABLE order_line  ADD
   CONSTRAINT PK_order_line PRIMARY KEY
   (
   ol_w_id,
   ol_d_id,
   ol_o_id,
   ol_number
   );
CREATE  INDEX IX_order_line ON order_line(ol_i_id);
Any suggestion ?

Tom Lane wrote:

Alfranio Correia Junior <[EMAIL PROTECTED]> writes:
 

I am facing a problem trying to put 500 concurrent users accessing
a postgresql instance.
   

I think you're going to need to buy more RAM.  1Gb of RAM means there
is a maximum of 2Mb available per Postgres process before you start
to go into swap hell --- in practice a lot less, since you have to allow
for other things like the kernel and other applications.
AFAIR TPC-C doesn't involve any complex queries, so it's possible you
could run it with only 1Mb of workspace per process, but not when
you've configured
 

sort_mem = 10240
   

That's ten times more than your configuration can possibly support.
(I don't recall whether TPC-C uses any queries that would sort, so
it's possible this setting isn't affecting you; but if you are doing
any sorts then it's killing you.)
Bottom line is you probably need more RAM.

			regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html
 



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Hardware suggestions for Linux/PGSQL server

2003-12-11 Thread Shridhar Daithankar
Jeff Bohmer wrote:
Well if this is the case, you probably should get an Opteron server 
*now* and just run 32-bit Linux on it until you're sure about the 
software. No point in buying a Xeon and then throwing the machine away 
in a year when you decide you need 64-bit for more speed.


That's a good point.  I had forgotten about the option to run 32bit on 
an Operton.  If we had 3GB or 4GB initially on an Opteron, we'd need 
bigmem for 32bit Linux, right?

This might work nicely since we'd factor in the penalty from PAE for now 
and have the performance boost from moving to 64bit available on 
demand.  Not having to build another DB server in a year would also be 
nice.
FWIW, there are only two pieces of software that need 64bit aware for a typical 
server job. Kernel and glibc. Rest of the apps can do fine as 32 bits unless you 
are oracle and insist on outsmarting OS.

In fact running 32 bit apps on 64 bit OS has plenty of advantages like 
effectively using the cache. Unless you need 64bit, going for 64bit software is 
not advised.

 Shridhar

--
-
Shridhar Daithankar
LIMS CPE Team Member, PSPL.
mailto:[EMAIL PROTECTED]
Phone:- +91-20-5676700 Extn.270
Fax  :- +91-20-5676701
-
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] fsync method checking

2003-12-11 Thread Bruce Momjian
Mark Kirkwood wrote:
> This is a well-worn thread title - apologies, but these results seemed 
> interesting, and hopefully useful in the quest to get better performance 
> on Solaris:
> 
> I was curious to see if the rather uninspiring pgbench performance 
> obtained from a Sun 280R (see General: ATA Disks and RAID controllers 
> for database servers) could be improved if more time was spent 
> tuning.
> 
> With the help of a fellow workmate who is a bit of a Solaris guy, we 
> decided to have a go.
> 
> The major performance killer appeared to be mounting the filesystem with 
> the logging option. The next most significant seemed to be the choice of 
> sync_method for Pg - the default (open_datasync), which we initially 
> thought should be the best - appears noticeably slower than fdatasync.

I thought the default was fdatasync, but looking at the code it seems
the default is open_datasync if O_DSYNC is available.

I assume the logic is that we usually do only one write() before
fsync(), so open_datasync should be faster.  Why do we not use O_FSYNC
over fsync().

Looking at the code:

#if defined(O_SYNC)
#define OPEN_SYNC_FLAG O_SYNC
#else
#if defined(O_FSYNC)
#define OPEN_SYNC_FLAGO_FSYNC
#endif
#endif

#if defined(OPEN_SYNC_FLAG)
#if defined(O_DSYNC) && (O_DSYNC != OPEN_SYNC_FLAG)
#define OPEN_DATASYNC_FLAGO_DSYNC
#endif
#endif

#if defined(OPEN_DATASYNC_FLAG)
#define DEFAULT_SYNC_METHOD_STR"open_datasync"
#define DEFAULT_SYNC_METHODSYNC_METHOD_OPEN
#define DEFAULT_SYNC_FLAGBIT   OPEN_DATASYNC_FLAG
#else
#if defined(HAVE_FDATASYNC)
#define DEFAULT_SYNC_METHOD_STR   "fdatasync"
#define DEFAULT_SYNC_METHOD   SYNC_METHOD_FDATASYNC
#define DEFAULT_SYNC_FLAGBIT  0
#else
#define DEFAULT_SYNC_METHOD_STR   "fsync"
#define DEFAULT_SYNC_METHOD   SYNC_METHOD_FSYNC
#define DEFAULT_SYNC_FLAGBIT  0
#endif
#endif

I think the problem is that we prefer O_DSYNC over fdatasync, but do not
prefer O_FSYNC over fsync.

Running the attached test program shows on BSD/OS 4.3:

write  0.000360
write & fsync  0.001391
write, close & fsync   0.001308
open o_fsync, write0.000924

showing O_FSYNC faster than fsync().

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
/*
 *  test_fsync.c
 *  tests if fsync can be done from another process than the original write
 */

#include 
#include 
#include 
#include 
#include 
#include 

void die(char *str);
void print_elapse(struct timeval start_t, struct timeval elapse_t);

int main(int argc, char *argv[])
{
struct timeval start_t;
struct timeval elapse_t;
int tmpfile;
char *strout = 
"";

/* write only */
gettimeofday(&start_t, NULL);
if ((tmpfile = open("/var/tmp/test_fsync.out", O_RDWR | O_CREAT)) == -1)
die("can't open /var/tmp/test_fsync.out");
write(tmpfile, &strout, 200);
close(tmpfile); 
gettimeofday(&elapse_t, NULL);
unlink("/var/tmp/test_fsync.out");
printf("write  ");
print_elapse(start_t, elapse_t);
printf("\n");

/* write & fsync */
gettimeofday(&start_t, NULL);
if ((tmpfile = open("/var/tmp/test_fsync.out", O_RDWR | O_CREAT)) == -1)
die("can't open /var/tmp/test_fsync.out");
write(tmpfile, &strout, 200);
fsync(tmpfile);
close(tmpfile); 
gettimeofday(&elapse_t, NULL);
unlink("/var/tmp/test_fsync.out");
printf("write & fsync  ");
print_elapse(start_t, elapse_t);
printf("\n");

/* write, close & fsync */
gettimeofday(&start_t, NULL);
if ((tmpfile = open("/var/tmp/test_fsync.out", O_RDWR | O_CREAT)) == -1)
die("can't open /var/tmp/test_fsync.out");
write(tmpfile, &strout, 200);
close(tmpfile);
/* reopen file */
if ((tmpfile = open("/var/tmp/test_fsync.out", O_RDWR | O_CREAT)) == -1)
die("can't open /var/tmp/test_fsync.out");
fsync(tmpfile);
close(tmpfile); 
gettimeofday(&elapse_t, NULL);
unlink("/var/tmp/test_fsync.out");
printf("write, close & fsync   ");
print_elapse(start_t, elapse_t);
printf(