[GENERAL] PGDay.EU 2009 - Call for lighting talks

2009-10-02 Thread Dave Page
The PGDay.EU 2009 conference in Paris will include a series of
lightning talks on Saturday the 7th November.

This is your chance for five minutes of fame and glory, presenting any
PostgreSQL-related topic you like!

Interested, but not sure what you could speak on? How about:

- A PostgreSQL tool or add-on that you've been hacking on
- The activities of your local PostgreSQL User Group
- How you use PostgreSQL at work or home
- How you manage your PostgreSQL installations
- Why you love PostgreSQL
- Your PostgreSQL pet-peeve

Please send your name, a couple of sentences about yourself and the
title and abstract for your proposed talk to pap...@pgday.eu. The
closing date for submissions is Sunday 25th October. You will have
five minutes (and not a second more!) to present your talk in English.

For more information about the project, including the talk schedule
and the registration information, please see the website at:

 http://2009.pgday.eu/

Regards, Dave

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Greg Smith

On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

Lowering bgwriter_delay is possible, but I think overhead is too much and 
still there is a limit of 800MB/s involved:


Stuff written by the background writer turns into largely random I/O. 
800MB/s of random writes is so large of a number it's only recently become 
remotely possible; a RAID0 of SSD devices might manage it.  No hardware 
available until very recently had any hope of getting that sort of 
performance.


In any case, I would wager you'll run into one of many other bottlenecks 
in PostgreSQL and/or currently available system/disk hardware long before 
the background writer limit gets important.


So in fact I think bgwriter_lru_maxpages should be limited to 10 if 
limited at all.


The current limit is based on the assumption that people will set it to 
values way too high if allowed, to the point where it's counterproductive. 
That's exactly what people used to do with early background writer 
designs.  I think you're wandering down the same road, where what it 
actually does and what you think it does are not the same thing at all. 
Much of the important disk I/O coming out of the database should be 
related to checkpoints, not the background writer, and there is no limit 
on that I/O.


If you think you've got a situation where the current limits are not 
sufficient, the path to argue that would start with showing what you're 
seeing in pg_stat_bgwriter.  I can imagine some results from there on a 
system with a very high rate of I/O available that would suggest the 
current limits are too small.  I've never come close to actually seeing 
such results in the real world though, and if you're not already 
monitoring those numbers on a real system I'd suggest you start there 
rather than presuming there's a design limitation here.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-02 Thread Greg Smith

On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

I think this is one of the most critical performance showstopper of 
PostgreSQL on the I/O side.


I wish, this is an easy problem compared to the real important ones that 
need to be resolved.  Situations where the OS is capable of faster 
sequential I/O performance than PostgreSQL appears to deliver doing reads 
are often caused by something other than what the person doing said 
benchmarking believes they are.  For example, the last time I thought I 
had a smoking gun situation just like the one you're describing, it turns 
out the background operation I didn't know was going on that slowed things 
down were hint bit updates:  http://wiki.postgresql.org/wiki/Hint_Bits


Background checkpoints can also cause this, typically if you set 
checkpoint_segments really high and watch when they're happening you can 
avoid that interfering with results too.


It's hard to isolate out the cause of issues like this.  Since most people 
seem to get something close to real disk speed from sequential scans when 
measured properly, I would suggest starting with the assumption there's 
something wrong with your test case rather than PostgreSQL.  The best way 
to do that is to construct a test case others can run that shows the same 
problem on other systems using the database itself.  The easiest way to 
build one of those is using generate_series to create some bogus test 
data, SELECT everything in there with \timing on, and then use the size of 
the relation on disk to estimate MB/s.


Regardless, it's easy enough to build PostgreSQL with larger block sizes 
if you think that really matters for your situation.  You're never going 
to see that in the mainstream version though, because there are plenty of 
downsides to using larger blocks.  And since the database doesn't actually 
know where on disk things are at, it's not really in a good position to 
make decisions about I/O scheduling anyway.  More on that below.


What's the current status of the patch of Gregory Stark? Any timeframes to 
integrate?


There needs to be a fairly major rearchitecting of how PostgreSQL handles 
incoming disk I/O for that to go anywhere else, and I don't believe that's 
expected to be ready in the near future.


Does it also work for sequence scans? Any plans for a generic "multi block 
read count" solution?


There was a similar patch for sequential scans submitted by someone else 
based on that work.  It was claimed to help performance on a Linux system 
with a rather poor disk I/O setup.  No one else was able to replicate any 
performance improvement using the patch though.  As far as I've been able 
to tell, the read-ahead logic being done by the Linux kernel and in some 
hardware is already doing this sort of optimization for you on that OS, 
whether or not your app knows enough to recognize it's sequentially 
scanning the disk it's working against.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-02 Thread Greg Smith

On Sun, 27 Sep 2009, Sam Mason wrote:


The first run of:
 select count(*) from benchmark;
Will cause the "hint" bits to get set and will cause a lot of writing to
happen.  Subsequent runs will be testing read performance.


You just need to be careful of caching effects here.  Either stop the 
database and clear the system caches before doing the second count(*), or 
pick a table size that's much larger than total system RAM so it's can't 
possibly cache everything.  Otherwise you'll just be reading back from 
cached memory instead on the second read (and the first one, too, but 
because of hint bits that result doesn't mean much anyway).


Sam's results represent the status quo as I'm used to seeing it:  you 
should see about the "wire speed" of the disk when pulling in data this 
way, but both hint bits and checkpoints can slow results if you're not 
careful to account for them.  I keep meaning to add something just like 
this as a second level example on top of dd/bonnie++ on my disk testing 
page.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Programming interfaces when using MD5 authentication

2009-10-02 Thread Willy-Bas Loos
Hi,

I can only answer for the database part. This is on postgres 8.3.8.
The passwords can be checked against the table pg_shadow using this algorithm:
'md5'||md5(||)

HTH,

WBL

On Fri, Oct 2, 2009 at 1:48 AM, Preston de Guise  wrote:
> Hi,
>
> I apologise in advance if this is considered the wrong list to post onto. I
> couldn't find specific details for joining a DBD::Pg style mailing list so
> I'm hoping this is something that's relatively well known about by general
> PostgreSQL developers anyway.
>
> Using Perl to program interfaces to PostgreSQL, and had previously
> misunderstood how md5 password authentication worked, so I'm now re-doing
> it, but struggling to find out how DBD::Pg might be used to actually
> authenticate by passing an md5 of the password instead of the real thing.
>
> I understand from various reading that the md5 should be a double-process
> consisting of:
>
> phase1 = md5(password  username)
> password_to_use = md5(phase1  salt)
>
> What I don't understand is how to "extract" the salt required to complete
> the second phase.
>
> Effectively what I'm hoping for is to be able to do something along the
> lines of:
>
> ---
> #!/usr/local/bin/perl -w
>
> use strict;
> use DBI;
> use Digest::MD5 qw(md5_hex);
> use DBD::Pg;
>
> my $user = "currentuser";
> my $pass = md5_hex("supersecretpassword" . $user);
>
> my $dbh = DBI->connect("dbi:Pg:dbname=monitoring","$user",$pass, {
> PrintError => 1 });
> if (defined($dbh)) {
>        $dbh->disconnect();
>        print "Successful\n";
> } else {
>        print "Failed!!\n";
> }
> ---
>
> In the above, if I prepend "md5" to the $pass variable I obviously get what
> exactly matches the content of the pg_shadow table entry for the given user
> ... however, either way the connection isn't successful because (from what
> I've been able to discern) I actually need to submit:
>
> md5  md5($pass  salt)
>
> Can DBD::Pg be used for these connections? If anyone has experience in this
> I'd much appreciate your thoughts or suggestions. (I realise the "connect"
> function is from DBI, but it seems to me that the use of the salt required
> to properly authenticate will be specific somehow to DBD::Pg usage.)
>
> Cheers,
>
> Preston.
>
> --
> Preston de Guise
> http://www.enterprisesystemsbackup.com
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
"Patriotism is the conviction that your country is superior to all
others because you were born in it." -- George Bernard Shaw

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [pgeu-general] PGDay.EU 2009 - Call for lighting talks

2009-10-02 Thread Hans-Juergen Schoenig

hi dave,

here are some suggestions:

Artificial intelligence PostgreSQL:
We use PostgreSQL to run database driven neural networks as well as 
classification algorithms for a leading edge application.
Our neural networks are running directly inside PostgreSQL. This shows 
that PostgreSQL is not just a database but a wonderful development 
environment as well.



Full-Text-Indexing:
We use PostgreSQL to run a full-text price search engine.


PostgreSQL modifications:
We are porting industrial applications from Informix to PostgreSQL.
The goal is to run entire factories using PostgreSQL.


   Many thanks,

  hans




Dave Page wrote:

The PGDay.EU 2009 conference in Paris will include a series of
lightning talks on Saturday the 7th November.

This is your chance for five minutes of fame and glory, presenting any
PostgreSQL-related topic you like!

Interested, but not sure what you could speak on? How about:

- A PostgreSQL tool or add-on that you've been hacking on
- The activities of your local PostgreSQL User Group
- How you use PostgreSQL at work or home
- How you manage your PostgreSQL installations
- Why you love PostgreSQL
- Your PostgreSQL pet-peeve

Please send your name, a couple of sentences about yourself and the
title and abstract for your proposed talk to pap...@pgday.eu. The
closing date for submissions is Sunday 25th October. You will have
five minutes (and not a second more!) to present your talk in English.

For more information about the project, including the talk schedule
and the registration information, please see the website at:

 http://2009.pgday.eu/

Regards, Dave

  



--
Cybertec Schoenig & Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Adding modified and creation datetime to system catalogs

2009-10-02 Thread Thom Brown
I was looking around for ways to find out the creation date of a database or
a table, but there doesn't appear to be any functions or available metadata
to provide this information.  Unless there's a way I haven't seen yet, does
anyone see any problem with adding a creationdatetime and modifieddatetime
column to catalogs such as pg_database and pg_tables (but also maybe
pg_views, pg_language, pg_type etc)?  Or should this be stored separately
with oids indices with associated datetiimes?  I imagine it would be useful,
and I've seen this on other RDBMS' (except for modified date and time).

An example of where I was intending to use this a short while ago was to get
a list of tables and sort them by date created to review the oldest tables
for relevance.

Hopefully my suggestion is completely unnecessary and there will already be
a way to identify these pieces of information.

Thanks

Thom


Re: [GENERAL] Adding modified and creation datetime to system catalogs

2009-10-02 Thread Tom Lane
Thom Brown  writes:
> I was looking around for ways to find out the creation date of a database or
> a table, but there doesn't appear to be any functions or available metadata
> to provide this information.  Unless there's a way I haven't seen yet, does
> anyone see any problem with adding a creationdatetime and modifieddatetime
> column to catalogs such as pg_database and pg_tables (but also maybe
> pg_views, pg_language, pg_type etc)?

This has been proposed before, and rejected before, many times.  Please
see the archives.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Programming interfaces when using MD5 authentication

2009-10-02 Thread Tom Lane
Preston de Guise  writes:
> Using Perl to program interfaces to PostgreSQL, and had previously  
> misunderstood how md5 password authentication worked, so I'm now re- 
> doing it, but struggling to find out how DBD::Pg might be used to  
> actually authenticate by passing an md5 of the password instead of the  
> real thing.

You should always pass the cleartext password.  Any md5-ification will
be done by the low-level driver.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Procedure for feature requests?

2009-10-02 Thread Sam Mason
On Thu, Oct 01, 2009 at 11:35:25PM +, Tim Landscheidt wrote:
> suppose I thought that PostgreSQL would benefit greatly from
> a "generate_series(DATE, DATE[, INT]) RETURNS DATE" function

8.4 has a generate_series(timestamp,timestamp,interval) which would seem
to be a bit more flexible than you want.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Adding modified and creation datetime to system catalogs

2009-10-02 Thread Thom Brown
>
>
> This has been proposed before, and rejected before, many times.  Please
> see the archives.
>
>regards, tom lane
>

I couldn't find any record of this being mentioned before when I had a look
before writing my message, but I'll take your word for it.

Thanks

Thom


Re: [GENERAL] Vacuumdb Fails: Huge Tuple

2009-10-02 Thread Teodor Sigaev

APseudoUtopia  writes:

Here's what happened:

$ vacuumdb --all --full --analyze --no-password
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "web_main"
vacuumdb: vacuuming of database "web_main" failed: ERROR: б═huge tuple



PostgreSQL 8.4.0 on i386-portbld-freebsd7.2, compiled by GCC cc (GCC)
4.2.1 20070719  [FreeBSD], 32-bit
Pls, apply attached patch. Patch increases max size from approximately 500 bytes 
up to 2700 bytes, so vacuum will be able to finish.




This is evidently coming out of ginHeapTupleFastCollect because it's
formed a GIN tuple that is too large (either too long a word, or too
many postings, or both).  I'd say that this represents a serious
degradation in usability from pre-8.4 releases: before, you would have
gotten the error upon attempting to insert the table row that triggers
the problem.  Now, with the "fast insert" stuff, you don't find out
until VACUUM fails, and you have no idea where the bad data is.  Not cool.

Oleg, Teodor, what can we do about this?  Can we split an oversize
tuple into multiple entries?  Can we apply suitable size checks
before instead of after the fast-insert queue?
ginHeapTupleFastCollect and ginEntryInsert checked tuple's size for 
TOAST_INDEX_TARGET, but ginHeapTupleFastCollect checks without one ItemPointer, 
as ginEntryInsert does it. So ginHeapTupleFastCollect could produce a tuple 
which 6-bytes larger than allowed by ginEntryInsert. ginEntryInsert is called 
during pending list cleanup.


Patch removes checking of TOAST_INDEX_TARGET and use checking only by 
GinMaxItemSize which is greater than TOAST_INDEX_TARGET. All size's check is now 
in GinFormTuple.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


patch.gz
Description: Unix tar archive

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Vacuumdb Fails: Huge Tuple

2009-10-02 Thread Tom Lane
Teodor Sigaev  writes:
> Patch removes checking of TOAST_INDEX_TARGET and use checking only by 
> GinMaxItemSize which is greater than TOAST_INDEX_TARGET. All size's check is 
> now 
> in GinFormTuple.

Looks reasonable, although since the error is potentially user-facing
I think we should put a bit more effort into the error message
(use ereport and make it mention the index name, at least --- is there
any other useful information we could give?)

Will you apply this, or do you want me to?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Understanding 'could not read block'

2009-10-02 Thread Joshua D. Drake
On Thu, 2009-09-24 at 05:14 -0700, stevesub wrote:
> Hi,
> 
> I have a table of about 693 million rows (80gb) of position data (standard
> object,timestamp,position,etc).
> 
> Every time I try to build some statistics by creating a table, such as:
> > create table pos_stats1 as
> > select id,year,month,count(1) from positions group by id,year,month;
> 
> I get an error:
> > ERROR:  could not read block 8519713 of temporary file: Permission denied
> 
> I get a similar error sometimes on indexes.  What problem am I hitting here? 
> The drive and permissions all seem fine & the query runs for a long time
> before hitting this error.  I've turned off virus scan & it seems to have no
> effect.

It means the permissions on the physical file are such that your
postgres service user can't read them.

Joshua D. Drake

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - 
Salamander


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Vacuumdb Fails: Huge Tuple

2009-10-02 Thread Teodor Sigaev

Looks reasonable, although since the error is potentially user-facing
I think we should put a bit more effort into the error message
(use ereport and make it mention the index name, at least --- is there
any other useful information we could give?)

Only sizes as it's done in BTree, I suppose.


Will you apply this, or do you want me to?


I'm not able to provide a good error message in good English :(


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Vacuumdb Fails: Huge Tuple

2009-10-02 Thread Tom Lane
Teodor Sigaev  writes:
>> Will you apply this, or do you want me to?

> I'm not able to provide a good error message in good English :(

OK, I'll take care of it later today.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Weird behavior with "sensitive" cursors.

2009-10-02 Thread Alvaro Herrera
Alvaro Herrera escribió:
> 
> This one really works and includes a basic test case.  You were right
> that the extra Register was bogus :-(  I had to expose CopySnapshot,
> which I still don't like but ...  (I could have added an extra
> Unregister somewhere during portal close, but it would have meant making
> everything messier).

Applied (to HEAD and 8.4).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-02 Thread Gerhard Wiesinger

On Fri, 2 Oct 2009, Greg Smith wrote:


On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

I think this is one of the most critical performance showstopper of 
PostgreSQL on the I/O side.


I wish, this is an easy problem compared to the real important ones that need 
to be resolved.  Situations where the OS is capable of faster sequential I/O 
performance than PostgreSQL appears to deliver doing reads are often caused 
by something other than what the person doing said benchmarking believes they 
are.  For example, the last time I thought I had a smoking gun situation just 
like the one you're describing, it turns out the background operation I 
didn't know was going on that slowed things down were hint bit updates: 
http://wiki.postgresql.org/wiki/Hint_Bits


Background checkpoints can also cause this, typically if you set 
checkpoint_segments really high and watch when they're happening you can 
avoid that interfering with results too.


It's hard to isolate out the cause of issues like this.  Since most people 
seem to get something close to real disk speed from sequential scans when 
measured properly, I would suggest starting with the assumption there's 
something wrong with your test case rather than PostgreSQL.  The best way to 
do that is to construct a test case others can run that shows the same 
problem on other systems using the database itself.  The easiest way to build 
one of those is using generate_series to create some bogus test data, SELECT 
everything in there with \timing on, and then use the size of the relation on 
disk to estimate MB/s.


Regardless, it's easy enough to build PostgreSQL with larger block sizes if 
you think that really matters for your situation.  You're never going to see 
that in the mainstream version though, because there are plenty of downsides 
to using larger blocks.  And since the database doesn't actually know where 
on disk things are at, it's not really in a good position to make decisions 
about I/O scheduling anyway.  More on that below.


What's the current status of the patch of Gregory Stark? Any timeframes to 
integrate?


There needs to be a fairly major rearchitecting of how PostgreSQL handles 
incoming disk I/O for that to go anywhere else, and I don't believe that's 
expected to be ready in the near future.


Does it also work for sequence scans? Any plans for a generic "multi block 
read count" solution?


There was a similar patch for sequential scans submitted by someone else 
based on that work.  It was claimed to help performance on a Linux system 
with a rather poor disk I/O setup.  No one else was able to replicate any 
performance improvement using the patch though.  As far as I've been able to 
tell, the read-ahead logic being done by the Linux kernel and in some 
hardware is already doing this sort of optimization for you on that OS, 
whether or not your app knows enough to recognize it's sequentially scanning 
the disk it's working against.




I've enhanced the pgiosim project http://pgfoundry.org/projects/pgiosim/ 
with a patch for larger blocksizes independent from PostgreSQL: http://www.wiesinger.com/opensource/pgiosim/pgiosim-0.2-blocksizes.diff


You'll find some detailed results below and can verify this on your 
platforms with the patch above. Maybe someone can verify this on different 
HW/SW plattforms. If you have any questions regarding the pgiosim and the 
patch just feel free to ask.


Summary:
RANDOM I/O of blocksizes of e.g. 128k (e.g. BITMAP HEAP SCAN) has better 
performance than reading the same blocks with 8k block sizes (factor 1.5).


Conclusio:
In the test scenario the proposed solution would have a performance gain 
of a factor of 1.5 for typical BITMAP HEAP SCANS. For other scenarios no 
performance gain with larger block sizes of continuous blocks could be 
measured. Therefore I'm assuming that prefetching works well on Linux with 
sequential I/O but not with random I/O.


I hope I can convince someone that such optimizations make sense as 
commercial database venders have implemented such features for performance 
reasons.


BTW: Prefetch is enabled on the raid and blockdevices.

Ciao,
Gerhard

--
http://www.wiesinger.com/

# RANDOM I/O 8k blocksize
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 test.txt
Arg: 1
Added test.txt
blocksize=8192, reading block as a whole
Elapsed: 135.92
Read 1 blocks Wrote 0 blocks
73.57 op/sec, 588.60kB/sec

# RANDOM I/O 8k blocksize (for verification only), in fact same test as below
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -r test.txt
Arg: 1
Added test.txt
blocksize=8192, doing single read requests with chunk size of 8192 bytes
Elapsed: 136.30
Read 1 blocks Wrote 0 blocks
73.37 op/sec, 586.94kB/sec

# RANDOM I/O 128k blocksize, read as one 128k block
echo 3 > /proc/sys/vm/drop_caches;./pgiosim -b 1 -o 131072 test.txt
Arg: 1
Added test.txt
blocksize=131072, reading block as a whole
Elapsed: 160.56
Read 1 blocks Wrote 0 blocks
62.28 op/sec, 7972.15

Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-02 Thread Gerhard Wiesinger

On Fri, 2 Oct 2009, Greg Smith wrote:


On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

I think this is one of the most critical performance showstopper of 
PostgreSQL on the I/O side.


I wish, this is an easy problem compared to the real important ones that need 
to be resolved.  Situations where the OS is capable of faster sequential I/O 
performance than PostgreSQL appears to deliver doing reads are often caused 
by something other than what the person doing said benchmarking believes they 
are.  For example, the last time I thought I had a smoking gun situation just 
like the one you're describing, it turns out the background operation I 
didn't know was going on that slowed things down were hint bit updates: 
http://wiki.postgresql.org/wiki/Hint_Bits


Background checkpoints can also cause this, typically if you set 
checkpoint_segments really high and watch when they're happening you can 
avoid that interfering with results too.


It's hard to isolate out the cause of issues like this.  Since most people 
seem to get something close to real disk speed from sequential scans when 
measured properly, I would suggest starting with the assumption there's 
something wrong with your test case rather than PostgreSQL.  The best way to 
do that is to construct a test case others can run that shows the same 
problem on other systems using the database itself.  The easiest way to build 
one of those is using generate_series to create some bogus test data, SELECT 
everything in there with \timing on, and then use the size of the relation on 
disk to estimate MB/s.


Regardless, it's easy enough to build PostgreSQL with larger block sizes if 
you think that really matters for your situation.  You're never going to see 
that in the mainstream version though, because there are plenty of downsides 
to using larger blocks.  And since the database doesn't actually know where 
on disk things are at, it's not really in a good position to make decisions 
about I/O scheduling anyway.  More on that below.


What's the current status of the patch of Gregory Stark? Any timeframes to 
integrate?


There needs to be a fairly major rearchitecting of how PostgreSQL handles 
incoming disk I/O for that to go anywhere else, and I don't believe that's 
expected to be ready in the near future.


Does it also work for sequence scans? Any plans for a generic "multi block 
read count" solution?


There was a similar patch for sequential scans submitted by someone else 
based on that work.  It was claimed to help performance on a Linux system 
with a rather poor disk I/O setup.  No one else was able to replicate any 
performance improvement using the patch though.  As far as I've been able to 
tell, the read-ahead logic being done by the Linux kernel and in some 
hardware is already doing this sort of optimization for you on that OS, 
whether or not your app knows enough to recognize it's sequentially scanning 
the disk it's working against.


I forgot to mention:
Larger blocksizes also reduce IOPS (I/Os per second) which might be a 
critial threshold on storage systems (e.g. Fibre Channel systems). You 
would get e.g. the throughput from the storage with large block sizes 
(less IOPS) but with small block sizes the IOPS limit is reached and 
throughput performance goes down.


Example:
With 100MB/s and 8k blocks you need 12500 IOPS which is a lot (e.g. at 
least 90 disks with 140 IOPS)!
When blocks can be read with e.g. 128k block size 781 IOPS are sufficient 
(6 disks are sufficient)!


So this makes a major difference.

Ciao,
Gerhard

--
http://www.wiesinger.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Gerhard Wiesinger

On Fri, 2 Oct 2009, Greg Smith wrote:


On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

Lowering bgwriter_delay is possible, but I think overhead is too much and 
still there is a limit of 800MB/s involved:


Stuff written by the background writer turns into largely random I/O. 800MB/s 
of random writes is so large of a number it's only recently become remotely 
possible; a RAID0 of SSD devices might manage it.  No hardware available 
until very recently had any hope of getting that sort of performance.


In any case, I would wager you'll run into one of many other bottlenecks in 
PostgreSQL and/or currently available system/disk hardware long before the 
background writer limit gets important.




Of course, 800MB/s are a theoretical max. limit I could thought of. But 
with SSDs this might be possible.


So in fact I think bgwriter_lru_maxpages should be limited to 10 if 
limited at all.


The current limit is based on the assumption that people will set it to 
values way too high if allowed, to the point where it's counterproductive. 
That's exactly what people used to do with early background writer designs. 
I think you're wandering down the same road, where what it actually does and 
what you think it does are not the same thing at all. Much of the important 
disk I/O coming out of the database should be related to checkpoints, not the 
background writer, and there is no limit on that I/O.




In my experience flushing I/O as soon as possible is the best solution. 
Think of the following scenario: You currently limit bgwriter at 4MB/s but 
you would have about 10MB/s random I/O capacity (a normal low cost 
system). So utilitzzation would be only 40% and you could write even more. 
At checkpoint time you would get a spike which the I/O system couldn't 
handle at all and performance goes down to nearly zero because of the I/O 
spike (e.g. 500% of available I/O needed). IHMO such scenarios should be 
avoided.


If you think you've got a situation where the current limits are not 
sufficient, the path to argue that would start with showing what you're 
seeing in pg_stat_bgwriter.  I can imagine some results from there on a 
system with a very high rate of I/O available that would suggest the current 
limits are too small.  I've never come close to actually seeing such results 
in the real world though, and if you're not already monitoring those numbers 
on a real system I'd suggest you start there rather than presuming there's a 
design limitation here.




On an nearly idle database with sometimes some performance tests:

SELECT
  buffers_checkpoint/buffers_clean AS checkpoint_spike,
  ROUND(100.0*buffers_checkpoint/(buffers_checkpoint + buffers_clean + 
buffers_backend),2) AS checkpoint_percentage,
  ROUND(100.0*buffers_clean/(buffers_checkpoint + buffers_clean + 
buffers_backend),2) AS pg_writer_percentage,
  ROUND(100.0*buffers_backend/(buffers_checkpoint + buffers_clean + 
buffers_backend),2) AS backend_percentage

FROM
  pg_stat_bgwriter
;

 checkpoint_spike | checkpoint_percentage | pg_writer_percentage | 
backend_percentage
--+---+--+
   31 | 90.58 | 2.92 |  
 6.50

So flushing happens typically at checkpoint time. In 6.5%of all blocks 
were put by the backend on disk which says IHMO: pgwriter is to slow, 
backend has to do the work now.


So I'd like to do some tests with new statistics. Any fast way to reset 
statistics for all databases for pg_stat_pgwriter?


Thnx.

Ciao,
Gerhard

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance evaluation of PostgreSQL's historic releases

2009-10-02 Thread Simon Riggs

On Tue, 2009-09-29 at 09:36 +0200, György Vilmos wrote:

> I've done a benchmark of recent versions of PostgreSQL's last five
> major releases to see, how performance has changed during the past
> years from version to version.
> You can find the article here:
> http://suckit.blog.hu/2009/09/26/postgresql_history
> 
> Thanks for working on this great piece of software!

Thanks for doing the benchmarks. I'd been meaning to write up something
about performance increases over that period from a development
perspective. It's good to see some numbers around that.

Your graphs tail off steeply as # threads increases. I guess they would
on a logarithmic graph, though I would guess that has more to do with
using 24 cores and contention than with a true limitation of capacity.

Do the FreeBSD folk got Dtrace working yet in userspace? Maybe we can
examine the contention. Not right now though, fairly busy.

8.4 numbers seem about right, though the #threads at peak seems slightly
off. I think you should look at the point where performance drops down
to 95% or less of peak, which would give a more stable and comparable
figure than just looking at a single peak value.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Time Management - Training Seminar in Cape Town

2009-10-02 Thread Simon Riggs

On Wed, 2009-09-30 at 23:48 -0500, Training wrote:

> A training seminar that will put more time back in your life. 

Attending PostgreSQL training will help you to target your prospective
customers more effectively and stop wasting their time in the first
place.


-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Boolean storage takes up 1 byte?

2009-10-02 Thread Simon Riggs

On Thu, 2009-10-01 at 12:03 +0100, Sam Mason wrote:

> > However, I see that a boolean takes up 1
> > byte of storage, which is 8 bits.  Is this due to the fact that the value
> > can be null?
> 
> I believe it's more to do with the fact that if you add a boolean column
> and then subsequently an int column then you're going to struggle to
> "pack" them efficiently.  PG always puts columns on the "end" so that you
> can add a column in constant time (i.e. no need to rewrite the table
> in some common situations).  Once you start doing this then packing is
> awkward and a single byte becomes much easier.  Whether the value is
> NULL is stored elsewhere in the row.

It might be possible to make BOOLEAN NOT NULL use the null bit to
represent the actual data value and then have the column use no
additional bytes, except when we don't store the null bitmap at all.
Just needs people to make it happen cleanly, if that's possible.

Don't like booleans myself. They tend to end up as 3+ values eventually.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-02 Thread Simon Riggs

On Sun, 2009-09-27 at 18:05 +0200, Gerhard Wiesinger wrote:

> So I saw, that even on sequential reads (and also on bitmap heap scan acces) 
> PostgreSQL uses only 8k blocks. I think that's a major I/O bottleneck.
> 
> A commercial software database vendor solved the problem by reading multiple 
> continuous blocks by multiple 8k blocks up to a maximum threshold. Output per 
> 5 
> seconds on an equivalent "sequence scan":

Is systemtap counting actual I/Os or just requests to access 8192 blocks
once in OS cache? Postgres doesn't read more than one block at a time
into its buffer pool, so those numbers of requests look about right.

There is belief here that multi-block I/O was introduced prior to OS
doing this as a standard mechanism. Linux expands its read ahead window
in response to sequential scans and so this seems like something we
don't want to do in the database.

It's possible this is wrong. Is the table being scanned fairly sizable
and was it allocated contiguously? i.e. was it a large table loaded via
COPY? 

I also wonder if more L2 cache effects exist.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Time Management - Training Seminar in Cape Town

2009-10-02 Thread David Fetter
On Fri, Oct 02, 2009 at 08:32:50PM +0100, Simon Riggs wrote:
> 
> On Wed, 2009-09-30 at 23:48 -0500, Training wrote:
> 
> > A training seminar that will put more time back in your life. 
> 
> Attending PostgreSQL training will help you to target your
> prospective customers more effectively and stop wasting their time
> in the first place.

LOL!

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Greg Smith

On Fri, 2 Oct 2009, Gerhard Wiesinger wrote:


In my experience flushing I/O as soon as possible is the best solution.


That what everyone assumes, but detailed benchmarks of PostgreSQL don't 
actually support that view given how the database operates.  We went 
through a lot of work in 8.3 related to how to optimize the database as a 
system that disproved some of the theories about what would work well 
here.


What happens if you're really aggressive about writing blocks out as soon 
as they're dirty is that you waste a lot of I/O on things that just get 
dirty again later.  Since checkpoint time is the only period where blocks 
*must* get written, the approach that worked the best for reducing 
checkpoint spikes was to spread the checkpoint writes out over a very wide 
period.  The only remaining work that made sense for the background writer 
was to tightly focus the background writer its I/O on blocks that are 
about to be evicted due to low usage no matter what.


In most cases where people think they need more I/O from the background 
writer, what you actually want is to increase checkpoint_segments, 
checkpoint_completion_target, and checkpoint_timeout in order to spread 
the checkpoint I/O out over a longer period.  The stats you provided 
suggest this is working exactly as intended.


As far as work to improve the status quo, IMHO the next thing to improve 
is getting the fsync calls made at checkpoint time more intelligently 
spread over the whole period.  That's got a better payback than trying to 
make the background writer more aggressive, which is basically a doomed 
cause.


So I'd like to do some tests with new statistics. Any fast way to reset 
statistics for all databases for pg_stat_pgwriter?


No, that's an open TODO item I keep meaning to fix; we lost that 
capability at one point.  What I do is create a table that looks just like 
it, but with a time stamp, and save snapshots to that table.  Then a view 
on top can generate just the deltas between two samples to show activity 
during that time.  It's handy to have such a history anyway.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

2009-10-02 Thread Greg Smith

On Fri, 2 Oct 2009, Gerhard Wiesinger wrote:

Larger blocksizes also reduce IOPS (I/Os per second) which might be a critial 
threshold on storage systems (e.g. Fibre Channel systems).


True to some extent, but don't forget that IOPS is always relative to a 
block size in the first place.  If you're getting 200 IOPS with 8K blocks, 
increasing your block size to 128K will not result in your getting 200 
IOPS at that larger size; the IOPS number at the larger block size is 
going to drop too.  And you'll pay the penalty for that IOPS number 
dropping every time you're accessing something that would have only been 
an 8K bit of I/O before.


The trade-off is very application dependent.  The position you're 
advocating, preferring larger blocks, only makes sense if your workload 
consists mainly of larger scans.  Someone who is pulling scattered records 
from throughout a larger table will suffer with that same change, because 
they'll be reading a minimum of 128K even if all they really needed with a 
few bytes.  That penalty ripples all the way from the disk I/O upwards 
through the buffer cache.


It's easy to generate a synthetic benchmark workload that models some 
real-world applications and see performance plunge with a larger block 
size.  There certainly are others where a larger block would work better. 
Testing either way is complicated by the way RAID devices usually have 
their own stripe sizes to consider on top of the database block size.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Scott Marlowe
On Fri, Oct 2, 2009 at 2:19 PM, Greg Smith  wrote:
> On Fri, 2 Oct 2009, Gerhard Wiesinger wrote:
>
>> In my experience flushing I/O as soon as possible is the best solution.
>
> That what everyone assumes, but detailed benchmarks of PostgreSQL don't
> actually support that view given how the database operates.  We went through
> a lot of work in 8.3 related to how to optimize the database as a system
> that disproved some of the theories about what would work well here.
>
> What happens if you're really aggressive about writing blocks out as soon as
> they're dirty is that you waste a lot of I/O on things that just get dirty
> again later.  Since checkpoint time is the only period where blocks *must*
> get written, the approach that worked the best for reducing checkpoint
> spikes was to spread the checkpoint writes out over a very wide period.

The session servers we have at work are a perfect match for this.  By
increasing checkpoint segments to 100 (or more), timeout to 60
minutes, and setting completion target lower (currently 0.25) we have
reduced our IO wait from 10 to 15% to nearly nothing.  These are
databases that update the same rows over and over with session data as
the user navigates the system, so writing things out as early as
possible is a REAL bad idea.

> In most cases where people think they need more I/O from the background
> writer, what you actually want is to increase checkpoint_segments,
> checkpoint_completion_target, and checkpoint_timeout in order to spread the
> checkpoint I/O out over a longer period.  The stats you provided suggest
> this is working exactly as intended.

I found that lowering checkpoint completion target was what helped.
Does that seem counter-intuitive to you?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Vacuumdb Fails: Huge Tuple

2009-10-02 Thread Tom Lane
Teodor Sigaev  writes:
> ginHeapTupleFastCollect and ginEntryInsert checked tuple's size for 
> TOAST_INDEX_TARGET, but ginHeapTupleFastCollect checks without one 
> ItemPointer, 
> as ginEntryInsert does it. So ginHeapTupleFastCollect could produce a tuple 
> which 6-bytes larger than allowed by ginEntryInsert. ginEntryInsert is called 
> during pending list cleanup.

I applied this patch after improving the error reporting a bit --- but
I was unable to get the unpatched code to fail in vacuum as the OP
reported was happening for him.  It looks to me like the original coding
limits the tuple size to TOAST_INDEX_TARGET (512 bytes) during
collection, but checks only the much larger GinMaxItemSize limit during
final insertion.  So while this is a good cleanup, I am suspicious that
it may not actually explain the trouble report.

I notice that the complaint was about a VACUUM FULL not a plain VACUUM,
which means that the vacuum would have been moving tuples around and
hence inserting brand new index entries.  Is there any possible way that
we could extract a larger index tuple from a moved row than we had
extracted from the original version?

It would be nice to see an actual test case that makes 8.4 fail this way
...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Procedure for feature requests?

2009-10-02 Thread Tim Landscheidt
Alvaro Herrera  wrote:

> [...]
>>   Suppose the feature request was not a trivial one, but
>> maybe a "DEPENDS ON " clause for "CREATE FUNCTION"
>> to allow PostgreSQL to deny requests to drop a table/view/
>> function that is needed by a function - where would I pro-
>> pose that?

> On -hackers, just like any other feature request, trivial or not.

Thanks. Any particular form? A quick glance at the archives
did not reveal any feature requests that were not accompa-
nied by a patch :-).

Tim


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Procedure for feature requests?

2009-10-02 Thread Tim Landscheidt
Sam Mason  wrote:

>> suppose I thought that PostgreSQL would benefit greatly from
>> a "generate_series(DATE, DATE[, INT]) RETURNS DATE" function

> 8.4 has a generate_series(timestamp,timestamp,interval) which would seem
> to be a bit more flexible than you want.

Yes, I know :-). But as "generate_series(A, B, C)" can also
be written as "A + generate_series(0, (C - B) / C) * C" (or
something "flexible" like that :-)), a
"generate_series(DATE, DATE)" would inter alia get rid off
the need to cast the result from TIMESTAMP to DATE and to
explicitly specify "'1 day'". Just a small, trivial enhance-
ment for a popular use case :-).

Tim


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Boolean storage takes up 1 byte?

2009-10-02 Thread Tom Lane
Simon Riggs  writes:
> It might be possible to make BOOLEAN NOT NULL use the null bit to
> represent the actual data value and then have the column use no
> additional bytes, except when we don't store the null bitmap at all.
> Just needs people to make it happen cleanly, if that's possible.

I really doubt that any scheme to pack booleans tighter would be a
net win.  It'd make the core tuple-assembly and -disassembly loops more
complicated, hence slower and harder to maintain.  Everybody would pay
that price whether or not they ever saved a byte from it.

It's worth noting also that you don't save anything from packing a bool
unless the *next* field in the row has a weak enough alignment
requirement that it can be moved over.  In a majority of cases this
would mean that you'd need at least five adjacent bool columns before
you have any shot at winning anything --- with four or less, a following
column with int alignment will stay right where it is.

If you do have lots and lots of bool columns, it might possibly be worth
the trouble to represent them as a combined BIT(n) column ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Procedure for feature requests?

2009-10-02 Thread Joshua D. Drake
On Fri, 2009-10-02 at 21:37 +, Tim Landscheidt wrote:
> Alvaro Herrera  wrote:
> 
> > [...]
> >>   Suppose the feature request was not a trivial one, but
> >> maybe a "DEPENDS ON " clause for "CREATE FUNCTION"
> >> to allow PostgreSQL to deny requests to drop a table/view/
> >> function that is needed by a function - where would I pro-
> >> pose that?
> 
> > On -hackers, just like any other feature request, trivial or not.
> 
> Thanks. Any particular form? A quick glance at the archives
> did not reveal any feature requests that were not accompa-
> nied by a patch :-).
> 

There is not a specific format but a good thing to do is:

Request Feature
Provide Use Case
Provide Example syntax (if applicable)

Joshua D. Drake



> Tim
> 
> 
-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - 
Salamander


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Tom Lane
Scott Marlowe  writes:
> The session servers we have at work are a perfect match for this.  By
> increasing checkpoint segments to 100 (or more), timeout to 60
> minutes, and setting completion target lower (currently 0.25) we have
> reduced our IO wait from 10 to 15% to nearly nothing.  These are
> databases that update the same rows over and over with session data as
> the user navigates the system, so writing things out as early as
> possible is a REAL bad idea.

> I found that lowering checkpoint completion target was what helped.
> Does that seem counter-intuitive to you?

Once the checkpoint completion target time is high enough that the
checkpoint-induced I/O is just background noise for you, increasing the
target further won't make for any noticeable further improvement.  I'm
not sure I see how it would make things *worse* though.  Maybe, even
though the I/O wait is "nearly nothing", the I/O is still forcing enough
extra seeks to slow normal disk operations?  If so, getting the
checkpoint out of the way sooner so that you can get back to full speed
operation sooner might be better than reducing the rate of checkpoint
I/Os below the nearly-noise level.  I'm just guessing about that though.
What were you measuring --- the performance within checkpoint, the
performance outside it, or the whole-cycle average?

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Bruce Momjian
Tom Lane wrote:
> Scott Marlowe  writes:
> > The session servers we have at work are a perfect match for this.  By
> > increasing checkpoint segments to 100 (or more), timeout to 60
> > minutes, and setting completion target lower (currently 0.25) we have
> > reduced our IO wait from 10 to 15% to nearly nothing.  These are
> > databases that update the same rows over and over with session data as
> > the user navigates the system, so writing things out as early as
> > possible is a REAL bad idea.
> 
> > I found that lowering checkpoint completion target was what helped.
> > Does that seem counter-intuitive to you?
> 
> Once the checkpoint completion target time is high enough that the
> checkpoint-induced I/O is just background noise for you, increasing the
> target further won't make for any noticeable further improvement.  I'm
> not sure I see how it would make things *worse* though.  Maybe, even
> though the I/O wait is "nearly nothing", the I/O is still forcing enough
> extra seeks to slow normal disk operations?  If so, getting the
> checkpoint out of the way sooner so that you can get back to full speed
> operation sooner might be better than reducing the rate of checkpoint
> I/Os below the nearly-noise level.  I'm just guessing about that though.
> What were you measuring --- the performance within checkpoint, the
> performance outside it, or the whole-cycle average?

My guess is that having a very long fuzzy checkpoint time means that
when you fsync you are fsync'ing lots of data, both your checkpoint data
and other writes performed by backends.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Tom Lane
Bruce Momjian  writes:
> My guess is that having a very long fuzzy checkpoint time means that
> when you fsync you are fsync'ing lots of data, both your checkpoint data
> and other writes performed by backends.

Hmm, could be ... although that would imply that shared_buffers should
be kicked up some more, so the backends aren't doing so many writes for
themselves.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Scott Marlowe
On Fri, Oct 2, 2009 at 3:55 PM, Tom Lane  wrote:
> Scott Marlowe  writes:
>> The session servers we have at work are a perfect match for this.  By
>> increasing checkpoint segments to 100 (or more), timeout to 60
>> minutes, and setting completion target lower (currently 0.25) we have
>> reduced our IO wait from 10 to 15% to nearly nothing.  These are
>> databases that update the same rows over and over with session data as
>> the user navigates the system, so writing things out as early as
>> possible is a REAL bad idea.
>
>> I found that lowering checkpoint completion target was what helped.
>> Does that seem counter-intuitive to you?
>
> Once the checkpoint completion target time is high enough that the
> checkpoint-induced I/O is just background noise for you, increasing the
> target further won't make for any noticeable further improvement.  I'm
> not sure I see how it would make things *worse* though.  Maybe, even
> though the I/O wait is "nearly nothing", the I/O is still forcing enough
> extra seeks to slow normal disk operations?  If so, getting the
> checkpoint out of the way sooner so that you can get back to full speed
> operation sooner might be better than reducing the rate of checkpoint
> I/Os below the nearly-noise level.  I'm just guessing about that though.
> What were you measuring --- the performance within checkpoint, the
> performance outside it, or the whole-cycle average?

I was measuring it over an extended period, say a few hours.  This db
is small enough to fit in memory easily (1.5Gig on a machine with 6Gig
ram doing nothing else) so all the io is basically blocks out, with
none in.  Since the checkpoints should have a LOT of the same records
updated over and over, I'm guessing that a very low completion target
lets it collect a lot of those together and just write out the last
one.  The nice thing is I can benchmark one of these machines against
the other, since they're basically identical twins doing the same job,
and see how changes like this affect them.  At first I was at
something like 0.5 completion target, and increasing the checkpoint
segments did make an effect, but lowering the completion target was as
much of a gain as the increased checkpoint segments.

So, I think the tuning of a small db like this that can fit in memory
is a whole different ball game than one that's several times larger
than memory.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Limit of bgwriter_lru_maxpages of max. 1000?

2009-10-02 Thread Greg Smith

On Fri, 2 Oct 2009, Scott Marlowe wrote:


I found that lowering checkpoint completion target was what helped.
Does that seem counter-intuitive to you?


Generally, but there are plenty of ways you can get into a state where a 
short but not immediate checkpoint is better.  For example, consider a 
case where your buffer cache is filled with really random stuff.  There's 
a sorting horizon in effect, where your OS and/or controller makes 
decisions about what order to write things based on the data it already 
has around, not really knowing what's coming in the near future.


Let's say you've got 256MB of cache in the disk controller, you have 1GB 
of buffer cache to write out, and there's 8GB of RAM in the server so it 
can cache the whole write.  If you wrote it out in a big burst, the OS 
would elevator sort things and feed them to the controller in disk order. 
Very efficient, one pass over the disk to write everything out.


But if you broke that up into 256MB write pieces instead on the database 
side, pausing after each chunk was written, the OS would only be sorting 
across 256MB at a time, and would basically fill the controller cache up 
with that before it saw the larger picture.  The disk controller can end 
up making seek decisions with that small of a planning window now that are 
not really optimal, making more passes over the disk to write the same 
data out.  If the timing between the DB write cache and the OS is 
pathologically out of sync here, the result can end up being slower than 
had you just written out in bigger chunks instead.  This is one reason I'd 
like to see fsync calls happen earlier and more evenly than they do now, 
to reduce these edge cases.


The usual approach I take in this situation is to reduce the amount of 
write caching the OS does, so at least things get more predictable.  A 
giant write cache always gives the best average performance, but the 
worst-case behavior increases at the same time.


There was a patch floating around at one point that sorted all the 
checkpoint writes by block order, which would reduce how likely it is 
you'll end up in one of these odd cases.  That turned out to be hard to 
nail down the benefit of though, because in a typical case the OS caching 
here trumps any I/O scheduling you try to do in user land, and it's hard 
to repeatibly generate scattered data in a benchmark situation.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] [ANN] VTD-XML 2.7

2009-10-02 Thread jimmy Zhang

VTD-XML 2.7 is released and can be downloaded at
http://sourceforge.net/projects/vtd-xml/files/

Below is a summary of what are the new features and enhancements.

Expanded VTD-XML's Core API

* VTDNav: toStringUpperCase, toStringLowerCase, contains(), endsWith(),
startsWith()
* Extended VTD added in-memory buffer support

Improved Xpath

* added the following XPath 2.0 functions: abs(), ends-with(), upper-case(),
lower-case()
* added support for variable reference
* significantly enhanced XPath syntax, checking error reporting (Special
thanks to Mark Swanson)
* Internal performance tuning

Bug fixes and Code Enhancement

* C version significantly removed warning message, fix memory leak during
Xpath expression parsing,
* Various bug fies (Special thanks to Jon Roberts, John Zhu, Matej Spiller,
Steve Polson, and Romain La Tellier) 



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Where can I get the number of plans that considered by Planner?

2009-10-02 Thread 纪晓曦
Since I also need to consider gego, is this the best way to do it?

2009/9/30 Tom Lane 

> =?UTF-8?B?57qq5pmT5pum?=  writes:
> > Where can I add a  integer counter to count the plans considered by
> planner.
>
> Well, you could count the number of calls to add_path, but a path is
> hardly the same thing as a complete plan.
>
>regards, tom lane
>