Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-17 Thread Stefan Kaltenbrunner

Marc Mamin wrote:


Postgres configuration for 64 CPUs, 128 GB RAM...


there are probably not that much installation out there that large - 
comments below




Hello,

We have the oppotunity to benchmark our application on a large server. I 
have to prepare the Postgres configuration and I'd appreciate some 
comments on it as I am not experienced with servers of such a scale. 
Moreover the configuration should be fail-proof as I won't be able to 
attend the tests.


Our application (java + perl) and Postgres will run on the same server, 
whereas the application activity is low when Postgres has large 
transactions to process.


There is a large gap between our current produtcion server (Linux, 4GB 
RAM, 4 cpus) and the benchmark server; one of the target of this  
benchmark is to verify the scalability of our application.




[...]

Posgres version: 8.2.1


upgrade to 8.2.4


File system:

_http://en.wikipedia.org/wiki/ZFS_


way more important is what kind of disk-IO subsystem you have attached ...





Planned configuration:


# we don't expect more than 150 parallel connections,
# but I suspect a leak in our application that let some idle connections 
open


max_connections=2000

ssl = off

#maximum allowed
shared_buffers= 262143


this is probably on the lower side for a 128GB box



# on our current best production server with 4GB RAM (not dedicated to 
Postgres), work_mem is set to 600 MB
# this limitation is probably the bottleneck for our application as the 
files in pgsql_tmp grows up to 15 GB
# during large aggregations (we have a locking mechanismus to avoid 
parallel processing of such transactions)

work_mem = 31457280  # (30 GB)


this is simply ridiculous - work_mem is PER SORT - so if your query 
requires 8 sorts it will feel free to use 8x30GB and needs to be 
multiplied by the number of concurrent connections.




# index creation time is also an issue for us; the process is locking 
other large processes too.

# our largest table so far is 13 GB + 11 GB indexes
maintenance_work_mem = 31457280  # (30 GB)


this is ridiculous too - testing has shown that there is not much point 
in going beyond 1GB or so




# more than the max number of tables +indexes expected during the benchmark
max_fsm_relations = 10

max_fsm_pages = 180


this is probably way to low for a database the size of yours - watch the 
oputput of VACUUM VERBOSE on a database wide vacuum for some stats on that.




# don't know if I schoud modify this.
# seems to be sufficient on our production servers
max_stack_depth = 2MB

# vacuum will be done per hand between each test session
autovacuum = off



# required to analyse the benchmark
log_min_duration_statement = 1000


max_prepared_transaction = 100


# seems to be required to drop schema/roles containing large number of 
objects

max_locks_per_transaction = 128




# I use the default for the bgwriter as I couldnt find recommendation on 
those


#bgwriter_delay = 200ms # 10-1ms between rounds
#bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers 
scanned/round

#bgwriter_lru_maxpages = 5  # 0-1000 buffers max written/round
#bgwriter_all_percent = 0.333   # 0-100% of all buffers 
scanned/round

#bgwriter_all_maxpages = 5  # 0-1000 buffers max written/round


#WAL

fsync = on

#use default
#wal_sync_method

# we are using 32 on our production system
wal_buffers=64


values up to 512 or so have been reported to help on systems with very 
high concurrency



what is missing here is your settings for:

effective_cache_size

and

random_page_cost



Stefan

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-17 Thread Jim C. Nasby
On Tue, Jul 17, 2007 at 04:10:30PM +0200, Marc Mamin wrote:
> shared_buffers= 262143
 
You should at least try some runs with this set far, far larger. At
least 10% of memory, but it'd be nice to see what happens with this set
to 50% or higher as well (though don't set it larger than the database
since it'd be a waste).

How big is the database, anyway?

> # on our current best production server with 4GB RAM (not dedicated to
> Postgres), work_mem is set to 600 MB
> # this limitation is probably the bottleneck for our application as the
> files in pgsql_tmp grows up to 15 GB 
> # during large aggregations (we have a locking mechanismus to avoid
> parallel processing of such transactions)

Keep in mind that a good filesystem will be caching most of pgsql_tmp if
it can.

> max_prepared_transaction = 100
 
Are you using 2PC? If not, there's no reason to touch this (could could
just set it to 0).

> # I use the default for the bgwriter as I couldnt find recommendation on
> those
> 
> #bgwriter_delay = 200ms # 10-1ms between rounds
> #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers
> scanned/round
> #bgwriter_lru_maxpages = 5  # 0-1000 buffers max
> written/round
> #bgwriter_all_percent = 0.333   # 0-100% of all buffers
> scanned/round
> #bgwriter_all_maxpages = 5  # 0-1000 buffers max
> written/round
 
You'll probably want to increase both maxpages parameters substantially,
assuming that you've got good IO hardware.
 
> #CHECKPOINT
> 
> # xlog will be  on a separate disk
> checkpoint_segments=256
> 
> checkpoint_timeout = 5min

The further apart your checkpoints, the better. Might want to look at 10
minutes. I'd also set checkpoint_warning to just a bit below
checkpoint_timeout and watch for warnings to make sure you're not
checkpointing a lot more frequently than you're expecting.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpAtlNgiEMna.pgp
Description: PGP signature


Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-17 Thread Gregory Stark

"Marc Mamin" <[EMAIL PROTECTED]> writes:

> We have the oppotunity to benchmark our application on a large server. I
> have to prepare the Postgres configuration and I'd appreciate some
> comments on it as I am not experienced with servers of such a scale.
> Moreover the configuration should be fail-proof as I won't be able to
> attend the tests. 

I really think that's a recipe for disaster. Even on a regular machine you
need to treat tuning as an on-going feedback process. There's no such thing as
a fail-proof configuration since every application is different.

On an exotic machine like this you're going to run into unique problems that
nobody here can anticipate with certainty.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-17 Thread Greg Smith

On Tue, 17 Jul 2007, Marc Mamin wrote:

Moreover the configuration should be fail-proof as I won't be able to 
attend the tests.


This is unreasonable.  The idea that you'll get a magic perfect 
configuration in one shot suggests a fundamental misunderstanding of how 
work like this is done.  If there's any way you could adjust things so 
that, say, you were allowed to give at least 4 different tuning setups and 
you got a report back with each of the results for them, that would let 
you design a much better test set.



Posgres version: 8.2.1


This has already been mentioned, but it really is critical for your type 
of test to run 8.2.4 instead so I wanted to emphasize it.  There is a 
major scalability bug in 8.2.1.  I'm going to ignore the other things that 
other people have already commented on (all the suggestions Stephan and 
Jim already made are good ones you should heed) and try to fill in the 
remaining gaps instead.



# I use the default for the bgwriter as I couldnt find recommendation on
those


The defaults are so small that it will barely do anything on a server of 
your size.  Tuning it properly so that it's effective but doesn't waste a 
lot of resources is tricky, which is why you haven't found such 
recommendations--they're fairly specific to what you're doing and require 
some testing to get right.  If you want to see an example from a big 
server, look at


http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070606-00065.html#DBDatabase_SW_Config0

That's tuned for a very specific benchmark though.  Here's a fairly 
generic set of parameters that would be much more aggressive than the 
defaults, while not going so far as to waste too many resources if the 
writer is just getting in the way on your server:


bgwriter_delay = 200ms
bgwriter_lru_percent = 3.0
bgwriter_lru_maxpages = 500
bgwriter_all_percent = 1.0
bgwriter_all_maxpages = 250


#WAL
fsync = on
#use default
#wal_sync_method


I'd expect wal_sync_method=open_datasync would outperfom the default, but 
you'd really want to test both ways here to be sure.  The fact that the 
Sun results I referenced above use the default of fdatasync makes me 
hesitate to recommend that change too strongly, as I haven't worked with 
this particular piece of hardware. See 
http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm for more 
information about this parameter.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-17 Thread Marc Mamin

Postgres configuration for 64 CPUs, 128 GB RAM...

Hello,

We have the oppotunity to benchmark our application on a large server. I
have to prepare the Postgres configuration and I'd appreciate some
comments on it as I am not experienced with servers of such a scale.
Moreover the configuration should be fail-proof as I won't be able to
attend the tests. 

Our application (java + perl) and Postgres will run on the same server,
whereas the application activity is low when Postgres has large
transactions to process.

There is a large gap between our current produtcion server (Linux, 4GB
RAM, 4 cpus) and the benchmark server; one of the target of this
benchmark is to verify the scalability of our application. 


And you have no reason to be envious as the server doesn't belong us :-)


Thanks for your comments,

Marc Mamin





Posgres version: 8.2.1



Server Specifications:
--

Sun SPARC Enterprise M8000 Server:

http://www.sun.com/servers/highend/m8000/specs.xml

File system:

http://en.wikipedia.org/wiki/ZFS



Planned configuration:


# we don't expect more than 150 parallel connections, 
# but I suspect a leak in our application that let some idle connections
open

max_connections=2000

ssl = off 

#maximum allowed
shared_buffers= 262143

# on our current best production server with 4GB RAM (not dedicated to
Postgres), work_mem is set to 600 MB
# this limitation is probably the bottleneck for our application as the
files in pgsql_tmp grows up to 15 GB 
# during large aggregations (we have a locking mechanismus to avoid
parallel processing of such transactions)
work_mem = 31457280  # (30 GB)

# index creation time is also an issue for us; the process is locking
other large processes too.
# our largest table so far is 13 GB + 11 GB indexes
maintenance_work_mem = 31457280  # (30 GB)

# more than the max number of tables +indexes expected during the
benchmark
max_fsm_relations = 10

max_fsm_pages = 180

# don't know if I schoud modify this.
# seems to be sufficient on our production servers
max_stack_depth = 2MB

# vacuum will be done per hand between each test session
autovacuum = off 



# required to analyse the benchmark
log_min_duration_statement = 1000


max_prepared_transaction = 100


# seems to be required to drop schema/roles containing large number of
objects
max_locks_per_transaction = 128 




# I use the default for the bgwriter as I couldnt find recommendation on
those

#bgwriter_delay = 200ms # 10-1ms between rounds
#bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers
scanned/round
#bgwriter_lru_maxpages = 5  # 0-1000 buffers max
written/round
#bgwriter_all_percent = 0.333   # 0-100% of all buffers
scanned/round
#bgwriter_all_maxpages = 5  # 0-1000 buffers max
written/round


#WAL

fsync = on

#use default
#wal_sync_method

# we are using 32 on our production system
wal_buffers=64


# we didn't make any testing with this parameter until now, but this
should'nt be a relevant
# point as our performance focus is on large transactions
commit_delay = 0 

#CHECKPOINT

# xlog will be  on a separate disk
checkpoint_segments=256

checkpoint_timeout = 5min


Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-17 Thread Strong, David
>"Marc Mamin" <[EMAIL PROTECTED]> writes:
>
>> We have the oppotunity to benchmark our application on a large
server. I
>> have to prepare the Postgres configuration and I'd appreciate some
>> comments on it as I am not experienced with servers of such a scale.
>> Moreover the configuration should be fail-proof as I won't be able to
>> attend the tests. 
>
>I really think that's a recipe for disaster. Even on a regular machine
you
>need to treat tuning as an on-going feedback process. There's no such
thing >as
>a fail-proof configuration since every application is different.
>
>On an exotic machine like this you're going to run into unique problems
>that
>nobody here can anticipate with certainty.
>
>-- 
>  Gregory Stark
>  EnterpriseDB  http://www.enterprisedb.com
>

Marc,

You're getting a lot of good advice for your project. Let me be another
to reiterate that upgrading to Postgres 8.2.4 will bring added
performance and scalability benefits.

Others have mentioned that you do have to be data driven and
unfortunately that is true. All you can really do is pick a reasonable
starting point and run a test to create a baseline number. Then,
monitor, make small changes and test again. That's the only way you're
going to find the best configuration for your system. This will take
time and effort.

In addition, everything involved in your testing must scale - not just
Postgres. For example, if your driver hardware or driver software does
not scale, you won't be able to generate enough throughput for your
application or Postgres. The same goes for your all of your networking
equipment and any other hardware servers/software that might be involved
in the test environment. So, you really have to monitor at all levels
i.e. don't just focus on the database platform.

I took a quick look at the Sun M8000 server link you provided. I don't
know the system specifically so I might be mistaken, but it looks like
it is configured with 4 sockets per CPU board and 4 CPU boards per
system. Each CPU board looks like it has the ability to take 128GB RAM.
In this case, you will have to keep an eye on how Solaris is binding
(affinitizing) processes to CPU cores and/or boards. Any time a process
is bound to a new CPU board it's likely that there will be a number of
cache invalidations to move data the process was working on from the old
board to the new board. In addition, the moved process may still
continue to refer to memory it allocated on the old board. This can be
quite expensive. Typically, the more CPU cores/CPU boards you have, the
more likely this will happen. I'm no Solaris expert so I don't know if
there is a better way of doing this, but you might consider using the
psrset or pbind commands to bind Postgres backend processes to a
specific CPU core or range of cores. If choosing a range of cores, these
should be on the same CPU board. Again, through monitoring, you'll have
to determine how many CPU cores each backend really needs and then
you'll have to determine how best to spread the backends out over each
of the CPU boards.

Good luck.

David

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-17 Thread PFC
On Tue, 17 > We have the oppotunity to benchmark our application on a  
large server. I

have to prepare the Postgres configuration and I'd appreciate some
comments on it as I am not experienced with servers of such a scale.
Moreover the configuration should be fail-proof as I won't be able to
attend the tests.

Our application (java + perl) and Postgres will run on the same server,
whereas the application activity is low when Postgres has large
transactions to process.


Please, can you be more specific about your application :

- what does it do ?
- what kind of workload does it generate ?
	[ie: many concurrent small queries (website) ; few huge queries,  
reporting, warehousing, all of the above, something else ?]

- percentage and size of update queries ?
	- how many concurrent threads / connections / clients do you serve on a  
busy day ?
	(I don't mean online users on a website, but ACTIVE concurrent database  
connections)


	I assume you find your current server is too slow or foresee it will  
become too slow soon and want to upgrade, so :


	- what makes the current server's performance inadequate ? is it IO, CPU,  
RAM, a mix ? which proportions in the mix ?


	This is very important. If you go to the dealer and ask "I need a better  
vehicle", he'll sell you a Porsche. But if you say "I need a better vehcle  
to carry two tons of cinderblocks" he'll sell you something else I guess.  
Same with database servers. You could need some humongous CPU power, but  
you might as well not. Depends.



There is a large gap between our current produtcion server (Linux, 4GB
RAM, 4 cpus) and the benchmark server; one of the target of this
benchmark is to verify the scalability of our application.


	Define scalability. (no this isn't a joke, I mean, you know your  
application, how would you like it to "scale" ? How do you think it will  
scale ? Why ? What did you do so it would scale well ? etc.)





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] large number of connected connections to postgres database (v8.0)

2007-07-17 Thread Fei Liu
It appears my multi-thread application (100 connections every 5 seconds) 
is stalled when working with postgresql database server. I have limited 
number of connections in my connection pool to postgresql to 20. At the 
begining, connection is allocated and released from connection pool as 
postgres serves data request. The pool can recover from exhaustion. But 
very quickly (after about 400 client requests), it seems postgres server 
stops serving and connection to postgres server is not released any more 
resulting a resource exhausting for clients.


Anyone have experience with the performance aspect of this?

Fei
unix  3  [ ] STREAM CONNECTED 1693655 
31976/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693654 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693653 
31975/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693652 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693651 
31974/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693650 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693649 
31973/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693648 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693647 
31972/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693646 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693645 
31971/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693644 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693641 
31969/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693640 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693639 
31968/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693638 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693637 
31967/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693636 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693585 
31941/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693584 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693583 
31940/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693582 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693581 
31939/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693580 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693579 
31938/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693578 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693577 
31937/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693576 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693575 
31936/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693574 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693573 
31935/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693572 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693571 
31934/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693570 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693427 
31851/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693426 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693425 
31777/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693424 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693419 
31764/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693418 
31740/ns_ge_classif



unix  3  [ ] STREAM CONNECTED 1693655 
31976/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693654 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693653 
31975/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693652 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693651 
31974/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693650 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693649 
31973/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] STREAM CONNECTED 1693648 
31740/ns_ge_classif
unix  3  [ ] STREAM CONNECTED 1693647 
31972/postgres: pos /tmp/.s.PGSQL.5583
unix  3  [ ] 

[PERFORM] insert vs select into performance

2007-07-17 Thread Thomas Finneid

Hi

I was doing some testing on "insert" compared to "select into". I 
inserted 100 000 rows (with 8 column values) into a table, which took 14 
seconds, compared to a select into, which took 0.8 seconds.
(fyi, the inserts where batched, autocommit was turned off and it all 
happend on the local machine)


Now I am wondering why the select into is that much faster?
Does the select into translate into a specially optimised function in c 
that can cut corners which a insert can not do (e.g. lazy copying), or 
is it some other reason?


The reason I am asking is that select into shows that a number of rows 
can be inserted into a table quite a lot faster than one would think was 
possible with ordinary sql. If that is the case, it means that if I 
write an pl-pgsql insert function in C instead of sql, then I can have 
my db perform order of magnitude faster.


Any comments?

regards

thomas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Michael Glaesemann


On Jul 17, 2007, at 14:38 , Thomas Finneid wrote:

I was doing some testing on "insert" compared to "select into". I  
inserted 100 000 rows (with 8 column values) into a table, which  
took 14 seconds, compared to a select into, which took 0.8 seconds.
(fyi, the inserts where batched, autocommit was turned off and it  
all happend on the local machine)


Now I am wondering why the select into is that much faster?


It would be helpful if you included the actual queries you're using,  
as there are a number of variables:


1) If there are any constraints on the original table, the INSERT  
will be checking those constraints. AIUI, SELECT INTO does not  
generate any table constraints.


2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1,  
foo2, foo3 FROM pre_foo or individual inserts for each row? The  
former would be faster than the latter.


2b) If you are doing individual inserts, are you wrapping them in a  
transaction? The latter would be faster.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Arjen van der Meijden
Have you also tried the COPY-statement? Afaik select into is similar to 
what happens in there.


Best regards,

Arjen

On 17-7-2007 21:38 Thomas Finneid wrote:

Hi

I was doing some testing on "insert" compared to "select into". I 
inserted 100 000 rows (with 8 column values) into a table, which took 14 
seconds, compared to a select into, which took 0.8 seconds.
(fyi, the inserts where batched, autocommit was turned off and it all 
happend on the local machine)


Now I am wondering why the select into is that much faster?
Does the select into translate into a specially optimised function in c 
that can cut corners which a insert can not do (e.g. lazy copying), or 
is it some other reason?


The reason I am asking is that select into shows that a number of rows 
can be inserted into a table quite a lot faster than one would think was 
possible with ordinary sql. If that is the case, it means that if I 
write an pl-pgsql insert function in C instead of sql, then I can have 
my db perform order of magnitude faster.


Any comments?

regards

thomas

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Tom Lane
Michael Glaesemann <[EMAIL PROTECTED]> writes:
> It would be helpful if you included the actual queries you're using,  
> as there are a number of variables:

Not to mention which PG version he's testing.  Since (I think) 8.1,
SELECT INTO knows that it can substitute one fsync for WAL-logging
the individual row inserts, since if there's a crash the new table
will disappear anyway.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Thomas Finneid



Michael Glaesemann wrote:


On Jul 17, 2007, at 14:38 , Thomas Finneid wrote:

I was doing some testing on "insert" compared to "select into". I 
inserted 100 000 rows (with 8 column values) into a table, which took 
14 seconds, compared to a select into, which took 0.8 seconds.
(fyi, the inserts where batched, autocommit was turned off and it all 
happend on the local machine)


Now I am wondering why the select into is that much faster?


It would be helpful if you included the actual queries you're using, as 
there are a number of variables:


create table ciu_data_type
(
id  integer,
loc_id  integer,
value1  integer,
value2  real,
value3  integer,
value4  real,
value5  real,
value6  char(2),
value7  char(3),
value8  bigint,
value9  bigint,
value10 real,
value11 bigint,
value12 smallint,
value13 double precision,
value14 real,
value15 real,
value16 char(1),
value17 varchar(18),
value18 bigint,
value19 char(4)
);

performed with JDBC

insert into ciu_data_type (id, loc_id, value3, value5, value8, value9, 
value10, value11 ) values (?,?,?,?,?,?,?,?)


select * into ciu_data_type_copy from ciu_data_type

1) If there are any constraints on the original table, the INSERT will 
be checking those constraints. AIUI, SELECT INTO does not generate any 
table constraints.


No constraints in this test.

2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1, foo2, 
foo3 FROM pre_foo or individual inserts for each row? The former would 
be faster than the latter.


2b) If you are doing individual inserts, are you wrapping them in a 
transaction? The latter would be faster.


disabling autocommit, but nothing more than that


I havent done this test in a stored function yet, nor have I tried it 
with a C client so far, so there is the chance that it is java/jdbc that 
makes the insert so slow. I'll get to that test soon if there is any 
chance my theory makes sence.


regards

thomas


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Thomas Finneid



Tom Lane wrote:

Michael Glaesemann <[EMAIL PROTECTED]> writes:
It would be helpful if you included the actual queries you're using,  
as there are a number of variables:


Not to mention which PG version he's testing. 


Its pg 8.1, for now, I'll be upgrading to a compile optimised 8.2 when I 
do the real test on the real server.


(its on kubuntu 6.10 running on a Thinkpad T60 with dual core 1.5,GB RAM 
and 100GB SATA, just in case anybody feels that is of any interrest.)




Since (I think) 8.1,
SELECT INTO knows that it can substitute one fsync for WAL-logging
the individual row inserts, since if there's a crash the new table
will disappear anyway.


I am not sure I understand you correctly here, are you saying that 
SELECT INTO in 8.1 disables WAL logging and uses just a single fsync at 
the end? in that case it means that I could disable WAL as well and 
achieve the same performance, does it not?


regards

thomas


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Michael Stone

On Tue, Jul 17, 2007 at 10:50:22PM +0200, Thomas Finneid wrote:
I havent done this test in a stored function yet, nor have I tried it 
with a C client so far, so there is the chance that it is java/jdbc that 
makes the insert so slow. I'll get to that test soon if there is any 
chance my theory makes sence.


What you're seeing is perfectly normal. Switch to COPY for fast inserts. 
(When you use inserts you need to wait for a round-trip for each row, 
instead of sending data to the server as fast as possible.)


Mike Stone

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Mark Lewis
If you're performing via JDBC, are you using addBatch/executeBatch, or
are you directly executing each insert?  If you directly execute each
insert, then your code will wait for a server round-trip between each
insert.

That still won't get you to the speed of select into, but it should
help.  You could also look at the pgsql-jdbc archives for the JDBC
driver patches which allow you to use COPY-style bulk loading, which
should get you to the performance level of COPY, which should be
reasonably close to the performance of select into.

-- Mark Lewis

On Tue, 2007-07-17 at 22:50 +0200, Thomas Finneid wrote:
> 
> Michael Glaesemann wrote:
> > 
> > On Jul 17, 2007, at 14:38 , Thomas Finneid wrote:
> > 
> >> I was doing some testing on "insert" compared to "select into". I 
> >> inserted 100 000 rows (with 8 column values) into a table, which took 
> >> 14 seconds, compared to a select into, which took 0.8 seconds.
> >> (fyi, the inserts where batched, autocommit was turned off and it all 
> >> happend on the local machine)
> >>
> >> Now I am wondering why the select into is that much faster?
> > 
> > It would be helpful if you included the actual queries you're using, as 
> > there are a number of variables:
> 
> create table ciu_data_type
> (
>   id  integer,
>   loc_id  integer,
>   value1  integer,
>   value2  real,
>   value3  integer,
>   value4  real,
>   value5  real,
>   value6  char(2),
>   value7  char(3),
>   value8  bigint,
>   value9  bigint,
>   value10 real,
>   value11 bigint,
>   value12 smallint,
>   value13 double precision,
>   value14 real,
>   value15 real,
>   value16 char(1),
>   value17 varchar(18),
>   value18 bigint,
>   value19 char(4)
> );
> 
> performed with JDBC
> 
> insert into ciu_data_type (id, loc_id, value3, value5, value8, value9, 
> value10, value11 ) values (?,?,?,?,?,?,?,?)
> 
> select * into ciu_data_type_copy from ciu_data_type
> 
> > 1) If there are any constraints on the original table, the INSERT will 
> > be checking those constraints. AIUI, SELECT INTO does not generate any 
> > table constraints.
> 
> No constraints in this test.
> 
> > 2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1, foo2, 
> > foo3 FROM pre_foo or individual inserts for each row? The former would 
> > be faster than the latter.
> > 
> > 2b) If you are doing individual inserts, are you wrapping them in a 
> > transaction? The latter would be faster.
> 
> disabling autocommit, but nothing more than that
> 
> 
> I havent done this test in a stored function yet, nor have I tried it 
> with a C client so far, so there is the chance that it is java/jdbc that 
> makes the insert so slow. I'll get to that test soon if there is any 
> chance my theory makes sence.
> 
> regards
> 
> thomas
> 
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Thomas Finneid



Arjen van der Meijden wrote:
Have you also tried the COPY-statement? Afaik select into is similar to 
what happens in there.


No, because it only works on file to db or vice versa not table to table.

regards

thoams

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Michael Glaesemann


On Jul 17, 2007, at 15:50 , Thomas Finneid wrote:


Michael Glaesemann wrote:


2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1,  
foo2, foo3 FROM pre_foo or individual inserts for each row? The  
former would be faster than the latter.



performed with JDBC

insert into ciu_data_type (id, loc_id, value3, value5, value8,  
value9, value10, value11 ) values (?,?,?,?,?,?,?,?)


As they're individual inserts, I think what you're seeing is overhead  
from calling this statement 100,000 times, not just on the server but  
also the overhead through JDBC. For comparison, try


CREATE TABLE ciu_data_type_copy LIKE ciu_data_type;

INSERT INTO ciu_data_type_copy (id, loc_id, value3, value5, value8,  
value9, value10, value11)

SELECT id, loc_id, value3, value5, value8, value9, value10, value11
FROM ciu_data_type;

I think this would be more comparable to what you're seeing.

I havent done this test in a stored function yet, nor have I tried  
it with a C client so far, so there is the chance that it is java/ 
jdbc that makes the insert so slow. I'll get to that test soon if  
there is any chance my theory makes sence.


Just testing in psql with \timing should be fairly easy.

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Thomas Finneid



Mark Lewis wrote:

If you're performing via JDBC, are you using addBatch/executeBatch, or
are you directly executing each insert?  If you directly execute each
insert, then your code will wait for a server round-trip between each
insert.


I tested both and I found almost no difference in the time it took to 
perform it. Mind you this was on a local machine, but I still thought 
that it was a bit strange.



That still won't get you to the speed of select into, but it should
help.  You could also look at the pgsql-jdbc archives for the JDBC
driver patches which allow you to use COPY-style bulk loading, which
should get you to the performance level of COPY, which should be
reasonably close to the performance of select into.


Yes, someone else on the list suggested this a couple of weeks ago. I 
havent had a chance to test it yet, but I am hopeful that I can use it.


The only issue I have is that the test I have done are rather 
simplistic, because they are just speed trials. The real system will 
probably use 5-10 tables, with up to 100 columns for all tables, that 
means I need a stored function which goes through all bulked data and 
reinserts them into their real tables. I am worried that this might hurt 
the performance so much so that almost the entire bulk copy advantage 
diasappears. This is why I am wondering about the details of SELECT INTO 
and C functions etc.


regards

thomas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread PFC


I was doing some testing on "insert" compared to "select into". I  
inserted 100 000 rows (with 8 column values) into a table, which took 14  
seconds, compared to a select into, which took 0.8 seconds.
(fyi, the inserts where batched, autocommit was turned off and it all  
happend on the local machine)


Did you use prepared statements ?
	Did you use INSERT INTO ... VALUES () with a long list of values, or just  
100K insert statements ?


	It's the time to parse statements, plan, execute, roundtrips with the  
client, context switches, time for your client library to escape the data  
and encode it and for postgres to decode it, etc. In a word : OVERHEAD.


By the way which language and client library are you using ?

	FYI 14s / 100k = 140 microseconds per individual SQL query. That ain't  
slow at all.


Does the select into translate into a specially optimised function in c  
that can cut corners which a insert can not do (e.g. lazy copying), or  
is it some other reason?


	Yeah : instead of your client having to encode 100K * 8 values, send it  
over a socket, and postgres decoding it, INSERT INTO SELECT just takes the  
data, and writes the data. Same thing as writing a file a byte at a time  
versus using a big buffer.


The reason I am asking is that select into shows that a number of rows  
can be inserted into a table quite a lot faster than one would think was  
possible with ordinary sql. If that is the case, it means that if I  
write an pl-pgsql insert function in C instead of sql, then I can have  
my db perform order of magnitude faster.


	Fortunately this is already done for you : there is the PREPARE  
statement, which will remove the parsing overhead. If you must insert many  
rows, use VALUES (),(),()...


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Thomas Finneid



PFC wrote:


I was doing some testing on "insert" compared to "select into". I 
inserted 100 000 rows (with 8 column values) into a table, which took 
14 seconds, compared to a select into, which took 0.8 seconds.
(fyi, the inserts where batched, autocommit was turned off and it all 
happend on the local machine)


Did you use prepared statements ?
Did you use INSERT INTO ... VALUES () with a long list of values, or 
just 100K insert statements ?


It was prepared statements and I tried it both batched and non-batched 
(not much difference on a local machine)


It's the time to parse statements, plan, execute, roundtrips with 
the client, context switches, time for your client library to escape the 
data and encode it and for postgres to decode it, etc. In a word : 
OVERHEAD.


I know there is some overhead, but that much when running it batched...?


By the way which language and client library are you using ?

FYI 14s / 100k = 140 microseconds per individual SQL query. That 
ain't slow at all.


Unfortunately its not fast enough, it needs to be done in no more than 
1-2 seconds, ( and in production it will be maybe 20-50 columns of data, 
perhaps divided over 5-10 tables.)
Additionally it needs to scale to perhaps three times as many columns 
and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
Further on it needs to allow for about 20 - 50 clients reading much of 
that data before the next batch of data arrives.


I know the computer is going to be a much faster one than the one I am 
testing with, but I need to make sure the solution scales well.



regars

thomas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] importance of fast disks with pg

2007-07-17 Thread Thomas Finneid

Hi

During the somes I did I noticed that it does not necessarily seem to be 
true that one needs the fastest disks to have a pg system that is fast.


It seems to me that its more important to:
- choose the correct methods to use for the operation
- tune the pg memory settings
- tune/disable pg xlog/wal etc

It also seems to me that fast disks are more important for db systems of 
the OLTP type applications with real concurrency of both readers and 
writes across many, possibly larger, tables etc.


Are the above statements close to having any truth in them?

regards

thomas

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] importance of fast disks with pg

2007-07-17 Thread Dan Harris

Thomas Finneid wrote:

Hi

During the somes I did I noticed that it does not necessarily seem to be 
true that one needs the fastest disks to have a pg system that is fast.


It seems to me that its more important to:
- choose the correct methods to use for the operation
- tune the pg memory settings
- tune/disable pg xlog/wal etc

It also seems to me that fast disks are more important for db systems of 
the OLTP type applications with real concurrency of both readers and 
writes across many, possibly larger, tables etc.


Are the above statements close to having any truth in them?

regards

thomas


I'd say that "it depends".  We run an OLAP workload on 350+ gigs of database on 
a system with 64GB of RAM.  I can tell you for certain that fetching non-cached 
data is very sensitive to disk throughput!


Different types of workloads will find different bottlenecks in the system..

-Dan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] ionice to make vacuum friendier?

2007-07-17 Thread Ron Mayer
Seems Linux has IO scheduling through a program called ionice.

Has anyone here experimented with using it rather than
vacuum sleep settings?

http://linux.die.net/man/1/ionice
   This program sets the io scheduling class and priority
   for a program. As of this writing, Linux supports 3 scheduling
   classes:

   Idle. A program running with idle io priority will only get disk
   time when no other program has asked for disk io for a defined
   grace period. The impact of idle io processes on normal system
   activity should be zero.[...]

   Best effort. This is the default scheduling class for any process
   that hasn't asked for a specific io priority. Programs inherit the
   CPU nice setting for io priorities. [...]

http://friedcpu.wordpress.com/2007/07/17/why-arent-you-using-ionice-yet/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org