Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...
Marc Mamin wrote: Postgres configuration for 64 CPUs, 128 GB RAM... there are probably not that much installation out there that large - comments below Hello, We have the oppotunity to benchmark our application on a large server. I have to prepare the Postgres configuration and I'd appreciate some comments on it as I am not experienced with servers of such a scale. Moreover the configuration should be fail-proof as I won't be able to attend the tests. Our application (java + perl) and Postgres will run on the same server, whereas the application activity is low when Postgres has large transactions to process. There is a large gap between our current produtcion server (Linux, 4GB RAM, 4 cpus) and the benchmark server; one of the target of this benchmark is to verify the scalability of our application. [...] Posgres version: 8.2.1 upgrade to 8.2.4 File system: _http://en.wikipedia.org/wiki/ZFS_ way more important is what kind of disk-IO subsystem you have attached ... Planned configuration: # we don't expect more than 150 parallel connections, # but I suspect a leak in our application that let some idle connections open max_connections=2000 ssl = off #maximum allowed shared_buffers= 262143 this is probably on the lower side for a 128GB box # on our current best production server with 4GB RAM (not dedicated to Postgres), work_mem is set to 600 MB # this limitation is probably the bottleneck for our application as the files in pgsql_tmp grows up to 15 GB # during large aggregations (we have a locking mechanismus to avoid parallel processing of such transactions) work_mem = 31457280 # (30 GB) this is simply ridiculous - work_mem is PER SORT - so if your query requires 8 sorts it will feel free to use 8x30GB and needs to be multiplied by the number of concurrent connections. # index creation time is also an issue for us; the process is locking other large processes too. # our largest table so far is 13 GB + 11 GB indexes maintenance_work_mem = 31457280 # (30 GB) this is ridiculous too - testing has shown that there is not much point in going beyond 1GB or so # more than the max number of tables +indexes expected during the benchmark max_fsm_relations = 10 max_fsm_pages = 180 this is probably way to low for a database the size of yours - watch the oputput of VACUUM VERBOSE on a database wide vacuum for some stats on that. # don't know if I schoud modify this. # seems to be sufficient on our production servers max_stack_depth = 2MB # vacuum will be done per hand between each test session autovacuum = off # required to analyse the benchmark log_min_duration_statement = 1000 max_prepared_transaction = 100 # seems to be required to drop schema/roles containing large number of objects max_locks_per_transaction = 128 # I use the default for the bgwriter as I couldnt find recommendation on those #bgwriter_delay = 200ms # 10-1ms between rounds #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round #bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round #WAL fsync = on #use default #wal_sync_method # we are using 32 on our production system wal_buffers=64 values up to 512 or so have been reported to help on systems with very high concurrency what is missing here is your settings for: effective_cache_size and random_page_cost Stefan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...
On Tue, Jul 17, 2007 at 04:10:30PM +0200, Marc Mamin wrote: > shared_buffers= 262143 You should at least try some runs with this set far, far larger. At least 10% of memory, but it'd be nice to see what happens with this set to 50% or higher as well (though don't set it larger than the database since it'd be a waste). How big is the database, anyway? > # on our current best production server with 4GB RAM (not dedicated to > Postgres), work_mem is set to 600 MB > # this limitation is probably the bottleneck for our application as the > files in pgsql_tmp grows up to 15 GB > # during large aggregations (we have a locking mechanismus to avoid > parallel processing of such transactions) Keep in mind that a good filesystem will be caching most of pgsql_tmp if it can. > max_prepared_transaction = 100 Are you using 2PC? If not, there's no reason to touch this (could could just set it to 0). > # I use the default for the bgwriter as I couldnt find recommendation on > those > > #bgwriter_delay = 200ms # 10-1ms between rounds > #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers > scanned/round > #bgwriter_lru_maxpages = 5 # 0-1000 buffers max > written/round > #bgwriter_all_percent = 0.333 # 0-100% of all buffers > scanned/round > #bgwriter_all_maxpages = 5 # 0-1000 buffers max > written/round You'll probably want to increase both maxpages parameters substantially, assuming that you've got good IO hardware. > #CHECKPOINT > > # xlog will be on a separate disk > checkpoint_segments=256 > > checkpoint_timeout = 5min The further apart your checkpoints, the better. Might want to look at 10 minutes. I'd also set checkpoint_warning to just a bit below checkpoint_timeout and watch for warnings to make sure you're not checkpointing a lot more frequently than you're expecting. -- Jim Nasby [EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpAtlNgiEMna.pgp Description: PGP signature
Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...
"Marc Mamin" <[EMAIL PROTECTED]> writes: > We have the oppotunity to benchmark our application on a large server. I > have to prepare the Postgres configuration and I'd appreciate some > comments on it as I am not experienced with servers of such a scale. > Moreover the configuration should be fail-proof as I won't be able to > attend the tests. I really think that's a recipe for disaster. Even on a regular machine you need to treat tuning as an on-going feedback process. There's no such thing as a fail-proof configuration since every application is different. On an exotic machine like this you're going to run into unique problems that nobody here can anticipate with certainty. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...
On Tue, 17 Jul 2007, Marc Mamin wrote: Moreover the configuration should be fail-proof as I won't be able to attend the tests. This is unreasonable. The idea that you'll get a magic perfect configuration in one shot suggests a fundamental misunderstanding of how work like this is done. If there's any way you could adjust things so that, say, you were allowed to give at least 4 different tuning setups and you got a report back with each of the results for them, that would let you design a much better test set. Posgres version: 8.2.1 This has already been mentioned, but it really is critical for your type of test to run 8.2.4 instead so I wanted to emphasize it. There is a major scalability bug in 8.2.1. I'm going to ignore the other things that other people have already commented on (all the suggestions Stephan and Jim already made are good ones you should heed) and try to fill in the remaining gaps instead. # I use the default for the bgwriter as I couldnt find recommendation on those The defaults are so small that it will barely do anything on a server of your size. Tuning it properly so that it's effective but doesn't waste a lot of resources is tricky, which is why you haven't found such recommendations--they're fairly specific to what you're doing and require some testing to get right. If you want to see an example from a big server, look at http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070606-00065.html#DBDatabase_SW_Config0 That's tuned for a very specific benchmark though. Here's a fairly generic set of parameters that would be much more aggressive than the defaults, while not going so far as to waste too many resources if the writer is just getting in the way on your server: bgwriter_delay = 200ms bgwriter_lru_percent = 3.0 bgwriter_lru_maxpages = 500 bgwriter_all_percent = 1.0 bgwriter_all_maxpages = 250 #WAL fsync = on #use default #wal_sync_method I'd expect wal_sync_method=open_datasync would outperfom the default, but you'd really want to test both ways here to be sure. The fact that the Sun results I referenced above use the default of fdatasync makes me hesitate to recommend that change too strongly, as I haven't worked with this particular piece of hardware. See http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm for more information about this parameter. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...
Postgres configuration for 64 CPUs, 128 GB RAM... Hello, We have the oppotunity to benchmark our application on a large server. I have to prepare the Postgres configuration and I'd appreciate some comments on it as I am not experienced with servers of such a scale. Moreover the configuration should be fail-proof as I won't be able to attend the tests. Our application (java + perl) and Postgres will run on the same server, whereas the application activity is low when Postgres has large transactions to process. There is a large gap between our current produtcion server (Linux, 4GB RAM, 4 cpus) and the benchmark server; one of the target of this benchmark is to verify the scalability of our application. And you have no reason to be envious as the server doesn't belong us :-) Thanks for your comments, Marc Mamin Posgres version: 8.2.1 Server Specifications: -- Sun SPARC Enterprise M8000 Server: http://www.sun.com/servers/highend/m8000/specs.xml File system: http://en.wikipedia.org/wiki/ZFS Planned configuration: # we don't expect more than 150 parallel connections, # but I suspect a leak in our application that let some idle connections open max_connections=2000 ssl = off #maximum allowed shared_buffers= 262143 # on our current best production server with 4GB RAM (not dedicated to Postgres), work_mem is set to 600 MB # this limitation is probably the bottleneck for our application as the files in pgsql_tmp grows up to 15 GB # during large aggregations (we have a locking mechanismus to avoid parallel processing of such transactions) work_mem = 31457280 # (30 GB) # index creation time is also an issue for us; the process is locking other large processes too. # our largest table so far is 13 GB + 11 GB indexes maintenance_work_mem = 31457280 # (30 GB) # more than the max number of tables +indexes expected during the benchmark max_fsm_relations = 10 max_fsm_pages = 180 # don't know if I schoud modify this. # seems to be sufficient on our production servers max_stack_depth = 2MB # vacuum will be done per hand between each test session autovacuum = off # required to analyse the benchmark log_min_duration_statement = 1000 max_prepared_transaction = 100 # seems to be required to drop schema/roles containing large number of objects max_locks_per_transaction = 128 # I use the default for the bgwriter as I couldnt find recommendation on those #bgwriter_delay = 200ms # 10-1ms between rounds #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round #bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round #WAL fsync = on #use default #wal_sync_method # we are using 32 on our production system wal_buffers=64 # we didn't make any testing with this parameter until now, but this should'nt be a relevant # point as our performance focus is on large transactions commit_delay = 0 #CHECKPOINT # xlog will be on a separate disk checkpoint_segments=256 checkpoint_timeout = 5min
Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...
>"Marc Mamin" <[EMAIL PROTECTED]> writes: > >> We have the oppotunity to benchmark our application on a large server. I >> have to prepare the Postgres configuration and I'd appreciate some >> comments on it as I am not experienced with servers of such a scale. >> Moreover the configuration should be fail-proof as I won't be able to >> attend the tests. > >I really think that's a recipe for disaster. Even on a regular machine you >need to treat tuning as an on-going feedback process. There's no such thing >as >a fail-proof configuration since every application is different. > >On an exotic machine like this you're going to run into unique problems >that >nobody here can anticipate with certainty. > >-- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > Marc, You're getting a lot of good advice for your project. Let me be another to reiterate that upgrading to Postgres 8.2.4 will bring added performance and scalability benefits. Others have mentioned that you do have to be data driven and unfortunately that is true. All you can really do is pick a reasonable starting point and run a test to create a baseline number. Then, monitor, make small changes and test again. That's the only way you're going to find the best configuration for your system. This will take time and effort. In addition, everything involved in your testing must scale - not just Postgres. For example, if your driver hardware or driver software does not scale, you won't be able to generate enough throughput for your application or Postgres. The same goes for your all of your networking equipment and any other hardware servers/software that might be involved in the test environment. So, you really have to monitor at all levels i.e. don't just focus on the database platform. I took a quick look at the Sun M8000 server link you provided. I don't know the system specifically so I might be mistaken, but it looks like it is configured with 4 sockets per CPU board and 4 CPU boards per system. Each CPU board looks like it has the ability to take 128GB RAM. In this case, you will have to keep an eye on how Solaris is binding (affinitizing) processes to CPU cores and/or boards. Any time a process is bound to a new CPU board it's likely that there will be a number of cache invalidations to move data the process was working on from the old board to the new board. In addition, the moved process may still continue to refer to memory it allocated on the old board. This can be quite expensive. Typically, the more CPU cores/CPU boards you have, the more likely this will happen. I'm no Solaris expert so I don't know if there is a better way of doing this, but you might consider using the psrset or pbind commands to bind Postgres backend processes to a specific CPU core or range of cores. If choosing a range of cores, these should be on the same CPU board. Again, through monitoring, you'll have to determine how many CPU cores each backend really needs and then you'll have to determine how best to spread the backends out over each of the CPU boards. Good luck. David ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...
On Tue, 17 > We have the oppotunity to benchmark our application on a large server. I have to prepare the Postgres configuration and I'd appreciate some comments on it as I am not experienced with servers of such a scale. Moreover the configuration should be fail-proof as I won't be able to attend the tests. Our application (java + perl) and Postgres will run on the same server, whereas the application activity is low when Postgres has large transactions to process. Please, can you be more specific about your application : - what does it do ? - what kind of workload does it generate ? [ie: many concurrent small queries (website) ; few huge queries, reporting, warehousing, all of the above, something else ?] - percentage and size of update queries ? - how many concurrent threads / connections / clients do you serve on a busy day ? (I don't mean online users on a website, but ACTIVE concurrent database connections) I assume you find your current server is too slow or foresee it will become too slow soon and want to upgrade, so : - what makes the current server's performance inadequate ? is it IO, CPU, RAM, a mix ? which proportions in the mix ? This is very important. If you go to the dealer and ask "I need a better vehicle", he'll sell you a Porsche. But if you say "I need a better vehcle to carry two tons of cinderblocks" he'll sell you something else I guess. Same with database servers. You could need some humongous CPU power, but you might as well not. Depends. There is a large gap between our current produtcion server (Linux, 4GB RAM, 4 cpus) and the benchmark server; one of the target of this benchmark is to verify the scalability of our application. Define scalability. (no this isn't a joke, I mean, you know your application, how would you like it to "scale" ? How do you think it will scale ? Why ? What did you do so it would scale well ? etc.) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] large number of connected connections to postgres database (v8.0)
It appears my multi-thread application (100 connections every 5 seconds) is stalled when working with postgresql database server. I have limited number of connections in my connection pool to postgresql to 20. At the begining, connection is allocated and released from connection pool as postgres serves data request. The pool can recover from exhaustion. But very quickly (after about 400 client requests), it seems postgres server stops serving and connection to postgres server is not released any more resulting a resource exhausting for clients. Anyone have experience with the performance aspect of this? Fei unix 3 [ ] STREAM CONNECTED 1693655 31976/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693654 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693653 31975/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693652 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693651 31974/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693650 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693649 31973/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693648 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693647 31972/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693646 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693645 31971/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693644 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693641 31969/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693640 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693639 31968/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693638 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693637 31967/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693636 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693585 31941/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693584 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693583 31940/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693582 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693581 31939/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693580 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693579 31938/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693578 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693577 31937/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693576 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693575 31936/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693574 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693573 31935/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693572 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693571 31934/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693570 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693427 31851/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693426 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693425 31777/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693424 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693419 31764/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693418 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693655 31976/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693654 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693653 31975/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693652 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693651 31974/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693650 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693649 31973/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ] STREAM CONNECTED 1693648 31740/ns_ge_classif unix 3 [ ] STREAM CONNECTED 1693647 31972/postgres: pos /tmp/.s.PGSQL.5583 unix 3 [ ]
[PERFORM] insert vs select into performance
Hi I was doing some testing on "insert" compared to "select into". I inserted 100 000 rows (with 8 column values) into a table, which took 14 seconds, compared to a select into, which took 0.8 seconds. (fyi, the inserts where batched, autocommit was turned off and it all happend on the local machine) Now I am wondering why the select into is that much faster? Does the select into translate into a specially optimised function in c that can cut corners which a insert can not do (e.g. lazy copying), or is it some other reason? The reason I am asking is that select into shows that a number of rows can be inserted into a table quite a lot faster than one would think was possible with ordinary sql. If that is the case, it means that if I write an pl-pgsql insert function in C instead of sql, then I can have my db perform order of magnitude faster. Any comments? regards thomas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] insert vs select into performance
On Jul 17, 2007, at 14:38 , Thomas Finneid wrote: I was doing some testing on "insert" compared to "select into". I inserted 100 000 rows (with 8 column values) into a table, which took 14 seconds, compared to a select into, which took 0.8 seconds. (fyi, the inserts where batched, autocommit was turned off and it all happend on the local machine) Now I am wondering why the select into is that much faster? It would be helpful if you included the actual queries you're using, as there are a number of variables: 1) If there are any constraints on the original table, the INSERT will be checking those constraints. AIUI, SELECT INTO does not generate any table constraints. 2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1, foo2, foo3 FROM pre_foo or individual inserts for each row? The former would be faster than the latter. 2b) If you are doing individual inserts, are you wrapping them in a transaction? The latter would be faster. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] insert vs select into performance
Have you also tried the COPY-statement? Afaik select into is similar to what happens in there. Best regards, Arjen On 17-7-2007 21:38 Thomas Finneid wrote: Hi I was doing some testing on "insert" compared to "select into". I inserted 100 000 rows (with 8 column values) into a table, which took 14 seconds, compared to a select into, which took 0.8 seconds. (fyi, the inserts where batched, autocommit was turned off and it all happend on the local machine) Now I am wondering why the select into is that much faster? Does the select into translate into a specially optimised function in c that can cut corners which a insert can not do (e.g. lazy copying), or is it some other reason? The reason I am asking is that select into shows that a number of rows can be inserted into a table quite a lot faster than one would think was possible with ordinary sql. If that is the case, it means that if I write an pl-pgsql insert function in C instead of sql, then I can have my db perform order of magnitude faster. Any comments? regards thomas ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert vs select into performance
Michael Glaesemann <[EMAIL PROTECTED]> writes: > It would be helpful if you included the actual queries you're using, > as there are a number of variables: Not to mention which PG version he's testing. Since (I think) 8.1, SELECT INTO knows that it can substitute one fsync for WAL-logging the individual row inserts, since if there's a crash the new table will disappear anyway. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert vs select into performance
Michael Glaesemann wrote: On Jul 17, 2007, at 14:38 , Thomas Finneid wrote: I was doing some testing on "insert" compared to "select into". I inserted 100 000 rows (with 8 column values) into a table, which took 14 seconds, compared to a select into, which took 0.8 seconds. (fyi, the inserts where batched, autocommit was turned off and it all happend on the local machine) Now I am wondering why the select into is that much faster? It would be helpful if you included the actual queries you're using, as there are a number of variables: create table ciu_data_type ( id integer, loc_id integer, value1 integer, value2 real, value3 integer, value4 real, value5 real, value6 char(2), value7 char(3), value8 bigint, value9 bigint, value10 real, value11 bigint, value12 smallint, value13 double precision, value14 real, value15 real, value16 char(1), value17 varchar(18), value18 bigint, value19 char(4) ); performed with JDBC insert into ciu_data_type (id, loc_id, value3, value5, value8, value9, value10, value11 ) values (?,?,?,?,?,?,?,?) select * into ciu_data_type_copy from ciu_data_type 1) If there are any constraints on the original table, the INSERT will be checking those constraints. AIUI, SELECT INTO does not generate any table constraints. No constraints in this test. 2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1, foo2, foo3 FROM pre_foo or individual inserts for each row? The former would be faster than the latter. 2b) If you are doing individual inserts, are you wrapping them in a transaction? The latter would be faster. disabling autocommit, but nothing more than that I havent done this test in a stored function yet, nor have I tried it with a C client so far, so there is the chance that it is java/jdbc that makes the insert so slow. I'll get to that test soon if there is any chance my theory makes sence. regards thomas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert vs select into performance
Tom Lane wrote: Michael Glaesemann <[EMAIL PROTECTED]> writes: It would be helpful if you included the actual queries you're using, as there are a number of variables: Not to mention which PG version he's testing. Its pg 8.1, for now, I'll be upgrading to a compile optimised 8.2 when I do the real test on the real server. (its on kubuntu 6.10 running on a Thinkpad T60 with dual core 1.5,GB RAM and 100GB SATA, just in case anybody feels that is of any interrest.) Since (I think) 8.1, SELECT INTO knows that it can substitute one fsync for WAL-logging the individual row inserts, since if there's a crash the new table will disappear anyway. I am not sure I understand you correctly here, are you saying that SELECT INTO in 8.1 disables WAL logging and uses just a single fsync at the end? in that case it means that I could disable WAL as well and achieve the same performance, does it not? regards thomas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert vs select into performance
On Tue, Jul 17, 2007 at 10:50:22PM +0200, Thomas Finneid wrote: I havent done this test in a stored function yet, nor have I tried it with a C client so far, so there is the chance that it is java/jdbc that makes the insert so slow. I'll get to that test soon if there is any chance my theory makes sence. What you're seeing is perfectly normal. Switch to COPY for fast inserts. (When you use inserts you need to wait for a round-trip for each row, instead of sending data to the server as fast as possible.) Mike Stone ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert vs select into performance
If you're performing via JDBC, are you using addBatch/executeBatch, or are you directly executing each insert? If you directly execute each insert, then your code will wait for a server round-trip between each insert. That still won't get you to the speed of select into, but it should help. You could also look at the pgsql-jdbc archives for the JDBC driver patches which allow you to use COPY-style bulk loading, which should get you to the performance level of COPY, which should be reasonably close to the performance of select into. -- Mark Lewis On Tue, 2007-07-17 at 22:50 +0200, Thomas Finneid wrote: > > Michael Glaesemann wrote: > > > > On Jul 17, 2007, at 14:38 , Thomas Finneid wrote: > > > >> I was doing some testing on "insert" compared to "select into". I > >> inserted 100 000 rows (with 8 column values) into a table, which took > >> 14 seconds, compared to a select into, which took 0.8 seconds. > >> (fyi, the inserts where batched, autocommit was turned off and it all > >> happend on the local machine) > >> > >> Now I am wondering why the select into is that much faster? > > > > It would be helpful if you included the actual queries you're using, as > > there are a number of variables: > > create table ciu_data_type > ( > id integer, > loc_id integer, > value1 integer, > value2 real, > value3 integer, > value4 real, > value5 real, > value6 char(2), > value7 char(3), > value8 bigint, > value9 bigint, > value10 real, > value11 bigint, > value12 smallint, > value13 double precision, > value14 real, > value15 real, > value16 char(1), > value17 varchar(18), > value18 bigint, > value19 char(4) > ); > > performed with JDBC > > insert into ciu_data_type (id, loc_id, value3, value5, value8, value9, > value10, value11 ) values (?,?,?,?,?,?,?,?) > > select * into ciu_data_type_copy from ciu_data_type > > > 1) If there are any constraints on the original table, the INSERT will > > be checking those constraints. AIUI, SELECT INTO does not generate any > > table constraints. > > No constraints in this test. > > > 2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1, foo2, > > foo3 FROM pre_foo or individual inserts for each row? The former would > > be faster than the latter. > > > > 2b) If you are doing individual inserts, are you wrapping them in a > > transaction? The latter would be faster. > > disabling autocommit, but nothing more than that > > > I havent done this test in a stored function yet, nor have I tried it > with a C client so far, so there is the chance that it is java/jdbc that > makes the insert so slow. I'll get to that test soon if there is any > chance my theory makes sence. > > regards > > thomas > > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert vs select into performance
Arjen van der Meijden wrote: Have you also tried the COPY-statement? Afaik select into is similar to what happens in there. No, because it only works on file to db or vice versa not table to table. regards thoams ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] insert vs select into performance
On Jul 17, 2007, at 15:50 , Thomas Finneid wrote: Michael Glaesemann wrote: 2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1, foo2, foo3 FROM pre_foo or individual inserts for each row? The former would be faster than the latter. performed with JDBC insert into ciu_data_type (id, loc_id, value3, value5, value8, value9, value10, value11 ) values (?,?,?,?,?,?,?,?) As they're individual inserts, I think what you're seeing is overhead from calling this statement 100,000 times, not just on the server but also the overhead through JDBC. For comparison, try CREATE TABLE ciu_data_type_copy LIKE ciu_data_type; INSERT INTO ciu_data_type_copy (id, loc_id, value3, value5, value8, value9, value10, value11) SELECT id, loc_id, value3, value5, value8, value9, value10, value11 FROM ciu_data_type; I think this would be more comparable to what you're seeing. I havent done this test in a stored function yet, nor have I tried it with a C client so far, so there is the chance that it is java/ jdbc that makes the insert so slow. I'll get to that test soon if there is any chance my theory makes sence. Just testing in psql with \timing should be fairly easy. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert vs select into performance
Mark Lewis wrote: If you're performing via JDBC, are you using addBatch/executeBatch, or are you directly executing each insert? If you directly execute each insert, then your code will wait for a server round-trip between each insert. I tested both and I found almost no difference in the time it took to perform it. Mind you this was on a local machine, but I still thought that it was a bit strange. That still won't get you to the speed of select into, but it should help. You could also look at the pgsql-jdbc archives for the JDBC driver patches which allow you to use COPY-style bulk loading, which should get you to the performance level of COPY, which should be reasonably close to the performance of select into. Yes, someone else on the list suggested this a couple of weeks ago. I havent had a chance to test it yet, but I am hopeful that I can use it. The only issue I have is that the test I have done are rather simplistic, because they are just speed trials. The real system will probably use 5-10 tables, with up to 100 columns for all tables, that means I need a stored function which goes through all bulked data and reinserts them into their real tables. I am worried that this might hurt the performance so much so that almost the entire bulk copy advantage diasappears. This is why I am wondering about the details of SELECT INTO and C functions etc. regards thomas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert vs select into performance
I was doing some testing on "insert" compared to "select into". I inserted 100 000 rows (with 8 column values) into a table, which took 14 seconds, compared to a select into, which took 0.8 seconds. (fyi, the inserts where batched, autocommit was turned off and it all happend on the local machine) Did you use prepared statements ? Did you use INSERT INTO ... VALUES () with a long list of values, or just 100K insert statements ? It's the time to parse statements, plan, execute, roundtrips with the client, context switches, time for your client library to escape the data and encode it and for postgres to decode it, etc. In a word : OVERHEAD. By the way which language and client library are you using ? FYI 14s / 100k = 140 microseconds per individual SQL query. That ain't slow at all. Does the select into translate into a specially optimised function in c that can cut corners which a insert can not do (e.g. lazy copying), or is it some other reason? Yeah : instead of your client having to encode 100K * 8 values, send it over a socket, and postgres decoding it, INSERT INTO SELECT just takes the data, and writes the data. Same thing as writing a file a byte at a time versus using a big buffer. The reason I am asking is that select into shows that a number of rows can be inserted into a table quite a lot faster than one would think was possible with ordinary sql. If that is the case, it means that if I write an pl-pgsql insert function in C instead of sql, then I can have my db perform order of magnitude faster. Fortunately this is already done for you : there is the PREPARE statement, which will remove the parsing overhead. If you must insert many rows, use VALUES (),(),()... ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] insert vs select into performance
PFC wrote: I was doing some testing on "insert" compared to "select into". I inserted 100 000 rows (with 8 column values) into a table, which took 14 seconds, compared to a select into, which took 0.8 seconds. (fyi, the inserts where batched, autocommit was turned off and it all happend on the local machine) Did you use prepared statements ? Did you use INSERT INTO ... VALUES () with a long list of values, or just 100K insert statements ? It was prepared statements and I tried it both batched and non-batched (not much difference on a local machine) It's the time to parse statements, plan, execute, roundtrips with the client, context switches, time for your client library to escape the data and encode it and for postgres to decode it, etc. In a word : OVERHEAD. I know there is some overhead, but that much when running it batched...? By the way which language and client library are you using ? FYI 14s / 100k = 140 microseconds per individual SQL query. That ain't slow at all. Unfortunately its not fast enough, it needs to be done in no more than 1-2 seconds, ( and in production it will be maybe 20-50 columns of data, perhaps divided over 5-10 tables.) Additionally it needs to scale to perhaps three times as many columns and perhaps 2 - 3 times as many rows in some situation within 1 seconds. Further on it needs to allow for about 20 - 50 clients reading much of that data before the next batch of data arrives. I know the computer is going to be a much faster one than the one I am testing with, but I need to make sure the solution scales well. regars thomas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] importance of fast disks with pg
Hi During the somes I did I noticed that it does not necessarily seem to be true that one needs the fastest disks to have a pg system that is fast. It seems to me that its more important to: - choose the correct methods to use for the operation - tune the pg memory settings - tune/disable pg xlog/wal etc It also seems to me that fast disks are more important for db systems of the OLTP type applications with real concurrency of both readers and writes across many, possibly larger, tables etc. Are the above statements close to having any truth in them? regards thomas ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] importance of fast disks with pg
Thomas Finneid wrote: Hi During the somes I did I noticed that it does not necessarily seem to be true that one needs the fastest disks to have a pg system that is fast. It seems to me that its more important to: - choose the correct methods to use for the operation - tune the pg memory settings - tune/disable pg xlog/wal etc It also seems to me that fast disks are more important for db systems of the OLTP type applications with real concurrency of both readers and writes across many, possibly larger, tables etc. Are the above statements close to having any truth in them? regards thomas I'd say that "it depends". We run an OLAP workload on 350+ gigs of database on a system with 64GB of RAM. I can tell you for certain that fetching non-cached data is very sensitive to disk throughput! Different types of workloads will find different bottlenecks in the system.. -Dan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] ionice to make vacuum friendier?
Seems Linux has IO scheduling through a program called ionice. Has anyone here experimented with using it rather than vacuum sleep settings? http://linux.die.net/man/1/ionice This program sets the io scheduling class and priority for a program. As of this writing, Linux supports 3 scheduling classes: Idle. A program running with idle io priority will only get disk time when no other program has asked for disk io for a defined grace period. The impact of idle io processes on normal system activity should be zero.[...] Best effort. This is the default scheduling class for any process that hasn't asked for a specific io priority. Programs inherit the CPU nice setting for io priorities. [...] http://friedcpu.wordpress.com/2007/07/17/why-arent-you-using-ionice-yet/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org