Re: [PERFORM] index file bloating still in 7.4 ?
On Mon, Oct 20, 2003 at 05:42:52PM -0400, Tom Lane wrote: > hold of these dead rows instead of recycling them. I suspect you have > a client process somewhere that is holding an open transaction for a > long time ... probably not doing anything, just sitting there with an > unclosed BEGIN ... Which could be because you're doing something nasty with one of the "autocommit=off" clients. Most of the client libraries implement this by doing "commit;begin;" at every commit. This means you have way more idle in transaction connections than you think. Look in pg_stat_activity, assuming you've turned on query echoing. A -- Andrew Sullivan 204-4141 Yonge Street Afilias CanadaToronto, Ontario Canada <[EMAIL PROTECTED]> M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Performance weirdness with/without vacuum analyze
Josh Berkus writes: > Harry, Many thanks for your response, > > >It has been suggested to me that I resubmit this question to this list, > > rather than the GENERAL list it was originaly sent to. > > > >I asked earlier about ways of doing an UPDATE involving a left outer > > join and got some very useful feedback. > > The query you posted will always be somewhat slow due to the forced join > order, which is unavodable with a left outer join. Yes - I rather suspected that! It is a shame it takes two joins to do the work. > > However, regarding your peculiar behaviour, please post: > > 1) Your random_page_cost and effective_cache_size settings #effective_cache_size = 1000# typically 8KB each #random_page_cost = 4 # units are one sequential page fetch cost i.e. - still set to their defaults. > 2) The EXPLAIN ANALYZE of each query instead of just the EXPLAIN First the case with no vacuum analyze: QUERY PLAN --- Merge Join (cost=99.32..171.32 rows=1000 width=259) (actual time=18579.92..48277.69 rows=335671 loops=1) Merge Cond: ("outer".cdr_id = "inner".cdr_id) -> Index Scan using import_cdrs_cdr_id_key on import_cdrs (cost=0.00..52.00 rows=1000 width=164) (actual time=0.42..11479.51 rows=335671 loops=1) -> Sort (cost=99.32..101.82 rows=1000 width=95) (actual time=18578.71..21155.65 rows=335671 loops=1) Sort Key: un.cdr_id -> Hash Join (cost=6.99..49.49 rows=1000 width=95) (actual time=4.70..10011.35 rows=335671 loops=1) Hash Cond: ("outer".interim_cli = "inner".interim_num) Join Filter: (("outer".starttime >= "inner".starttime) AND ("outer".starttime <= "inner".endtime)) -> Seq Scan on import_cdrs un (cost=0.00..20.00 rows=1000 width=49) (actual time=0.02..4265.63 rows=335671 loops=1) -> Hash (cost=6.39..6.39 rows=239 width=46) (actual time=4.57..4.57 rows=0 loops=1) -> Seq Scan on num_xlate (cost=0.00..6.39 rows=239 width=46) (actual time=0.12..2.77 rows=239 loops=1) Total runtime: 80408.42 msec (12 rows) And now the case *with* the vacuum analyze: QUERY PLAN - Hash Join (cost=15335.91..49619.57 rows=335671 width=202) (actual time=12383.44..49297.58 rows=335671 loops=1) Hash Cond: ("outer".cdr_id = "inner".cdr_id) -> Seq Scan on import_cdrs (cost=0.00..8496.71 rows=335671 width=126) (actual time=0.15..9504.24 rows=335671 loops=1) -> Hash (cost=10398.73..10398.73 rows=335671 width=76) (actual time=12371.13..12371.13 rows=0 loops=1) -> Hash Join (cost=6.99..10398.73 rows=335671 width=76) (actual time=4.91..9412.55 rows=335671 loops=1) Hash Cond: ("outer".interim_cli = "inner".interim_num) Join Filter: (("outer".starttime >= "inner".starttime) AND ("outer".starttime <= "inner".endtime)) -> Seq Scan on import_cdrs un (cost=0.00..8496.71 rows=335671 width=30) (actual time=0.09..3813.54 rows=335671 loops=1) -> Hash (cost=6.39..6.39 rows=239 width=46) (actual time=4.71..4.71 rows=0 loops=1) -> Seq Scan on num_xlate (cost=0.00..6.39 rows=239 width=46) (actual time=0.22..2.90 rows=239 loops=1) Total runtime: 432543.73 msec (11 rows) Please note that since I first posted I have been slightly adjusting the schema of the tables, but the disparity remains. Many thanks for your assistance. Regards, Harry. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance weirdness with/without vacuum analyze
Harry Broomhall wrote: > #effective_cache_size = 1000# typically 8KB each > #random_page_cost = 4 # units are one sequential page fetch cost You must tune the first one at least. Try http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these parameters. >>2) The EXPLAIN ANALYZE of each query instead of just the EXPLAIN > > > First the case with no vacuum analyze: > > QUERY PLAN > --- > Merge Join (cost=99.32..171.32 rows=1000 width=259) (actual time=18579.92..48277.69 rows=335671 loops=1) >Merge Cond: ("outer".cdr_id = "inner".cdr_id) >-> Index Scan using import_cdrs_cdr_id_key on import_cdrs (cost=0.00..52.00 rows=1000 width=164) (actual time=0.42..11479.51 rows=335671 loops=1) >-> Sort (cost=99.32..101.82 rows=1000 width=95) (actual time=18578.71..21155.65 rows=335671 loops=1) > Sort Key: un.cdr_id > -> Hash Join (cost=6.99..49.49 rows=1000 width=95) (actual time=4.70..10011.35 rows=335671 loops=1) >Hash Cond: ("outer".interim_cli = "inner".interim_num) >Join Filter: (("outer".starttime >= "inner".starttime) AND ("outer".starttime <= "inner".endtime)) >-> Seq Scan on import_cdrs un (cost=0.00..20.00 rows=1000 width=49) (actual time=0.02..4265.63 rows=335671 loops=1) >-> Hash (cost=6.39..6.39 rows=239 width=46) (actual time=4.57..4.57 rows=0 loops=1) > -> Seq Scan on num_xlate (cost=0.00..6.39 rows=239 width=46) (actual time=0.12..2.77 rows=239 loops=1) > Total runtime: 80408.42 msec > (12 rows) You are lucky to get a better plan here because planner is way off w.r.t estimated number of rows. > > And now the case *with* the vacuum analyze: > >QUERY PLAN > - > Hash Join (cost=15335.91..49619.57 rows=335671 width=202) (actual time=12383.44..49297.58 rows=335671 loops=1) >Hash Cond: ("outer".cdr_id = "inner".cdr_id) >-> Seq Scan on import_cdrs (cost=0.00..8496.71 rows=335671 width=126) (actual time=0.15..9504.24 rows=335671 loops=1) >-> Hash (cost=10398.73..10398.73 rows=335671 width=76) (actual time=12371.13..12371.13 rows=0 loops=1) > -> Hash Join (cost=6.99..10398.73 rows=335671 width=76) (actual time=4.91..9412.55 rows=335671 loops=1) >Hash Cond: ("outer".interim_cli = "inner".interim_num) >Join Filter: (("outer".starttime >= "inner".starttime) AND ("outer".starttime <= "inner".endtime)) >-> Seq Scan on import_cdrs un (cost=0.00..8496.71 rows=335671 width=30) (actual time=0.09..3813.54 rows=335671 loops=1) >-> Hash (cost=6.39..6.39 rows=239 width=46) (actual time=4.71..4.71 rows=0 loops=1) > -> Seq Scan on num_xlate (cost=0.00..6.39 rows=239 width=46) (actual time=0.22..2.90 rows=239 loops=1) > Total runtime: 432543.73 msec > (11 rows) > What happens if you turn off hash joins? Also bump sort memory to something good.. around 16MB and see what difference does it make to performance.. Shridhar ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] index file bloating still in 7.4 ?
Andrew Sullivan wrote: On Mon, Oct 20, 2003 at 05:42:52PM -0400, Tom Lane wrote: hold of these dead rows instead of recycling them. I suspect you have a client process somewhere that is holding an open transaction for a long time ... probably not doing anything, just sitting there with an unclosed BEGIN ... Which could be because you're doing something nasty with one of the "autocommit=off" clients. Most of the client libraries implement this by doing "commit;begin;" at every commit. This means you have way more idle in transaction connections than you think. Look in pg_stat_activity, assuming you've turned on query echoing. Or is enough do a ps -eafwww | grep post to see the state of the connections Regards Gaetano Mendola ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Performance weirdness with/without vacuum analyze
Shridhar Daithankar writes: > Harry Broomhall wrote: > > #effective_cache_size = 1000# typically 8KB each > > #random_page_cost = 4 # units are one sequential page fetch cost > > You must tune the first one at least. Try > http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these > parameters. Wow. Many thanks for the pointer. I'm going to be spending some time trying to get my head around all of that! [SNIP] > > Total runtime: 80408.42 msec > > (12 rows) > > You are lucky to get a better plan here because planner is way off w.r.t > estimated number of rows. Yes! I thought that. Which was why I was so surprised at the difference. > > > > And now the case *with* the vacuum analyze: > > [SNIP] > > What happens if you turn off hash joins? Also bump sort memory to something > good.. around 16MB and see what difference does it make to performance.. Lots of things to try there. It will probably take me some time . Regards, Harry. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL data on a NAS device ?
The machine is going to be used for a pretty large database (well over 100 tables with some of them containing over a million records from the start, number of tables and records will grow (much?) larger in the future). This database is going to be used by a pretty large number of employees. The number of concurrent users will vary between 1 - 100 or so, depending on the time of day etc. This will be a database containing client and supplier data as well as product descriptions and prices/ingredients/labels/brands etc. Database use will include lots of SELECTS but also lots of INSERTS/UPDATES, i.e. the database will be pretty active during bussiness hours... I think you (Scott and Will) are right when you say that NAS devices are not ideal for this kind of thing. I have been thinking about the hardware configuration for this machine for some time now (and had a lot of hints through this list already) and decided to go for a SCSI RAID config after all. The extra costs will be worth it :) The machine I have in mind now is like this : Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM. This machine will contain a PERC4/Di RAID controller with 128MB of battery backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives (15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB server via a SCSI cable... This machine will be a bit more expensive than I thought at first (it's going to be about EUR 14.000, but that's including 3 years of on-site support from Dell (24x7, 4-hour response) and peripherals like UPS etc... Do you think this machine wil be OK for this task? Thanks for your help so far :) Kind regards, Alexander Priem. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL data on a NAS device ?
Alexander Priem wrote: Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM. This machine will contain a PERC4/Di RAID controller with 128MB of battery backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives (15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB server via a SCSI cable... This machine will be a bit more expensive than I thought at first (it's going to be about EUR 14.000, but that's including 3 years of on-site support from Dell (24x7, 4-hour response) and peripherals like UPS etc... Check opteron as well.. I don't know much about european resellers. IBM sells eserver 325 which has opterons. Apparently they scale much better at higher load. Of course pricing,availability and support are most important. http://theregister.co.uk/content/61/33378.html http://www.pc.ibm.com/us/eserver/opteron/325/ Any concrete benchmarks for postgresql w.r.t xeons and opterons? A collection would be nice to have..:-) Shridhar ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] PostgreSQL data on a NAS device ?
I have considered Opteron, yes. But I think there are too many uncertainties, like : * It's a new CPU that has not proven itself yet. * I don't think I can buy directly from IBM (according to their site), so how about support (24x7) ? This will be very important to our client. * I need to install and configure a 64bit Linux flavour which I don't know (yet) Any suggestions about the usability of the system I described before? Here is the description again: Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM. This machine will contain a PERC4/Di RAID controller with 128MB of battery backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives (15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB server via a SCSI cable... I have never worked with a XEON CPU before. Does anyone know how it performs running PostgreSQL 7.3.4 / 7.4 on RedHat 9 ? Is it faster than a Pentium 4? I believe the main difference is cache memory, right? Aside from cache mem, it's basically a Pentium 4, or am I wrong? Kind regards, Alexander. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL data on a NAS device ?
Alexander Priem wrote: I have considered Opteron, yes. But I think there are too many uncertainties, like : * It's a new CPU that has not proven itself yet. * I don't think I can buy directly from IBM (according to their site), so how about support (24x7) ? This will be very important to our client. * I need to install and configure a 64bit Linux flavour which I don't know (yet) See http://www.monarchcomputer.com/ they custom build operton systems and preload them with Linux. You don't pay the Microsoft tax. -- Until later, Geoffrey [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Performance weirdness with/without vacuum analyze
Shridhar Daithankar writes: First - many thanks for your suggestions and pointers to further info. I have been trying some of them with some interesting results! > Harry Broomhall wrote: > > #effective_cache_size = 1000# typically 8KB each > > #random_page_cost = 4 # units are one sequential page fetch cost > > You must tune the first one at least. Try > http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these > parameters. Changing effective_cache_size seemed to have very little effect. I took it in steps up to 300MB (the machine has 640MB memory), and the differences in speed were less than 10%. [SNIP] > > What happens if you turn off hash joins? This makes the non vacuum version about 40% slower, and the vacuum version to the same speed (i.e. about 4X faster than it had been!). > Also bump sort memory to something > good.. around 16MB and see what difference does it make to performance.. This was interesting. Taking it to 10MB made a slight improvement. Up to 20MB and the vacuum case improved by 5X speed, but the non-vacuum version slowed down. Putting it up to 40MB slowed both down again. I will need to test with some of the other scripts and functions I have written, but it looks as if selective use of more sort memory will be useful. Regards, Harry. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SRFs ... no performance penalty?
On Mon, 2003-10-20 at 20:55, Josh Berkus wrote: > Folks, > > I'm working on the demo session for our upcoming presentation at PHPCon. > > As a side issue, we ended up comparing 3 versions of the same search screen: > > 1) All in PHP with views; > 2) Using a function to build a query and count results but executing that > query directly and sorting, paging in PHP; > 3) Using a Set Returning function to handle row-returning, sorting, and > paging. > > All three methods were executing a series moderately complex query against a > medium-sized data set (only about 20,000 rows but it's on a laptop). The > postgresql.conf was tuned like a webserver; e.g. low sort_mem, high > max_connections. > > So far, on the average of several searches, we have: > > 1) 0.19687 seconds > 2) 0.20667 seconds > 3) 0.20594 seconds > Is this measuring time in the back-end or total time of script execution? > In our tests, using any kind of PL/pgSQL function seems to carry a 0.01 second > penalty over using PHP to build the search query. I'm not sure if this is > comparitive time for string-parsing or something else; the 0.01 seems to be > consistent regardless of scale. > > The difference between using a PL/pgSQL function as a query-builder only (the > 7.2.x method) and using SRFs was small enough not to be significant. > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Tuning for mid-size server
Title: Tuning for mid-size server Hi, Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with internal drives on RAID5 will be delivered. Postgres will be from RH8.0. I am planning for these values for the postgres configuration - to begin with: Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144 Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - 167772 Effective_cache_size = 262144 (same as shared_buffers - 25%) In the /etc/sysctl file: = kernel.shmall = 536870912 (512MB) SHMALL Total amount of shared memory available (bytes or pages) kernel.shmmax = 536870912 (512MB) SHMMAX Maximum size of shared memory segment (bytes) In a generic sense, these are recommended values I found in some documents. The database will be small in size and will gradually grow over time from few thousands to a few million records, or more. The activity will be mostly of select statements from a few tables with joins, orderby, groupby clauses. The web application is based on Apache/Resin and hotspot JVM 1.4.0. Are the above settings ok to begin with? Are there any other parameters that I should configure now, or monitor lateron? In other words, am I missing anything here to take full advantage of 4 CPUs and 8Gigs of RAM? Appreciate any help. Thanks, Anjan ** This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.
Re: [PERFORM] PostgreSQL data on a NAS device ?
On Tue, 21 Oct 2003, Alexander Priem wrote: > The machine is going to be used for a pretty large database (well over 100 > tables with some of them containing over a million records from the start, > number of tables and records will grow (much?) larger in the future). This > database is going to be used by a pretty large number of employees. The > number of concurrent users will vary between 1 - 100 or so, depending on the > time of day etc. This will be a database containing client and supplier data > as well as product descriptions and prices/ingredients/labels/brands etc. > Database use will include lots of SELECTS but also lots of INSERTS/UPDATES, > i.e. the database will be pretty active during bussiness hours... > > I think you (Scott and Will) are right when you say that NAS devices are not > ideal for this kind of thing. I have been thinking about the hardware > configuration for this machine for some time now (and had a lot of hints > through this list already) and decided to go for a SCSI RAID config after > all. The extra costs will be worth it :) > > The machine I have in mind now is like this : > > Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM. > This machine will contain a PERC4/Di RAID controller with 128MB of battery > backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup > of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a > Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives > (15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB > server via a SCSI cable... Funny, we're looking at the same basic type of system here, but with a Perc3/CI controller. We have a local supplier who gives us machines with a 3 year warranty and looks to be $1,000 to $2,000 lower than the Dell. We're just going to run two 73 Gig drives in a RAID1 to start with, with battery backed RAM. So that brings up my question, which is better, the Perc4 or Perc3 controllers, and what's the difference between them? I find Dell's tendency to hide other people's hardware behind their own model numbers mildly bothersome, as it makes it hard to comparison shop. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Tuning for mid-size server
On Tuesday 21 October 2003 15:28, Anjan Dave wrote: > Hi, > > Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with > internal drives on RAID5 will be delivered. Postgres will be from RH8.0. You'll want to upgrade PG to v7.3.4 > I am planning for these values for the postgres configuration - to begin > with: > > Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144 > > Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - > 167772 > > Effective_cache_size = 262144 (same as shared_buffers - 25%) My instincts would be to lower the first two substantially, and increase the effective cache once you know load levels. I'd probably start with something like the values below and work up: shared_buffers = 8,000 - 10,000 (PG is happier letting the OS do the cacheing) sort_mem = 4,000 - 8,000 (don't forget this is for each sort) You'll find the annotated postgresql.conf and performance tuning articles useful: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php > In a generic sense, these are recommended values I found in some > documents. The database will be small in size and will gradually grow > over time from few thousands to a few million records, or more. The > activity will be mostly of select statements from a few tables with > joins, orderby, groupby clauses. The web application is based on > Apache/Resin and hotspot JVM 1.4.0. You'll need to figure out how many concurrent users you'll have and how much memory will be required by apache/java. If your database grows radically, you'll probably want to re-tune as it grows. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Tuning for mid-size server
Anjan, > Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with > internal drives on RAID5 will be delivered. Postgres will be from RH8.0. How many drives? RAID5 sucks for heavy read-write databases, unless you have 5+ drives. Or a large battery-backed cache. Also, last I checked, you can't address 8GB of RAM without a 64-bit processor. Since when are the Xeons 64-bit? > Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144 That's too high. Cut it in half at least. Probably down to 5% of available RAM. > Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - > 167772 Fine if you're running a few-user-large-operation database. If this is a webserver, you want a much, much lower value. > Effective_cache_size = 262144 (same as shared_buffers - 25%) Much too low. Where did you get these calculations, anyway? > In a generic sense, these are recommended values I found in some > documents. Where? We need to contact the author of the "documents" and tell them to correct things. > joins, orderby, groupby clauses. The web application is based on > Apache/Resin and hotspot JVM 1.4.0. You'll need to estimate the memory consumed by Java & Apache to have realistic figures to work with. > Are the above settings ok to begin with? Are there any other parameters > that I should configure now, or monitor lateron? No, they're not. See: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these parameters. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] SRFs ... no performance penalty?
Robert, > > 1) 0.19687 seconds > > 2) 0.20667 seconds > > 3) 0.20594 seconds > > Is this measuring time in the back-end or total time of script > execution? Total time of execution, e.g. from clicking the "enter" button to displaying the list of matches. Any other comparison would be misleading. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] index file bloating still in 7.4 ?
Hi Tom, 1.) OK. We have narrowed it down. We did a few (like 5 to 8 times) vacuum analyze (no full), the pg_statistics relfilenode grew. There was no database operation when we did this, no other client connections except the one that does the vacuum. If we do plain simple "vacuum " (again no full), we see pg_statistics_relid_att_index relfilenode grew instead of pg_statistics. So, overtime, these files will grow if we do vacuum. Are these expected ? The question now is, if we are not doing anything to the database, why would they grow after a few vacuums ? 2.) The other problem we have with DETAIL: 101802 dead row versions cannot be removed yet. DETAIL: 110900 dead row versions cannot be removed yet. DETAIL: 753064 dead row versions cannot be removed yet. DETAIL: 765328 dead row versions cannot be removed yet. We will collect more data and see what we can get from the the process. Offhand, the process is connecting to the database through ODBC and we don't use any BEGIN in our updates, just doing plain UPDATE repeatedly with different keys randomly. The database is defaulted to autocommit=true in postgresql.conf. Thanks. Gan At 5:25 pm -0400 2003/10/20, Tom Lane wrote: Seum-Lim Gan <[EMAIL PROTECTED]> writes: We tried one more thing: with the table not being updated at all and we did vacuum. Each time a vacuum is done, the index file becomes bigger. It is not possible for plain vacuum to make the index bigger. VACUUM FULL possibly could make the index bigger, since it has to transiently create duplicate index entries for every row it moves. If you want any really useful comments on your situation, you're going to have to offer considerably more detail than you have done so far --- preferably, a test case that lets someone else reproduce your results. So far, all we can do is guess on the basis of very incomplete information. When you aren't even bothering to mention whether a vacuum is FULL or not, I have to wonder whether I have any realistic picture of what's going on. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- ++ | Seum-Lim GAN email : [EMAIL PROTECTED] | | Lucent Technologies| | 2000 N. Naperville Road, 6B-403F tel : (630)-713-6665 | | Naperville, IL 60566, USA.fax : (630)-713-7272 | | web : http://inuweb.ih.lucent.com/~slgan | ++ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] index file bloating still in 7.4 ?
Seum-Lim Gan <[EMAIL PROTECTED]> writes: > We did a few (like 5 to 8 times) vacuum analyze (no full), the > pg_statistics relfilenode grew. Well, sure. ANALYZE puts new rows into pg_statistic, and obsoletes old ones. You need to vacuum pg_statistic every so often (not to mention the other system catalogs). > If we do plain simple "vacuum " (again no full), we see > pg_statistics_relid_att_index relfilenode grew instead of > pg_statistics. Don't think I believe that. Plain vacuum won't touch pg_statistic at all (unless it's the target table of course). I'd expect ANALYZE to make both the stats table and its index grow, though. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance weirdness with/without vacuum analyze
Harry Broomhall <[EMAIL PROTECTED]> writes: >-> Index Scan using import_cdrs_cdr_id_key on import_cdrs (cost=0.00..52.00 > rows=1000 width=164) (actual time=0.42..11479.51 rows=335671 loops=1) >-> Seq Scan on import_cdrs (cost=0.00..8496.71 rows=335671 width=126) (actual > time=0.15..9504.24 rows=335671 loops=1) Hm. The planner's default cost parameters assume that a full-table index scan will be much slower than a full-table seq scan. That's evidently not the case in your test situation. You could probably bring the estimates more in line with reality (and thereby improve the choice of plan) by reducing random_page_cost towards 1 and increasing effective_cache_size to represent some realistic fraction of your available RAM (though I concur with your observation that the latter doesn't change the estimates all that much). Beware however that test-case reality and production reality are not the same thing. You are evidently testing with tables that fit in RAM. If your production tables will not, you'd better be wary of being overly aggressive about reducing random_page_cost. I believe the default value (4.0) is fairly representative for situations where many actual disk fetches are needed, ie, the tables are much larger than RAM. 1.0 would be appropriate if all your tables are always fully cached in RAM (since RAM has by definition no random-access penalty). In intermediate cases you need to select intermediate values. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Tuning for mid-size server
On Tue, 21 Oct 2003, Josh Berkus wrote: > Anjan, > > > Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with > > internal drives on RAID5 will be delivered. Postgres will be from RH8.0. > > How many drives? RAID5 sucks for heavy read-write databases, unless you have > 5+ drives. Or a large battery-backed cache. You don't need a large cache, so much as a cache. The size isn't usually an issue now that 64 to 256 megs caches are the nominal cache sizes. Back when it was a choice of 4 or 8 megs it made a much bigger difference than 64 versus 256 meg make today. Also, if it's a read only environment, RAID5 with n drives equals the performance of RAID0 with n-1 drives. > Also, last I checked, you can't address 8GB of RAM without a 64-bit processor. > Since when are the Xeons 64-bit? Josh, you gotta get out more. IA32 has supported >4 gig ram for a long time now, and so has the linux kernel. It uses a paging method to do it. Individual processes are still limited to ~3 gig on Linux on 32 bit hardware though, so the extra mem will almost certainly spend it's time as kernel cache. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Tuning for mid-size server
From what I know, there is a cache-row-set functionality that doesn't exist with the newer postgres... Concurrent users will start from 1 to a high of 5000 or more, and could ramp up rapidly. So far, with increased users, we have gone up to starting the JVM (resin startup) with 1024megs min and max (recommended by Sun) - on the app side. Thanks, Anjan -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 21, 2003 11:57 AM To: Anjan Dave; [EMAIL PROTECTED] Subject: Re: [PERFORM] Tuning for mid-size server On Tuesday 21 October 2003 15:28, Anjan Dave wrote: > Hi, > > Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, > with internal drives on RAID5 will be delivered. Postgres will be from > RH8.0. You'll want to upgrade PG to v7.3.4 > I am planning for these values for the postgres configuration - to > begin > with: > > Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144 > > Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - > 167772 > > Effective_cache_size = 262144 (same as shared_buffers - 25%) My instincts would be to lower the first two substantially, and increase the effective cache once you know load levels. I'd probably start with something like the values below and work up: shared_buffers = 8,000 - 10,000 (PG is happier letting the OS do the cacheing) sort_mem = 4,000 - 8,000 (don't forget this is for each sort) You'll find the annotated postgresql.conf and performance tuning articles useful: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php > In a generic sense, these are recommended values I found in some > documents. The database will be small in size and will gradually grow > over time from few thousands to a few million records, or more. The > activity will be mostly of select statements from a few tables with > joins, orderby, groupby clauses. The web application is based on > Apache/Resin and hotspot JVM 1.4.0. You'll need to figure out how many concurrent users you'll have and how much memory will be required by apache/java. If your database grows radically, you'll probably want to re-tune as it grows. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Tuning for mid-size server
Josh, The 6650 can have upto 32GB of RAM. There are 5 drives. In future, they will be replaced by a fiber array - hopefully. I read an article that suggests you 'start' with 25% of memory for shared_buffers. Sort memory was suggested to be at 2-4%. Here's the link: http://www.ca.postgresql.org/docs/momjian/hw_performance/node8.html Maybe, I misinterpreted it. I read the document on http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html and the suggested values are much lower than what I have mentioned here. It won't hurt to start with lower numbers and increase lateron if needed. Thanks, Anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 21, 2003 12:21 PM To: Anjan Dave; [EMAIL PROTECTED] Subject: Re: [PERFORM] Tuning for mid-size server Anjan, > Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, > with internal drives on RAID5 will be delivered. Postgres will be from > RH8.0. How many drives? RAID5 sucks for heavy read-write databases, unless you have 5+ drives. Or a large battery-backed cache. Also, last I checked, you can't address 8GB of RAM without a 64-bit processor. Since when are the Xeons 64-bit? > Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144 That's too high. Cut it in half at least. Probably down to 5% of available RAM. > Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - > 167772 Fine if you're running a few-user-large-operation database. If this is a webserver, you want a much, much lower value. > Effective_cache_size = 262144 (same as shared_buffers - 25%) Much too low. Where did you get these calculations, anyway? > In a generic sense, these are recommended values I found in some > documents. Where? We need to contact the author of the "documents" and tell them to correct things. > joins, orderby, groupby clauses. The web application is based on > Apache/Resin and hotspot JVM 1.4.0. You'll need to estimate the memory consumed by Java & Apache to have realistic figures to work with. > Are the above settings ok to begin with? Are there any other > parameters that I should configure now, or monitor lateron? No, they're not. See: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these parameters. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Tuning for mid-size server
Scott, > Also, if it's a read only environment, RAID5 with n drives equals the > performance of RAID0 with n-1 drives. True. > Josh, you gotta get out more. IA32 has supported >4 gig ram for a long > time now, and so has the linux kernel. It uses a paging method to do it. > Individual processes are still limited to ~3 gig on Linux on 32 bit > hardware though, so the extra mem will almost certainly spend it's time as > kernel cache. Not that you'd want a sigle process to grow that large anyway. So what is the ceiling on 32-bit processors for RAM? Most of the 64-bit vendors are pushing Athalon64 and G5 as "breaking the 4GB barrier", and even I can do the math on 2^32. All these 64-bit vendors, then, are talking about the limit on ram *per application* and not per machine? This has all been academic to me to date, as the only very-high-ram systems I've worked with were Sparc or micros. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Low Insert/Update Performance
Rhaoni, > The delphi program does just one commit for all queries . > I was wandering if ther is some configuration parameters to be changed to > improve the performance ? To help you, we'll need to to trap a query and run an EXPLAIN ANALYZE on it. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Tuning for mid-size server
Anjan, > I read an article that suggests you 'start' with 25% of memory for > shared_buffers. Sort memory was suggested to be at 2-4%. Here's the > link: > http://www.ca.postgresql.org/docs/momjian/hw_performance/node8.html > Maybe, I misinterpreted it. No, I can see how you arrived at that conclusion, and Bruce is an authority. I'll contact him. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Tuning for mid-size server
Anjan, > From what I know, there is a cache-row-set functionality that doesn't > exist with the newer postgres... What? PostgreSQL has always used the kernel cache for queries. > Concurrent users will start from 1 to a high of 5000 or more, and could > ramp up rapidly. So far, with increased users, we have gone up to > starting the JVM (resin startup) with 1024megs min and max (recommended > by Sun) - on the app side. Well, just keep in mind when tuning that your calculations should be based on *available* RAM, meaning RAM not used by Apache or the JVM. With that many concurrent requests, you'll want to be *very* conservative with sort_mem; I might stick to the default of 1024 if I were you, or even lower it to 512k. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Tuning for mid-size server
On Tue, 21 Oct 2003, Josh Berkus wrote: > Scott, > > > Also, if it's a read only environment, RAID5 with n drives equals the > > performance of RAID0 with n-1 drives. > > True. > > > Josh, you gotta get out more. IA32 has supported >4 gig ram for a long > > time now, and so has the linux kernel. It uses a paging method to do it. > > Individual processes are still limited to ~3 gig on Linux on 32 bit > > hardware though, so the extra mem will almost certainly spend it's time as > > kernel cache. > > Not that you'd want a sigle process to grow that large anyway. True :-) Especially a pgsql backend. > So what is the ceiling on 32-bit processors for RAM? Most of the 64-bit > vendors are pushing Athalon64 and G5 as "breaking the 4GB barrier", and even > I can do the math on 2^32. All these 64-bit vendors, then, are talking > about the limit on ram *per application* and not per machine? I think it's 64 gigs in the current implementation, but that could just be a chip set thing, i.e. the theoretical limit is probably 2^63 or 2^64, but the realistic limitation is that the current mobo chipsets are gonna have a much lower limit, and I seem to recall that being 64 gig last I looked. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Tuning for mid-size server
On Tue, 21 Oct 2003, Josh Berkus wrote: > Anjan, > > > From what I know, there is a cache-row-set functionality that doesn't > > exist with the newer postgres... > > What? PostgreSQL has always used the kernel cache for queries. > > > Concurrent users will start from 1 to a high of 5000 or more, and could > > ramp up rapidly. So far, with increased users, we have gone up to > > starting the JVM (resin startup) with 1024megs min and max (recommended > > by Sun) - on the app side. > > Well, just keep in mind when tuning that your calculations should be based on > *available* RAM, meaning RAM not used by Apache or the JVM. > > With that many concurrent requests, you'll want to be *very* conservative with > sort_mem; I might stick to the default of 1024 if I were you, or even lower > it to 512k. Exactly. Remember, Anjan, that that if you have a single sort that can't fit in RAM, it will use the hard drive for temp space, effectively "swapping" on its own. If the concurrent sorts run the server out of memory, the server will start swapping process, quite possibly the sorts, in a sort of hideous round robin death spiral that will bring your machine to its knees as the worst possible time, midday under load. sort_mem is one of the small "foot guns" in the postgresql.conf file that people tend to pick up and go "huh, what's this do?" right before cranking it up. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Tuning for mid-size server
On Tue, Oct 21, 2003 at 10:12:15AM -0700, Josh Berkus wrote: > > So what is the ceiling on 32-bit processors for RAM? Most of the 64-bit > vendors are pushing Athalon64 and G5 as "breaking the 4GB barrier", and even > I can do the math on 2^32. All these 64-bit vendors, then, are talking > about the limit on ram *per application* and not per machine? Or per same-time access. Remember that, back in the old days on the pre-386s, accessing the extended or expanded memory (anyone remember which was which?) involved some fairly serious work, and not everything was seamless. I expect something similar is at work here. Not that I've had a reason to play with 4G ix86 machines, anyway. A -- Andrew Sullivan 204-4141 Yonge Street Afilias CanadaToronto, Ontario Canada <[EMAIL PROTECTED]> M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Tuning for mid-size server
On Tue, Oct 21, 2003 at 10:15:57AM -0700, Josh Berkus wrote: > Anjan, > > > I read an article that suggests you 'start' with 25% of memory for > > shared_buffers. Sort memory was suggested to be at 2-4%. Here's the > > link: > > http://www.ca.postgresql.org/docs/momjian/hw_performance/node8.html > > Maybe, I misinterpreted it. > > No, I can see how you arrived at that conclusion, and Bruce is an authority. > I'll contact him. I think the "25%" rule of thumb is slightly stale: above some threshold, it just falls apart, and lots of people now have machines well within that threshold. Heck, I'll bet Bruce's 2-way machine is within that threshold. A -- Andrew Sullivan 204-4141 Yonge Street Afilias CanadaToronto, Ontario Canada <[EMAIL PROTECTED]> M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Tuning for mid-size server
On Tue, 21 Oct 2003 10:12:15 -0700 Josh Berkus <[EMAIL PROTECTED]> wrote: > So what is the ceiling on 32-bit processors for RAM? Most of the > 64-bit vendors are pushing Athalon64 and G5 as "breaking the 4GB > barrier", and even I can do the math on 2^32. All these 64-bit > vendors, then, are talking about the limit on ram *per application* > and not per machine? You can have > 4GB per app, but also you get a big performance boost as you don't have to deal with all the silly paging - think of it from when we switched from real mode to protected mode. If you check out hte linux-kernel archives you'll see one of the things often recommended when things go odd is to turn off HIMEM support. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Tuning for mid-size server
Andrew Sullivan <[EMAIL PROTECTED]> writes: > I think the "25%" rule of thumb is slightly stale: above some > threshold, it just falls apart, and lots of people now have machines > well within that threshold. Heck, I'll bet Bruce's 2-way machine is > within that threshold. IIRC, we've not seen much evidence that increasing shared_buffers above about 1 delivers any performance boost. That's 80Mb, so the "25%" rule doesn't get seriously out of whack until you get to a gig or so of RAM. Which was definitely not common at the time the rule was put forward, but is now. Probably we should modify the rule-of-thumb to something like "25%, but not more than 1 buffers". regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Tuning for mid-size server
Andrew, > I think the "25%" rule of thumb is slightly stale: above some > threshold, it just falls apart, and lots of people now have machines > well within that threshold. Heck, I'll bet Bruce's 2-way machine is > within that threshold. Sure. But we had a few people on this list do tests (including me) and the anecdotal evidence was lower than 25%, substantially. The falloff is subtle until you hit 50% of RAM, like: % query throughput 1 5 - 10--- 15-- 20 -- 25 - 30 35 40 --- ... so it's often not immediately apparent when you've set stuff a little too high.However, in the folks that tested, the ideal was never anywhere near 25%, usually more in the realm of 5-10%. I've been using 6% as my starting figure for the last year for a variety of servers with good results. Of course, if you have anecdotal evidence to the contrary, then the only way to work this would be to have OSDL help us sort it out. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL data on a NAS device ?
On Tue, 2003-10-21 at 08:40, scott.marlowe wrote: > So that brings up my question, which is better, the Perc4 or Perc3 > controllers, and what's the difference between them? I find Dell's > tendency to hide other people's hardware behind their own model numbers > mildly bothersome, as it makes it hard to comparison shop. Perc4 has n LSI 1030 chip http://docs.us.dell.com/docs/storage/perc4di/en/ug/features.htm Perc3 depending on the model can be a couple of things but I think they are all U160 controllers and not U320 Will signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Tuning for mid-size server
Andrew Sullivan <[EMAIL PROTECTED]> writes: > On Tue, Oct 21, 2003 at 10:12:15AM -0700, Josh Berkus wrote: >> So what is the ceiling on 32-bit processors for RAM? > ... Remember that, back in the old days on the > pre-386s, accessing the extended or expanded memory (anyone remember > which was which?) involved some fairly serious work, and not > everything was seamless. I expect something similar is at work here. Right. A 32-bit processor can only (conveniently) allow any individual process to access 4G worth of address space. However the total RAM in the system can be more --- the kernel can set up the hardware address mappings to let different user processes use different up-to-4G segments of that RAM. And the kernel can also use excess RAM for disk buffer cache. So there's plenty of value in more-than-4G RAM, as long as you're not expecting any single user process to need more than 4G. This is no problem at all for Postgres, in which individual backend processes don't usually get very large, and we'd just as soon let most of the RAM go to kernel disk buffers anyway. I think that some hardware configurations have problems with using RAM above the first 4G for disk buffers, because of disk controller hardware that can't cope with physical DMA addresses wider than 32 bits. The solution here is to buy a better disk controller. If you google for "bounce buffers" you can learn more about this. What goes around comes around I guess --- I remember playing these same kinds of games to use more than 64K RAM in 16-bit machines, 25-odd years ago... regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Tuning for mid-size server
In the last exciting episode, [EMAIL PROTECTED] (Josh Berkus) wrote: > So what is the ceiling on 32-bit processors for RAM? Most of the > 64-bit vendors are pushing Athalon64 and G5 as "breaking the 4GB > barrier", and even I can do the math on 2^32. All these 64-bit > vendors, then, are talking about the limit on ram *per application* > and not per machine? I have been seeing ia-32 servers with 8GB of RAM; it looks as though there are ways of having them support ("physically, in theory, if you could get a suitable motherboard") as much as 64GB. But that certainly doesn't get you past 2^32 bytes per process, and possibly not past 2^31 bytes/process. >From Linux kernel help: CONFIG_NOHIGHMEM: Linux can use up to 64 Gigabytes of physical memory on x86 systems. However, the address space of 32-bit x86 processors is only 4 Gigabytes large. That means that, if you have a large amount of physical memory, not all of it can be "permanently mapped" by the kernel. The physical memory that's not permanently mapped is called "high memory". And that leaves open the question of how much shared memory you can address. That presumably has to fit into the 4GB, and if your PostgreSQL processes had (by some fluke) 4GB of shared memory, there wouldn't be any "local" memory for sort memory and the likes. Add to that the consideration that there are reports of Linux "falling over" when you get to right around 2GB/4GB. I ran a torture test a while back that _looked_ like it was running into that; I can't verify that, unfortunately. I don't see there being a whole lot of use of having more than about 8GB on an ia-32 system; what with shared memory maxing out at somewhere between 1 and 2GB, that suggests having ~8GB in total. I'd add another PG cluster if I had 16GB... -- let name="aa454" and tld="freenet.carleton.ca" in name ^ "@" ^ tld;; http://www.ntlug.org/~cbbrowne/postgresql.html "A statement is either correct or incorrect. To be *very* incorrect is like being *very* dead ... " -- Herbert F. Spirer Professor of Information Management University of Conn. (DATAMATION Letters, Sept. 1, 1984) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Tuning for mid-size server
Josh, The app servers are seperate dual-cpu boxes with 2GB RAM on each. Yes, from all the responses i have seen, i will be reducing the numbers to what has been suggested. Thanks to all, anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tue 10/21/2003 1:22 PM To: Anjan Dave; Richard Huxton; [EMAIL PROTECTED] Cc: Subject: Re: [PERFORM] Tuning for mid-size server Anjan, > From what I know, there is a cache-row-set functionality that doesn't > exist with the newer postgres... What? PostgreSQL has always used the kernel cache for queries. > Concurrent users will start from 1 to a high of 5000 or more, and could > ramp up rapidly. So far, with increased users, we have gone up to > starting the JVM (resin startup) with 1024megs min and max (recommended > by Sun) - on the app side. Well, just keep in mind when tuning that your calculations should be based on *available* RAM, meaning RAM not used by Apache or the JVM. With that many concurrent requests, you'll want to be *very* conservative with sort_mem; I might stick to the default of 1024 if I were you, or even lower it to 512k. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Tuning for mid-size server
Hopefully, i am not steering this into a different direction, but is there a way to find out how much sort memory each query is taking up, so that we can scale that up with increasing users? THanks, Anjan -Original Message- From: scott.marlowe [mailto:[EMAIL PROTECTED] Sent: Tue 10/21/2003 1:33 PM To: Josh Berkus Cc: Anjan Dave; Richard Huxton; [EMAIL PROTECTED] Subject: Re: [PERFORM] Tuning for mid-size server On Tue, 21 Oct 2003, Josh Berkus wrote: > Anjan, > > > From what I know, there is a cache-row-set functionality that doesn't > > exist with the newer postgres... > > What? PostgreSQL has always used the kernel cache for queries. > > > Concurrent users will start from 1 to a high of 5000 or more, and could > > ramp up rapidly. So far, with increased users, we have gone up to > > starting the JVM (resin startup) with 1024megs min and max (recommended > > by Sun) - on the app side. > > Well, just keep in mind when tuning that your calculations should be based on > *available* RAM, meaning RAM not used by Apache or the JVM. > > With that many concurrent requests, you'll want to be *very* conservative with > sort_mem; I might stick to the default of 1024 if I were you, or even lower > it to 512k. Exactly. Remember, Anjan, that that if you have a single sort that can't fit in RAM, it will use the hard drive for temp space, effectively "swapping" on its own. If the concurrent sorts run the server out of memory, the server will start swapping process, quite possibly the sorts, in a sort of hideous round robin death spiral that will bring your machine to its knees as the worst possible time, midday under load. sort_mem is one of the small "foot guns" in the postgresql.conf file that people tend to pick up and go "huh, what's this do?" right before cranking it up. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] RAID controllers etc... was: PostgreSQL data on a NAS device ?
On 21 Oct 2003, Will LaShell wrote: > On Tue, 2003-10-21 at 08:40, scott.marlowe wrote: > > > So that brings up my question, which is better, the Perc4 or Perc3 > > controllers, and what's the difference between them? I find Dell's > > tendency to hide other people's hardware behind their own model numbers > > mildly bothersome, as it makes it hard to comparison shop. > > Perc4 has n LSI 1030 chip > http://docs.us.dell.com/docs/storage/perc4di/en/ug/features.htm > > > Perc3 > depending on the model can be a couple of things but I think they are > all U160 controllers and not U320 Thanks. I googled around and found this page: http://www.domsch.com/linux/ Which says what each model is. It looks like the "RAID" controller they wanna charge me for is about $500 or so, so I'm guessing it's the medium range Elite 1600 type controller, i.e. U160, which is plenty for the machine / drive number we'll be using. Has anyone played around with the latest ones to get a feel for them? I want a battery backed controller that runs well under linux and also BSD that isn't gonna break the bank. I'd heard bad stories about the performance of the Adaptec RAID controllers, but it seems the newer ones aren't bad from what I've found googling. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Tuning for mid-size server
On Tue, Oct 21, 2003 at 11:51:02AM -0700, Josh Berkus wrote: > Of course, if you have anecdotal evidence to the contrary, then the > only way to work this would be to have OSDL help us sort it out. Nope. I too have such anecdotal evidence that 25% is way too high. It also seems to depend pretty heavily on what you're trying to optimise for and what platform you have. But I'm glad to hear (again) that people seem to think the 25% too high for most cases. I don't feel so much like I'm tilting against windmills. A -- Andrew Sullivan 204-4141 Yonge Street Afilias CanadaToronto, Ontario Canada <[EMAIL PROTECTED]> M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Tuning for mid-size server
On Tue, 21 Oct 2003, Andrew Sullivan wrote: > On Tue, Oct 21, 2003 at 11:51:02AM -0700, Josh Berkus wrote: > > > Of course, if you have anecdotal evidence to the contrary, then the > > only way to work this would be to have OSDL help us sort it out. > > Nope. I too have such anecdotal evidence that 25% is way too high. > It also seems to depend pretty heavily on what you're trying to > optimise for and what platform you have. But I'm glad to hear > (again) that people seem to think the 25% too high for most cases. I > don't feel so much like I'm tilting against windmills. I think where it makes sense is when you have something like a report server where the result sets may be huge, but the parellel load is load, i.e. 5 or 10 users tossing around 100 Meg or more at time. If you've got 5,000 users running queries that are indexed and won't be using that much memory each, then there's usually no advantage to going over a certain number of buffers, and that certain number may be as low as 1000 for some applications. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] RAID controllers etc... was: PostgreSQL data on a NAS device ?
On Tue, 2003-10-21 at 13:36, scott.marlowe wrote: > On 21 Oct 2003, Will LaShell wrote: > > > On Tue, 2003-10-21 at 08:40, scott.marlowe wrote: > > > > > So that brings up my question, which is better, the Perc4 or Perc3 > > > controllers, and what's the difference between them? I find Dell's > > > tendency to hide other people's hardware behind their own model numbers > > > mildly bothersome, as it makes it hard to comparison shop. > > > > Perc4 has n LSI 1030 chip > > http://docs.us.dell.com/docs/storage/perc4di/en/ug/features.htm > > > > > > Perc3 > > depending on the model can be a couple of things but I think they are > > all U160 controllers and not U320 > > Thanks. I googled around and found this page: > > http://www.domsch.com/linux/ > > Which says what each model is. It looks like the "RAID" controller they > wanna charge me for is about $500 or so, so I'm guessing it's the medium > range Elite 1600 type controller, i.e. U160, which is plenty for the > machine / drive number we'll be using. > > Has anyone played around with the latest ones to get a feel for them? I > want a battery backed controller that runs well under linux and also BSD > that isn't gonna break the bank. I'd heard bad stories about the > performance of the Adaptec RAID controllers, but it seems the newer ones > aren't bad from what I've found googling. We own 2 Elite 1650 and we love them. It would be nice to have had U320 capable controllers but the cards are completely reliable. I recommend the LSI controllers to everyone because I've never had a problem with them. signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Tuning for mid-size server
On Tue, Oct 21, 2003 at 03:11:17PM -0600, scott.marlowe wrote: > I think where it makes sense is when you have something like a report > server where the result sets may be huge, but the parellel load is load, > i.e. 5 or 10 users tossing around 100 Meg or more at time. In our case, we were noticing that truss showed an unbelievable amount of time spent by the postmaster doing open() calls to the OS (this was on Solaris 7). So we thought, "Let's try a 2G buffer size." 2G was more than enough to hold the entire data set under question. Once the buffer started to fill, even plain SELECTs started taking a long time. The buffer algorithm is just not that clever, was my conclusion. (Standard disclaimer: not a long, controlled test. It's just a bit of gossip.) A -- Andrew Sullivan 204-4141 Yonge Street Afilias CanadaToronto, Ontario Canada <[EMAIL PROTECTED]> M2P 2A8 +1 416 646 3304 x110 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Tuning for mid-size server
Scott, > I think where it makes sense is when you have something like a report > server where the result sets may be huge, but the parellel load is load, > i.e. 5 or 10 users tossing around 100 Meg or more at time. I've found that that question makes the difference between using 6% & 12% ... particularly large data transformations ... but not higher than that. And I've had ample opportunity to test on 2 reporting servers.For one thing, with very large reports one tends to have a lot of I/O binding, which is handled by the kernel. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] analyzing postgresql performance for dbt-2
I'm running our DBT-2 workload against PostgreSQL 7.3.4 and I'm having some trouble figuring out what I should be looking for when I'm trying to tune the database. I have results for a decent baseline, but when I try to increase the load on the database, the performance drops. Nothing in the graphs (in the links listed later) sticks out to me so I'm wondering if there are other database statitics I should try to collect. Any suggestions would be great and let me know if I can answer any other questions. Here are a pair of results where I just raise the load on the database, where increasing the load increases the area of the database touched in addition to increasing the transaction rate. The overall metric increases somewhat, but the response time for most of the interactions also increases significantly: http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline] - load of 100 warehouses - metric 1249.65 http://developer.osdl.org/markw/dbt2-pgsql/149/ - load of 140 warehouses - metric 1323.90 Both of these runs had wal_buffers set to 8, checkpoint_segments 200, and checkpoint_timeout 1800. So far I've only tried various wal_buffers and checkpoint_segments settings in the next set of results for a load of 140 warehouses. http://developer.osdl.org/markw/dbt2-pgsql/148/ - metric 1279.26 - wal_buffers 8 - checkpoint_segments 100 - checkpoint_timeout 300 http://developer.osdl.org/markw/dbt2-pgsql/149/ - metric 1323.90 - wal_buffers 8 - checkpoint_segments 200 - checkpoint_timeout 1800 http://developer.osdl.org/markw/dbt2-pgsql/150/ - metric 1281.13 - wal_buffers 8 - checkpoint_segments 300 - checkpoint_timeout 1800 http://developer.osdl.org/markw/dbt2-pgsql/151/ - metric 1311.99 - wal_buffers 32 - checkpoint_segments 200 - checkpoint_timeout 1800 http://developer.osdl.org/markw/dbt2-pgsql/152/ - metric 1268.37 - wal_buffers 64 - checkpoint_segments 200 - checkpoint_timeout 1800 http://developer.osdl.org/markw/dbt2-pgsql/154/ - metric 1314.62 - wal_buffers 16 - checkpoint_segments 200 - checkpoint_timeout 1800 Thanks! -- Mark Wong - - [EMAIL PROTECTED] Open Source Development Lab Inc - A non-profit corporation 12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005 (503) 626-2455 x 32 (office) (503) 626-2436 (fax) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] analyzing postgresql performance for dbt-2
[EMAIL PROTECTED] wrote: > I'm running our DBT-2 workload against PostgreSQL 7.3.4 and I'm having > some trouble figuring out what I should be looking for when I'm trying > to tune the database. I have results for a decent baseline, but when I > try to increase the load on the database, the performance drops. > Nothing in the graphs (in the links listed later) sticks out to me so > I'm wondering if there are other database statitics I should try to > collect. Any suggestions would be great and let me know if I can answer > any other questions. > > Here are a pair of results where I just raise the load on the > database, where increasing the load increases the area of the database > touched in addition to increasing the transaction rate. The overall > metric increases somewhat, but the response time for most of the > interactions also increases significantly: > > http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline] > - load of 100 warehouses > - metric 1249.65 > > http://developer.osdl.org/markw/dbt2-pgsql/149/ > - load of 140 warehouses > - metric 1323.90 I looked at these charts and they looked normal to me. It looked like your the load increased until your computer was saturated. Is there something I am missing? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] analyzing postgresql performance for dbt-2
On Tue, Oct 21, 2003 at 08:35:56PM -0400, Bruce Momjian wrote: > [EMAIL PROTECTED] wrote: > > I'm running our DBT-2 workload against PostgreSQL 7.3.4 and I'm having > > some trouble figuring out what I should be looking for when I'm trying > > to tune the database. I have results for a decent baseline, but when I > > try to increase the load on the database, the performance drops. > > Nothing in the graphs (in the links listed later) sticks out to me so > > I'm wondering if there are other database statitics I should try to > > collect. Any suggestions would be great and let me know if I can answer > > any other questions. > > > > Here are a pair of results where I just raise the load on the > > database, where increasing the load increases the area of the database > > touched in addition to increasing the transaction rate. The overall > > metric increases somewhat, but the response time for most of the > > interactions also increases significantly: > > > > http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline] > > - load of 100 warehouses > > - metric 1249.65 > > > > http://developer.osdl.org/markw/dbt2-pgsql/149/ > > - load of 140 warehouses > > - metric 1323.90 > > I looked at these charts and they looked normal to me. It looked like > your the load increased until your computer was saturated. Is there > something I am missing? I've run some i/o tests so I'm pretty sure I haven't saturated that. And it looks like I have almost 10% more processor time left. I do agree that it appears something might be saturated, I just don't know where to look... Thanks, Mark ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org