[PERFORM] plperl vs plpgsql
Hi, i am thinking about swiching to plperl as it seems to me much more flexible and easier to create functions. what is the recommended PL for postgres? or which one is most widely used / most popular? is there a performance difference between plpgsql and plperl ? porting to other systems is not a real issue as all my servers have perl installed. Thanks for any advice Alex ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] plperl vs plpgsql
Is there a performance difference between the two? which of the PL is most widely used. One problem i have with the plpgsql is that the quoting is really a pain. Christopher Browne wrote: After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Alex) belched out: i am thinking about swiching to plperl as it seems to me much more flexible and easier to create functions. what is the recommended PL for postgres? or which one is most widely used / most popular? is there a performance difference between plpgsql and plperl ? If what you're trying to do is "munge text," pl/perl will be a whole lot more suitable than pl/pgsql because it has a rich set of text mungeing tools and string functions which pl/pgsql lacks. If you intend to do a lot of work involving reading unmunged tuples from this table and that, pl/pgsql provides a much more natural syntax, and will probably be a bit faster as the query processor may even be able to expand some of the actions, rather than needing to treat Perl code as an "opaque blob." I would definitely be inclined to use the more natural language for the given task... ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Some Performance Advice Needed
Hi, i recently run pgbench against different servers and got some results I dont quite understand. A) EV1: Dual Xenon, 2GHz, 1GB Memory, SCSI 10Krpm, RHE3 B) Dual Pentium3 1.4ghz (Blade), SCSI Disk 10Krmp, 1GB Memory, Redhat 8 C) P4 3.2GHz, IDE 7.2Krpm, 1GBMem, Fedora Core2 All did run only postgres 7.4.6 pgconf settings: max_connections = 100 shared_buffers = 8192 sort_mem = 8192 vacuum_mem = 32768 max_fsm_pages = 20 max_fsm_relations = 1 wal_sync_method = fsync wal_buffers = 64 checkpoint_segments = 10 effective_cache_size = 65536 random_page_cost = 1.4 /etc/sysctl.conf shmall and shmmax set to 768mb Runnig PGbench reported A) 220 tps B) 240 tps C) 510 tps Running hdparm reported A) 920mb/s (SCSI 10k) B) 270mb/s (SCSI 10k) C) 1750mb/s (IDE 7.2k) What I dont quite understand is why a P3.2 is twice as fast as a Dual Xenon with SCSI disks, A dual Xenon 2GHz is not faster than a dual P3 1.4Ghz, and the hdparm results also dont make much sense. Has anybody an explanation for that? Is there something I can do to get more performance out of the SCSI disks? Thanks for any advise Alex ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] How can I make use of both CPUs in a dual processor machine
Hi, we just got a new dual processor machine and I wonder if there is a way to utilize both processors. Our DB server is basically fully dedicated to postgres. (its a dual amd with 4gb mem.) I have a batch job that periodically loads about 8 million records into a table. for this I drop the indices, truncate the table, use the copy to insert the data, recreate the indices (4 indices), vacuum the table. That is all done through a perl batch job. While I am doing this, I noticed that only one CPU is really used. So here are my questions: Is there a way to utilize both CPUs Is it possible to split up the import file and run 2 copy processes Is it possible to create 2 indices at the same time Would I actually gain anything from that, or is the bottleneck somewhere else ? (perl is a given here for the batch job) If anyone has some experience or ideas... any hints or help on this would be appreciated. Thanks Alex ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How can I make use of both CPUs in a dual processor
Thanks John. Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm SCSI Disks, 4GB of memory. Disks are pretty fast and memory should be more than enough. Currently we dont have many concurrent connections. I run PG 8.0.1 on Fedora Core 3 When I now run the batch job, one CPU runs in the 80-90% the other in 5-10% max. John A Meinel wrote: Alex wrote: Hi, we just got a new dual processor machine and I wonder if there is a way to utilize both processors. Our DB server is basically fully dedicated to postgres. (its a dual amd with 4gb mem.) I have a batch job that periodically loads about 8 million records into a table. for this I drop the indices, truncate the table, use the copy to insert the data, recreate the indices (4 indices), vacuum the table. That is all done through a perl batch job. While I am doing this, I noticed that only one CPU is really used. So here are my questions: Is there a way to utilize both CPUs For postgres, you get a max of 1 CPU per connection, so to use both, you need 2 CPU's. Is it possible to split up the import file and run 2 copy processes Is it possible to create 2 indices at the same time You'd want to be a little careful. Postgres uses work_mem for vacuum and index creation, so if you have 2 processes doing it, just make sure you aren't running out of RAM and going to swap. Would I actually gain anything from that, or is the bottleneck somewhere else ? More likely, the bottleneck would be disk I/O. Simply because it is almost always disk I/O. However, without knowing your configuration, how much CPU is used during the operation, etc, it's hard to say. (perl is a given here for the batch job) If anyone has some experience or ideas... any hints or help on this would be appreciated. Thanks Alex Sorry I wasn't a lot of help. You should probably post your postgres version, and more information about how much CPU load there is while your load is running. John =:-> ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How can I make use of both CPUs in a dual processor
Thanks for all the suggestions. It seems that creating indices, or even import data using a copy is easy to implement. I also have some jobs that create reports and want to try if I gain anything if i work reports in parallel. will give it a try in the next week and let you know the resuls. Alex John A Meinel wrote: Alex wrote: Thanks John. Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm SCSI Disks, 4GB of memory. Disks are pretty fast and memory should be more than enough. Currently we dont have many concurrent connections. Well, you didn't mention Opteron before (it makes a difference against Xeons). How many disks and in what configuration? Do you have pg_xlog on a separate set of disks? Are your drives in RAID 10 (0+1) or RAID 5? If you have enough disks the recommended configuration is at least a RAID1 for the OS, RAID 10 for pg_xlog (4drives), and RAID 10 (the rest of the drives) for the actual data. If your dataset is read heavy, or you have more than 6 disks, you can get away with RAID 5 for the actual data. But since you are talking about loading 8million rows at once, it certainly sounds like you are write heavy. If you only have a few disks, it's still probably better to put pg_xlog on it's own RAID1 (2-drive) mirror. pg_xlog is pretty much append only, so if you dedicate a disk set to it, you eliminate a lot of seek times. I run PG 8.0.1 on Fedora Core 3 When I now run the batch job, one CPU runs in the 80-90% the other in 5-10% max. Anyway, it doesn't completely sound like you are CPU limited, but you might be able to get a little bit more if you spawn another process. Have you tried dropping the index, doing the copy, and then recreating the 4-indexes in separate processes? The simple test for this is to open 3-4 psql connections, have one of them drop the indexes and do the copy, in the other connections you can already have typed "CREATE INDEX ..." so when the copy is done and committed to the database, you just go to the other terminals and hit enter. Unfortunately you'll have to use wall clock time to see if this is faster. Though I think you could do the same thing with a bash script. The authentication should be in "trust" mode so that you don't take the time to type your password. #!/bin/bash psql -h -c "DROP INDEX ...; COPY FROM ..." psql -h -c "CREATE INDEX ..." & psql -h -c "CREATE INDEX ..." & psql -h -c "CREATE INDEX ..." & psql -h -c "CREATE INDEX ..." Now, I don't really know how to wait for all child processes in a bash script (I could give you the python for it, but you're a perl guy). But by not spawning the last INDEX, I'm hoping it takes longer than the rest. Try to put the most difficult index there. Then you could just run time loadscript.sh I'm sure you could do the equivalent in perl. Just open multiple connections to the DB, and have them ready. I'm guessing since you are on a dual processor machine, you won't get much better performance above 2 connections. You can also try doing 2 COPYs at the same time, but it seems like you would have issues. Do you have any serial columns that you expect to be in a certain order, or is all the information in the copy? If the latter, try it, let us know what you get. I can't tell you the perl for this, since I'm not a perl guy. John =:-> ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Poor query performance
Forgot to add: postg...@ec2-75-101-128-4:~$ psql --version psql (PostgreSQL) 8.3.5 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Poor query performance
Below is a query that takes 16 seconds on the first run. I am having generally poor performance for queries in uncached areas of the data and often mediocre (500ms-2s+) performance generallly, although sometimes it's very fast. All the queries are pretty similar and use the indexes this way. I've been trying to tune this thing with little luck. There are about 1.5M records. It's using the index properly. Settings are: work_mem=20MB, shared_buffers=128MB, effective_cache_size=1024MB. I have run ANALYZE and VACUUM FULL recently. The whole database is around 16GB. The server is an ec2 instance with 5 compute units in two cores (1 unit is one 2Ghz processor) and 1.7Gb of RAM. Swapping seems to be minimal. Note that the ANALYZE is from my slow query logger, so the numbers don't match the time the uncached query took. There are 118K rows in this select. It is possible the sort is the issue, but that's why I have 20M working memory. Do I really need more than that? Slow query: (16.852746963501) [0] SELECT id FROM "source_listings" WHERE (post_time BETWEEN '2009-07-02 14:19:29.520886' AND '2009-07-11 14:19:29.520930' AND ((geo_lon BETWEEN 10879358 AND 10909241 AND geo_lat BETWEEN 13229080 AND 13242719)) AND city = 'boston') ORDER BY post_time DESC LIMIT 108 OFFSET 0 Limit (cost=30396.63..30396.90 rows=108 width=12) (actual time=1044.575..1044.764 rows=108 loops=1) -> Sort (cost=30396.63..30401.47 rows=1939 width=12) (actual time=1044.573..1044.630 rows=108 loops=1) Sort Key: post_time Sort Method: top-N heapsort Memory: 21kB -> Bitmap Heap Scan on source_listings (cost=23080.81..30321.44 rows=1939 width=12) (actual time=321.111..952.704 rows=118212 loops=1) Recheck Cond: ((city = 'boston'::text) AND (post_time >= '2009-07-02 14:19:29.520886'::timestamp without time zone) AND (post_time <= '2009-07-11 14:19:29.52093'::timestamp without time zone) AND (geo_lat >= 13229080) AND (geo_lat <= 13242719) AND (geo_lon >= 10879358) AND (geo_lon <= 10909241)) -> Bitmap Index Scan on sl_city_etc (cost=0.00..23080.33 rows=1939 width=0) (actual time=309.007..309.007 rows=118212 loops=1) Index Cond: ((city = 'boston'::text) AND (post_time >= '2009-07-02 14:19:29.520886'::timestamp without time zone) AND (post_time <= '2009-07-11 14:19:29.52093'::timestamp without time zone) AND (geo_lat >= 13229080) AND (geo_lat <= 13242719) AND (geo_lon >= 10879358) AND (geo_lon <= 10909241)) Total runtime: 1045.683 ms Even without the sort performance is poor: cribq=# EXPLAIN ANALYZE SELECT count(id) FROM "source_listings" WHERE (post_time BETWEEN '2009-07-02 14:19:29.520886' AND '2009-07-11 14:19:29.520930' AND ((geo_lon BETWEEN 10879358 AND 10909241 AND geo_lat BETWEEN 13229080 AND 13242719)) AND city = 'boston'); QUERY PLAN -- Aggregate (cost=30326.29..30326.30 rows=1 width=4) (actual time=847.967..847.968 rows=1 loops=1) -> Bitmap Heap Scan on source_listings (cost=23080.81..30321.44 rows=1939 width=4) (actual time=219.505..769.878 rows=118212 loops=1) Recheck Cond: ((city = 'boston'::text) AND (post_time >= '2009-07-02 14:19:29.520886'::timestamp without time zone) AND (post_time <= '2009-07-11 14:19:29.52093'::timestamp without time zone) AND (geo_lat >= 13229080) AND (geo_lat <= 13242719) AND (geo_lon >= 10879358) AND (geo_lon <= 10909241)) -> Bitmap Index Scan on sl_city_etc (cost=0.00..23080.33 rows=1939 width=0) (actual time=206.981..206.981 rows=118212 loops=1) Index Cond: ((city = 'boston'::text) AND (post_time >= '2009-07-02 14:19:29.520886'::timestamp without time zone) AND (post_time <= '2009-07-11 14:19:29.52093'::timestamp without time zone) AND (geo_lat >= 13229080) AND (geo_lat <= 13242719) AND (geo_lon >= 10879358) AND (geo_lon <= 10909241)) Total runtime: 848.816 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor query performance
> > How is the index sl_city_etc defined? > Index "public.sl_city_etc" Column|Type --+- city | text listing_type | text post_time| timestamp without time zone bedrooms | integer region | text geo_lat | integer geo_lon | integer btree, for table "public.source_listings" -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor query performance
Thanks. That's very helpful. I'll take your suggestions and see if things improve. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Strange memory behavior with rails - caching in connection?
I am using Postgres with Rails. Each rails application "thread" is actually a separate process (mongrel) with it's own connection. Normally, the db connection processes (?) look something like this in top: 15772 postgres 15 0 229m 13m 12m S0 0.8 0:00.09 postgres: db db [local] idle These quickly grow as the application is used to 50+ Mb per instance. When I restart mongrel (the Rails application processes) these go back down to their normal small size. That makes me suspect this is not normal caching and there is some sort of unhealthy leak going on. Is there something Rails could be doing to cause these to grow? Maybe the connection is not being cleaned up properly? Is there some sort of connection cache going on? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Memory usage of writer process
The writer process seems to be using inordinate amounts of memory: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 11088 postgres 13 -2 3217m 2.9g 2.9g S0 38.7 0:10.46 postgres: writer process 20190 postgres 13 -2 3219m 71m 68m S0 0.9 0:52.48 postgres: cribq cribq [local] idle I am writing moderately large (~3k) records to my database a few times a second. Even when I stop doing that, the process continues to take up all of that memory. Am I reading this right? Why is it using so much memory? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sustained inserts per sec ... ?
1250/sec with record size average is 26 bytes 800/sec with record size average is 48 bytes. 250/sec with record size average is 618 bytes. Data from pg_stats and our own job monitoring System has four partitions, two raid 1s, a four disk RAID 10 and a six disk RAID 10. pg_xlog is on four disk RAID 10, database is on RAID 10. Data is very spread out because database turnover time is very high, so our performance is about double this with a fresh DB. (the data half life is probably measurable in days or weeks). Alex Turner netEconomist On Apr 1, 2005 1:06 PM, Marc G. Fournier <[EMAIL PROTECTED]> wrote: > > Just curious, but does anyone have an idea of what we are capable of? I > realize that size of record would affect things, as well as hardware, but > if anyone has some ideas on max, with 'record size', that would be > appreciated ... > > Thanks ... > > > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Sustained inserts per sec ... ?
Oh - this is with a seperate transaction per command. fsync is on. Alex Turner netEconomist On Apr 1, 2005 4:17 PM, Alex Turner <[EMAIL PROTECTED]> wrote: > 1250/sec with record size average is 26 bytes > 800/sec with record size average is 48 bytes. > 250/sec with record size average is 618 bytes. > > Data from pg_stats and our own job monitoring > > System has four partitions, two raid 1s, a four disk RAID 10 and a six > disk RAID 10. > pg_xlog is on four disk RAID 10, database is on RAID 10. > > Data is very spread out because database turnover time is very high, > so our performance is about double this with a fresh DB. (the data > half life is probably measurable in days or weeks). > > Alex Turner > netEconomist > > On Apr 1, 2005 1:06 PM, Marc G. Fournier <[EMAIL PROTECTED]> wrote: > > > > Just curious, but does anyone have an idea of what we are capable of? I > > realize that size of record would affect things, as well as hardware, but > > if anyone has some ideas on max, with 'record size', that would be > > appreciated ... > > > > Thanks ... > > > > > > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > > Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: > > 7615664 > > > > ---(end of broadcast)--- > > TIP 8: explain analyze is your friend > > > ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to improve db performance with $7K?
To be honest, I've yet to run across a SCSI configuration that can touch the 3ware SATA controllers. I have yet to see one top 80MB/sec, let alone 180MB/sec read or write, which is why we moved _away_ from SCSI. I've seen Compaq, Dell and LSI controllers all do pathetically badly on RAID 1, RAID 5 and RAID 10. 35MB/sec for a three drive RAID 0 is not bad, it's appalling. The hardware manufacturer should be publicly embarassed for this kind of speed. A single U320 10k drive can do close to 70MB/sec sustained. If someone can offer benchmarks to the contrary (particularly in linux), I would be greatly interested. Alex Turner netEconomist On Mar 29, 2005 8:17 AM, Dave Cramer <[EMAIL PROTECTED]> wrote: > Yeah, 35Mb per sec is slow for a raid controller, the 3ware mirrored is > about 50Mb/sec, and striped is about 100 > > Dave > > PFC wrote: > > > > >> With hardware tuning, I am sure we can do better than 35Mb per sec. Also > > > > > > WTF ? > > > > My Laptop does 19 MB/s (reading <10 KB files, reiser4) ! > > > > A recent desktop 7200rpm IDE drive > > # hdparm -t /dev/hdc1 > > /dev/hdc1: > > Timing buffered disk reads: 148 MB in 3.02 seconds = 49.01 MB/sec > > > > # ll "DragonBall 001.avi" > > -r--r--r--1 peufeu users218M mar 9 20:07 DragonBall > > 001.avi > > > > # time cat "DragonBall 001.avi" >/dev/null > > real0m4.162s > > user0m0.020s > > sys 0m0.510s > > > > (the file was not in the cache) > > => about 52 MB/s (reiser3.6) > > > > So, you have a problem with your hardware... > > > > ---(end of broadcast)--- > > TIP 7: don't forget to increase your free space map settings > > > > > > -- > Dave Cramer > http://www.postgresintl.com > 519 939 0336 > ICQ#14675561 > > ---(end of broadcast)--- > TIP 6: Have you searched our list archives? > >http://archives.postgresql.org > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Sustained inserts per sec ... ?
Yup, Battery backed, cache enabled. 6 drive RAID 10, and 4 drive RAID 10, and 2xRAID 1. It's a 3ware 9500S-8MI - not bad for $450 plus BBU. Alex Turner netEconomist On Apr 1, 2005 6:03 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > Alex Turner <[EMAIL PROTECTED]> writes: > > On Apr 1, 2005 4:17 PM, Alex Turner <[EMAIL PROTECTED]> wrote: > >> 1250/sec with record size average is 26 bytes > >> 800/sec with record size average is 48 bytes. > >> 250/sec with record size average is 618 bytes. > > > Oh - this is with a seperate transaction per command. > > fsync is on. > > [ raised eyebrow... ] What kind of disk hardware is that exactly, and > does it have write cache enabled? It's hard to believe those numbers > if not. > > Write caching is fine if it's done in a battery-backed cache, which you > can get in the higher-end hardware RAID controllers. Otherwise you're > going to have problems whenever the power goes away unexpectedly. > > regards, tom lane > ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to improve db performance with $7K?
I'm no drive expert, but it seems to me that our write performance is excellent. I think what most are concerned about is OLTP where you are doing heavy write _and_ heavy read performance at the same time. Our system is mostly read during the day, but we do a full system update everynight that is all writes, and it's very fast compared to the smaller SCSI system we moved off of. Nearly a 6x spead improvement, as fast as 900 rows/sec with a 48 byte record, one row per transaction. I don't know enough about how SATA works to really comment on it's performance as a protocol compared with SCSI. If anyone has a usefull link on that, it would be greatly appreciated. More drives will give more throughput/sec, but not necesarily more transactions/sec. For that you will need more RAM on the controler, and defaintely a BBU to keep your data safe. Alex Turner netEconomist On Apr 4, 2005 10:39 AM, Steve Poe <[EMAIL PROTECTED]> wrote: > > > Alex Turner wrote: > > >To be honest, I've yet to run across a SCSI configuration that can > >touch the 3ware SATA controllers. I have yet to see one top 80MB/sec, > >let alone 180MB/sec read or write, which is why we moved _away_ from > >SCSI. I've seen Compaq, Dell and LSI controllers all do pathetically > >badly on RAID 1, RAID 5 and RAID 10. > > > > > Alex, > > How does the 3ware controller do in heavy writes back to the database? > It may have been Josh, but someone said that SATA does well with reads > but not writes. Would not equal amount of SCSI drives outperform SATA? > I don't want to start a "whose better" war, I am just trying to learn > here. It would seem the more drives you could place in a RAID > configuration, the performance would increase. > > Steve Poe > > ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How to improve db performance with $7K?
I'm doing some research on SATA vs SCSI right now, but to be honest I'm not turning up much at the protocol level. Alot of stupid benchmarks comparing 10k Raptor drives against Top of the line 15k drives, where usnurprsingly the SCSI drives win but of course cost 4 times as much. Although even in some, SATA wins, or draws. I'm trying to find something more apples to apples. 10k to 10k. Alex Turner netEconomist On Apr 4, 2005 3:23 PM, Vivek Khera <[EMAIL PROTECTED]> wrote: > > On Apr 4, 2005, at 3:12 PM, Alex Turner wrote: > > > Our system is mostly read during the day, but we do a full system > > update everynight that is all writes, and it's very fast compared to > > the smaller SCSI system we moved off of. Nearly a 6x spead > > improvement, as fast as 900 rows/sec with a 48 byte record, one row > > per transaction. > > > > Well, if you're not heavily multitasking, the advantage of SCSI is lost > on you. > > Vivek Khera, Ph.D. > +1-301-869-4449 x806 > > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to improve db performance with $7K?
It's hardly the same money, the drives are twice as much. It's all about the controller baby with any kind of dive. A bad SCSI controller will give sucky performance too, believe me. We had a Compaq Smart Array 5304, and it's performance was _very_ sub par. If someone has a simple benchmark test database to run, I would be happy to run it on our hardware here. Alex Turner On Apr 6, 2005 3:30 AM, William Yu <[EMAIL PROTECTED]> wrote: > Alex Turner wrote: > > I'm no drive expert, but it seems to me that our write performance is > > excellent. I think what most are concerned about is OLTP where you > > are doing heavy write _and_ heavy read performance at the same time. > > > > Our system is mostly read during the day, but we do a full system > > update everynight that is all writes, and it's very fast compared to > > the smaller SCSI system we moved off of. Nearly a 6x spead > > improvement, as fast as 900 rows/sec with a 48 byte record, one row > > per transaction. > > I've started with SATA in a multi-read/multi-write environment. While it > ran pretty good with 1 thread writing, the addition of a 2nd thread > (whether reading or writing) would cause exponential slowdowns. > > I suffered through this for a week and then switched to SCSI. Single > threaded performance was pretty similar but with the advanced command > queueing SCSI has, I was able to do multiple reads/writes simultaneously > with only a small performance hit for each thread. > > Perhaps having a SATA caching raid controller might help this situation. > I don't know. It's pretty hard justifying buying a $$$ 3ware controller > just to test it when you could spend the same money on SCSI and have a > guarantee it'll work good under multi-IO scenarios. > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application ?
I think everyone was scared off by the 5000 inserts per second number. I've never seen even Oracle do this on a top end Dell system with copious SCSI attached storage. Alex Turner netEconomist On Apr 6, 2005 3:17 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > Unfortunately. > > But we are in the the process to choose Postgresql with pgcluster. I'm > currently running some tests (performance, stability...) > Save the money on the license fees, you get it for your hardware ;-) > > I still welcome any advices or comments and I'll let you know how the > project is going on. > > Benjamin. > > > > "Mohan, Ross" <[EMAIL PROTECTED]> > > 05/04/2005 20:48 > > Pour :<[EMAIL PROTECTED]> > cc : > Objet :RE: [PERFORM] Postgresql vs SQLserver for this > application ? > > > You never got answers on this? Apologies, I don't have one, but'd be curious > to hear about any you did get > > thx > > Ross > > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf > Of [EMAIL PROTECTED] > Sent: Monday, April 04, 2005 4:02 AM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] Postgresql vs SQLserver for this application ? > > > hi all. > > We are designing a quite big application that requires a high-performance > database backend. > The rates we need to obtain are at least 5000 inserts per second and 15 > selects per second for one connection. There should only be 3 or 4 > simultaneous connections. > I think our main concern is to deal with the constant flow of data coming > from the inserts that must be available for selection as fast as possible. > (kind of real time access ...) > > As a consequence, the database should rapidly increase up to more than one > hundred gigs. We still have to determine how and when we shoud backup old > data to prevent the application from a performance drop. We intend to > develop some kind of real-time partionning on our main table keep the flows > up. > > At first, we were planning to use SQL Server as it has features that in my > opinion could help us a lot : > - replication > - clustering > > Recently we started to study Postgresql as a solution for our project : > - it also has replication > - Postgis module can handle geographic datatypes (which would > facilitate our developments) > - We do have a strong knowledge on Postgresql administration (we use > it for production processes) > - it is free (!) and we could save money for hardware purchase. > > Is SQL server clustering a real asset ? How reliable are Postgresql > replication tools ? Should I trust Postgresql performance for this kind of > needs ? > > My question is a bit fuzzy but any advices are most welcome... > hardware,tuning or design tips as well :)) > > Thanks a lot. > > Benjamin. > > > ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: RE : RE: [PERFORM] Postgresql vs SQLserver for thisapplication ?
I guess I was thinking more in the range of 5000 transaction/sec, less so 5000 rows on bulk import... Alex On Apr 6, 2005 12:47 PM, Mohan, Ross <[EMAIL PROTECTED]> wrote: > > > 31Million tuples were loaded in approx 279 seconds, or approx 112k rows per > second. > > > I'd love to see PG get into this range..i am a big fan of PG (just a > > rank newbie) but I gotta think the underlying code to do this has to > > be not-too-complex. > > I'd say we're there. > > ||Yes! PG is there, assuredly! So VERY cool! I made a > newbie > error of conflating COPY with INSERT. I don't know if I could get > oracle to do much more than about 500-1500 rows/sec...PG is quite > impressive. > > Makes one wonder why corporations positively insist on giving oracle > yearly. > > -Original Message- > From: Rod Taylor [mailto:[EMAIL PROTECTED] > Sent: Wednesday, April 06, 2005 12:41 PM > To: Mohan, Ross > Cc: pgsql-performance@postgresql.org > Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for thisapplication ? > > On Wed, 2005-04-06 at 16:12 +, Mohan, Ross wrote: > > I wish I had a Dell system and run case to show you Alex, but I > > don't... however...using Oracle's "direct path" feature, it's pretty > > straightforward. > > > > We've done 110,000 rows per second into index-less tables on a big > > system (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a > > second. Sustained for almost 9 minutes. ) > > Just for kicks I did a local test on a desktop machine (single CPU, single > IDE drive) using COPY from STDIN for a set of integers in via a single > transaction, no indexes. > > 1572864 tuples were loaded in 13715.613ms, which is approx 115k rows per > second. > > Okay, no checkpoints and I didn't cross an index boundary, but I also haven't > tuned the config file beyond bumping up the buffers. > > Lets try again with more data this time. > > 31Million tuples were loaded in approx 279 seconds, or approx 112k rows per > second. > > > I'd love to see PG get into this range..i am a big fan of PG (just a > > rank newbie) but I gotta think the underlying code to do this has to > > be not-too-complex. > > I'd say we're there. > > > -Original Message- > > From: Alex Turner [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, April 06, 2005 11:38 AM > > To: [EMAIL PROTECTED] > > Cc: pgsql-performance@postgresql.org; Mohan, Ross > > Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this > > application ? > > > > > > I think everyone was scared off by the 5000 inserts per second number. > > > > I've never seen even Oracle do this on a top end Dell system with > > copious SCSI attached storage. > > > > Alex Turner > > netEconomist > > > > On Apr 6, 2005 3:17 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > > > > Unfortunately. > > > > > > But we are in the the process to choose Postgresql with pgcluster. > > > I'm > > > currently running some tests (performance, stability...) Save the > > > money on the license fees, you get it for your hardware ;-) > > > > > > I still welcome any advices or comments and I'll let you know how > > > the > > > project is going on. > > > > > > Benjamin. > > > > > > > > > > > > "Mohan, Ross" <[EMAIL PROTECTED]> > > > > > > 05/04/2005 20:48 > > > > > > Pour :<[EMAIL PROTECTED]> > > > cc : > > > Objet :RE: [PERFORM] Postgresql vs SQLserver for this > > > application ? > > > > > > > > > You never got answers on this? Apologies, I don't have one, but'd be > > > curious to hear about any you did get > > > > > > thx > > > > > > Ross > > > > > > -Original Message- > > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of > > > [EMAIL PROTECTED] > > > Sent: Monday, April 04, 2005 4:02 AM > > > To: pgsql-performance@postgresql.org > > > Subject: [PERFORM] Postgresql vs SQLserver for this application ? > > > > > > > > > hi all. > > > > > > We are designing a quite big application that requires a > > > high-performance database backend. The rates we need to obtain are
Re: [PERFORM] Réf
I think his point was that 9 * 4 != 2400 Alex Turner netEconomist On Apr 6, 2005 2:23 PM, Rod Taylor <[EMAIL PROTECTED]> wrote: > On Wed, 2005-04-06 at 19:42 +0200, Steinar H. Gunderson wrote: > > On Wed, Apr 06, 2005 at 01:18:29PM -0400, Rod Taylor wrote: > > > Yeah, I think that can be done provided there is more than one worker. > > > My limit seems to be about 1000 transactions per second each with a > > > single insert for a single process (round trip time down the Fibre > > > Channel is large) but running 4 simultaneously only drops throughput to > > > about 900 per process (total of 2400 transactions per second) and the > > > machine still seemed to have lots of oomph to spare. > > > > Erm, have I missed something here? 900 * 4 = 2400? > > Nope. You've not missed anything. > > If I ran 10 processes and the requirement would be met. > -- > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster > ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to improve db performance with $7K?
Well - unfortuantely software RAID isn't appropriate for everyone, and some of us need a hardware RAID controller. The LSI Megaraid 320-2 card is almost exactly the same price as the 3ware 9500S-12 card (although I will conceed that a 320-2 card can handle at most 2x14 devices compare with the 12 on the 9500S). If someone can come up with a test, I will be happy to run it and see how it goes. I would be _very_ interested in the results having just spent $7k on a new DB server!! I have also seen really bad performance out of SATA. It was with either an on-board controller, or a cheap RAID controller from HighPoint. As soon as I put in a decent controller, things went much better. I think it's unfair to base your opinion of SATA from a test that had a poor controler. I know I'm not the only one here running SATA RAID and being very satisfied with the results. Thanks, Alex Turner netEconomist On Apr 6, 2005 4:01 PM, William Yu <[EMAIL PROTECTED]> wrote: > It's the same money if you factor in the 3ware controller. Even without > a caching controller, SCSI works good in multi-threaded IO (not > withstanding crappy shit from Dell or Compaq). You can get such cards > from LSI for $75. And of course, many server MBs come with LSI > controllers built-in. Our older 32-bit production servers all use Linux > software RAID w/ SCSI and there's no issues when multiple > users/processes hit the DB. > > *Maybe* a 3ware controller w/ onboard cache + battery backup might do > much better for multi-threaded IO than just plain-jane SATA. > Unfortunately, I have not been able to find anything online that can > confirm or deny this. Hence, the choice is spend $$$ on the 3ware > controller and hope it meets your needs -- or spend $$$ on SCSI drives > and be sure. > > Now if you want to run such tests, we'd all be delighted with to see the > results so we have another option for building servers. > > > Alex Turner wrote: > > It's hardly the same money, the drives are twice as much. > > > > It's all about the controller baby with any kind of dive. A bad SCSI > > controller will give sucky performance too, believe me. We had a > > Compaq Smart Array 5304, and it's performance was _very_ sub par. > > > > If someone has a simple benchmark test database to run, I would be > > happy to run it on our hardware here. > > > > Alex Turner > > > > On Apr 6, 2005 3:30 AM, William Yu <[EMAIL PROTECTED]> wrote: > > > >>Alex Turner wrote: > >> > >>>I'm no drive expert, but it seems to me that our write performance is > >>>excellent. I think what most are concerned about is OLTP where you > >>>are doing heavy write _and_ heavy read performance at the same time. > >>> > >>>Our system is mostly read during the day, but we do a full system > >>>update everynight that is all writes, and it's very fast compared to > >>>the smaller SCSI system we moved off of. Nearly a 6x spead > >>>improvement, as fast as 900 rows/sec with a 48 byte record, one row > >>>per transaction. > >> > >>I've started with SATA in a multi-read/multi-write environment. While it > >>ran pretty good with 1 thread writing, the addition of a 2nd thread > >>(whether reading or writing) would cause exponential slowdowns. > >> > >>I suffered through this for a week and then switched to SCSI. Single > >>threaded performance was pretty similar but with the advanced command > >>queueing SCSI has, I was able to do multiple reads/writes simultaneously > >>with only a small performance hit for each thread. > >> > >>Perhaps having a SATA caching raid controller might help this situation. > >>I don't know. It's pretty hard justifying buying a $$$ 3ware controller > >>just to test it when you could spend the same money on SCSI and have a > >>guarantee it'll work good under multi-IO scenarios. > >> > >>---(end of broadcast)--- > >>TIP 8: explain analyze is your friend > >> > > > > > > ---(end of broadcast)--- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > > > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] > ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to improve db performance with $7K?
I guess I'm setting myself up here, and I'm really not being ignorant, but can someone explain exactly how is SCSI is supposed to better than SATA? Both systems use drives with platters. Each drive can physically only read one thing at a time. SATA gives each drive it's own channel, but you have to share in SCSI. A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but SCSI can only do 320MB/sec across the entire array. What am I missing here? Alex Turner netEconomist On Apr 6, 2005 5:41 PM, Jim C. Nasby <[EMAIL PROTECTED]> wrote: > Sorry if I'm pointing out the obvious here, but it seems worth > mentioning. AFAIK all 3ware controllers are setup so that each SATA > drive gets it's own SATA bus. My understanding is that by and large, > SATA still suffers from a general inability to have multiple outstanding > commands on the bus at once, unlike SCSI. Therefore, to get good > performance out of SATA you need to have a seperate bus for each drive. > Theoretically, it shouldn't really matter that it's SATA over ATA, other > than I certainly wouldn't want to try and cram 8 ATA cables into a > machine... > > Incidentally, when we were investigating storage options at a previous > job we talked to someone who deals with RS/6000 storage. He had a bunch > of info about their serial controller protocol (which I can't think of > the name of) vs SCSI. SCSI had a lot more overhead, so you could end up > saturating even a 160MB SCSI bus with only 2 or 3 drives. > > People are finally realizing how important bandwidth has become in > modern machines. Memory bandwidth is why RS/6000 was (and maybe still > is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of > the water. Likewise it's why SCSI is so much better than IDE (unless you > just give each drive it's own dedicated bandwidth). > -- > Jim C. Nasby, Database Consultant [EMAIL PROTECTED] > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How to improve db performance with $7K?
Ok - so I found this fairly good online review of various SATA cards out there, with 3ware not doing too hot on RAID 5, but ok on RAID 10. http://www.tweakers.net/reviews/557/ Very interesting stuff. Alex Turner netEconomist On Apr 6, 2005 7:32 PM, Alex Turner <[EMAIL PROTECTED]> wrote: > I guess I'm setting myself up here, and I'm really not being ignorant, > but can someone explain exactly how is SCSI is supposed to better than > SATA? > > Both systems use drives with platters. Each drive can physically only > read one thing at a time. > > SATA gives each drive it's own channel, but you have to share in SCSI. > A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but > SCSI can only do 320MB/sec across the entire array. > > What am I missing here? > > Alex Turner > netEconomist > > On Apr 6, 2005 5:41 PM, Jim C. Nasby <[EMAIL PROTECTED]> wrote: > > Sorry if I'm pointing out the obvious here, but it seems worth > > mentioning. AFAIK all 3ware controllers are setup so that each SATA > > drive gets it's own SATA bus. My understanding is that by and large, > > SATA still suffers from a general inability to have multiple outstanding > > commands on the bus at once, unlike SCSI. Therefore, to get good > > performance out of SATA you need to have a seperate bus for each drive. > > Theoretically, it shouldn't really matter that it's SATA over ATA, other > > than I certainly wouldn't want to try and cram 8 ATA cables into a > > machine... > > > > Incidentally, when we were investigating storage options at a previous > > job we talked to someone who deals with RS/6000 storage. He had a bunch > > of info about their serial controller protocol (which I can't think of > > the name of) vs SCSI. SCSI had a lot more overhead, so you could end up > > saturating even a 160MB SCSI bus with only 2 or 3 drives. > > > > People are finally realizing how important bandwidth has become in > > modern machines. Memory bandwidth is why RS/6000 was (and maybe still > > is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of > > the water. Likewise it's why SCSI is so much better than IDE (unless you > > just give each drive it's own dedicated bandwidth). > > -- > > Jim C. Nasby, Database Consultant [EMAIL PROTECTED] > > Give your computer some brain candy! www.distributed.net Team #1828 > > > > Windows: "Where do you want to go today?" > > Linux: "Where do you want to go tomorrow?" > > FreeBSD: "Are you guys coming, or what?" > > > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] How to improve db performance with $7K?
Ok - I take it back - I'm reading through this now, and realising that the reviews are pretty clueless in several places... On Apr 6, 2005 8:12 PM, Alex Turner <[EMAIL PROTECTED]> wrote: > Ok - so I found this fairly good online review of various SATA cards > out there, with 3ware not doing too hot on RAID 5, but ok on RAID 10. > > http://www.tweakers.net/reviews/557/ > > Very interesting stuff. > > Alex Turner > netEconomist > > On Apr 6, 2005 7:32 PM, Alex Turner <[EMAIL PROTECTED]> wrote: > > I guess I'm setting myself up here, and I'm really not being ignorant, > > but can someone explain exactly how is SCSI is supposed to better than > > SATA? > > > > Both systems use drives with platters. Each drive can physically only > > read one thing at a time. > > > > SATA gives each drive it's own channel, but you have to share in SCSI. > > A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but > > SCSI can only do 320MB/sec across the entire array. > > > > What am I missing here? > > > > Alex Turner > > netEconomist > > > > On Apr 6, 2005 5:41 PM, Jim C. Nasby <[EMAIL PROTECTED]> wrote: > > > Sorry if I'm pointing out the obvious here, but it seems worth > > > mentioning. AFAIK all 3ware controllers are setup so that each SATA > > > drive gets it's own SATA bus. My understanding is that by and large, > > > SATA still suffers from a general inability to have multiple outstanding > > > commands on the bus at once, unlike SCSI. Therefore, to get good > > > performance out of SATA you need to have a seperate bus for each drive. > > > Theoretically, it shouldn't really matter that it's SATA over ATA, other > > > than I certainly wouldn't want to try and cram 8 ATA cables into a > > > machine... > > > > > > Incidentally, when we were investigating storage options at a previous > > > job we talked to someone who deals with RS/6000 storage. He had a bunch > > > of info about their serial controller protocol (which I can't think of > > > the name of) vs SCSI. SCSI had a lot more overhead, so you could end up > > > saturating even a 160MB SCSI bus with only 2 or 3 drives. > > > > > > People are finally realizing how important bandwidth has become in > > > modern machines. Memory bandwidth is why RS/6000 was (and maybe still > > > is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of > > > the water. Likewise it's why SCSI is so much better than IDE (unless you > > > just give each drive it's own dedicated bandwidth). > > > -- > > > Jim C. Nasby, Database Consultant [EMAIL PROTECTED] > > > Give your computer some brain candy! www.distributed.net Team #1828 > > > > > > Windows: "Where do you want to go today?" > > > Linux: "Where do you want to go tomorrow?" > > > FreeBSD: "Are you guys coming, or what?" > > > > > > ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to improve db performance with $7K?
Yeah - the more reading I'm doing - the more I'm finding out. Alledgelly the Western Digial Raptor drives implement a version of ATA-4 Tagged Queing which allows reordering of commands. Some controllers support this. The 3ware docs say that the controller support both reordering on the controller and to the drive. *shrug* This of course is all supposed to go away with SATA II which as NCQ, Native Command Queueing. Of course the 3ware controllers don't support SATA II, but a few other do, and I'm sure 3ware will come out with a controller that does. Alex Turner netEconomist On 06 Apr 2005 23:00:54 -0400, Greg Stark <[EMAIL PROTECTED]> wrote: > > Alex Turner <[EMAIL PROTECTED]> writes: > > > SATA gives each drive it's own channel, but you have to share in SCSI. > > A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but > > SCSI can only do 320MB/sec across the entire array. > > SCSI controllers often have separate channels for each device too. > > In any case the issue with the IDE protocol is that fundamentally you can only > have a single command pending. SCSI can have many commands pending. This is > especially important for a database like postgres that may be busy committing > one transaction while another is trying to read. Having several commands > queued on the drive gives it a chance to execute any that are "on the way" to > the committing transaction. > > However I'm under the impression that 3ware has largely solved this problem. > Also, if you save a few dollars and can afford one additional drive that > additional drive may improve your array speed enough to overcome that > inefficiency. > > -- > greg > > ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to improve db performance with $7K?
Based on the reading I'm doing, and somebody please correct me if I'm wrong, it seems that SCSI drives contain an on disk controller that has to process the tagged queue. SATA-I doesn't have this. This additional controller, is basicaly an on board computer that figures out the best order in which to process commands. I believe you are also paying for the increased tolerance that generates a better speed. If you compare an 80Gig 7200RPM IDE drive to a WD Raptor 76G 10k RPM to a Seagate 10k.6 drive to a Seagate Cheatah 15k drive, each one represents a step up in parts and technology, thereby generating a cost increase (at least thats what the manufactures tell us). I know if you ever held a 15k drive in your hand, you can notice a considerable weight difference between it and a 7200RPM IDE drive. Alex Turner netEconomist On Apr 7, 2005 11:37 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Another simple question: Why is SCSI more expensive? After the > eleventy-millionth controller is made, it seems like SCSI and SATA are > using a controller board and a spinning disk. Is somebody still making > money by licensing SCSI technology? > > Rick > > [EMAIL PROTECTED] wrote on 04/06/2005 11:58:33 PM: > > > You asked for it! ;-) > > > > If you want cheap, get SATA. If you want fast under > > *load* conditions, get SCSI. Everything else at this > > time is marketing hype, either intentional or learned. > > Ignoring dollars, expect to see SCSI beat SATA by 40%. > > > > * * * What I tell you three times is true * * * > > > > Also, compare the warranty you get with any SATA > > drive with any SCSI drive. Yes, you still have some > > change leftover to buy more SATA drives when they > > fail, but... it fundamentally comes down to some > > actual implementation and not what is printed on > > the cardboard box. Disk systems are bound by the > > rules of queueing theory. You can hit the sales rep > > over the head with your queueing theory book. > > > > Ultra320 SCSI is king of the hill for high concurrency > > databases. If you're only streaming or serving files, > > save some money and get a bunch of SATA drives. > > But if you're reading/writing all over the disk, the > > simple first-come-first-serve SATA heuristic will > > hose your performance under load conditions. > > > > Next year, they will *try* bring out some SATA cards > > that improve on first-come-first-serve, but they ain't > > here now. There are a lot of rigged performance tests > > out there... Maybe by the time they fix the queueing > > problems, serial Attached SCSI (a/k/a SAS) will be out. > > Looks like Ultra320 is the end of the line for parallel > > SCSI, as Ultra640 SCSI (a/k/a SPI-5) is dead in the > > water. > > > > Ultra320 SCSI. > > Ultra320 SCSI. > > Ultra320 SCSI. > > > > Serial Attached SCSI. > > Serial Attached SCSI. > > Serial Attached SCSI. > > > > For future trends, see: > > http://www.incits.org/archive/2003/in031163/in031163.htm > > > > douglas > > > > p.s. For extra credit, try comparing SATA and SCSI drives > > when they're 90% full. > > > > On Apr 6, 2005, at 8:32 PM, Alex Turner wrote: > > > > > I guess I'm setting myself up here, and I'm really not being ignorant, > > > but can someone explain exactly how is SCSI is supposed to better than > > > SATA? > > > > > > Both systems use drives with platters. Each drive can physically only > > > read one thing at a time. > > > > > > SATA gives each drive it's own channel, but you have to share in SCSI. > > > A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but > > > SCSI can only do 320MB/sec across the entire array. > > > > > > What am I missing here? > > > > > > Alex Turner > > > netEconomist > > > > > > ---(end of broadcast)--- > > TIP 9: the planner will ignore your desire to choose an index scan if > your > > joining column's datatypes do not match > > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] profiling postgresql queries?
Speaking of triggers... Is there any plan to speed up plpgsql tiggers? Fairly simple crosstable insert triggers seem to slow my inserts to a crawl. Is the best thing just to write triggers in C (I really don't want to put this stuff in the application logic because it really doesn't belong there). Alex Turner netEconomist On Apr 12, 2005 10:10 AM, Tom Lane <[EMAIL PROTECTED]> wrote: > hubert lubaczewski <[EMAIL PROTECTED]> writes: > > and it made me wonder - is there a way to tell how much time of backend > > was spent on triggers, index updates and so on? > > In CVS tip, EXPLAIN ANALYZE will break out the time spent in each > trigger. This is not in any released version, but if you're desperate > you could load up a play server with your data and test. > > regards, tom lane > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Intel SRCS16 SATA raid?
I have read a large chunk of this, and I would highly recommend it to anyone who has been participating in the drive discussions. It is most informative!! Alex Turner netEconomist On 4/14/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Greg, > > I posted this link under a different thread (the $7k server thread). It is > a very good read on why SCSI is better for servers than ATA. I didn't note > bias, though it is from a drive manufacturer. YMMV. There is an > interesting, though dated appendix on different manufacturers' drive > characteristics. > > http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf > > Enjoy, > > Rick > > [EMAIL PROTECTED] wrote on 04/14/2005 09:54:45 AM: > > > > > Our vendor is trying to sell us on an Intel SRCS16 SATA raid controller > > instead of the 3ware one. > > > > Poking around it seems this does come with Linux drivers and there is a > > battery backup option. So it doesn't seem to be completely insane. > > > > Anyone have any experience with these controllers? > > > > I'm also wondering about whether I'm better off with one of these SATA > raid > > controllers or just going with SCSI drives. > > > > -- > > greg > > > > > > ---(end of broadcast)--- > > TIP 8: explain analyze is your friend > > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Intel SRCS16 SATA raid?
I have put together a little head to head performance of a 15k SCSI, 10k SCSI 10K SATA w/TCQ, 10K SATA wo/TCQ and 7.2K SATA drive comparison at storage review http://www.storagereview.com/php/benchmark/compare_rtg_2001.php?typeID=10&testbedID=3&osID=4&raidconfigID=1&numDrives=1&devID_0=232&devID_1=40&devID_2=259&devID_3=267&devID_4=261&devID_5=248&devCnt=6 It does illustrate some of the weaknesses of SATA drives, but all in all the Raptor drives put on a good show. Alex Turner netEconomist On 4/14/05, Alex Turner <[EMAIL PROTECTED]> wrote: > I have read a large chunk of this, and I would highly recommend it to > anyone who has been participating in the drive discussions. It is > most informative!! > > Alex Turner > netEconomist > > On 4/14/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > Greg, > > > > I posted this link under a different thread (the $7k server thread). It is > > a very good read on why SCSI is better for servers than ATA. I didn't note > > bias, though it is from a drive manufacturer. YMMV. There is an > > interesting, though dated appendix on different manufacturers' drive > > characteristics. > > > > http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf > > > > Enjoy, > > > > Rick > > > > [EMAIL PROTECTED] wrote on 04/14/2005 09:54:45 AM: > > > > > > > > Our vendor is trying to sell us on an Intel SRCS16 SATA raid controller > > > instead of the 3ware one. > > > > > > Poking around it seems this does come with Linux drivers and there is a > > > battery backup option. So it doesn't seem to be completely insane. > > > > > > Anyone have any experience with these controllers? > > > > > > I'm also wondering about whether I'm better off with one of these SATA > > raid > > > controllers or just going with SCSI drives. > > > > > > -- > > > greg > > > > > > > > > ---(end of broadcast)--- > > > TIP 8: explain analyze is your friend > > > > ---(end of broadcast)--- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Intel SRCS16 SATA raid?
Just to clarify these are tests from http://www.storagereview.com, not my own. I guess they couldn't get number for those parts. I think everyone understands that a 0ms seek time impossible, and indicates a missing data point. Thanks, Alex Turner netEconomist On 4/14/05, Dave Held <[EMAIL PROTECTED]> wrote: > > -Original Message----- > > From: Alex Turner [mailto:[EMAIL PROTECTED] > > Sent: Thursday, April 14, 2005 12:14 PM > > To: [EMAIL PROTECTED] > > Cc: Greg Stark; pgsql-performance@postgresql.org; > > [EMAIL PROTECTED] > > Subject: Re: [PERFORM] Intel SRCS16 SATA raid? > > > > > > I have put together a little head to head performance of a 15k SCSI, > > 10k SCSI 10K SATA w/TCQ, 10K SATA wo/TCQ and 7.2K SATA drive > > comparison at storage review > > > > http://www.storagereview.com/php/benchmark/compare_rtg_2001.ph > > p?typeID=10&testbedID=3&osID=4&raidconfigID=1&numDrives=1&devI > > D_0=232&devID_1=40&devID_2=259&devID_3=267&devID_4=261&devID_5 > > =248&devCnt=6 > > > > It does illustrate some of the weaknesses of SATA drives, but all in > > all the Raptor drives put on a good show. > > [...] > > I think it's a little misleading that your tests show 0ms seek times > for some of the write tests. The environmental test also selects a > missing data point as the winner. Besides that, it seems to me that > seek time is one of the most important features for a DB server, which > means that the SCSI drives are the clear winners and the non-WD SATA > drives are the embarrassing losers. Transfer rate is import, but > perhaps less so because DBs tend to read/write small blocks rather > than large files. On the server suite, which seems to me to be the > most relevant for DBs, the Atlas 15k spanks the other drives by a > fairly large margin (especially the lesser SATA drives). When you > ignore the "consumer app" benchmarks, I wouldn't be so confident in > saying that the Raptors "put on a good show". > > __ > David B. Held > Software Engineer/Array Services Group > 200 14th Ave. East, Sartell, MN 56377 > 320.534.3637 320.253.7800 800.752.8129 > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend > ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Intel SRCS16 SATA raid?
Looking at the numbers, the raptor with TCQ enabled was close or beat the Atlas III 10k drive on most benchmarks. Naturaly a 15k drive is going to be faster in many areas, but it is also much more expensive. It was only 44% better on the server tests than the raptor with TCQ, but it costs nearly 300% more ($538 cdw.com, $180 newegg.com). Note also that the 15k drive was the only drive that kept up with the raptor on raw transfer speed, which is going to matter for WAL. For those of us on a budget, a quality controller card with lots of RAM is going to be our biggest friend because it can cache writes, and improve performance. The 3ware controllers seem to be universally benchmarked as the best SATA RAID 10 controllers where database performance is concerned. Even the crappy tweakers.net review had the 3ware as the fastest controller for a MySQL data partition in RAID 10. The Raptor drives can be had for as little as $180/ea, which is quite a good price point considering they can keep up with their SCSI 10k RPM counterparts on almost all tests with NCQ enabled (Note that 3ware controllers _don't_ support NCQ, although they claim their HBA based queueing is 95% as good as NCQ on the drive). Alex Turner netEconomist On 4/14/05, Dave Held <[EMAIL PROTECTED]> wrote: > > -Original Message----- > > From: Alex Turner [mailto:[EMAIL PROTECTED] > > Sent: Thursday, April 14, 2005 12:14 PM > > To: [EMAIL PROTECTED] > > Cc: Greg Stark; pgsql-performance@postgresql.org; > > [EMAIL PROTECTED] > > Subject: Re: [PERFORM] Intel SRCS16 SATA raid? > > > > > > I have put together a little head to head performance of a 15k SCSI, > > 10k SCSI 10K SATA w/TCQ, 10K SATA wo/TCQ and 7.2K SATA drive > > comparison at storage review > > > > http://www.storagereview.com/php/benchmark/compare_rtg_2001.ph > > p?typeID=10&testbedID=3&osID=4&raidconfigID=1&numDrives=1&devI > > D_0=232&devID_1=40&devID_2=259&devID_3=267&devID_4=261&devID_5 > > =248&devCnt=6 > > > > It does illustrate some of the weaknesses of SATA drives, but all in > > all the Raptor drives put on a good show. > > [...] > > I think it's a little misleading that your tests show 0ms seek times > for some of the write tests. The environmental test also selects a > missing data point as the winner. Besides that, it seems to me that > seek time is one of the most important features for a DB server, which > means that the SCSI drives are the clear winners and the non-WD SATA > drives are the embarrassing losers. Transfer rate is import, but > perhaps less so because DBs tend to read/write small blocks rather > than large files. On the server suite, which seems to me to be the > most relevant for DBs, the Atlas 15k spanks the other drives by a > fairly large margin (especially the lesser SATA drives). When you > ignore the "consumer app" benchmarks, I wouldn't be so confident in > saying that the Raptors "put on a good show". > > __ > David B. Held > Software Engineer/Array Services Group > 200 14th Ave. East, Sartell, MN 56377 > 320.534.3637 320.253.7800 800.752.8129 > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend > ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How to improve db performance with $7K?
3ware claim that their 'software' implemented command queueing performs at 95% effectiveness compared to the hardware queueing on a SCSI drive, so I would say that they agree with you. I'm still learning, but as I read it, the bits are split across the platters and there is only 'one' head, but happens to be reading from multiple platters. The 'further' in linear distance the data is from the current position, the longer it's going to take to get there. This seems to be true based on a document that was circulated. A hard drive takes considerable amount of time to 'find' a track on the platter compared to the rotational speed, which would agree with the fact that you can read 70MB/sec, but it takes up to 13ms to seek. the ATA protocol is just how the HBA communicates with the drive, there is no reason why the HBA can't reschedule reads and writes just the like SCSI drive would do natively, and this is what infact 3ware claims. I get the feeling based on my own historical experience that generaly drives don't just have a bunch of bad blocks. This all leads me to believe that you can predict with pretty good accuracy how expensive it is to retrieve a given block knowing it's linear increment. Alex Turner netEconomist On 4/14/05, Kevin Brown <[EMAIL PROTECTED]> wrote: > Tom Lane wrote: > > Kevin Brown <[EMAIL PROTECTED]> writes: > > > I really don't see how this is any different between a system that has > > > tagged queueing to the disks and one that doesn't. The only > > > difference is where the queueing happens. In the case of SCSI, the > > > queueing happens on the disks (or at least on the controller). In the > > > case of SATA, the queueing happens in the kernel. > > > > That's basically what it comes down to: SCSI lets the disk drive itself > > do the low-level I/O scheduling whereas the ATA spec prevents the drive > > from doing so (unless it cheats, ie, caches writes). Also, in SCSI it's > > possible for the drive to rearrange reads as well as writes --- which > > AFAICS is just not possible in ATA. (Maybe in the newest spec...) > > > > The reason this is so much more of a win than it was when ATA was > > designed is that in modern drives the kernel has very little clue about > > the physical geometry of the disk. Variable-size tracks, bad-block > > sparing, and stuff like that make for a very hard-to-predict mapping > > from linear sector addresses to actual disk locations. > > Yeah, but it's not clear to me, at least, that this is a first-order > consideration. A second-order consideration, sure, I'll grant that. > > What I mean is that when it comes to scheduling disk activity, > knowledge of the specific physical geometry of the disk isn't really > important. What's important is whether or not the disk conforms to a > certain set of expectations. Namely, that the general organization is > such that addressing the blocks in block number order guarantees > maximum throughput. > > Now, bad block remapping destroys that guarantee, but unless you've > got a LOT of bad blocks, it shouldn't destroy your performance, right? > > > Combine that with the fact that the drive controller can be much > > smarter than it was twenty years ago, and you can see that the case > > for doing I/O scheduling in the kernel and not in the drive is > > pretty weak. > > Well, I certainly grant that allowing the controller to do the I/O > scheduling is faster than having the kernel do it, as long as it can > handle insertion of new requests into the list while it's in the > middle of executing a request. The most obvious case is when the head > is in motion and the new request can be satisfied by reading from the > media between where the head is at the time of the new request and > where the head is being moved to. > > My argument is that a sufficiently smart kernel scheduler *should* > yield performance results that are reasonably close to what you can > get with that feature. Perhaps not quite as good, but reasonably > close. It shouldn't be an orders-of-magnitude type difference. > > -- > Kevin Brown [EMAIL PROTECTED] > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend > ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Intel SRCS16 SATA raid?
No offense to that review, but it was really wasn't that good, and drew bad conclusions from the data. I posted it originaly and immediately regretted it. See http://www.tweakers.net/reviews/557/18 Amazingly the controller with 1Gig cache manages a write throughput of 750MB/sec on a single drive. quote: "Floating high above the crowd, the ARC-1120 has a perfect view on the struggles of the other adapters. " It's because the adapter has 1Gig of RAM, nothing to do with the RAID architecture, it's clearly caching the entire dataset. The drive can't physicaly run that fast. These guys really don't know what they are doing. Curiously: http://www.tweakers.net/reviews/557/25 The 3ware does very well as a data drive for MySQL. The size of your cache is going to _directly_ affect RAID 5 performance. Put a gig of memory in a 3ware 9500S and benchmark it against the Areca then. Also - folks don't run data paritions on RAID 5 because the write speed is too low. When you look at the results for RAID 10, the 3ware leads the pack. See also: http://www20.tomshardware.com/storage/20041227/areca-raid6-06.html I trust toms hardware a little more to set up a good review to be honest. The 3ware trounces the Areca in all IO/sec test. Alex Turner netEconomist On 4/15/05, Marinos Yannikos <[EMAIL PROTECTED]> wrote: > Joshua D. Drake wrote: > > Well I have never even heard of it. 3ware is the defacto authority of > > reasonable SATA RAID. > > no! 3ware was rather early in this business, but there are plenty of > (IMHO, and some other people's opinion) better alternatives available. > 3ware has good Linux drivers, but the performance of their current > controllers isn't that good. > > Have a look at this: http://www.tweakers.net/reviews/557/1 > > especially the sequential writes with RAID-5 on this page: > > http://www.tweakers.net/reviews/557/19 > > We have been a long-time user of a 3ware 8506 controller (8 disks, > RAID-5) and have purchased 2 Areca ARC-1120 now since we weren't > satisfied with the performance and the 2TB per array limit... > > -mjy > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Intel SRCS16 SATA raid?
The original thread was how much can I get for $7k You can't fit a 15k RPM SCSI solution into $7K ;) Some of us are on a budget! 10k RPM SATA drives give acceptable performance at a good price, thats really the point here. I have never really argued that SATA is going to match SCSI performance on multidrive arrays for IO/sec. But it's all about the benjamins baby. If I told my boss we need $25k for a database machine, he'd tell me that was impossible, and I have $5k to do it. If I tell him $7k - he will swallow that. We don't _need_ the amazing performance of a 15k RPM drive config. Our biggest hit is reads, so we can buy 3xSATA machines and load balance. It's all about the application, and buying what is appropriate. I don't buy a Corvette if all I need is a malibu. Alex Turner netEconomist On 4/15/05, Dave Held <[EMAIL PROTECTED]> wrote: > > -Original Message- > > From: Alex Turner [mailto:[EMAIL PROTECTED] > > Sent: Thursday, April 14, 2005 6:15 PM > > To: Dave Held > > Cc: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] Intel SRCS16 SATA raid? > > > > Looking at the numbers, the raptor with TCQ enabled was close or > > beat the Atlas III 10k drive on most benchmarks. > > And I would be willing to bet that the Atlas 10k is not using the > same generation of technology as the Raptors. > > > Naturaly a 15k drive is going to be faster in many areas, but it > > is also much more expensive. It was only 44% better on the server > > tests than the raptor with TCQ, but it costs nearly 300% more ($538 > > cdw.com, $180 newegg.com). > > State that in terms of cars. Would you be willing to pay 300% more > for a car that is 44% faster than your competitor's? Of course you > would, because we all recognize that the cost of speed/performance > does not scale linearly. Naturally, you buy the best speed that you > can afford, but when it comes to hard drives, the only major feature > whose price tends to scale anywhere close to linearly is capacity. > > > Note also that the 15k drive was the only drive that kept up with > > the raptor on raw transfer speed, which is going to matter for WAL. > > So get a Raptor for your WAL partition. ;) > > > [...] > > The Raptor drives can be had for as little as $180/ea, which is > > quite a good price point considering they can keep up with their > > SCSI 10k RPM counterparts on almost all tests with NCQ enabled > > (Note that 3ware controllers _don't_ support NCQ, although they > > claim their HBA based queueing is 95% as good as NCQ on the drive). > > Just keep in mind the points made by the Seagate article. You're > buying much more than just performance for that $500+. You're also > buying vibrational tolerance, high MTBF, better internal > environmental controls, and a pretty significant margin on seek time, > which is probably your most important feature for disks storing tables. > An interesting test would be to stick several drives in a cabinet and > graph how performance is affected at the different price points/ > technologies/number of drives. > > __ > David B. Held > Software Engineer/Array Services Group > 200 14th Ave. East, Sartell, MN 56377 > 320.534.3637 320.253.7800 800.752.8129 > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly > ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Intel SRCS16 SATA raid?
I stand corrected! Maybe I should re-evaluate our own config! Alex T (The dell PERC controllers do pretty much suck on linux) On 4/15/05, Vivek Khera <[EMAIL PROTECTED]> wrote: > > On Apr 15, 2005, at 11:01 AM, Alex Turner wrote: > > > You can't fit a 15k RPM SCSI solution into $7K ;) Some of us are on a > > budget! > > > > I just bought a pair of Dual Opteron, 4GB RAM, LSI 320-2X RAID dual > channel with 8 36GB 15kRPM seagate drives. Each one of these boxes set > me back just over $7k, including onsite warrantee. > > They totally blow away the Dell Dual XEON with external 14 disk RAID > (also 15kRPM drives, manufacturer unknown) which also has 4GB RAM and a > Dell PERC 3/DC controller, the whole of which set me back over $15k. > > Vivek Khera, Ph.D. > +1-301-869-4449 x806 > > > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to improve db performance with $7K?
This is fundamentaly untrue. A mirror is still a mirror. At most in a RAID 10 you can have two simultaneous seeks. You are always going to be limited by the seek time of your drives. It's a stripe, so you have to read from all members of the stripe to get data, requiring all drives to seek. There is no advantage to seek time in adding more drives. By adding more drives you can increase throughput, but the max throughput of the PCI-X bus isn't that high (I think around 400MB/sec) You can easily get this with a six or seven drive RAID 5, or a ten drive RAID 10. At that point you start having to factor in the cost of a bigger chassis to hold more drives, which can be big bucks. Alex Turner netEconomist On 18 Apr 2005 10:59:05 -0400, Greg Stark <[EMAIL PROTECTED]> wrote: > > William Yu <[EMAIL PROTECTED]> writes: > > > Using the above prices for a fixed budget for RAID-10, you could get: > > > > SATA 7200 -- 680MB per $1000 > > SATA 10K -- 200MB per $1000 > > SCSI 10K -- 125MB per $1000 > > What a lot of these analyses miss is that cheaper == faster because cheaper > means you can buy more spindles for the same price. I'm assuming you picked > equal sized drives to compare so that 200MB/$1000 for SATA is almost twice as > many spindles as the 125MB/$1000. That means it would have almost double the > bandwidth. And the 7200 RPM case would have more than 5x the bandwidth. > > While 10k RPM drives have lower seek times, and SCSI drives have a natural > seek time advantage, under load a RAID array with fewer spindles will start > hitting contention sooner which results into higher latency. If the controller > works well the larger SATA arrays above should be able to maintain their > mediocre latency much better under load than the SCSI array with fewer drives > would maintain its low latency response time despite its drives' lower average > seek time. > > -- > greg > > > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to improve db performance with $7K?
[snip] > > Adding drives will not let you get lower response times than the average seek > time on your drives*. But it will let you reach that response time more often. > [snip] I believe your assertion is fundamentaly flawed. Adding more drives will not let you reach that response time more often. All drives are required to fill every request in all RAID levels (except possibly 0+1, but that isn't used for enterprise applicaitons). Most requests in OLTP require most of the request time to seek, not to read. Only in single large block data transfers will you get any benefit from adding more drives, which is atypical in most database applications. For most database applications, the only way to increase transactions/sec is to decrease request service time, which is generaly achieved with better seek times or a better controller card, or possibly spreading your database accross multiple tablespaces on seperate paritions. My assertion therefore is that simply adding more drives to an already competent* configuration is about as likely to increase your database effectiveness as swiss cheese is to make your car run faster. Alex Turner netEconomist *Assertion here is that the DBA didn't simply configure all tables and xlog on a single 7200 RPM disk, but has seperate physical drives for xlog and tablespace at least on 10k drives. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to improve db performance with $7K?
Not true - the recommended RAID level is RAID 10, not RAID 0+1 (at least I would never recommend 1+0 for anything). RAID 10 and RAID 0+1 are _quite_ different. One gives you very good redundancy, the other is only slightly better than RAID 5, but operates faster in degraded mode (single drive). Alex Turner netEconomist On 4/18/05, John A Meinel <[EMAIL PROTECTED]> wrote: > Alex Turner wrote: > > >[snip] > > > > > >>Adding drives will not let you get lower response times than the average > >>seek > >>time on your drives*. But it will let you reach that response time more > >>often. > >> > >> > >> > >[snip] > > > >I believe your assertion is fundamentaly flawed. Adding more drives > >will not let you reach that response time more often. All drives are > >required to fill every request in all RAID levels (except possibly > >0+1, but that isn't used for enterprise applicaitons). > > > Actually 0+1 is the recommended configuration for postgres databases > (both for xlog and for the bulk data), because the write speed of RAID5 > is quite poor. > Hence you base assumption is not correct, and adding drives *does* help. > > >Most requests > >in OLTP require most of the request time to seek, not to read. Only > >in single large block data transfers will you get any benefit from > >adding more drives, which is atypical in most database applications. > >For most database applications, the only way to increase > >transactions/sec is to decrease request service time, which is > >generaly achieved with better seek times or a better controller card, > >or possibly spreading your database accross multiple tablespaces on > >seperate paritions. > > > > > This is probably true. However, if you are doing lots of concurrent > connections, and things are properly spread across multiple spindles > (using RAID0+1, or possibly tablespaces across multiple raids). > Then each seek occurs on a separate drive, which allows them to occur at > the same time, rather than sequentially. Having 2 processes competing > for seeking on the same drive is going to be worse than having them on > separate drives. > John > =:-> > > > ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to improve db performance with $7K?
I think the add more disks thing is really from the point of view that one disk isn't enough ever. You should really have at least four drives configured into two RAID 1s. Most DBAs will know this, but most average Joes won't. Alex Turner netEconomist On 4/18/05, Steve Poe <[EMAIL PROTECTED]> wrote: > Alex, > > In the situation of the animal hospital server I oversee, their > application is OLTP. Adding hard drives (6-8) does help performance. > Benchmarks like pgbench and OSDB agree with it, but in reality users > could not see noticeable change. However, moving the top 5/10 tables and > indexes to their own space made a greater impact. > > Someone who reads PostgreSQL 8.0 Performance Checklist is going to see > point #1 add more disks is the key. How about adding a subpoint to > explaining when more disks isn't enough or applicable? I maybe > generalizing the complexity of tuning an OLTP application, but some > clarity could help. > > Steve Poe > > ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to improve db performance with $7K?
Ok - well - I am partially wrong... If you're stripe size is 64Kb, and you are reading 256k worth of data, it will be spread across four drives, so you will need to read from four devices to get your 256k of data (RAID 0 or 5 or 10), but if you are only reading 64kb of data, I guess you would only need to read from one disk. So my assertion that adding more drives doesn't help is pretty wrong... particularly with OLTP because it's always dealing with blocks that are smaller that the stripe size. Alex Turner netEconomist On 4/18/05, Jacques Caron <[EMAIL PROTECTED]> wrote: > Hi, > > At 18:56 18/04/2005, Alex Turner wrote: > >All drives are required to fill every request in all RAID levels > > No, this is definitely wrong. In many cases, most drives don't actually > have the data requested, how could they handle the request? > > When reading one random sector, only *one* drive out of N is ever used to > service any given request, be it RAID 0, 1, 0+1, 1+0 or 5. > > When writing: > - in RAID 0, 1 drive > - in RAID 1, RAID 0+1 or 1+0, 2 drives > - in RAID 5, you need to read on all drives and write on 2. > > Otherwise, what would be the point of RAID 0, 0+1 or 1+0? > > Jacques. > > ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How to improve db performance with $7K?
So I wonder if one could take this stripe size thing further and say that a larger stripe size is more likely to result in requests getting served parallized across disks which would lead to increased performance? Again, thanks to all people on this list, I know that I have learnt a _hell_ of alot since subscribing. Alex Turner netEconomist On 4/18/05, Alex Turner <[EMAIL PROTECTED]> wrote: > Ok - well - I am partially wrong... > > If you're stripe size is 64Kb, and you are reading 256k worth of data, > it will be spread across four drives, so you will need to read from > four devices to get your 256k of data (RAID 0 or 5 or 10), but if you > are only reading 64kb of data, I guess you would only need to read > from one disk. > > So my assertion that adding more drives doesn't help is pretty > wrong... particularly with OLTP because it's always dealing with > blocks that are smaller that the stripe size. > > Alex Turner > netEconomist > > On 4/18/05, Jacques Caron <[EMAIL PROTECTED]> wrote: > > Hi, > > > > At 18:56 18/04/2005, Alex Turner wrote: > > >All drives are required to fill every request in all RAID levels > > > > No, this is definitely wrong. In many cases, most drives don't actually > > have the data requested, how could they handle the request? > > > > When reading one random sector, only *one* drive out of N is ever used to > > service any given request, be it RAID 0, 1, 0+1, 1+0 or 5. > > > > When writing: > > - in RAID 0, 1 drive > > - in RAID 1, RAID 0+1 or 1+0, 2 drives > > - in RAID 5, you need to read on all drives and write on 2. > > > > Otherwise, what would be the point of RAID 0, 0+1 or 1+0? > > > > Jacques. > > > > > ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to improve db performance with $7K?
Mistype.. I meant 0+1 in the second instance :( On 4/18/05, Joshua D. Drake <[EMAIL PROTECTED]> wrote: > Alex Turner wrote: > > Not true - the recommended RAID level is RAID 10, not RAID 0+1 (at > > least I would never recommend 1+0 for anything). > > Uhmm I was under the impression that 1+0 was RAID 10 and that 0+1 is NOT > RAID 10. > > Ref: http://www.acnc.com/raid.html > > Sincerely, > > Joshua D. Drake > > > > ---(end of broadcast)--- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to [EMAIL PROTECTED] so that your > > message can get through to the mailing list cleanly > > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] How to improve db performance with $7K?
On 4/18/05, Jacques Caron <[EMAIL PROTECTED]> wrote: > Hi, > > At 20:21 18/04/2005, Alex Turner wrote: > >So I wonder if one could take this stripe size thing further and say > >that a larger stripe size is more likely to result in requests getting > >served parallized across disks which would lead to increased > >performance? > > Actually, it would be pretty much the opposite. The smaller the stripe > size, the more evenly distributed data is, and the more disks can be used > to serve requests. If your stripe size is too large, many random accesses > within one single file (whose size is smaller than the stripe size/number > of disks) may all end up on the same disk, rather than being split across > multiple disks (the extreme case being stripe size = total size of all > disks, which means concatenation). If all accesses had the same cost (i.e. > no seek time, only transfer time), the ideal would be to have a stripe size > equal to the number of disks. > [snip] Ahh yes - but the critical distinction is this: The smaller the stripe size, the more disks will be used to serve _a_ request - which is bad for OLTP because you want fewer disks per request so that you can have more requests per second because the cost is mostly seek. If more than one disk has to seek to serve a single request, you are preventing that disk from serving a second request at the same time. To have more throughput in MB/sec, you want a smaller stripe size so that you have more disks serving a single request allowing you to multiple by effective drives to get total bandwidth. Because OLTP is made up of small reads and writes to a small number of different files, I would guess that you want those files split up across your RAID, but not so much that a single small read or write operation would traverse more than one disk. That would infer that your optimal stripe size is somewhere on the right side of the bell curve that represents your database read and write block count distribution. If on average the dbwritter never flushes less than 1MB to disk at a time, then I guess your best stripe size would be 1MB, but that seems very large to me. So I think therefore that I may be contending the exact opposite of what you are postulating! Alex Turner netEconomist ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How to improve db performance with $7K?
Does it really matter at which end of the cable the queueing is done (Assuming both ends know as much about drive geometry etc..)? Alex Turner netEconomist On 4/18/05, Bruce Momjian wrote: > Kevin Brown wrote: > > Greg Stark wrote: > > > > > > > I think you're being misled by analyzing the write case. > > > > > > Consider the read case. When a user process requests a block and > > > that read makes its way down to the driver level, the driver can't > > > just put it aside and wait until it's convenient. It has to go ahead > > > and issue the read right away. > > > > Well, strictly speaking it doesn't *have* to. It could delay for a > > couple of milliseconds to see if other requests come in, and then > > issue the read if none do. If there are already other requests being > > fulfilled, then it'll schedule the request in question just like the > > rest. > > The idea with SCSI or any command queuing is that you don't have to wait > for another request to come in --- you can send the request as it > arrives, then if another shows up, you send that too, and the drive > optimizes the grouping at a later time, knowing what the drive is doing, > rather queueing in the kernel. > > -- > Bruce Momjian| http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup.| Newtown Square, Pennsylvania 19073 > > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to improve db performance with $7K?
I wonder if thats something to think about adding to Postgresql? A setting for multiblock read count like Oracle (Although having said that I believe that Oracle natively caches pages much more aggressively that postgresql, which allows the OS to do the file caching). Alex Turner netEconomist P.S. Oracle changed this with 9i, you can change the Database block size on a tablespace by tablespace bassis making it smaller for OLTP tablespaces and larger for Warehousing tablespaces (at least I think it's on a tablespace, might be on a whole DB). On 4/19/05, Jim C. Nasby <[EMAIL PROTECTED]> wrote: > On Mon, Apr 18, 2005 at 06:41:37PM -, Mohan, Ross wrote: > > Don't you think "optimal stripe width" would be > > a good question to research the binaries for? I'd > > think that drives the answer, largely. (uh oh, pun alert) > > > > EG, oracle issues IO requests (this may have changed _just_ > > recently) in 64KB chunks, regardless of what you ask for. > > So when I did my striping (many moons ago, when the Earth > > was young...) I did it in 128KB widths, and set the oracle > > "multiblock read count" according. For oracle, any stripe size > > under 64KB=stupid, anything much over 128K/258K=wasteful. > > > > I am eager to find out how PG handles all this. > > AFAIK PostgreSQL requests data one database page at a time (normally > 8k). Of course the OS might do something different. > -- > Jim C. Nasby, Database Consultant [EMAIL PROTECTED] > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to improve db performance with $7K?
Whilst I admire your purist approach, I would say that if it is beneficial to performance that a kernel understand drive geometry, then it is worth investigating teaching it how to deal with that! I was less referrring to the kernel as I was to the controller. Lets say we invented a new protocol that including the drive telling the controller how it was layed out at initialization time so that the controller could make better decisions about re-ordering seeks. It would be more cost effective to have that set of electronics just once in the controller, than 8 times on each drive in an array, which would yield better performance to cost ratio. Therefore I would suggest it is something that should be investigated. After all, why implemented TCQ on each drive, if it can be handled more effeciently at the other end by the controller for less money?! Alex Turner netEconomist On 4/19/05, Dave Held <[EMAIL PROTECTED]> wrote: > > -Original Message- > > From: Alex Turner [mailto:[EMAIL PROTECTED] > > Sent: Monday, April 18, 2005 5:50 PM > > To: Bruce Momjian > > Cc: Kevin Brown; pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] How to improve db performance with $7K? > > > > Does it really matter at which end of the cable the queueing is done > > (Assuming both ends know as much about drive geometry etc..)? > > [...] > > The parenthetical is an assumption I'd rather not make. If my > performance depends on my kernel knowing how my drive is laid > out, I would always be wondering if a new drive is going to > break any of the kernel's geometry assumptions. Drive geometry > doesn't seem like a kernel's business any more than a kernel > should be able to decode the ccd signal of an optical mouse. > The kernel should queue requests at a level of abstraction that > doesn't depend on intimate knowledge of drive geometry, and the > drive should queue requests on the concrete level where geometry > matters. A drive shouldn't guess whether a process is trying to > read a file sequentially, and a kernel shouldn't guess whether > sector 30 is contiguous with sector 31 or not. > > __ > David B. Held > Software Engineer/Array Services Group > 200 14th Ave. East, Sartell, MN 56377 > 320.534.3637 320.253.7800 800.752.8129 > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index bloat problem?
Is: REINDEX DATABASE blah supposed to rebuild all indices in the database, or must you specify each table individualy? (I'm asking because I just tried it and it only did system tables) Alex Turner netEconomist On 4/21/05, Josh Berkus wrote: > Bill, > > > What about if an out-of-the-ordinary number of rows > > were deleted (say 75% of rows in the table, as opposed > > to normal 5%) followed by a 'VACUUM ANALYZE'? Could > > things get out of whack because of that situation? > > Yes. You'd want to run REINDEX after and event like that. As you should now. > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---(end of broadcast)--- > TIP 6: Have you searched our list archives? > >http://archives.postgresql.org > ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index bloat problem?
Same thing happens in Oracle ALTER INDEX rebuild To force a rebuild. It will mark the free blocks as 'free' below the PCTFREE value for the tablespace. Basically If you build an index with entries. and each entry is 1/4 of a block, the database will write 2500 blocks to the disk. If you delete a random 75% of the index values, you will now have 2500 blocks that have 75% free space. The database will reuse that free space in those blocks as you insert new values, but until then, you still have 2500 blocks worth of data on a disk, that is only 25% full. Rebuilding the index forces the system to physically re-allocate all that data space, and now you have just 2499 entries, that use 625 blocks. I'm not sure that 'blocks' is the correct term in postgres, it's segments in Oracle, but the concept remains the same. Alex Turner netEconomist On 4/21/05, Bill Chandler <[EMAIL PROTECTED]> wrote: > > --- Josh Berkus wrote: > > Bill, > > > > > What about if an out-of-the-ordinary number of > > rows > > > were deleted (say 75% of rows in the table, as > > opposed > > > to normal 5%) followed by a 'VACUUM ANALYZE'? > > Could > > > things get out of whack because of that situation? > > > > Yes. You'd want to run REINDEX after and event like > > that. As you should now. > > > > -- > > Josh Berkus > > Aglio Database Solutions > > San Francisco > > > > Thank you. Though I must say, that is very > discouraging. REINDEX is a costly operation, timewise > and due to the fact that it locks out other processes > from proceeding. Updates are constantly coming in and > queries are occurring continuously. A REINDEX could > potentially bring the whole thing to a halt. > > Honestly, this seems like an inordinate amount of > babysitting for a production application. I'm not > sure if the client will be willing to accept it. > > Admittedly my knowledge of the inner workings of an > RDBMS is limited, but could somebody explain to me why > this would be so? If you delete a bunch of rows why > doesn't the index get updated at the same time? Is > this a common issue among all RDBMSs or is it > something that is PostgreSQL specific? Is there any > way around it? > > thanks, > > Bill > > __ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ---(end of broadcast)--- > TIP 6: Have you searched our list archives? > >http://archives.postgresql.org > ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Partitioning / Clustering
What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be using better machines, but I am trying to tackle the real issue) is not a hugely price efficient way of getting extra performance, nor particularly scalable in the long term. So, when/is PG meant to be getting a decent partitioning system? MySQL is getting one (eventually) which is apparently meant to be similiar to Oracle's according to the docs. Clusgres does not appear to be widely/or at all used, and info on it seems pretty thin on the ground, so I am not too keen on going with that. Is the real solution to multi- machine partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL actually doing it in our application API? This seems like a less than perfect solution once we want to add redundancy and things into the mix. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioning / Clustering
On 10 May 2005, at 15:41, John A Meinel wrote: Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be using better machines, but I am trying to tackle the real issue) is not a hugely price efficient way of getting extra performance, nor particularly scalable in the long term. Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far away from Big Iron. I don't know what performance you are looking for, but you can easily get into inserting 10M rows/day with quality hardware. Better hardware = More Efficient != More Scalable But actually is it your SELECT load that is too high, or your INSERT load, or something inbetween. Because Slony is around if it is a SELECT problem. http://gborg.postgresql.org/project/slony1/projdisplay.php Basically, Slony is a Master/Slave replication system. So if you have INSERT going into the Master, you can have as many replicated slaves, which can handle your SELECT load. Slony is an asynchronous replicator, so there is a time delay from the INSERT until it will show up on a slave, but that time could be pretty small. So, when/is PG meant to be getting a decent partitioning system? MySQL is getting one (eventually) which is apparently meant to be similiar to Oracle's according to the docs. Clusgres does not appear to be widely/or at all used, and info on it seems pretty thin on the ground, so I am not too keen on going with that. Is the real solution to multi- machine partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL actually doing it in our application API? This seems like a less than perfect solution once we want to add redundancy and things into the mix. There is also PGCluster http://pgfoundry.org/projects/pgcluster/ Which is trying to be more of a Synchronous multi-master system. I haven't heard of Clusgres, so I'm guessing it is an older attempt, which has been overtaken by pgcluster. Just realize that clusters don't necessarily scale like you would want them too. Because at some point you have to insert into the same table, which means you need to hold a lock which prevents the other machine from doing anything. And with synchronous replication, you have to wait for all of the machines to get a copy of the data before you can say it has been committed, which does *not* scale well with the number of machines. This is why I mention partitioning. It solves this issue by storing different data sets on different machines under the same schema. These seperate chunks of the table can then be replicated as well for data redundancy and so on. MySQL are working on these things, but PG just has a bunch of third party extensions, I wonder why these are not being integrated into the main trunk :/ Thanks for pointing me to PGCluster though. It looks like it should be better than Slony at least. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioning / Clustering
On 10 May 2005, at 16:02, Adam Haberlach wrote: I think that perhaps he was trying to avoid having to buy "Big Iron" at all. You would be right. Although we are not against paying a bit more than $300 for a server ;) With all the Opteron v. Xeon around here, and talk of $30,000 machines, perhaps it would be worth exploring the option of buying 10 cheapass machines for $300 each. At the moment, that $300 buys you, from Dell, a 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet. The aggregate CPU and bandwidth is pretty stupendous, but not as easy to harness as a single machine. Yes, clustering solutions can distribute the data, and can even do it on a per-table basis in some cases. This still leaves it up to the application's logic to handle reunification of the data. If your going to be programming that sort of logic into your API in the beginning, it's not too much more work to add basic replication, load balancing and partitioning into it either. But the DB should be able to do it for you, adding that stuff in later is often more difficult and less likely to get done. Ideas: 1. Create a table/storage type that consists of a select statement on another machine. While I don't think the current executor is capable of working on multiple nodes of an execution tree at the same time, it would be great if it could offload a select of tuples from a remote table to an entirely different server and merge the resulting data into the current execution. I believe MySQL has this, and Oracle may implement it in another way. MySQL sort of has this, it's not as good as Oracle's though. Apparently there is a much better version of it in 5.1 though, that should make it to stable sometime next year I imagine. 2. There is no #2 at this time, but I'm sure one can be hypothesized. I would of thought a particularly smart version of pg_pool could do it. It could partition data to different servers if it knew which columns to key by on each table. ...Google and other companies have definitely proved that one can harness huge clusters of cheap hardware. It can't be _that_ hard, can it. :) I shudder to think how much the "Big Iron" equivalent of a google data-center would cost. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John A Meinel Sent: Tuesday, May 10, 2005 7:41 AM To: Alex Stapleton Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Partitioning / Clustering Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be using better machines, but I am trying to tackle the real issue) is not a hugely price efficient way of getting extra performance, nor particularly scalable in the long term. Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far away from Big Iron. I don't know what performance you are looking for, but you can easily get into inserting 10M rows/day with quality hardware. But actually is it your SELECT load that is too high, or your INSERT load, or something inbetween. Because Slony is around if it is a SELECT problem. http://gborg.postgresql.org/project/slony1/projdisplay.php Basically, Slony is a Master/Slave replication system. So if you have INSERT going into the Master, you can have as many replicated slaves, which can handle your SELECT load. Slony is an asynchronous replicator, so there is a time delay from the INSERT until it will show up on a slave, but that time could be pretty small. This would require some application level support, since an INSERT goes to a different place than a SELECT. But there has been some discussion about pg_pool being able to spread the query load, and having it be aware of the difference between a SELECT and an INSERT and have it route the query to the correct host. The biggest problem being that functions could cause a SELECT func() to actually insert a row, which pg_pool wouldn't know about. There are 2 possible solutions, a) don't do that when you are using this system, b) add some sort of comment hint so that pg_pool can understand that the select is actually an INSERT, and needs to be done on the master. So, when/is PG meant to be getting a decent partitioning system? MySQL is getting one (eventually) which is apparently meant to be similiar to Oracle's according to the docs. Clusgres does not appear to be widely/or at all used, and info on it seems pretty thin on the ground, so I am not too keen on going with that. Is the real solution to multi- machine partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL actually doing it in our ap
Re: [PERFORM] Partitioning / Clustering
On 11 May 2005, at 08:16, Simon Riggs wrote: On Tue, 2005-05-10 at 11:03 +0100, Alex Stapleton wrote: So, when/is PG meant to be getting a decent partitioning system? ISTM that your question seems to confuse where code comes from. Without meaning to pick on you, or reply rudely, I'd like to explore that question. Perhaps it should be a FAQ entry. All code is written by someone, and those people need to eat. Some people are fully or partly funded to perform their tasks on this project (coding, patching, etc). Others contribute their time for a variety of reasons where involvement has a positive benefit. You should ask these questions: - Is anyone currently working on (Feature X)? - If not, Can I do it myself? - If not, and I still want it, can I fund someone else to build it for me? Asking "when is Feature X going to happen" is almost certainly going to get the answer "never" otherwise, if the initial development is large and complex. There are many TODO items that have lain untouched for years, even though adding the feature has been discussed and agreed. Best Regards, Simon Riggs Acceptable Answers to 'So, when/is PG meant to be getting a decent partitioning system?': 1. Person X is working on it I believe. 2. It's on the list, but nobody has done anything about it yet 3. Your welcome to take a stab at it, I expect the community would support your efforts as well. 4. If you have a huge pile of money you could probably buy the Moon. Thinking along those lines, you can probably pay someone to write it for you. 5. It's a stupid idea, and it's never going to work, and heres why.. Unacceptable Answers to the same question: 1. Yours. Be more helpful, and less arrogant please. Everyone else who has contributed to this thread has been very helpful in clarifying the state of affairs and pointing out what work is and isn't being done, and alternatives to just waiting for PG do it for you. ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioning / Clustering
On 11 May 2005, at 08:57, David Roussel wrote: For an interesting look at scalability, clustering, caching, etc for a large site have a look at how livejournal did it. http://www.danga.com/words/2004_lisa/lisa04.pdf I have implemented similar systems in the past, it's a pretty good technique, unfortunately it's not very "Plug-and-Play" as you have to base most of your API on memcached (I imagine MySQLs NDB tables might work as well actually) for it to work well. They have 2.6 Million active users, posting 200 new blog entries per minute, plus many comments and countless page views. Although this system is of a different sort to the type I work on it's interesting to see how they've made it scale. They use mysql on dell hardware! And found single master replication did not scale. There's a section on multimaster replication, not sure if they use it. The main approach they use is to parition users into spefic database clusters. Caching is done using memcached at the application level to avoid hitting the db for rendered pageviews I don't think they are storing pre-rendered pages (or bits of) in memcached, but are principally storing the data for the pages in it. Gluing pages together is not a hugely intensive process usually :) The only problem with memcached is that the clients clustering/ partitioning system will probably break if a node dies, and probably get confused if you add new nodes onto it as well. Easily extensible clustering (no complete redistribution of data required when you add/ remove nodes) with the data distributed across nodes seems to be nothing but a pipe dream right now. It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up. Don't Google use indexing servers which keep track of where data is? So that you only need to update them when you add or move data, deletes don't even have to be propagated among indexes immediately really because you'll find out if data isn't there when you visit where it should be. Or am I talking crap? David ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioning / Clustering
On 11 May 2005, at 09:50, Alex Stapleton wrote: On 11 May 2005, at 08:57, David Roussel wrote: For an interesting look at scalability, clustering, caching, etc for a large site have a look at how livejournal did it. http://www.danga.com/words/2004_lisa/lisa04.pdf I have implemented similar systems in the past, it's a pretty good technique, unfortunately it's not very "Plug-and-Play" as you have to base most of your API on memcached (I imagine MySQLs NDB tables might work as well actually) for it to work well. They have 2.6 Million active users, posting 200 new blog entries per minute, plus many comments and countless page views. Although this system is of a different sort to the type I work on it's interesting to see how they've made it scale. They use mysql on dell hardware! And found single master replication did not scale. There's a section on multimaster replication, not sure if they use it. The main approach they use is to parition users into spefic database clusters. Caching is done using memcached at the application level to avoid hitting the db for rendered pageviews I don't think they are storing pre-rendered pages (or bits of) in memcached, but are principally storing the data for the pages in it. Gluing pages together is not a hugely intensive process usually :) The only problem with memcached is that the clients clustering/ partitioning system will probably break if a node dies, and probably get confused if you add new nodes onto it as well. Easily extensible clustering (no complete redistribution of data required when you add/remove nodes) with the data distributed across nodes seems to be nothing but a pipe dream right now. It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up. Don't Google use indexing servers which keep track of where data is? So that you only need to update them when you add or move data, deletes don't even have to be propagated among indexes immediately really because you'll find out if data isn't there when you visit where it should be. Or am I talking crap? That will teach me to RTFA first ;) Ok so LJ maintain an index of which cluster each user is on, kinda of like google do :) David ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning / Clustering
On 11 May 2005, at 23:35, PFC wrote: However, memcached (and for us, pg_memcached) is an excellent way to improve horizontal scalability by taking disposable data (like session information) out of the database and putting it in protected RAM. So, what is the advantage of such a system versus, say, a "sticky sessions" system where each session is assigned to ONE application server (not PHP then) which keeps it in RAM as native objects instead of serializing and deserializing it on each request ? I'd say the sticky sessions should perform a lot better, and if one machine dies, only the sessions on this one are lost. But of course you can't do it with PHP as you need an app server which can manage sessions. Potentially the savings are huge, though. Theres no reason it couldn't be done with PHP to be fair as long as you could ensure that the client was always routed back to the same machines. Which has it's own set of issues entirely. I am not entirely sure that memcached actually does serialize data when it's comitted into memcached either, although I could be wrong, I have not looked at the source. Certainly if you can ensure that a client always goes back to the same machine you can simplify the whole thing hugely. It's generally not that easy though, you need a proxy server of some description capable of understanding the HTTP traffic and maintaining a central session lookup table to redirect with. Which isn't really solving the problem so much as moving it somewhere else. Instead of needing huge memcached pools, you need hardcore loadbalancers. Load Balancers tend to cost $ in comparison. Distributed sticky sessions are a rather nice idea, I would like to hear a way of implementing them cheaply (and on PHP) as well. I may have to give that some thought in fact. Oh yeah, and load balancers software often sucks in annoying (if not always important) ways. On Google, their distributed system spans a huge number of PCs and it has redundancy, ie. individual PC failure is a normal thing and is a part of the system, it is handled gracefully. I read a paper on this matter, it's pretty impressive. The google filesystem has nothing to do with databases though, it's more a massive data store / streaming storage. Since when did Massive Data stores have nothing to do with DBs? Isn't Oracle Cluster entirely based on forming an enormous scalable disk array to store your DB on? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Partitioning / Clustering
Having local sessions is unnesesary, and here is my logic: Generaly most people have less than 100Mb of bandwidth to the internet. If you make the assertion that you are transferring equal or less session data between your session server (lets say an RDBMS) and the app server than you are between the app server and the client, an out of band 100Mb network for session information is plenty of bandwidth. This also represents OLTP style traffic, which postgresql is pretty good at. You should easily be able to get over 100Tps. 100 hits per second is an awful lot of traffic, more than any website I've managed will ever see. Why solve the complicated clustered sessions problem, when you don't really need to? Alex Turner netEconomist On 5/11/05, PFC <[EMAIL PROTECTED]> wrote: > > > > However, memcached (and for us, pg_memcached) is an excellent way to > > improve > > horizontal scalability by taking disposable data (like session > > information) > > out of the database and putting it in protected RAM. > > So, what is the advantage of such a system versus, say, a "sticky > sessions" system where each session is assigned to ONE application server > (not PHP then) which keeps it in RAM as native objects instead of > serializing and deserializing it on each request ? > I'd say the sticky sessions should perform a lot better, and if one > machine dies, only the sessions on this one are lost. > But of course you can't do it with PHP as you need an app server which > can manage sessions. Potentially the savings are huge, though. > > On Google, their distributed system spans a huge number of PCs and it > has > redundancy, ie. individual PC failure is a normal thing and is a part of > the system, it is handled gracefully. I read a paper on this matter, it's > pretty impressive. The google filesystem has nothing to do with databases > though, it's more a massive data store / streaming storage. > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioning / Clustering
On 12 May 2005, at 15:08, Alex Turner wrote: Having local sessions is unnesesary, and here is my logic: Generaly most people have less than 100Mb of bandwidth to the internet. If you make the assertion that you are transferring equal or less session data between your session server (lets say an RDBMS) and the app server than you are between the app server and the client, an out of band 100Mb network for session information is plenty of bandwidth. This also represents OLTP style traffic, which postgresql is pretty good at. You should easily be able to get over 100Tps. 100 hits per second is an awful lot of traffic, more than any website I've managed will ever see. Why solve the complicated clustered sessions problem, when you don't really need to? 100 hits a second = 8,640,000 hits a day. I work on a site which does > 100 million dynamic pages a day. In comparison Yahoo probably does > 100,000,000,000 (100 billion) views a day if I am interpreting Alexa's charts correctly. Which is about 1,150,000 a second. Now considering the site I work on is not even in the top 1000 on Alexa, theres a lot of sites out there which need to solve this problem I would assume. There are also only so many hash table lookups a single machine can do, even if its a Quad Opteron behemoth. Alex Turner netEconomist On 5/11/05, PFC <[EMAIL PROTECTED]> wrote: However, memcached (and for us, pg_memcached) is an excellent way to improve horizontal scalability by taking disposable data (like session information) out of the database and putting it in protected RAM. So, what is the advantage of such a system versus, say, a "sticky sessions" system where each session is assigned to ONE application server (not PHP then) which keeps it in RAM as native objects instead of serializing and deserializing it on each request ? I'd say the sticky sessions should perform a lot better, and if one machine dies, only the sessions on this one are lost. But of course you can't do it with PHP as you need an app server which can manage sessions. Potentially the savings are huge, though. On Google, their distributed system spans a huge number of PCs and it has redundancy, ie. individual PC failure is a normal thing and is a part of the system, it is handled gracefully. I read a paper on this matter, it's pretty impressive. The google filesystem has nothing to do with databases though, it's more a massive data store / streaming storage. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Partitioning / Clustering
Ok - my common sense alarm is going off here... There are only 6.446 billion people worldwide. 100 Billion page views would require every person in the world to view 18 pages of yahoo every day. Not very likely. http://www.internetworldstats.com/stats.htm suggests that there are around 1 billion people actualy on the internet. That means each and every person on the internet has to view 100 pages per day of yahoo. pretty unlikely IMHO. I for one don't even use Yahoo ;) 100 million page views per day suggests that 1 in 100 people on the internet each viewed 10 pages of a site. Thats a pretty high percentage if you ask me. If I visit 20 web sites in a day, and see an average of 10 pages per site. that means only about 2000 or so sites generate 100 million page views in a day or better. 100 million pageviews averages to 1157/sec, which we'll double for peak load to 2314. I can easily see a system doing 2314 hash lookups per second. Hell I wrote a system that could do a thousand times that four years ago on a single 1Ghz Athlon. Heck - you can get 2314 lookups/sec on a 486 ;) Given that session information doesn't _have_ to persist to storage, and can be kept in RAM. A single server could readily manage session information for even very large sites (of course over a million concurrent users could really start chewing into RAM, but if you are Yahoo, you can probably afford a box with 100GB of RAM ;). We get over 1000 tps on a dual opteron with a couple of mid size RAID arrays on 10k discs with fsync on for small transactions. I'm sure that could easily be bettered with a few more dollars. Maybe my number are off, but somehow it doesn't seem like that many people need a highly complex session solution to me. Alex Turner netEconomist On 5/12/05, Alex Stapleton <[EMAIL PROTECTED]> wrote: > > On 12 May 2005, at 15:08, Alex Turner wrote: > > > Having local sessions is unnesesary, and here is my logic: > > > > Generaly most people have less than 100Mb of bandwidth to the > > internet. > > > > If you make the assertion that you are transferring equal or less > > session data between your session server (lets say an RDBMS) and the > > app server than you are between the app server and the client, an out > > of band 100Mb network for session information is plenty of bandwidth. > > This also represents OLTP style traffic, which postgresql is pretty > > good at. You should easily be able to get over 100Tps. 100 hits per > > second is an awful lot of traffic, more than any website I've managed > > will ever see. > > > > Why solve the complicated clustered sessions problem, when you don't > > really need to? > > 100 hits a second = 8,640,000 hits a day. I work on a site which does > > 100 million dynamic pages a day. In comparison Yahoo probably does > > 100,000,000,000 (100 billion) views a day > if I am interpreting Alexa's charts correctly. Which is about > 1,150,000 a second. > > Now considering the site I work on is not even in the top 1000 on > Alexa, theres a lot of sites out there which need to solve this > problem I would assume. > > There are also only so many hash table lookups a single machine can > do, even if its a Quad Opteron behemoth. > > > > Alex Turner > > netEconomist > > > > On 5/11/05, PFC <[EMAIL PROTECTED]> wrote: > > > >> > >> > >> > >>> However, memcached (and for us, pg_memcached) is an excellent way to > >>> improve > >>> horizontal scalability by taking disposable data (like session > >>> information) > >>> out of the database and putting it in protected RAM. > >>> > >> > >> So, what is the advantage of such a system versus, say, a > >> "sticky > >> sessions" system where each session is assigned to ONE application > >> server > >> (not PHP then) which keeps it in RAM as native objects instead of > >> serializing and deserializing it on each request ? > >> I'd say the sticky sessions should perform a lot better, > >> and if one > >> machine dies, only the sessions on this one are lost. > >> But of course you can't do it with PHP as you need an app > >> server which > >> can manage sessions. Potentially the savings are huge, though. > >> > >> On Google, their distributed system spans a huge number of > >> PCs and it has > >> redundancy, ie. individual PC failure is a normal thing and is a > >> part of > >> the system, it is handled gracefully. I read a paper on this > >> matter, it's > >> pretty impressive. The google filesystem has nothing to do with > >> databases > >> though, it's more a massive data store / streaming storage. > >> > >> ---(end of > >> broadcast)--- > >> TIP 1: subscribe and unsubscribe commands go to > >> [EMAIL PROTECTED] > >> > >> > > > > > > ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Partitioning / Clustering
On 12 May 2005, at 18:33, Josh Berkus wrote: People, In general I think your point is valid. Just remember that it probably also matters how you count page views. Because technically images are a separate page (and this thread did discuss serving up images). So if there are 20 graphics on a specific page, that is 20 server hits just for that one page. Also, there's bots and screen-scrapers and RSS, web e-mails, and web services and many other things which create hits but are not "people". I'm currently working on clickstream for a site which is nowhere in the top 100, and is getting 3 million real hits a day ... and we know for a fact that at least 1/4 of that is bots. I doubt bots are generally Alexa toolbar enabled. Regardless, the strategy you should be employing for a high traffic site is that if your users hit the database for anything other than direct interaction (like filling out a webform) then you're lost.Use memcached, squid, lighttpd caching, ASP.NET caching, pools, etc. Keep the load off the database except for the stuff that only the database can do. This is the aproach I would take as well. There is no point storing stuff in a DB, if your only doing direct lookups on it and it isn't the sort of data that you care so much about the integrity of. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Optimizing for writes. Data integrity not critical
Is using a ramdisk in situations like this entirely ill-advised then? When data integrity isn't a huge issue and you really need good write performance it seems like it wouldn't hurt too much. Unless I am missing something? On 20 May 2005, at 02:45, Christopher Kings-Lynne wrote: I'm doing the writes individually. Is there a better way? Combining them all into a transaction or something? Use COPY of course :) Or at worst bundle 1000 inserts at a time in a transation... And if you seriously do not care about your data at all, set fsync = off in you postgresql.conf for a mega speedup. Chris ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimizing for writes. Data integrity not critical
I am interested in optimising write performance as well, the machine I am testing on is maxing out around 450 UPDATEs a second which is quite quick I suppose. I haven't tried turning fsync off yet. The table has...a lot of indices as well. They are mostly pretty simple partial indexes though. I would usually just shuv stuff into memcached, but I need to store and sort (in realtime) 10's of thousands of rows. (I am experimenting with replacing some in house toplist generating stuff with a PG database.) The partial indexes are basically the only thing which makes the table usable btw. The read performance is pretty damn good, but for some reason I chose to wrote the benchmark script in PHP, which can totally destroy the accuracy of your results if you decide to call pg_fetch_*(), even pg_affected_rows() can skew things significantly. So any ideas how to improve the number of writes I can do a second? The existing system sorts everything by the desired column when a request is made, and the data it sorts is updated in realtime (whilst it isn't being sorted.) And it can sustain the read/write load (to memory) just fine. If I PG had heap tables this would probably not be a problem at all, but it does, so it is. Running it in a ramdisk would be acceptable, it's just annoying to create the db everytime the machine goes down. And having to run the entire PG instance off of the ramdisk isn't great either. On 19 May 2005, at 23:21, Steve Bergman wrote: Hi, I am using postgresql in small (almost trivial) application in which I pull some data out of a Cobol C/ISAM file and write it into a pgsl table. My users can then use the data however they want by interfacing to the data from OpenOffice.org. The amount of data written is about 60MB and takes a few minutes on a 1200Mhz Athlon with a single 60MB IDE drive running Fedora Core 3 with pgsql 7.4.7. I'd like to speed up the DB writes a bit if possible. Data integrity is not at all critical as the database gets dropped, created, and populated immediately before each use. Filesystem is ext3, data=ordered and I need to keep it that way as there is other data in the filesystem that I do care about. I have not done any tuning in the config file yet, and was wondering what things would likely speed up writes in this situation. I'm doing the writes individually. Is there a better way? Combining them all into a transaction or something? Thanks, Steve Bergman ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Select performance vs. mssql
Until you start worrying about MVC - we have had problems with the MSSQL implementation of read consistency because of this 'feature'. Alex Turner NetEconomistOn 5/24/05, Bruno Wolff III <[EMAIL PROTECTED]> wrote: On Tue, May 24, 2005 at 08:36:36 -0700, mark durrant <[EMAIL PROTECTED]> wrote:>> --MSSQL's ability to hit the index only and not having> to go to the table itself results in a _big_ > performance/efficiency gain. If someone who's in> development wants to pass this along, it would be a> nice addition to PostgreSQL sometime in the future.> I'd suspect that as well as making one query faster, > it would make everything else faster/more scalable as> the server load is so much less.This gets brought up a lot. The problem is that the index doesn't includeinformation about whether the current transaction can see the referenced row. Putting this information in the index will add significant overheadto every update and the opinion of the developers is that this would bea net loss overall.---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Filesystem
We have been using XFS for about 6 months now and it has even tolerated a controller card crash. So far we have mostly good things to report about XFS. I benchmarked raw throughputs at various stripe sizes, and XFS came out on top for us against reiser and ext3. I also used it because of it's supposed good support for large files, which was verified somewhat by the benchmarks. I have noticed a problem though - if you have 80 files in a directory, it seems that XFS chokes on simple operations like 'ls' or 'chmod -R ...' where ext3 doesn't, don't know about reiser, I went straight back to default after that problem (that partition is not on a DB server though). Alex Turner netEconomistOn 6/3/05, Martin Fandel <[EMAIL PROTECTED]> wrote: Hi @ all,i have only a little question. Which filesystem is preferred forpostgresql? I'm plan to use xfs (before i used reiserfs). The reasonis the xfs_freeze Tool to make filesystem-snapshots.Is the performance better than reiserfs, is it reliable? best regards,Martin---(end of broadcast)---TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] random_page_cost = 1?
Is this advisable? The disks are rather fast (15k iirc) but somehow I don't think they are covered in whatever magic fairy dust it would require for a sequential read to be as fast as a random one. However I could be wrong, are there any circumstances when this is actually going to help performance? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] PostgreSQL using the wrong Index
We have two index's like so l1_historical=# \d "N_intra_time_idx" Index "N_intra_time_idx" Column |Type +- time | timestamp without time zone btree l1_historical=# \d "N_intra_pkey" Index "N_intra_pkey" Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from "N_intra" where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; PostgreSQL takes a very long time to complete, as it effectively scans the entire table, backwards. And the table is huge, about 450 million rows. (btw, there are no triggers or any other exciting things like that on our tables in this db.) but on things where the symbol does exist in the table, it's more or less fine, and nice and fast. Whilst the option the planner has taken might be faster most of the time, the worst case scenario is unacceptable for obvious reasons. I've googled for trying to force the use of a specific index, but can't find anything relevant. Does anyone have any suggestions on getting it to use an index which hopefully will have better worst case performance? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL using the wrong Index
Oh, we are running 7.4.2 btw. And our random_page_cost = 1 On 13 Jun 2005, at 14:02, Alex Stapleton wrote: We have two index's like so l1_historical=# \d "N_intra_time_idx" Index "N_intra_time_idx" Column |Type +- time | timestamp without time zone btree l1_historical=# \d "N_intra_pkey" Index "N_intra_pkey" Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from "N_intra" where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; PostgreSQL takes a very long time to complete, as it effectively scans the entire table, backwards. And the table is huge, about 450 million rows. (btw, there are no triggers or any other exciting things like that on our tables in this db.) but on things where the symbol does exist in the table, it's more or less fine, and nice and fast. Whilst the option the planner has taken might be faster most of the time, the worst case scenario is unacceptable for obvious reasons. I've googled for trying to force the use of a specific index, but can't find anything relevant. Does anyone have any suggestions on getting it to use an index which hopefully will have better worst case performance? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL using the wrong Index
On 13 Jun 2005, at 15:47, John A Meinel wrote: Alex Stapleton wrote: Oh, we are running 7.4.2 btw. And our random_page_cost = 1 Which is only correct if your entire db fits into memory. Also, try updating to a later 7.4 version if at all possible. I am aware of this, I didn't configure this machine though unfortuantely. On 13 Jun 2005, at 14:02, Alex Stapleton wrote: We have two index's like so l1_historical=# \d "N_intra_time_idx" Index "N_intra_time_idx" Column |Type +- time | timestamp without time zone btree Just so you are aware, writing this as: "We have an index on N_intra(time) and one on N_Intra(symbol, time)" is a lot more succinct. Sorry, I happened to have them there in my clipboard at the time so I just blindly pasted them in. l1_historical=# \d "N_intra_pkey" Index "N_intra_pkey" Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from "N_intra" where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; PostgreSQL takes a very long time to complete, as it effectively scans the entire table, backwards. And the table is huge, about 450 million rows. (btw, there are no triggers or any other exciting things like that on our tables in this db.) but on things where the symbol does exist in the table, it's more or less fine, and nice and fast. What happens if you do: SELECT * FROM "N_intra" WHERE symbol='doesnt exist' ORDER BY symbol, time DESC LIMIT 1; Hurrah! I should of thought of this, considering i've done it in the past :) Thanks a lot, that's great. Yes, symbol is constant, but it frequently helps the planner realize it can use an index scan if you include all terms in the index in the ORDER BY clause. Whilst the option the planner has taken might be faster most of the time, the worst case scenario is unacceptable for obvious reasons. I've googled for trying to force the use of a specific index, but can't find anything relevant. Does anyone have any suggestions on getting it to use an index which hopefully will have better worst case performance? Try the above first. You could also create a new index on symbol CREATE INDEX "N_intra_symbol_idx" ON "N_intra"(symbol); Then the WHERE clause should use the symbol index, which means it can know quickly that an entry doesn't exist. I'm not sure how many entries you have per symbol, though, so this might cause problems in the ORDER BY time portion. I'm guessing what you really want is to just do the ORDER BY symbol, time. John =:-> ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Multiple disks: RAID 5 or PG Cluster
Of course these numbers are not true as soon as you exceed the stripe size for a read operation, which is often only 128k. Typically a stripe of mirrors will not read from seperate halves of the mirrors either, so RAID 10 is only N/2 best case in my experience, Raid 0+1 is a mirror of stripes and will read from independant halves, but gives worse redundancy. Alex Turner NetEconomistOn 6/18/05, Jacques Caron <[EMAIL PROTECTED]> wrote: Hi,At 18:00 18/06/2005, PFC wrote:> I don't know what I'm talking about, but wouldn't mirorring be> faster>than striping for random reads like you often get on a database ? (ie. the >reads can be dispatched to any disk) ? (or course, not for writes, but if>you won't use fsync, random writes should be reduced no ?)Roughly, for random reads, the performance (in terms of operations/s) compared to a single disk setup, with N being the number of drives, is:RAID 0 (striping):- read = N- write = N- capacity = N- redundancy = 0RAID 1 (mirroring, N=2):- read = N- write = 1 - capacity = 1- redundancy = 1RAID 5 (striping + parity, N>=3)- read = N-1- write = 1/2- capacity = N-1- redundancy = 1RAID 10 (mirroring + striping, N=2n, N>=4)- read = N - write = N/2- capacity = N/2- redundancy < N/2So depending on your app, i.e. your read/write ratio, how much data can becached, whether the data is important or not, how much data you have, etc, one or the other option might be better.Jacques.---(end of broadcast)---TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] autovacuum suggestions for 500,000,000+ row tables?
Hi, i'm trying to optimise our autovacuum configuration so that it vacuums / analyzes some of our larger tables better. It has been set to the default settings for quite some time. We never delete anything (well not often, and not much) from the tables, so I am not so worried about the VACUUM status, but I am wary of XID wraparound nuking us at some point if we don't sort vacuuming out so we VACUUM at least once every year ;) However not running ANALYZE for such huge periods of time is probably impacting the statistics accuracy somewhat, and I have seen some unusually slow queries at times. Anyway, does anyone think we might benefit from a more aggressive autovacuum configuration? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?
On 20 Jun 2005, at 15:59, Jacques Caron wrote: Hi, At 16:44 20/06/2005, Alex Stapleton wrote: We never delete anything (well not often, and not much) from the tables, so I am not so worried about the VACUUM status DELETEs are not the only reason you might need to VACUUM. UPDATEs are important as well, if not more. Tables that are constantly updated (statistics, session data, queues...) really need to be VACUUMed a lot. We UPDATE it even less often. but I am wary of XID wraparound nuking us at some point if we don't sort vacuuming out so we VACUUM at least once every year ;) That would give you a maximum average of 31 transactions/sec... Don't know if that's high or low for you. It's high as far as inserts go for us. It does them all at the end of each minute. However not running ANALYZE for such huge periods of time is probably impacting the statistics accuracy somewhat, and I have seen some unusually slow queries at times. Anyway, does anyone think we might benefit from a more aggressive autovacuum configuration? ANALYZE is not a very expensive operation, however VACUUM can definitely be a big strain and take a long time on big tables, depending on your setup. I've found that partitioning tables (at the application level) can be quite helpful if you manage to keep each partition to a reasonable size (under or close to available memory), especially if the partitioning scheme is somehow time- related. YMMV. Jacques. That's not currently an option as it would require a pretty large amount of work to implement. I think we will have to keep that in mind though. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?
On 20 Jun 2005, at 18:46, Josh Berkus wrote: Alex, Hi, i'm trying to optimise our autovacuum configuration so that it vacuums / analyzes some of our larger tables better. It has been set to the default settings for quite some time. We never delete anything (well not often, and not much) from the tables, so I am not so worried about the VACUUM status, but I am wary of XID wraparound nuking us at some point if we don't sort vacuuming out so we VACUUM at least once every year ;) I personally don't use autovaccuum on very large databases. For DW, vacuuming is far better tied to ETL operations or a clock schedule of downtime. Downtime is something I'd rather avoid if possible. Do you think we will need to run VACUUM FULL occasionally? I'd rather not lock tables up unless I cant avoid it. We can probably squeeze an automated vacuum tied to our data inserters every now and then though. XID wraparound may be further away than you think. Try checking pg_controldata, which will give you the current XID, and you can calculate how long you are away from wraparound. I just tested a 200G data warehouse and figured out that we are 800 months away from wraparound, despite hourly ETL. Is this an 8.0 thing? I don't have a pg_controldata from what I can see. Thats nice to hear though. However not running ANALYZE for such huge periods of time is probably impacting the statistics accuracy somewhat, and I have seen some unusually slow queries at times. Anyway, does anyone think we might benefit from a more aggressive autovacuum configuration? Hmmm, good point, you could use autovacuum for ANALYZE only. Just set the VACUUM settings preposterously high (like 10x) so it never runs. Then it'll run ANALYZE only. I generally threshold 200, multiple 0.1x for analyze; that is, re-analyze after 200+10% of rows have changed. I will try those settings out, that sounds good to me though. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?
On 21 Jun 2005, at 18:13, Josh Berkus wrote: Alex, Downtime is something I'd rather avoid if possible. Do you think we will need to run VACUUM FULL occasionally? I'd rather not lock tables up unless I cant avoid it. We can probably squeeze an automated vacuum tied to our data inserters every now and then though. As long as your update/deletes are less than 10% of the table for all time, you should never have to vacuum, pending XID wraparound. Hmm, I guess as we have hundreds of millions of rows, and when we do delete things, it's only a few thousand, and rarely. VACUUMing shouldn't need to happen too often. Thats good. Thanks a lot for the advice. Is this an 8.0 thing? I don't have a pg_controldata from what I can see. Thats nice to hear though. 'fraid so, yes. Bloody Debian stable. I might have to experiment with building from source or using alien on debian to convert the rpms. Fun. Oh well. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Mount database on RAM disk?
On 8 Jul 2005, at 20:21, Merlin Moncure wrote: Stuart, I'm putting together a road map on how our systems can scale as our load increases. As part of this, I need to look into setting up some fast read only mirrors of our database. We should have more than enough RAM to fit everything into memory. I would like to find out if I could expect better performance by mounting the database from a RAM disk, or if I would be better off keeping that RAM free and increasing the effective_cache_size appropriately. If you're accessing a dedicated, read-only system with a database small enough to fit in RAM, it'll all be cached there anyway, at least on Linux and BSD. You won't be gaining anything by creating a ramdisk. ditto windows. Files cached in memory are slower than reading straight from memory but not nearly enough to justify reserving memory for your use. In other words, your O/S is a machine with years and years of engineering designed best how to dole memory out to caching and various processes. Why second guess it? Because sometimes it gets it wrong. The most brutal method is occasionally the most desirable. Even if it not the "right" way to do it. Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Cheap RAM disk?
Also seems pretty silly to put it on a regular SATA connection, when all that can manage is 150MB/sec. If you made it connection directly to 66/64-bit PCI then it could actualy _use_ the speed of the RAM, not to mention PCI-X. Alex Turner NetEconomist On 7/26/05, John A Meinel <[EMAIL PROTECTED]> wrote: > I saw a review of a relatively inexpensive RAM disk over at > anandtech.com, the Gigabyte i-RAM > http://www.anandtech.com/storage/showdoc.aspx?i=2480 > > Basically, it is a PCI card, which takes standard DDR RAM, and has a > SATA port on it, so that to the system, it looks like a normal SATA drive. > > The card costs about $100-150, and you fill it with your own ram, so for > a 4GB (max size) disk, it costs around $500. Looking for solid state > storage devices, the cheapest I found was around $5k for 2GB. > > Gigabyte claims that the battery backup can last up to 16h, which seems > decent, if not really long (the $5k solution has a built-in harddrive so > that if the power goes out, it uses the battery power to copy the > ramdisk onto the harddrive for more permanent storage). > > Anyway, would something like this be reasonable as a drive for storing > pg_xlog? With 4GB you could have as many as 256 checkpoint segments. > > I'm a little leary as it is definitely a version 1.0 product (it is > still using an FPGA as the controller, so they were obviously pushing to > get the card into production). > > But it seems like this might be a decent way to improve insert > performance, without setting fsync=false. > > Probably it should see some serious testing (as in power spikes/pulled > plugs, etc). I know the article made some claim that if you actually > pull out the card it goes into "high consumption mode" which is somehow > greater than if you leave it in the slot with the power off. Which to me > seems like a lot of bull, and really means the 16h is only under > best-case circumstances. But even 1-2h is sufficient to handle a simple > power outage. > > And if you had a UPS with detection of power failure, you could always > sync the ramdisk to a local partition before the power goes out. Though > you could do that with a normal in-memory ramdisk (tmpfs) without having > to buy the card. Though it does give you up-to an extra 4GB of ram, for > machines which have already maxed out their slots. > > Anyway, I thought I would mention it to the list, to see if anyone else > has heard of it, or has any thoughts on the matter. I'm sure there are > some people who are using more expensive ram disks, maybe they have some > ideas about what this device is missing. (other than costing about > 1/10th the price) > > John > =:-> > > > > ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Cheap RAM disk?
Please see: http://www.newegg.com/Product/Product.asp?Item=N82E16820145309 and http://www.newegg.com/Product/Product.asp?Item=N82E16820145416 The price of Reg ECC is not significantly higher than regular ram at this point. Plus if you go with super fast 2-2-2-6 then it's actualy more than good ol 2.5 Reg ECC. Alex Turner NetEconomist On 7/26/05, PFC <[EMAIL PROTECTED]> wrote: > > > I'm a little leary as it is definitely a version 1.0 product (it is > > still using an FPGA as the controller, so they were obviously pushing to > > get the card into production). > > Not necessarily. FPGA's have become a sensible choice now. My RME > studio > soundcard uses a big FPGA. > > The performance in the test doesn't look that good, though, but don't > forget it was run under windows. For instance they get 77s to copy the > Firefox source tree on their Athlon 64/raptor ; my Duron / 7200rpm ide > drive does it in 30 seconds, but not with windows of course. > > However it doesnt' use ECC so... That's a pity, because they could > have > implemented ECC in "software" inside the chip, and have the benefits of > error correction with normal, cheap RAM. > > Well; wait and see... > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend > ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Need for speed
Are you calculating aggregates, and if so, how are you doing it (I ask the question from experience of a similar application where I found that my aggregating PGPLSQL triggers were bogging the system down, and changed them so scheduled jobs instead). Alex Turner NetEconomist On 8/16/05, Ulrich Wisser <[EMAIL PROTECTED]> wrote: > Hello, > > one of our services is click counting for on line advertising. We do > this by importing Apache log files every five minutes. This results in a > lot of insert and delete statements. At the same time our customers > shall be able to do on line reporting. > > We have a box with > Linux Fedora Core 3, Postgres 7.4.2 > Intel(R) Pentium(R) 4 CPU 2.40GHz > 2 scsi 76GB disks (15.000RPM, 2ms) > > I did put pg_xlog on another file system on other discs. > > Still when several users are on line the reporting gets very slow. > Queries can take more then 2 min. > > I need some ideas how to improve performance in some orders of > magnitude. I already thought of a box with the whole database on a ram > disc. So really any idea is welcome. > > Ulrich > > > > -- > Ulrich Wisser / System Developer > > RELEVANT TRAFFIC SWEDEN AB, Riddarg 17A, SE-114 57 Sthlm, Sweden > Direct (+46)86789755 || Cell (+46)704467893 || Fax (+46)86789769 > > http://www.relevanttraffic.com > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly > ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] choosing RAID level for xlogs
Theoretically RAID 5 can perform better than RAID 10 over the same number of drives (more members form the stripe in RAID 5 than in RAID 10). All you have to do is calculate parity faster than the drives can write. Doesn't seem like a hard task really, although most RAID controllers seem incapable of doing so, it is possible that Clariion might be able to acheive it. The other factor is that for partial block writes, the array has to first read the original block in order to recalculate the parity, so small random writes are very slow. If you are writing chunks that are larger than your stripe size*(n-1), then in theory the controller doesn't have to re-read a block, and can just overwrite the parity with the new info. Consider just four drives. in RAID 10, it is a stripe of two mirrors, forming two independant units to write to. in RAID 5, it is a 3 drive stripe with parity giving three independant units to write to. Theoretically the RAID 5 should be faster, but I've yet to benchmark a controler where this holds to be true. Of course if you ever do have a drive failure, your array grinds to a halt because rebuilding a raid 5 requires reading (n-1) blocks to rebuild just one block where n is the number of drives in the array, whereas a mirror only required to read from a single spindle of the RAID. I would suggest running some benchmarks at RAID 5 and RAID 10 to see what the _real_ performance actualy is, thats the only way to really tell. Alex Turner NetEconomist On 8/16/05, Anjan Dave <[EMAIL PROTECTED]> wrote: > Yes, that's true, though, I am a bit confused because the Clariion array > document I am reading talks about how the write cache can eliminate the > RAID5 Write Penalty for sequential and large IOs...resulting in better > sequential write performance than RAID10. > > anjan > > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Tuesday, August 16, 2005 2:00 PM > To: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] choosing RAID level for xlogs > > Quoting Anjan Dave <[EMAIL PROTECTED]>: > > > Hi, > > > > > > > > One simple question. For 125 or more checkpoint segments > > (checkpoint_timeout is 600 seconds, shared_buffers are at 21760 or > > 170MB) on a very busy database, what is more suitable, a separate 6 > disk > > RAID5 volume, or a RAID10 volume? Databases will be on separate > > spindles. Disks are 36GB 15KRPM, 2Gb Fiber Channel. Performance is > > paramount, but I don't want to use RAID0. > > > > RAID10 -- no question. xlog activity is overwhelmingly sequential 8KB > writes. > In order for RAID5 to perform a write, the host (or controller) needs to > perform > extra calculations for parity. This turns into latency. RAID10 does > not > perform those extra calculations. > > > > > > > PG7.4.7 on RHAS 4.0 > > > > > > > > I can provide more info if needed. > > > > > > > > Appreciate some recommendations! > > > > > > > > Thanks, > > > > Anjan > > > > > > > > > > --- > > This email message and any included attachments constitute > confidential > > and privileged information intended exclusively for the listed > > addressee(s). If you are not the intended recipient, please notify > > Vantage by immediately telephoning 215-579-8390, extension 1158. In > > addition, please reply to this message confirming your receipt of the > > same in error. A copy of your email reply can also be sent to > > [EMAIL PROTECTED] Please do not disclose, copy, distribute or take > > any action in reliance on the contents of this information. Kindly > > destroy all copies of this message and any attachments. Any other use > of > > this email is prohibited. Thank you for your cooperation. For more > > information about Vantage, please visit our website at > > http://www.vantage.com <http://www.vantage.com/> . > > --- > > > > > > > > > > > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly > > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org > ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] choosing RAID level for xlogs
Don't forget that often controlers don't obey fsyncs like a plain drive does. thats the point of having a BBU ;) Alex Turner NetEconomist On 8/16/05, John A Meinel <[EMAIL PROTECTED]> wrote: > Anjan Dave wrote: > > Yes, that's true, though, I am a bit confused because the Clariion array > > document I am reading talks about how the write cache can eliminate the > > RAID5 Write Penalty for sequential and large IOs...resulting in better > > sequential write performance than RAID10. > > > > anjan > > > > To give a shorter statement after my long one... > If you have enough cache that the controller can write out big chunks to > the disk at a time, you can get very good sequential RAID5 performance, > because the stripe size is large (so it can do a parallel write to all > disks). > > But for small chunk writes, you suffer the penalty of the read before > write, and possible multi-disk read (depends on what is in cache). > > RAID10 generally handles small writes better, and I would guess that > 4disks would perform almost identically to 6disks, since you aren't > usually writing enough data to span multiple stripes. > > If your battery-backed cache is big enough that you don't fill it, they > probably perform about the same (superfast) since the cache hides the > latency of the disks. > > If you start filling up your cache, RAID5 probably can do better because > of the parallelization. > > But small writes followed by an fsync do favor RAID10 over RAID5. > > John > =:-> > > > ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] choosing RAID level for xlogs
The other point that is well made is that with enough drives you will max out the PCI bus before you max out the drives. 64-bit 66Mhz can do about 400MB/sec, which can be acheived by two 3 drive stripes (6 drive in RAID 10). A true PCI-X card can do better, but can your controller? Remember, U320 is only 320MB/channel... Alex Turner NetEconomist On 8/16/05, Anjan Dave <[EMAIL PROTECTED]> wrote: > Thanks, everyone. I got some excellent replies, including some long > explanations. Appreciate the time you guys took out for the responses. > > The gist of it i take, is to use RAID10. I have 400MB+ of write cache on the > controller(s), that the RAID5 LUN(s) could benefit from by filling it up and > writing out the complete stripe, but come to think of it, it's shared among > the two Storage Processors, all the LUNs, not just the ones holding the > pg_xlog directory. The other thing (with Clariion) is the write cache > mirroring. Write isn't signalled complete to the host until the cache content > is mirrored across the other SP (and vice-versa), which is a good thing, but > this operation could potentially become a bottleneck with very high load on > the SPs. > > Also, one would have to fully trust the controller/manufacturer's claim on > signalling the write completion. And, performance is a priority over the > drive space lost in RAID10 for me. > > I can use 4 drives instead of 6. > > Thanks, > Anjan > > t-Original Message- > From: Gregory S. Williamson [mailto:[EMAIL PROTECTED] > Sent: Tue 8/16/2005 6:22 PM > To: Anjan Dave; pgsql-performance@postgresql.org > Cc: > Subject: RE: [PERFORM] choosing RAID level for xlogs > > > > I would be very cautious about ever using RAID5, despite > manufacturers' claims to the contrary. The link below is authored by a very > knowledgable fellow whose posts I know (and trust) from Informix land. > > <http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt> > > Greg Williamson > DBA > GlobeXplorer LLC > > > -Original Message- > From: [EMAIL PROTECTED] on behalf of Anjan Dave > Sent: Mon 8/15/2005 1:35 PM > To: pgsql-performance@postgresql.org > Cc: > Subject:[PERFORM] choosing RAID level for xlogs > Hi, > > > > One simple question. For 125 or more checkpoint segments > (checkpoint_timeout is 600 seconds, shared_buffers are at 21760 or > 170MB) on a very busy database, what is more suitable, a separate 6 > disk > RAID5 volume, or a RAID10 volume? Databases will be on separate > spindles. Disks are 36GB 15KRPM, 2Gb Fiber Channel. Performance is > paramount, but I don't want to use RAID0. > > > > PG7.4.7 on RHAS 4.0 > > > > I can provide more info if needed. > > > > Appreciate some recommendations! > > > > Thanks, > > Anjan > > > > > --- > This email message and any included attachments constitute > confidential > and privileged information intended exclusively for the listed > addressee(s). If you are not the intended recipient, please notify > Vantage by immediately telephoning 215-579-8390, extension 1158. In > addition, please reply to this message confirming your receipt of the > same in error. A copy of your email reply can also be sent to > [EMAIL PROTECTED] Please do not disclose, copy, distribute or take > any action in reliance on the contents of this information. Kindly > destroy all copies of this message and any attachments. Any other use > of > this email is prohibited. Thank you for your cooperation. For more > information about Vantage, please visit our website at > http://www.vantage.com <http://www.vantage.com/> . > --- > > > > > > !DSPAM:4300fd35105094125621296! > > > > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match > ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] sustained update load of 1-2k/sec
I have managed tx speeds that high from postgresql going even as high as 2500/sec for small tables, but it does require a good RAID controler card (yes I'm even running with fsync on). I'm using 3ware 9500S-8MI with Raptor drives in multiple RAID 10s. The box wasn't too $$$ at just around $7k. I have two independant controlers on two independant PCI buses to give max throughput. on with a 6 drive RAID 10 and the other with two 4 drive RAID 10s. Alex Turner NetEconomist On 8/19/05, Mark Cotner <[EMAIL PROTECTED]> wrote: > Hi all, > I bet you get tired of the same ole questions over and > over. > > I'm currently working on an application that will poll > thousands of cable modems per minute and I would like > to use PostgreSQL to maintain state between polls of > each device. This requires a very heavy amount of > updates in place on a reasonably large table(100k-500k > rows, ~7 columns mostly integers/bigint). Each row > will be refreshed every 15 minutes, or at least that's > how fast I can poll via SNMP. I hope I can tune the > DB to keep up. > > The app is threaded and will likely have well over 100 > concurrent db connections. Temp tables for storage > aren't a preferred option since this is designed to be > a shared nothing approach and I will likely have > several polling processes. > > Here are some of my assumptions so far . . . > > HUGE WAL > Vacuum hourly if not more often > > I'm getting 1700tx/sec from MySQL and I would REALLY > prefer to use PG. I don't need to match the number, > just get close. > > Is there a global temp table option? In memory tables > would be very beneficial in this case. I could just > flush it to disk occasionally with an insert into blah > select from memory table. > > Any help or creative alternatives would be greatly > appreciated. :) > > 'njoy, > Mark > > > -- > Writing software requires an intelligent person, > creating functional art requires an artist. > -- Unknown > > > ---(end of broadcast)--- > TIP 3: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faq > ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] sustained update load of 1-2k/sec
Don't forget that Ultra 320 is the speed of the bus, not each drive. No matter how many honking 15k disks you put on a 320MB bus, you can only get 320MB/sec! and have so many outstanding IO/s on the bus. Not so with SATA! Each drive is on it's own bus, and you are only limited by the speed of your PCI-X Bus, which can be as high as 800MB/sec at 133Mhz/64bit. It's cheap and it's fast - all you have to do is pay for the enclosure, which can be a bit pricey, but there are some nice 24bay and even 40bay enclosures out there for SATA. Yes a 15k RPM drive will give you better seek time and better peak through put, but put them all on a single U320 bus and you won't see much return past a stripe size of 3 or 4. If it's raw transactions per second data warehouse style, it's all about the xlog baby which is sequential writes, and all about large block reads, which is sequential reads. Alex Turner NetEconomist P.S. Sorry if i'm a bit punchy, I've been up since yestarday with server upgrade nightmares that continue ;) On 8/19/05, Ron <[EMAIL PROTECTED]> wrote: > Alex mentions a nice setup, but I'm pretty sure I know how to beat > that IO subsystems HW's performance by at least 1.5x or 2x. Possibly > more. (No, I do NOT work for any vendor I'm about to discuss.) > > Start by replacing the WD Raptors with Maxtor Atlas 15K II's. > At 5.5ms average access, 97.4MB/s outer track throughput, 85.9MB/s > average, and 74.4 MB/s inner track throughput, they have the best > performance characteristics of any tested shipping HDs I know > of. (Supposedly the new SAS versions will _sustain_ ~98MB/s, but > I'll believe that only if I see it under independent testing). > In comparison, the numbers on the WD740GD are 8.1ms average access, > 71.8, 62.9, and 53.9 MB/s outer, average and inner track throughputs > respectively. > > Be prepared to use as many of them as possible (read: as many you can > afford) if you want to maximize transaction rates, particularly for > small transactions like this application seems to be mentioning. > > Next, use a better RAID card. The TOL enterprise stuff (Xyratex, > Engino, Dot-hill) is probably too expensive, but in the commodity > market benchmarks indicate that that Areca's 1GB buffer RAID cards > currently outperform all the other commodity RAID stuff. > > 9 Atlas II's per card in a RAID 5 set, or 16 per card in a RAID 10 > set, should max the RAID card's throughput and come very close to, if > not attaining, the real world peak bandwidth of the 64b 133MHz PCI-X > bus they are plugged into. Say somewhere in the 700-800MB/s range. > > Repeat the above for as many independent PCI-X buses as you have for > a very fast commodity RAID IO subsystem. > > Two such configured cards used in the dame manner as mentioned by > Alex should easily attain 1.5x - 2x the transaction numbers mentioned > by Alex unless there's a bottleneck somewhere else in the system design. > > Hope this helps, > Ron Peacetree > > At 08:40 AM 8/19/2005, Alex Turner wrote: > >I have managed tx speeds that high from postgresql going even as high > >as 2500/sec for small tables, but it does require a good RAID > >controler card (yes I'm even running with fsync on). I'm using 3ware > >9500S-8MI with Raptor drives in multiple RAID 10s. The box wasn't too > >$$$ at just around $7k. I have two independant controlers on two > >independant PCI buses to give max throughput. on with a 6 drive RAID > >10 and the other with two 4 drive RAID 10s. > > > >Alex Turner > >NetEconomist > > > >On 8/19/05, Mark Cotner <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > I bet you get tired of the same ole questions over and > > > over. > > > > > > I'm currently working on an application that will poll > > > thousands of cable modems per minute and I would like > > > to use PostgreSQL to maintain state between polls of > > > each device. This requires a very heavy amount of > > > updates in place on a reasonably large table(100k-500k > > > rows, ~7 columns mostly integers/bigint). Each row > > > will be refreshed every 15 minutes, or at least that's > > > how fast I can poll via SNMP. I hope I can tune the > > > DB to keep up. > > > > > > The app is threaded and will likely have well over 100 > > > concurrent db connections. Temp tables for storage > > > aren't a preferred option since this is designed to be > > > a shared nothing approach and I will likely have > > > several polling processes. > > > > > > Here are some of my a
Re: [PERFORM] Avoid using swap in a cluster
On 2 Sep 2005, at 10:42, Richard Huxton wrote: Ricardo Humphreys wrote: Hi all. In a cluster, is there any way to use the main memory of the other nodes instead of the swap? If I have a query with many sub- queries and a lot of data, I can easily fill all the memory in a node. The point is: is there any way to continue using the main memory from other nodes in the same query instead of the swap? I don't know of any clustered version of PG that can spread queries over multiple machines. Can I ask what you are using? IIRC GreenPlums DeepGreen MPP (Version 2) can do it. It does cost money though, but it is a very nice product. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query take 101 minutes, help, please
On Wed, 7 Sep 2005, Meetesh Karia wrote: > PG is creating the union of January, February and March tables first and > that doesn't have an index on it. If you're going to do many queries using > the union of those three tables, you might want to place their contents into > one table and create an index on it. > > Otherwise, try something like this: > > SELECT "Rut Cliente" > FROM "Internet_Abril" > WHERE "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM > "Internet_Enero") > AND "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM > "Internet_Febrero") > AND "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM > "Internet_Marzo"); You may also wish to try: SELECT "Rut Cliente" FROM "Internet_Abril" WHERE NOT EXISTS (SELECT 1 FROM "Internet_Enero" WHERE "Internet_Enero"."Rut Cliente"="Internet_Abril"."Rut Cliente") AND NOT EXISTS (SELECT 1 FROM "Internet_Febrero" WHERE "Internet_Febrero"."Rut Cliente"="Internet_Abril"."Rut Cliente") AND NOT EXISTS (SELECT 1 FROM "Internet_Marzo" WHERE "Internet_Marzo"."Rut Cliente"="Internet_Abril"."Rut Cliente") which will probably scan the indexes on the January, February and March indexes once for each row in the April table. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance considerations for very heavy INSERT traffic
Split your system into multiple partitions of RAID 10s.For max performance, ten drive RAID 10 for pg_xlog (This will max out a PCI-X bus) on Bus A, multiple 4/6Drive RAID 10s for tablespaces on Bus B. For max performance I would recommend using one RAID 10 for raw data tables, one for aggregate tables and one for indexes. More RAM will only help you with queries against your data, if you are pre-aggregating, then you may not need all that much RAM. You can easily get 100 tansacts per second with even less hardware with a little data partitioning. Choose your controller carefully as many don't co-operate with linux well. Alex Turner NetEconomist On 9/12/05, Brandon Black <[EMAIL PROTECTED]> wrote: I'm in the process of developing an application which uses PostgreSQL for data storage. Our database traffic is very atypical, and as a result it has been rather challenging to figure out how to best tune PostgreSQL on what development hardware we have, as well as to figure out exactly what we should be evaluating and eventually buying for production hardware. The vast, overwhelming majority of our database traffic is pretty much a non-stop stream of INSERTs filling up tables. It is akin to data acquisition. Several thousand clients are sending once-per-minute updates full of timestamped numerical data at our central server, which in turn performs INSERTs into several distinct tables as part of the transaction for that client. We're talking on the order of ~100 transactions per second, each containing INSERTs to multiple tables (which contain only integer and floating point columns and a timestamp column - the primary key (and only index) is on a unique integer ID for the client and the timestamp). The transaction load is spread evenly over time by having the clients send their per-minute updates at random times rather than on the exact minute mark. There will of course be users using a web-based GUI to extract data from these tables and display them in graphs and whatnot, but the SELECT query traffic will always be considerably less frequent and intensive than the incessant INSERTs, and it's not that big a deal if the large queries take a little while to run. This data also expires - rows with timestamps older than X days will be DELETEd periodically (once an hour or faster), such that the tables will reach a relatively stable size (pg_autovacuum is handling vacuuming for now, but considering our case, we're thinking of killing pg_autovacuum in favor of having the periodic DELETE process also do a vacuum of affected tables right after the DELETE, and then have it vacuum the other low traffic tables once a day while it's at it). There is an aggregation layer in place which proxies the inbound data from the clients into a small(er) number of persistent postgresql backend processes. Right now we're doing one aggregator per 128 clients (so instead of 128 seperate database connections over the course of a minute for a small transaction each, there is a single database backend that is constantly committing transactions at a rate of ~ 2/second). At a test load of ~1,000 clients, we would have 8 aggregators running and 8 postgresql backends. Testing has seemed to indicate we should aggregate even harder - the planned production load is ~5,000 clients initially, but will grow to almost double that in the not-too-distant future, and that would mean ~40 backends at 128 clients each initially. Even on 8 cpus, I'm betting 40 concurrent backends doing 2 tps is much worse off than 10 backends doing 8 tps. Test hardware right now is a dual Opteron with 4G of ram, which we've barely gotten 1,000 clients running against. Current disk hardware in testing is whatever we could scrape together (4x 3-ware PCI hardware RAID controllers, with 8 SATA drives in a RAID10 array off of each - aggregated up in a 4-way stripe with linux md driver and then formatted as ext3 with an appropriate stride parameter and data="" Production will hopefully be a 4-8-way Opteron, 16 or more G of RAM, and a fiberchannel hardware raid array or two (~ 1TB available RAID10 storage) with 15krpm disks and battery-backed write cache. I know I haven't provided a whole lot of application-level detail here, but does anyone have any general advice on tweaking postgresql to deal with a very heavy load of concurrent and almost exclusively write-only transactions? Increasing shared_buffers seems to always help, even out to half of the dev box's ram (2G). A 100ms commit_delay seemed to help, but tuning it (and _siblings) has been difficult. We're using 8.0 with the default 8k blocksize, but are strongly considering both developing against 8.1 (seems it might handle the heavy concurrency better), and re-compiling with 32k blocksize since our storage arrays will inevitably be using fairly wide stripes. Any advice on any of this (other than drop the project while you're still a little bit sane)? --Brandon
Re: [PERFORM] RAID Stripe size
I have benched different sripe sizes with different file systems, and the perfmance differences can be quite dramatic. Theoreticaly a smaller stripe is better for OLTP as you can write more small transactions independantly to more different disks more often than not, but a large stripe size is good for Data warehousing as you are often doing very large sequential reads, and a larger stripe size is going to exploit the on-drive cache as you request larger single chunks from the disk at a time. It also seems that different controllers are partial to different defaults that can affect their performance, so I would suggest that testing this on two different controller cards man be less than optimal. I would also recommend looking at file system. For us JFS worked signifcantly faster than resier for large read loads and large write loads, so we chose JFS over ext3 and reiser. I found that lower stripe sizes impacted performance badly as did overly large stripe sizes. Alex Turner NetEconomistOn 16 Sep 2005 04:51:43 -0700, bmmbn <[EMAIL PROTECTED]> wrote: Hi EveryoneThe machine is IBM x345 with ServeRAID 6i 128mb cache and 6 SCSI 15kdisks.2 disks are in RAID1 and hold the OS, SWAP & pg_xlog4 disks are in RAID10 and hold the Cluster itself. the DB will have two major tables 1 with 10 million rows and one with100 million rows.All the activities against this tables will be SELECT.Currently the strip size is 8k. I read in many place this is a poor setting.Am i right ?---(end of broadcast)---TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] RAID Stripe size
I have found JFS to be just fine. We have been running a medium load on this server for 9 months with no unscheduled down time. Datbase is about 30gig on disk, and we get about 3-4 requests per second that generate results sets in the thousands from about 8am to about 11pm. I have foudn that JFS barfs if you put a million files in a directory and try to do an 'ls', but then so did reiser, only Ext3 handled this test succesfully. Fortunately with a database, this is an atypical situation, so JFS has been fine for DB for us so far. We have had severe problems with Ext3 when file systems hit 100% usage, they get all kinds of unhappy, we haven't had the same problem with JFS. Alex Turner NetEconomistOn 9/20/05, Welty, Richard <[EMAIL PROTECTED]> wrote: Alex Turner wrote:> I would also recommend looking at file system. For us JFS worked signifcantly> faster than resier for large read loads and large write loads, so we chose JFS> over ext3 and reiser. has jfs been reliable for you? there seems to be a lot of conjecture about instability,but i find jfs a potentially attractive alternative for a number of reasons.richard---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Need for speed 2
I have found that while the OS may flush to the controller fast with fsync=true, the controller does as it pleases (it has BBU, so I'm not too worried), so you get great performance because your controller is determine read/write sequence outside of what is being demanded by an fsync. Alex Turner NetEconomistOn 8/25/05, Kelly Burkhart <[EMAIL PROTECTED]> wrote: On Thu, 2005-08-25 at 11:16 -0400, Ron wrote:> ># - Settings -> >> >fsync = false # turns forced synchronization on or off> >#wal_sync_method = fsync# the default varies across platforms:> > # fsync, fdatasync, open_sync, or>> I hope you have a battery backed write buffer!Battery backed write buffer will do nothing here, because the OS istaking it's sweet time flushing to the controller's battery backed write buffer!Isn't the reason for batter backed controller cache to make fsync()sfast?-K---(end of broadcast)---TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Monitoring Postgresql performance
On 28 Sep 2005, at 15:32, Arnau wrote: Hi all, I have been "googling" a bit searching info about a way to monitor postgresql (CPU & Memory, num processes, ... ) and I haven't found anything relevant. I'm using munin to monitor others parameters of my servers and I'd like to include postgresql or have a similar tool. Any of you is using anything like that? all kind of hints are welcome :-) Cheers! Have you looked at SNMP? It's a bit complex but there's lots of tools for monitoring system data / sending alerts based on SNMP already. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Indexes on ramdisk
Talk about your IO system a bit. There might be obvious ways to improve. What System/Motherboard are you using? What Controller Cards are you using? What kind of Disks do you have (SATA, SCSI 7.6k 10k 15k) What denominations (9, 18, 36, 72, 143, 80, 160, 200 240Gig)? What kind of RAIDs do you have setup (How many drives what stripe sizes, how many used for what). What levels of RAID are you using (0,1,10,5,50)? With good setup, a dual PCI-X bus motherboard can hit 2GB/sec and thousands of transactions to disk if you have a controller/disks that can keep up. That is typicaly enough for most people without resorting to SSD. Alex Turner NetEconomistOn 10/4/05, Emil Briggs <[EMAIL PROTECTED]> wrote: I have an application that has a table that is both read and write intensive.Data from iostat indicates that the write speed of the system is the factorthat is limiting performance. The table has around 20 columns and most of the columns are indexed. The data and the indices for the table are distributedover several mirrored disk partitions and pg_xlog is on another. I'm lookingat ways to improve performance and besides the obvious one of getting an SSD I thought about putting the indices on a ramdisk. That means that after apower failure or shutdown I would have to recreate them but that isacceptable for this application. What I am wondering though is whether or not I would see much performance benefit and if there would be any startupproblems after a power down event due to the indices not being present. Anyinsight would be appreciated.Emil---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Indexes on ramdisk
What kind of order of improvement do you need to see? What period are these number for? Were they collected over 1 hour, 1 day, 1 month? How much Cache do you have on the controller? You can certainly get more speed by adding more disk and possibly by adding more controller RAM/a second controller. 10 disks isn't really that many for a totally kick-ass DB server. You can acheive more block writes with RAID 10s than with RAID 1s. Wether it's cost effective is dependant on lots of factors like your chassis and drive enclosures etc. vs SSD. SSD will be faster, but last I heard was expensive, and I checked a few websites but couldn't get much price info. Normaly when you can't get price info, thats a bad sign ;). If you are doing large chunks of writes to a small number of tables, then you might be better off with a single large RAID 10 for your tablespace than with seperate RAID 1s. If you are writing 5 to 1 more table data than index data, you are hurting yourself by seperating on to multiple RAID 1s instead of a single RAID 10 which could write at 2-3x for the 5, and 2-3x for the 1 and only suffer a single seek penalty but get data onto disk twice to three times as fast (depending how many RAID 1s you join). Try unseperating RAID 1s, and combine to a RAID 10. for indexes and tablespaces. The controller will re-sequence your writes/reads to help with effeciency, and dbwriter is there to make things go easier. You can at least get some idea by doing an iostat and see how many IOs and how much throughput is happening. That will rappidly help determine if you are bound by IOs or by MB/sec. Worst case I'm wrong, but IMHO it's worth a try. Alex Turner NetEconomistOn 10/4/05, Emil Briggs <[EMAIL PROTECTED]> wrote: > Talk about your IO system a bit. There might be obvious ways to improve.>> What System/Motherboard are you using?> What Controller Cards are you using?> What kind of Disks do you have (SATA, SCSI 7.6k 10k 15k)> What denominations (9, 18, 36, 72, 143, 80, 160, 200 240Gig)?> What kind of RAIDs do you have setup (How many drives what stripe sizes,> how many used for what).> What levels of RAID are you using (0,1,10,5,50)? >It's a quad opteron system. RAID controller is a 4 channel LSILogic Megaraid320 connected to 10 15k 36.7G SCSI disks. The disks are configured in 5mirrored partitions. The pg_xlog is on one mirror and the data and indexes are spread over the other 4 using tablespaces. These numbers frompg_stat_user_tables are from about 2 hours earlier today on this one table.idx_scan 20578690idx_tup_fetch 35866104841 n_tup_ins1940081n_tup_upd 1604041n_tup_del1880424---(end of broadcast)---TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance on SUSE w/ reiserfs
Realise also that unless you are running the 1.5 x86-64 build, java will not use more than 1Gig, and if the app server requests more than 1gig, Java will die (I've been there) with an out of memory error, even though there is plenty of free mem available. This can easily be cause by a lazy GC thread if the applicaiton is running high on CPU usage. The kernel will not report memory used for caching pages as being unavailable, if a program calls a malloc, the kernel will just swap out the oldest disk page and give the memory to the application. Your free -mo shows 3 gig free even with cached disk pages. It looks to me more like either a Java problem, or a kernel problem... Alex Turner NetEconomistOn 10/10/05, Jon Brisbin <[EMAIL PROTECTED]> wrote: Tom Lane wrote:>> Are you sure it's not cached data pages, rather than cached inodes?> If so, the above behavior is *good*.>> People often have a mistaken notion that having near-zero free RAM means > they have a problem. In point of fact, that is the way it is supposed> to be (at least on Unix-like systems). This is just a reflection of the> kernel doing what it is supposed to do, which is to use all spare RAM > for caching recently accessed disk pages. If you're not swapping then> you do not have a problem.Except for the fact that my Java App server crashes when all theavailable memory is being used by caching and not reclaimed :-) If it wasn't for the app server going down, I probably wouldn't care.--Jon BrisbinWebmasterNPC International, Inc.---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance on SUSE w/ reiserfs
Perhaps this is true for 1.5 on x86-32 (I've only used it on x86-64) but I was more thinking 1.4 which many folks are still using. AlexOn 10/11/05, Alan Stange <[EMAIL PROTECTED]> wrote: Alex Turner wrote:> Realise also that unless you are running the 1.5 x86-64 build, java> will not use more than 1Gig, and if the app server requests more than> 1gig, Java will die (I've been there) with an out of memory error, > even though there is plenty of free mem available. This can easily be> cause by a lazy GC thread if the applicaiton is running high on CPU usage.On my side of Planet Earth, the standard non-x64 1.5 JVM will happilyuse more than 1G of memory (on linux and Solaris, can't speak forWindows). If you're running larger programs, it's probably a good ideato use the -server compiler in the JVM as well. I regularly run with -Xmx1800m and regularly have >1GB heap sizes.The standard GC will not cause on OOM error if space remains for therequested object. The GC thread blocks all other threads during itsactivity, whatever else is happening on the machine. The newer/experimental GC's did have some potential race conditions, but Ibelieve those have been resolved in the 1.5 JVMs.Finally, note that the latest _05 release of the 1.5 JVM also nowsupports large page sizes on Linux and Windows: -XX:+UseLargePages this can be quite beneficial depending on thememory patterns in your programs.-- Alan
Re: [PERFORM] Performance on SUSE w/ reiserfs
Well - to each his own I guess - we did extensive testing on 1.4, and it refused to allocate much past 1gig on both Linux x86/x86-64 and Windows. AlexOn 10/11/05, Alan Stange <[EMAIL PROTECTED]> wrote: Alex Turner wrote:> Perhaps this is true for 1.5 on x86-32 (I've only used it on x86-64)> but I was more thinking 1.4 which many folks are still using.The 1.4.x JVM's will also work just fine with much more than 1GB of memory. Perhaps you'd like to try again?-- Alan>> On 10/11/05, *Alan Stange* <[EMAIL PROTECTED]> [EMAIL PROTECTED]>> wrote:>> Alex Turner wrote:>> > Realise also that unless you are running the 1.5 x86-64 build, java> > will not use more than 1Gig, and if the app server requests more > than> > 1gig, Java will die (I've been there) with an out of memory error,> > even though there is plenty of free mem available. This can> easily be> > cause by a lazy GC thread if the applicaiton is running high on > CPU usage.>> On my side of Planet Earth, the standard non-x64 1.5 JVM will happily> use more than 1G of memory (on linux and Solaris, can't speak for> Windows). If you're running larger programs, it's probably a good > idea> to use the -server compiler in the JVM as well. I regularly run with> -Xmx1800m and regularly have >1GB heap sizes.>> The standard GC will not cause on OOM error if space remains for the > requested object. The GC thread blocks all other threads during its> activity, whatever else is happening on the machine. The> newer/experimental GC's did have some potential race conditions, but I > believe those have been resolved in the 1.5 JVMs.>> Finally, note that the latest _05 release of the 1.5 JVM also now> supports large page sizes on Linux and Windows:> -XX:+UseLargePages this can be quite beneficial depending on the > memory patterns in your programs.>> -- Alan>>
Re: [PERFORM] What gets cached?
Oracle uses LRU caching algorithm also, not LFU. AlexOn 10/21/05, Martin Nickel <[EMAIL PROTECTED]> wrote: I was reading a comment in another posting and it started me thinkingabout this. Let's say I startup an Oracle server. All my queries are alittle bit (sometimes a lot bit) slow until it gets its "normal" things in memory, then it's up to speed. The "normal" things would include somesmall lookup tables and the indexes for the most frequently used tables.Let's say I do the same thing in Postgres. I'm likely to have my very fastest performance for the first few queries until memory gets filled up. The only time Postgres seems to take advantage of cached data is when I repeat the same (or substantially the same) query. I don't know of any way to view what is actually cached at any point in time, but it seems like "most recently used" rather than "most frequently used".Does this seem true? s---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Used Memory
[snip]to the second processor in my dual Xeon eServer) has got me to thepoint that the perpetually high memory usage doesn't affect my application server. I'm curious - how does the high memory usage affect your application server? Alex
Re: [PERFORM] What gets cached?
Just to play devils advocate here for as second, but if we have an algorithm that is substational better than just plain old LRU, which is what I believe the kernel is going to use to cache pages (I'm no kernel hacker), then why don't we apply that and have a significantly larger page cache a la Oracle? AlexOn 10/21/05, Neil Conway <[EMAIL PROTECTED]> wrote: On Fri, 2005-21-10 at 07:34 -0500, Martin Nickel wrote:> Let's say I do the same thing in Postgres. I'm likely to have my very> fastest performance for the first few queries until memory gets filled up. No, you're not: if a query doesn't hit the cache (both the OS cache andthe Postgres userspace cache), it will run slower. If the caches areempty when Postgres starts up (which is true for the userspace cache and might be true of the OS cache), the first queries that are run should beslower, not faster.> The only time Postgres seems to take advantage of cached data is when I> repeat the same (or substantially the same) query. Caching is done on a page-by-page basis -- the source text of the queryitself is not relevant. If two different queries happen to hit a similarset of pages, they will probably both benefit from the same set of cached pages.> I don't know of any way to view what is actually cached at any point> in time, but it seems like "most recently used" rather than "most> frequently used". The cache replacement policy in 7.4 and older releases is simple LRU.The policy in 8.0 is ARC (essentially a version of LRU modified to tryto retain hot pages more accurately). The policy in 8.1 is a clock-based algorithm.-Neil---(end of broadcast)---TIP 6: explain analyze is your friend
Re: [PERFORM] Is There Any Way ....
This is possible with Oracle utilizing the keep pool alter table t_name storage ( buffer_pool keep); If Postgres were to implement it's own caching system, this seems like it would be easily to implement (beyond the initial caching effort). Alex On 10/24/05, Craig A. James <[EMAIL PROTECTED]> wrote: > Jim C. Nasby" wrote: > > > Stefan Weiss wrote: > > > ... IMO it would be useful to have a way to tell > > > PG that some tables were needed frequently, and should be cached if > > > possible. This would allow application developers to consider joins with > > > these tables as "cheap", even when querying on columns that are > > > not indexed. > > > > Why do you think you'll know better than the database how frequently > > something is used? At best, your guess will be correct and PostgreSQL > > (or the kernel) will keep the table in memory. Or, your guess is wrong > > and you end up wasting memory that could have been used for something > > else. > > > > It would probably be better if you describe why you want to force this > > table (or tables) into memory, so we can point you at more appropriate > > solutions. > > Or perhaps we could explain why we NEED to force these tables into memory, so > we can point you at a more appropriate implementation. ;-) > > Ok, wittiness aside, here's a concrete example. I have an application with > one critical index that MUST remain in memory at all times. The index's > tablespace is about 2 GB. As long as it's in memory, performance is > excellent - a user's query takes a fraction of a second. But if it gets > swapped out, the user's query might take up to five minutes as the index is > re-read from memory. > > Now here's the rub. The only performance I care about is response to queries > from the web application. Everything else is low priority. But there is > other activity going on. Suppose, for example, that I'm updating tables, > performing queries, doing administration, etc., etc., for a period of an > hour, during which no customer visits the site. The another customer comes > along and performs a query. > > At this point, no heuristic in the world could have guessed that I DON'T CARE > ABOUT PERFORMANCE for anything except my web application. The performance of > all the other stuff, the administration, the updates, etc., is utterly > irrelevant compared to the performance of the customer's query. > > What actually happens is that the other activities have swapped out the > critical index, and my customer waits, and waits, and waits... and goes away > after a minute or two. To solve this, we've been forced to purchase two > computers, and mirror the database on both. All administration and > modification happens on the "offline" database, and the web application only > uses the "online" database. At some point, we swap the two servers, sync the > two databases, and carry on. It's a very unsatisfactory solution. > > There is ONLY one way to convey this sort of information to Postgres, which > is to provide the application developer a mechanism to explicitely name the > tables that should be locked in memory. > > Look at tsearchd that Oleg is working on. It's a direct response to this > problem. > > It's been recognized for decades that, as kernel developers (whether a Linux > kernel or a database kernel), our ability to predict the behavior of an > application is woefully inadequate compared with the application developer's > knowledge of the application. Computer Science simply isn't a match for the > human brain yet, not even close. > > To give you perspective, since I posted a question about this problem > (regarding tsearch2/GIST indexes), half of the responses I received told me > that they encountered this problem, and their solution was to use an external > full-text engine. They all confirmed that Postgres can't deal with this > problem yet, primarily for the reasons outlined above. > > Craig > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings > ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] browsing table with 2 million records
You could also create your own index so to speak as a table that simply contains a list of primary keys and an order value field that you can use as your offset. This can be kept in sync with the master table using triggers pretty easily. 2 million is not very much if you only have a integer pkey, and an integer order value, then you can join it against the main table. create table my_index_table ( primary_key_value int, order_val int, primary key (primary_key_value)); create index my_index_table_order_val_i on index_table (order_val); select * from main_table a, my_index_table b where b.order_val>=25 and b.order_val<50 and a.primary_key_id=b.primary_key_id If the data updates alot then this won't work as well though as the index table will require frequent updates to potentialy large number of records (although a small number of pages so it still won't be horrible). Alex Turner NetEconomist On 10/26/05, Joshua D. Drake <[EMAIL PROTECTED]> wrote: > > > We have a GUI that let user browser through the record page by page at > > about 25 records a time. (Don't ask me why but we have to have this > > GUI). This translates to something like > > > > select count(*) from table <-- to give feedback about the DB size > > Do you have a integer field that is an ID that increments? E.g; serial? > > > select * from table order by date limit 25 offset 0 > > You could use a cursor. > > Sincerely, > > Joshua D. Drake > > > > > > Tables seems properly indexed, with vacuum and analyze ran regularly. > > Still this very basic SQLs takes up to a minute run. > > > > I read some recent messages that select count(*) would need a table > > scan for Postgre. That's disappointing. But I can accept an > > approximation if there are some way to do so. But how can I optimize > > select * from table order by date limit x offset y? One minute > > response time is not acceptable. > > > > Any help would be appriciated. > > > > Wy > > > > > -- > The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > Managed Services, Shared and Dedicated Hosting > Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ > > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings > ---(end of broadcast)--- TIP 6: explain analyze is your friend