Re: [PERFORM] SELECT DISTINCT Performance Issue
Previously, we have also tried to use LIMIT 1 instead of DISTINCT, but the performance was no better: select PlayerID,AtDate from Player where PlayerID='0' order by PlayerID desc, AtDate desc LIMIT 1 The DISTINCT query will pull out all the rows and keep only one, so the one with LIMIT should be faster. Can you post explain analyze of the LIMIT query ? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Need help to decide Mysql vs Postgres
If you want something more "embedded" in your application, you could consider : http://firebird.sourceforge.net/ http://hsqldb.sourceforge.net/ http://sqlite.org/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance nightmare with dspam (urgent) (resolved)
On Wednesday 01 June 2005 20:19, Casey Allen Shobe wrote: > We've seen PostgreSQL performance as a dspam database be simply stellar on > some machines with absolutely no tuning to the postgres.conf, and no > statistics target altering. Wow. That took a phenomenally long time to post. I asked on IRC, and they said it is "normal" for the PG lists to bee so horribly slow. What gives? I think you guys really need to stop using majordomo, but I'll avoid blaming that for the time being. Maybe a good time for the performance crew to look at the mailing list software instead of just PG. > We had set up about 200 domains on a SuperMicro P4 2.4GHz server, and it was > working great too (without the above tweak!), but then the motherboard > started having issues and the machine would lock up every few weeks. So we > moved everything to a brand new SuperMicro P4 3.0GHz server last week, and > now performance is simply appalling. Well, we actually added about 10 more domains right around the time of the move, not thinking anything of it. Turns out that simply set the disk usage over the threshhold of what the drive could handle. At least, that's the best guess of the situation - I don't really know whether to believe that because the old machine had a 3-disk RAID5 so it should have been half the speed of the new machine. However, analyzing the statements showed that they were all using index scans as they should, and no amount of tuning managed to reduce the I/O to an acceptable level. After lots of tuning, we moved pg_xlog onto a separate disk, and switched dspam from TEFT to TOE mode (which reduces the number of inserts). By doing this, the immediate problem was alleviated. Indeed the suggestion in link in my previous email to add an extra index was a BAD idea, since it increased the amount of work that had to be done per write, and didn't help anything. Long-term, whenever we hit the I/O limit again, it looks like we really don't have much of a solution except to throw more hardware (mainly lots of disks in RAID0's) at the problem. :( Fortunately, with the above two changes I/O usage on the PG data disk is a quarter of what it was, so theoretically we should be able to quadruple the number of users on current hardware. Our plan forward is to increase the number of disks in the two redundant mail servers, so that each has a single ultra320 disk for O/S and pg_xlog, and a 3-disk RAID0 for the data. This should triple our current capacity. The general opinion of the way dspam uses the database among people I've talked to on #postgresql is not very good, but of course the dspam folk blame PostgreSQL and say to use MySQL if you want reasonable performance. Makes it real fun to be a DSpam+PostgreSQL user when limits are reached, since everyone denies responsibility. Fortunately, PostgreSQL people are pretty helpful even if they think the client software sucks. :) Cheers, -- Casey Allen Shobe | http://casey.shobe.info [EMAIL PROTECTED] | cell 425-443-4653 AIM & Yahoo: SomeLinuxGuy | ICQ: 1494523 SeattleServer.com, Inc. | http://www.seattleserver.com ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How to avoid database bloat
> Looked like pg_autovacuum is operating as expected. One of the annoying > limitations of pg_autovacuum in current releases is that you can't set > thresholds on a per table basis. It looks like this table might require > an even more aggressive vacuum threshold. Couple of thoughts, are you > sure it's the table that is growing and not the indexes? (assuming this > table has indexes on it). Yes I am sure (oid2name :) ). > > And one more question - anyway why table keeps growing? It is shown that > >it occupies > ><1 pages and max_fsm_pages = 20 so vacuum should keep up with the > >changes? > >Or is it too low according to pg_class system table? What should be the > >reasonable value? > > > > > > Does the table keep growing? Or does it grow to a point an then stop > growing? It's normal for a table to operate at a steady state size that > is bigger that it's fresly "vacuum full"'d size. And with -V set at 0.5 > it should be at a minimum 50% larger than it's minimum size. Your email > before said that this table went from 20M to 70M but does it keep > going? Perhaps it would start leveling off at this point, or some point > shortly there-after. Yes it keeps growing. And the main problem is that performance starts to suffer from that. Do not forget that we are talking about 100+ insert/ update/select/delete cycles per second. > Anyway, I'm not sure if there is something else going on here, but from > the log it looks as though pg_autovacuum is working as advertised. Something is out there :). But how to fix that bloat? More aggressive autovacuum settings? Even larger FSM? Do not know if that matters but database has very many connections to it (400-600) and clients are doing mostly asynchronous operations. How to find out where this extra space gone? Thanks, Mindaugas ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance nightmare with dspam (urgent) (resolved)
Casey Allen Shobe wrote: > On Wednesday 01 June 2005 20:19, Casey Allen Shobe wrote: > ... > Long-term, whenever we hit the I/O limit again, it looks like we really don't > have much of a solution except to throw more hardware (mainly lots of disks > in RAID0's) at the problem. :( Fortunately, with the above two changes I/O > usage on the PG data disk is a quarter of what it was, so theoretically we > should be able to quadruple the number of users on current hardware. > Be very careful in this situation. If any disks in a RAID0 fails, the entire raid is lost. You *really* want a RAID10. It takes more drives, but then if anything dies you don't lose everything. If you are running RAID0 and you *really* want performance, and aren't concerned about safety (at all), you could also set fsync=false. That should also speed things up. But you are really risking corruption/data loss on your system. > Our plan forward is to increase the number of disks in the two redundant mail > servers, so that each has a single ultra320 disk for O/S and pg_xlog, and a > 3-disk RAID0 for the data. This should triple our current capacity. I don't know if you can do it, but it would be nice to see this be 1 RAID1 for OS, 1 RAID10 for pg_xlog, and another RAID10 for data. That is the recommended performance layout. It takes quite a few drives (minimum of 10). But it means your data is safe, and your performance should be very good. > > The general opinion of the way dspam uses the database among people I've > talked to on #postgresql is not very good, but of course the dspam folk blame > PostgreSQL and say to use MySQL if you want reasonable performance. Makes it > real fun to be a DSpam+PostgreSQL user when limits are reached, since > everyone denies responsibility. Fortunately, PostgreSQL people are pretty > helpful even if they think the client software sucks. :) > I can't say how dspam uses the database. But they certainly could make assumptions about how certain actions are done by the db, which are not quite true with postgres. (For instance MySQL can use an index to return information, because Postgres supports transactions, it cannot, because even though a row is in the index, it may not be visible to the current transaction.) They also might be doing stuff like "select max(row)" instead of "select row ORDER BY row DESC LIMIT 1". In postgres the former will be a sequential scan, the latter will be an index scan. Though I wonder about "select max(row) ORDER BY row DESC LIMIT 1". to me, that should still return the right answer, but I'm not sure. > Cheers, Good luck, John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Performance nightmare with dspam (urgent) (resolved)
PostgreSQL and say to use MySQL if you want reasonable performance. If you want MySQL performance and reliability with postgres, simply run it with fsync deactivated ;) I'd suggest a controller with battery backed up cache to get rid of the 1 commit = 1 seek boundary. Makes it real fun to be a DSpam+PostgreSQL user when limits are reached, since everyone denies responsibility. Fortunately, PostgreSQL people are pretty helpful even if they think the client software sucks. :) Cheers, ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance nightmare with dspam (urgent) (resolved)
On Mon, Jun 06, 2005 at 10:08:23AM -0500, John A Meinel wrote: I don't know if you can do it, but it would be nice to see this be 1 RAID1 for OS, 1 RAID10 for pg_xlog, That's probably overkill--it's a relatively small sequential-write partition with really small writes; I don't see how pg_xlog would benefit from raid10 as opposed to raid1. Mike Stone ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] SELECT DISTINCT Performance Issue
At 19:45 05/06/06, PFC wrote: Previously, we have also tried to use LIMIT 1 instead of DISTINCT, but the performance was no better: select PlayerID,AtDate from Player where PlayerID='0' order by PlayerID desc, AtDate desc LIMIT 1 The DISTINCT query will pull out all the rows and keep only one, so the one with LIMIT should be faster. Can you post explain analyze of the LIMIT query ? Actually the problem with LIMIT 1 query is when we use views with the LIMIT 1 construct. The direct SQL is ok: esdt=> explain analyze select PlayerID,AtDate from Player where PlayerID='0' order by PlayerID desc, AtDate desc LIMIT 1; Limit (cost=0.00..1.37 rows=1 width=23) (actual time=0.000..0.000 rows=1 loops =1) -> Index Scan Backward using pk_player on player (cost=0.00..16074.23 rows= 11770 width=23) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 0.000 ms esdt=> create or replace view VCurPlayer3 as select * from Player a where AtDate = (select b.AtDate from Player b where a.PlayerID = b.PlayerID order by b.PlayerID desc, b.AtDate desc LIMIT 1); esdt=> explain analyze select PlayerID,AtDate,version from VCurPlayer3 where Pla yerID='0'; Index Scan using pk_player on player a (cost=0.00..33072.78 rows=59 width=27) (actual time=235.000..235.000 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: ((atdate)::text = ((subplan))::text) SubPlan -> Limit (cost=0.00..1.44 rows=1 width=23) (actual time=0.117..0.117 rows =1 loops=1743) -> Index Scan Backward using pk_player on player b (cost=0.00..1402 3.67 rows=9727 width=23) (actual time=0.108..0.108 rows=1 loops=1743) Index Cond: (($0)::text = (playerid)::text) Total runtime: 235.000 ms The problem appears to be in the loops=1743 scanning all 1743 data records for that player. Regards, KC. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Need help to decide Mysql vs Postgres
Re: your JDBC wishes: Consider IBM Cloudscape (now Apache Derby) too, which has an apache license. It's all pure java and it's easy to get going. As to MySql vs Postgres: license issues aside, if you have transactionally complex needs (multi-table updates, etc), PostgreSQL wins hands down in my experience. There are a bunch of things about MySQL that just suck for high end SQL needs. (I like my subqueries, and I absolutely demand transactional integrity). There are some pitfalls to pgsql though, especially for existing SQL code using MAX and some other things which can really be blindsided (performance-wise) by pgsql if you don't use the workarounds. MySQL is nice for what I call "raw read speed" applications. But that license is an issue for me, as it is for you apparently. Some cloudscape info: http://www-306.ibm.com/software/data/cloudscape/ Some info on pitfalls of MySQL and PostgreSQL, an interesting contrast: http://sql-info.de/postgresql/postgres-gotchas.html http://sql-info.de/mysql/gotchas.html ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Need help to decide Mysql vs Postgres
Hi all, Thanks for your replies. I ran a very prelimnary test, and found following results. I feel they are wierd and I dont know what I am doing wrong !!! I made a schema with 5 tables. I have a master data table with foreign keys pointing to other 4 tables. Master data table has around 4 million records. When I run a select joining it with the baby tables, postgres -> returns results in 2.8 seconds mysql -> takes around 16 seconds (This is with myisam ... with innodb it takes 220 seconds) I am all for postgres at this point, however just want to know why I am getting opposite results !!! Both DBs are on the same machine Thanks, Amit -Original Message- From: Jeffrey Tenny [mailto:[EMAIL PROTECTED] Sent: Monday, June 06, 2005 11:51 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Need help to decide Mysql vs Postgres Re: your JDBC wishes: Consider IBM Cloudscape (now Apache Derby) too, which has an apache license. It's all pure java and it's easy to get going. As to MySql vs Postgres: license issues aside, if you have transactionally complex needs (multi-table updates, etc), PostgreSQL wins hands down in my experience. There are a bunch of things about MySQL that just suck for high end SQL needs. (I like my subqueries, and I absolutely demand transactional integrity). There are some pitfalls to pgsql though, especially for existing SQL code using MAX and some other things which can really be blindsided (performance-wise) by pgsql if you don't use the workarounds. MySQL is nice for what I call "raw read speed" applications. But that license is an issue for me, as it is for you apparently. Some cloudscape info: http://www-306.ibm.com/software/data/cloudscape/ Some info on pitfalls of MySQL and PostgreSQL, an interesting contrast: http://sql-info.de/postgresql/postgres-gotchas.html http://sql-info.de/mysql/gotchas.html ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance nightmare with dspam (urgent) (resolved)
On Monday 06 June 2005 15:08, John A Meinel wrote: > Be very careful in this situation. If any disks in a RAID0 fails, the > entire raid is lost. You *really* want a RAID10. It takes more drives, > but then if anything dies you don't lose everything. We have redundancy at the machine level using DRBD, so this is not a concern. > I don't know if you can do it, but it would be nice to see this be 1 > RAID1 for OS, 1 RAID10 for pg_xlog, and another RAID10 for data. That is > the recommended performance layout. It takes quite a few drives (minimum > of 10). But it means your data is safe, and your performance should be > very good. The current servers have 4 drive bays, and we can't even afford to fill them all right now...we just invested what amounts to "quite a lot" on our budget for these 2 servers, so replacing them is not an option at all right now. I think the most cost-effective road forward is to add 2 more drives to each of the existing servers (which currently have 2 each). Cheers, -- Casey Allen Shobe | http://casey.shobe.info [EMAIL PROTECTED] | cell 425-443-4653 AIM & Yahoo: SomeLinuxGuy | ICQ: 1494523 SeattleServer.com, Inc. | http://www.seattleserver.com ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance nightmare with dspam (urgent) (resolved)
Michael Stone wrote: > On Mon, Jun 06, 2005 at 10:08:23AM -0500, John A Meinel wrote: > >> I don't know if you can do it, but it would be nice to see this be 1 >> RAID1 for OS, 1 RAID10 for pg_xlog, > > > That's probably overkill--it's a relatively small sequential-write > partition with really small writes; I don't see how pg_xlog would > benefit from raid10 as opposed to raid1. > Mike Stone > pg_xlog benefits from being super fast. Because it has to be fully synced before the rest of the data can be committed. Yes they are small, but if you can make it fast, you eliminate that overhead. It also benefits from having it's own spindle, because you eliminate the seek time. (Since it is always appending) Anyway, my point is that pg_xlog isn't necessarily tiny. Many people seem to set it as high as 100-200, and each one is 16MB. But one other thing to consider is to make pg_xlog on a battery backed ramdisk. Because it really *can* use the extra speed. I can't say that a ramdisk is more cost effective than faster db disks. But if you aren't using many checkpoint_segments, it seems like you could get a 1GB ramdisk, and probably have a pretty good performance boost. (I have not tested this personally, though). Since he is using the default settings (mostly) for dspam, he could probably get away with something like a 256MB ramdisk. The only prices I could find with a few minutes of googleing was: http://www.cenatek.com/store/category.cfm?Category=15 Which is $1.6k for 2GB. But there is also a product that is being developed, which claims $60 for the PCI card, you supply the memory. It has 4 DDR slots http://www.engadget.com/entry/1234000227045399/ And you can get a 128MB SDRAM ECC module for around $22 http://www.newegg.com/Product/Product.asp?Item=N82E16820998004 So that would put the total cost of a 512MB battery backed ramdisk at $60 + 4*22 = $150. That certainly seems less than what you would pay for the same speed in hard-drives. Unfortunately the Giga-byte iRam seems to just be in the demo stage. But if they aren't lying in the press releases, it would certainly be something to keep an eye on. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Need help to decide Mysql vs Postgres
On Mon, Jun 06, 2005 at 12:00:08PM -0400, Amit V Shah wrote: > I made a schema with 5 tables. I have a master data table with foreign keys > pointing to other 4 tables. Master data table has around 4 million records. > When I run a select joining it with the baby tables, > > postgres -> returns results in 2.8 seconds > mysql -> takes around 16 seconds (This is with myisam ... with innodb > it takes 220 seconds) PostgreSQL has an excellent query optimizer, so if you get a much better execution time than MySQL in complex queries this isn't at all unexpected. I assume the MySQL guys would tell you to rewrite the queries in certain ways to make it go faster (just like the Postgres guys tell people to rewrite certain things when they hit Postgres limitations.) -- Alvaro Herrera () "I would rather have GNU than GNOT." (ccchips, lwn.net/Articles/37595/) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Need help to decide Mysql vs Postgres
On Mon, 2005-06-06 at 12:00 -0400, Amit V Shah wrote: > Hi all, > > Thanks for your replies. > > I ran a very prelimnary test, and found following results. I feel they are > wierd and I dont know what I am doing wrong !!! > > I made a schema with 5 tables. I have a master data table with foreign keys > pointing to other 4 tables. Master data table has around 4 million records. > When I run a select joining it with the baby tables, > > postgres -> returns results in 2.8 seconds > mysql -> takes around 16 seconds (This is with myisam ... with innodb > it takes 220 seconds) We said MySQL was faster for simple selects and non-transaction inserts on a limited number of connections. Assuming you rebuilt statistics in MySQL (myisamchk -a), I would presume that PostgreSQLs more mature optimizer has come into play in the above 5 table join test by finding a better (faster) way of executing the query. If you post EXPLAIN ANALYZE output for the queries, we might be able to tell you what they did differently. > I am all for postgres at this point, however just want to know why I am > getting opposite results !!! Both DBs are on the same machine If possible, it would be wise to run a performance test with the expected load you will receive. If you expect to have 10 clients perform operation X at a time, then benchmark that specific scenario. Both PostgreSQL and MySQL will perform differently in a typical real load situation than with a single user, single query situation. > -Original Message- > From: Jeffrey Tenny [mailto:[EMAIL PROTECTED] > Sent: Monday, June 06, 2005 11:51 AM > To: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Need help to decide Mysql vs Postgres > > > Re: your JDBC wishes: Consider IBM Cloudscape (now Apache Derby) too, > which has an apache license. It's all pure java and it's easy to get going. > > > As to MySql vs Postgres: license issues aside, if you have > transactionally complex needs (multi-table updates, etc), PostgreSQL > wins hands down in my experience. There are a bunch of things about > MySQL that just suck for high end SQL needs. (I like my subqueries, > and I absolutely demand transactional integrity). > > There are some pitfalls to pgsql though, especially for existing SQL > code using MAX and some other things which can really be blindsided > (performance-wise) by pgsql if you don't use the workarounds. > > > MySQL is nice for what I call "raw read speed" applications. But that > license is an issue for me, as it is for you apparently. > > > Some cloudscape info: > http://www-306.ibm.com/software/data/cloudscape/ > > Some info on pitfalls of MySQL and PostgreSQL, an interesting contrast: > http://sql-info.de/postgresql/postgres-gotchas.html > http://sql-info.de/mysql/gotchas.html > > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > > > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Need help to decide Mysql vs Postgres
> I am all for postgres at this point, however just want to know why I am > getting opposite results !!! Both DBs are on the same machine > Why do you say "opposite results" ? Please pardon my ignorance, but from whatever I had heard, mysql was supposedly always faster than postgres Thats why I was so surprised !! I will definately post the "analyze query" thing by end of today ... Thanks for all your helps !! Amit ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Need help to decide Mysql vs Postgres
postgres -> returns results in 2.8 seconds What kind of plan does it do ? seq scan on the big tables and hash join on the small tables ? mysql -> takes around 16 seconds (This is with myisam ... with innodb it takes 220 seconds) I'm not surprised at all. Try the same Join query but with a indexed where + order by / limit on the big table and you should get even worse for MySQL. I found 3 tables in a join was the maximum the MySQL planner was able to cope with before blowing up just like you experienced. I am all for postgres at this point, however just want to know why I am getting opposite results !!! Both DBs are on the same machine Why do you say "opposite results" ? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] slow growing table
HI! I have a table that I use for about a month. As the month progresses, COPYs performed to this table get much much slower than they were at the beginning, for the same number of rows (about 100,000 and growing). I'm essentially doing a delete for a given day, then a COPY as a big transaction. This is done about 12 times a day. When the table is new it's very fast, towards the end of the month it's taking almost 10 times longer, yet I'm deleting and COPYing in the same amount of data. Other operations on this table slow down, too, that were fast before using the same criteria. I do a VACUUM ANALYZE after each delete / COPY process, I tried experimenting with CLUSTER but saw no real difference. this is psql 7.45 on Linux server, dedicated for this purpose. About 5 indexes, no FKs on this table. happy to provide any other info might need, suggestions appreciated all my best, Jone ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] slow growing table
On Mon, Jun 06, 2005 at 09:48:26AM -0700, Jone C wrote: > When the table is new it's very fast, towards the end of the month > it's taking almost 10 times longer, yet I'm deleting and COPYing in > the same amount of data. Other operations on this table slow down, > too, that were fast before using the same criteria. You might have a problem with index bloat. Could you try REINDEXing the indexes on the table and see if that makes a difference? /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] slow growing table
On Mon, Jun 06, 2005 at 07:00:37PM +0200, Steinar H. Gunderson wrote: > You might have a problem with index bloat. Could you try REINDEXing the > indexes on the table and see if that makes a difference? On second thought... Does a VACUUM FULL help? If so, you might want to increase your FSM settings. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance nightmare with dspam (urgent) (resolved)
On Mon, Jun 06, 2005 at 10:52:09AM -0500, John A Meinel wrote: pg_xlog benefits from being super fast. Because it has to be fully synced before the rest of the data can be committed. Yes they are small, but if you can make it fast, you eliminate that overhead. It also benefits from having it's own spindle, because you eliminate the seek time. (Since it is always appending) Eliminating the seeks is definately a win. Anyway, my point is that pg_xlog isn't necessarily tiny. Many people seem to set it as high as 100-200, and each one is 16MB. It's not the size of the xlog, it's the size of the write. Unless you're writing out a stripe size of data at once you're only effectively writing to one disk pair at a time anyway. (Things change if you have a big NVRAM cache to aggregate the writes, but you'd need a *lot* of transaction activity to exceed the 50MB/s or so you could get from the single raid1 pair in that scenario.) Mike Stone ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Need help to decide Mysql vs Postgres
[Jeffrey Tenny - Mon at 11:51:22AM -0400] > There are some pitfalls to pgsql though, especially for existing SQL > code using MAX and some other things which can really be blindsided > (performance-wise) by pgsql if you don't use the workarounds. Yes, I discovered that - "select max(num_attr)" does a full table scan even if the figure can be found easily through an index. There exists a workaround: select num_attr from my_table order by num_attr desc limit 1; will find the number through the index. -- Tobias Brox, Tallinn ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Need help to decide Mysql vs Postgres
In the last exciting episode, [EMAIL PROTECTED] (Amit V Shah) wrote: >> I am all for postgres at this point, however just want to know why I am >> getting opposite results !!! Both DBs are on the same machine > >> Why do you say "opposite results" ? > > Please pardon my ignorance, but from whatever I had heard, mysql was > supposedly always faster than postgres Thats why I was so > surprised !! I will definately post the "analyze query" thing by > end of today ... There is a common "use case" where MySQL(tm) using the "MyISAM" storage manager tends to be quicker than PostgreSQL, namely where you are submitting a lot of more-or-less serial requests of the form: select * from some_table where id='some primary key value'; If your usage patterns differ from that, then "what you heard" won't necessarily apply to your usage. -- output = ("cbbrowne" "@" "acm.org") http://linuxdatabases.info/info/rdbms.html The difference between a child and a hacker is the amount he flames about his toys. -- Ed Schwalenberg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Postgresql on an AMD64 machine
I'm not sure if this is the appropriate list to post this question to but i'm starting with this one because it is related to the performance of Postgresql server. I have a Penguin Computing dual AMD 64 bit opteron machine with 8 Gigs of memory. In my attempt to increase the number of shared_buffers from the default to 65000 i was running into a semget error when trying to start Postgresql. After reading the documentation I adjusted the semaphore settings in the kernel to allow Postgresql to start successfully. With this configuration running if I do a ipcs -u i get the following. -- Shared Memory Status segments allocated 1 pages allocated 30728 pages resident 30626 pages swapped 0 Swap performance: 0 attempts 0 successes -- Semaphore Status used arrays = 1880 allocated semaphores = 31928 -- Messages: Status allocated queues = 0 used headers = 0 used space = 0 bytes I'm questioning the number of semaphores being used. In order for postgresql to start I had to set the maximum number of semaphores system wide to 600. This seems to be an abnormal amount of semaphores. I'm curious if this is a bug in the amd64 postgresql port. Is anyone else using postgresql on an AMD64 machine without similar issues? TIA Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Need help to decide Mysql vs Postgres
Please pardon my ignorance, but from whatever I had heard, mysql was supposedly always faster than postgres Thats why I was so surprised !! I heard a lot of this too, so much it seems common wisdom that postgres is slow... well maybe some old version was, but it's getting better at every release, and the 8.0 really delivers... I get the feeling that the PG team is really working and delivering improvements every few months, compare this to MySQL 5 which has been in beta for as long as I can remember. Also, yes, definitely mysql is faster when doing simple selects like SELECT * FROM table WHERE id=constant, or on updates with few users, but once you start digging... it can get a thousand times slower on some joins just because the optimizer is dumb... and then suddenly 0.2 ms for MySQL versus 0.3 ms for postgres on a simple query doesn't seem that attractive when it's 2 ms on postgres versus 2 seconds on mysql for a not so complicated one like pulling the first N rows from a join ordered by... PG is considered slower than mysql also because many people don't use persistent connections, and connecting postgres is a lot slower than connecting MySQL... But well, persistent connections are easy to use and mandatory for performance on any database anyway so I don't understand why the fuss. I will definately post the "analyze query" thing by end of today ... Thanks for all your helps !! Amit ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Postgresql on an AMD64 machine
On Jun 6, 2005, at 1:53 PM, Mark Rinaudo wrote: I'm questioning the number of semaphores being used. In order for postgresql to start I had to set the maximum number of semaphores system wide to 600. This seems to be an abnormal amount of semaphores. I'm curious if this is a bug in the amd64 postgresql port. Is anyone else using postgresql on an AMD64 machine without similar issues? No such nonsense required for me under FreeBSD 5.4/amd64. I used the same settings I had under i386 OS. Postgres uses very few semaphores, from what I recall. My system shows 13 active semaphores. Vivek Khera, Ph.D. +1-301-869-4449 x806 smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] Need help to decide Mysql vs Postgres
On Mon, Jun 06, 2005 at 08:25:08PM +0300, Tobias Brox wrote: > [Jeffrey Tenny - Mon at 11:51:22AM -0400] > > There are some pitfalls to pgsql though, especially for existing SQL > > code using MAX and some other things which can really be blindsided > > (performance-wise) by pgsql if you don't use the workarounds. > > Yes, I discovered that - "select max(num_attr)" does a full table scan even > if the figure can be found easily through an index. PostgreSQL 8.1 will be able to use indexes for MIN and MAX. http://archives.postgresql.org/pgsql-committers/2005-04/msg00163.php http://archives.postgresql.org/pgsql-committers/2005-04/msg00168.php -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Need help to decide Mysql vs Postgres
Christopher Browne wrote: There is a common "use case" where MySQL(tm) ... select * from some_table where id='some primary key value'; If your usage patterns differ from that... However this is a quite common use-case; and I wonder what the best practices for postgresql is for applications like that. I'm guessing the answer is PGMemcache? (http://people.freebsd.org/~seanc/pgmemcache/pgmemcache.pdf) ... with triggers and listen/notify to manage deletes&updates and tweaks to the application code to look to memcached for those primary_key=constant queries? If that is the answer, I'm curious if anyone's benchmarked or even has qualitative "yeah, feels very fast" results for such an application for the common mysql use case. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Need help to decide Mysql vs Postgres
On 6/6/2005 2:12 PM, PFC wrote: Please pardon my ignorance, but from whatever I had heard, mysql was supposedly always faster than postgres Thats why I was so surprised !! I heard a lot of this too, so much it seems common wisdom that postgres is slow... well maybe some old version was, but it's getting better at every release, and the 8.0 really delivers... The harder it is to evaluate software, the less often people reevaluate it and the more often people just "copy" opinions instead of doing an evaluation at all. Today there are a gazillion people out there who "know" that MySQL is faster than PostgreSQL. They don't know under what circumstances it is, or what the word "circumstances" means in this context anyway. When you ask them when was the last time they actually tested this you get in about 99% of the cases an answer anywhere between 3 years and infinity (for all those who never did). The remaining 1% can then be reduced to an insignificant minority by asking how many concurrent users their test simulated. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Postgresql and Software RAID/LVM
> Has anyone ran Postgres with software RAID or LVM on a production box? > What have been your experience? Yes, we have run for a couple years Pg with software LVM (mirroring) against two hardware RAID5 arrays. We host a production Sun box that runs 24/7. My experience: * Software RAID (other than mirroring) is a disaster waiting to happen. If the metadata for the RAID set gives out for any reason (CMOS scrambles, card dies, power spike, etc.) then you are hosed beyond belief. In most cases it is almost impossible to recover. With mirroring, however, you can always boot and operate on a single mirror, pretending that no LVM/RAID is underway. In other words, each mirror is a fully functional copy of the data which will operate your server. * Hardware RAID5 is a terrific way to boost performance via write caching and spreading I/O across multiple spindles. Each of our external arrays operates 14 drives (12 data, 1 parity and 1 hot spare). While RAID5 protects against single spindle failure, it will not hedge against multiple failures in a short time period, SCSI contoller failure, SCSI cable problems or even wholesale failure of the RAID controller. All of these things happen in a 24/7 operation. Using software RAID1 against the hardware RAID5 arrays hedges against any single failure. * Software mirroring gives you tremendous ability to change the system while it is running, by taking offline the mirror you wish to change and then synchronizing it after the change. On a fully operational production server, we have: * restriped the RAID5 array * replaced all RAID5 media with higher capacity drives * upgraded RAID5 controller * moved all data from an old RAID5 array to a newer one * replaced host SCSI controller * uncabled and physically moved storage to a different part of data center Again, all of this has taken place (over the years) while our machine was fully operational. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Need help to decide Mysql vs Postgres
I did my own evaluation a few months back, because postgres was not cutting it for me. I found that postgres 8.0 (was what I was using at the time, now on 8.0.2) out performed mysql on a optiplex with 2gig meg of memory. I had postgres and mysql loaded and would run one server at a time doing testing. My tests included using aqua studios connection to both databases and .asp page using odbc connections. There was not a huge difference, but I had significant time in postgres and it was a little faster, so I just took new approaches (flattened views,eliminated outer joins etc) to fixing the issues. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 [EMAIL PROTECTED] www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jan Wieck Sent: Monday, June 06, 2005 1:55 PM To: PFC Cc: Amit V Shah; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Need help to decide Mysql vs Postgres On 6/6/2005 2:12 PM, PFC wrote: > >> Please pardon my ignorance, but from whatever I had heard, mysql was >> supposedly always faster than postgres Thats why I was so surprised >> !! > > I heard a lot of this too, so much it seems common wisdom that postgres > is slow... well maybe some old version was, but it's getting better at > every release, and the 8.0 really delivers... The harder it is to evaluate software, the less often people reevaluate it and the more often people just "copy" opinions instead of doing an evaluation at all. Today there are a gazillion people out there who "know" that MySQL is faster than PostgreSQL. They don't know under what circumstances it is, or what the word "circumstances" means in this context anyway. When you ask them when was the last time they actually tested this you get in about 99% of the cases an answer anywhere between 3 years and infinity (for all those who never did). The remaining 1% can then be reduced to an insignificant minority by asking how many concurrent users their test simulated. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Postgresql on an AMD64 machine
On 06 Jun 2005 12:53:40 -0500, Mark Rinaudo <[EMAIL PROTECTED]> wrote: > I'm not sure if this is the appropriate list to post this question to > but i'm starting with this one because it is related to the performance > of Postgresql server. I have a Penguin Computing dual AMD 64 bit > opteron machine with 8 Gigs of memory. In my attempt to increase the > number of shared_buffers from the default to 65000 i was running into a > semget error when trying to start Postgresql. After reading the > documentation I adjusted the semaphore settings in the kernel to allow > Postgresql to start successfully. With this configuration running if I > do a ipcs -u i get the following. On my HP-585, 4xOpteron, 16G RAM, Gentoo Linux (2.6.9): $ ipcs -u i -- Shared Memory Status segments allocated 1 pages allocated 34866 pages resident 31642 pages swapped 128 Swap performance: 0 attempts 0 successes -- Semaphore Status used arrays = 7 allocated semaphores = 119 -- Messages: Status allocated queues = 0 used headers = 0 used space = 0 bytes Did you perhaps disable spinlocks when compiling PG? -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgresql on an AMD64 machine
Mike Rylander <[EMAIL PROTECTED]> writes: > On 06 Jun 2005 12:53:40 -0500, Mark Rinaudo <[EMAIL PROTECTED]> wrote: >> I'm not sure if this is the appropriate list to post this question to >> but i'm starting with this one because it is related to the performance >> of Postgresql server. I have a Penguin Computing dual AMD 64 bit >> opteron machine with 8 Gigs of memory. In my attempt to increase the >> number of shared_buffers from the default to 65000 i was running into a >> semget error when trying to start Postgresql. > Did you perhaps disable spinlocks when compiling PG? That sure looks like it must be the issue --- in a normal build the number of semaphores needed does not vary with shared_buffers, but it will if Postgres is falling back to semaphore-based spinlocks. Which is a really bad idea from a performance standpoint, so you want to fix the build. Which PG version is this exactly, and what configure options did you use? What compiler was used? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Postgresql on an AMD64 machine
I'm running the Redhat Version of Postgresql which came pre-installed with Redhat ES. It's version number is 7.3.10-1. I'm not sure what options it was compiled with. Is there a way for me to tell? Should i just compile my own postgresql for this platform? Thanks Mark On Mon, 2005-06-06 at 16:15, Tom Lane wrote: > Mike Rylander <[EMAIL PROTECTED]> writes: > > On 06 Jun 2005 12:53:40 -0500, Mark Rinaudo <[EMAIL PROTECTED]> wrote: > >> I'm not sure if this is the appropriate list to post this question to > >> but i'm starting with this one because it is related to the performance > >> of Postgresql server. I have a Penguin Computing dual AMD 64 bit > >> opteron machine with 8 Gigs of memory. In my attempt to increase the > >> number of shared_buffers from the default to 65000 i was running into a > >> semget error when trying to start Postgresql. > > > Did you perhaps disable spinlocks when compiling PG? > > That sure looks like it must be the issue --- in a normal build the > number of semaphores needed does not vary with shared_buffers, but > it will if Postgres is falling back to semaphore-based spinlocks. > Which is a really bad idea from a performance standpoint, so you > want to fix the build. > > Which PG version is this exactly, and what configure options did > you use? What compiler was used? > > regards, tom lane > -- Mark Rinaudo 318-213-8780 ext 111 Bowman Systems ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Postgresql on an AMD64 machine
Mark Rinaudo wrote: I'm running the Redhat Version of Postgresql which came pre-installed with Redhat ES. It's version number is 7.3.10-1. I'm not sure what options it was compiled with. Is there a way for me to tell? `pg_config --configure` in recent releases. Should i just compile my own postgresql for this platform? Yes, I would. 7.4 was the first release to include support for proper spinlocks on AMD64. (From a Redhat POV, it would probably be a good idea to patch 7.3 to include the relatively trivial changes needed for decent AMD64 performance, assuming that shipping a more recent version of PG with ES isn't an option.) -Neil ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Postgresql on an AMD64 machine
Neil Conway <[EMAIL PROTECTED]> writes: > (From a Redhat POV, it would probably be a good idea to patch 7.3 to > include the relatively trivial changes needed for decent AMD64 > performance, How embarrassing :-( Will see about fixing it. However, this certainly won't ship before the next RHEL3 quarterly update, so in the meantime if Mark feels like building locally, it wouldn't be a bad idea. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Postgresql and Software RAID/LVM
Marty Scholes wrote: >> Has anyone ran Postgres with software RAID or LVM on a production box? >> What have been your experience? > > Yes, we have run for a couple years Pg with software LVM (mirroring) > against two hardware RAID5 arrays. We host a production Sun box that > runs 24/7. > > My experience: > * Software RAID (other than mirroring) is a disaster waiting to happen. > If the metadata for the RAID set gives out for any reason (CMOS > scrambles, card dies, power spike, etc.) then you are hosed beyond > belief. In most cases it is almost impossible to recover. With > mirroring, however, you can always boot and operate on a single mirror, > pretending that no LVM/RAID is underway. In other words, each mirror is > a fully functional copy of the data which will operate your server. Isn't this actually more of a problem for the meta-data to give out in a hardware situation? I mean, if the card you are using dies, you can't just get another one. With software raid, because the meta-data is on the drives, you can pull it out of that machine, and put it into any machine that has a controller which can read the drives, and a similar kernel, and you are back up and running. > > * Hardware RAID5 is a terrific way to boost performance via write > caching and spreading I/O across multiple spindles. Each of our > external arrays operates 14 drives (12 data, 1 parity and 1 hot spare). > While RAID5 protects against single spindle failure, it will not hedge > against multiple failures in a short time period, SCSI contoller > failure, SCSI cable problems or even wholesale failure of the RAID > controller. All of these things happen in a 24/7 operation. Using > software RAID1 against the hardware RAID5 arrays hedges against any > single failure. No, it hedges against *more* than one failure. But you can also do a RAID1 over a RAID5 in software. But if you are honestly willing to create a full RAID1, just create a RAID1 over RAID0. The performance is much better. And since you have a full RAID1, as long as both drives of a pairing don't give out, you can lose half of your drives. If you want the space, but you feel that RAID5 isn't redundant enough, go to RAID6, which uses 2 parity locations, each with a different method of storing parity, so not only is it more redundant, you have a better chance of finding problems. > > * Software mirroring gives you tremendous ability to change the system > while it is running, by taking offline the mirror you wish to change and > then synchronizing it after the change. > That certainly is a nice ability. But remember that LVM also has the idea of "snapshot"ing a running system. I don't know the exact details, just that there is a way to have some processes see the filesystem as it existed at an exact point in time. Which is also a great way to handle backups. > On a fully operational production server, we have: > * restriped the RAID5 array > * replaced all RAID5 media with higher capacity drives > * upgraded RAID5 controller > * moved all data from an old RAID5 array to a newer one > * replaced host SCSI controller > * uncabled and physically moved storage to a different part of data center > > Again, all of this has taken place (over the years) while our machine > was fully operational. > So you are saying that you were able to replace the RAID controller without turning off the machine? I realize there does exist hot-swappable PCI cards, but I think you are overstating what you mean by "fully operational". For instance, it's not like you can access your data while it is being physically moved. I do think you had some nice hardware. But I know you can do all of this in software as well. It is usually a price/performance tradeoff. You spend quite a bit to get a hardware RAID card that can keep up with a modern CPU. I know we have an FC raid box at work which has a full 512MB of cache on it, but it wasn't that much cheaper than buying a dedicated server. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Need help to decide Mysql vs Postgres
Ron Mayer <[EMAIL PROTECTED]> writes: > Christopher Browne wrote: >> There is a common "use case" where MySQL(tm) ... >> select * from some_table where id='some primary key value'; > However this is a quite common use-case; and I wonder what the > best practices for postgresql is for applications like that. Setting up a prepared statement should be a noticeable win for that sort of thing. Also of course there are the usual tuning issues: have you picked an appropriate shared_buffers setting, etc. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq