Re: [PERFORM] Help me please !
Asatryan, Anahit wrote: I am running postgreSQL 8.0.1 under the Windows 2000. I want to use COPY FROM STDIN function from Java application, but it doesn't work, it throws: "org.postgresql.util.PSQLException: Unknown Response Type G" error. I don't think that there is a "STDIN" if you are executing via JDBC. The only workaround I know of is to create a file and copy from that, which you already have working. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Joins, Deletes and Indexes
[EMAIL PROTECTED] wrote: I've got 2 tables defined as follows: CREATE TABLE "cluster" ( id int8 NOT NULL DEFAULT nextval('serial'::text), clusterid varchar(255) NOT NULL, ... CONSTRAINT pk_cluster PRIMARY KEY (id) ) CREATE TABLE sensorreport ( id int8 NOT NULL DEFAULT nextval('serial'::text), clusterid int8 NOT NULL, ... CONSTRAINT pk_sensorreport PRIMARY KEY (id), CONSTRAINT fk_sensorreport_clusterid FOREIGN KEY (clusterid) REFERENCES "cluster" (id) ON UPDATE RESTRICT ON DELETE RESTRICT ) I've defined an Index on the clusterid field of sensorreport. Looking further down, perhaps an index on cluster.clusterid too. So I've run into 2 issues, one a SELECT, the other a DELETE; SELECT issue: So the following query: EXPLAIN ANALYZE select * from sensorreport where clusterid = 25000114; Yields: "Index Scan using idx_sensorreport_clusterid on sensorreport (cost=0.00..2.01 rows=1 width=129) (actual time=0.000..0.000 rows=38 loops=1)" " Index Cond: (clusterid = 25000114)" "Total runtime: 0.000 ms" However, when using a join as follows (in the cluster table id=25000114 clusterid='clusterid1'): EXPLAIN ANALYZE select * from sensorreport as a join cluster as c on c.id = a.clusterid where c.clusterid = 'clusterid1'; You don't say what version you're using, but older versions of PG took a literal join as a request to plan a query in that order. Try rewriting it without the "join" keyword and see if the plan alters. Yields: Hash Join (cost=1.18..566211.51 rows=1071429 width=287) (actual time=150025.000..150025.000 rows=38 loops=1) Hash Cond: ("outer".clusterid = "inner".id) -> Seq Scan on sensorreport a (cost=0.00..480496.03 rows=1503 width=129) (actual time=10.000..126751.000 rows=1539 loops=1) -> Hash (cost=1.18..1.18 rows=1 width=158) (actual time=0.000..0.000 rows=0 loops=1) -> Seq Scan on "cluster" c (cost=0.00..1.18 rows=1 width=158) (actual time=0.000..0.000 rows=1 loops=1) Filter: ((clusterid)::text = 'clusterid1'::text) Total runtime: 150025.000 ms My question is can I get the join query to use the idx_sensorreport_clusterid index on the sensorreport table? The only reason to use the index on sensorreport is if it isn't going to match many rows. That means we want to run the restriction on "clisterid1" first, which suggests you want that index on table cluster. DELETE issue: The statement: EXPLAIN ANALYZE delete from cluster where clusterid='clusterid99' Yields: Seq Scan on "cluster" (cost=0.00..1.18 rows=1 width=6) (actual time=0.000..0.000 rows=1 loops=1) Filter: ((clusterid)::text = 'clusterid99'::text) Total runtime: 275988.000 ms I'm assuming that the length of the delete is because the "DELETE RESTRICT" on the foreign key from sensortable. Again, is there any way to get the delete to use the idx_sensorreport_clusterid index? No, because this is the cluster table, not sensorreport :-) -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL is extremely slow on Windows
Hi, I changed fsync to false. It took 8 minutes to restore the full database. That is 26 times faster than before. :-/ (aprox. 200 tps) With background writer it took 12 minutes. :-( The funny thing is, I had a VMWARE emulation on the same Windows mashine, running Red Hat, with fsync turned on. It took also 8 minutes to finish. Probably the Linux code is better + VMWARE optimises (physical) disk access.(?) It seems to me, I need 2 types of operating modes: - For bulk loading (database restore) : fsync=false - Normal operation fsync=true Am I right? How can I do it "elegantly"? I Think, it should be a "performance tuning guide" in the docomentation. (not just explaning the settings) Playing with the settings could be quite anoying. Anyway, thanks for the tips. Bye, Vig Sándor -Original Message- From: Magnus Hagander [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 22, 2005 7:15 PM To: Vig, Sandor (G/FI-2); pgsql-performance@postgresql.org Subject: RE: [PERFORM] PostgreSQL is extremely slow on Windows >I've downloaded the latest release (PostgreSQL 8.0) for windows. >Installation was OK, but I have tried to restore a database. >It had more than ~100.000 records. Usually I use PostgreSQL >under Linux, and it used to be done under 10 minutes. > >Under W2k und XP it took 3 hours(!) Why is it so slow > >The commands I used: > >Under Linux: (duration: 1 minute) > pg_dump -D databasename > databasename.db > >Under Windows: (duration: 3 - 3.5 hours(!)) > psql databasename < databasename.db >nul > >It seemed to me, that only 20-30 transactions/sec were >writen to the database. 20-30 transactionsi s about what you'll get on a single disk on Windows today. We have a patch in testing that will bring this up to about 80. You can *never* get above 80 without using write cache, regardless of your OS, if you have a single disk. You might want to look into wether write cacheing is enabled on your linux box, and disable it. (unless you are using RAID) A lot points towards write cache enabled on your system. If you need the performance that equals the one with write cache on, you can set fsync=off. But then you will lose the guarantee that your machine will survive an unclean shutdown or crash. I would strongly advice against it on a production system - same goes for running with write cache! //Magnus The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Joins, Deletes and Indexes
> -Original Message- > From: Richard Huxton [mailto:[EMAIL PROTECTED] > Sent: Wednesday, February 23, 2005 3:40 AM > To: [EMAIL PROTECTED] > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Joins, Deletes and Indexes > > [EMAIL PROTECTED] wrote: > > I've got 2 tables defined as follows: > > > > CREATE TABLE "cluster" > > ( > > id int8 NOT NULL DEFAULT nextval('serial'::text), > > clusterid varchar(255) NOT NULL, > > ... > > CONSTRAINT pk_cluster PRIMARY KEY (id) > > ) > > > > CREATE TABLE sensorreport > > ( > > id int8 NOT NULL DEFAULT nextval('serial'::text), > > clusterid int8 NOT NULL, > > ... > > CONSTRAINT pk_sensorreport PRIMARY KEY (id), > > CONSTRAINT fk_sensorreport_clusterid FOREIGN KEY > (clusterid) REFERENCES > > "cluster" (id) ON UPDATE RESTRICT ON DELETE RESTRICT > > ) > > > > I've defined an Index on the clusterid field of sensorreport. > > Looking further down, perhaps an index on cluster.clusterid too. > > > So I've run into 2 issues, one a SELECT, the other a DELETE; > > > > SELECT issue: > > So the following query: > > EXPLAIN ANALYZE select * from sensorreport where clusterid > = 25000114; > > > > Yields: > > "Index Scan using idx_sensorreport_clusterid on sensorreport > > (cost=0.00..2.01 rows=1 width=129) (actual time=0.000..0.000 rows=38 > > loops=1)" > > " Index Cond: (clusterid = 25000114)" > > "Total runtime: 0.000 ms" > > > > However, when using a join as follows (in the cluster table > id=25000114 > > clusterid='clusterid1'): > > EXPLAIN ANALYZE select * from sensorreport as a join > cluster as c on c.id = > > a.clusterid where c.clusterid = 'clusterid1'; > > You don't say what version you're using, but older versions > of PG took a > literal join as a request to plan a query in that order. Try > rewriting > it without the "join" keyword and see if the plan alters. I'm using version 8.0 on Windows. > > > Yields: > > Hash Join (cost=1.18..566211.51 rows=1071429 width=287) (actual > > time=150025.000..150025.000 rows=38 loops=1) > > Hash Cond: ("outer".clusterid = "inner".id) > > -> Seq Scan on sensorreport a (cost=0.00..480496.03 > rows=1503 > > width=129) (actual time=10.000..126751.000 rows=1539 loops=1) > > -> Hash (cost=1.18..1.18 rows=1 width=158) (actual > time=0.000..0.000 > > rows=0 loops=1) > > -> Seq Scan on "cluster" c (cost=0.00..1.18 > rows=1 width=158) > > (actual time=0.000..0.000 rows=1 loops=1) > > Filter: ((clusterid)::text = 'clusterid1'::text) > > Total runtime: 150025.000 ms > > > > My question is can I get the join query to use the > > idx_sensorreport_clusterid index on the sensorreport table? > > The only reason to use the index on sensorreport is if it > isn't going to > match many rows. That means we want to run the restriction on > "clisterid1" first, which suggests you want that index on > table cluster. The cluster table only has 11 rows, so I'm not sure an index would help. The sensorreport table has 15,000,000 rows so that's why I've got the index there. > > > DELETE issue: > > The statement: > > EXPLAIN ANALYZE delete from cluster where clusterid='clusterid99' > > > > Yields: > > Seq Scan on "cluster" (cost=0.00..1.18 rows=1 width=6) (actual > > time=0.000..0.000 rows=1 loops=1) > >Filter: ((clusterid)::text = 'clusterid99'::text) > > Total runtime: 275988.000 ms > > > > I'm assuming that the length of the delete is because the > "DELETE RESTRICT" > > on the foreign key from sensortable. > > Again, is there any way to get the delete to use the > > idx_sensorreport_clusterid index? > > No, because this is the cluster table, not sensorreport :-) True, but the foreign key constraint on the sensorreport table forces Postgres to check if there are any sensorreport's that are currently using this cluster before allowing the cluster to be deleted. > > -- >Richard Huxton >Archonet Ltd > Thanks a lot for the reply. Chuck Butkus EMC ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Joins, Deletes and Indexes
[EMAIL PROTECTED] wrote: The cluster table only has 11 rows, so I'm not sure an index would help. The sensorreport table has 15,000,000 rows so that's why I've got the index there. Ah - only 11? on the foreign key from sensortable. Again, is there any way to get the delete to use the idx_sensorreport_clusterid index? No, because this is the cluster table, not sensorreport :-) True, but the foreign key constraint on the sensorreport table forces Postgres to check if there are any sensorreport's that are currently using this cluster before allowing the cluster to be deleted. If you only have 11 distinct values in the large table then it's debatable whether it's always quicker to use the index. Since your first example (clusterid = 25000114) returned so few rows, I'm guessing that some other values represent a sizeable percentage of the table. That'd explain the difference between PG's estimates and the actual number of matching rows. You can try "SET enable_seqscan =false;" before running the query and see whether using the index helps things. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL is extremely slow on Windows
Hi, Magnus & all, Magnus Hagander schrieb: > 20-30 transactionsi s about what you'll get on a single disk on Windows > today. > We have a patch in testing that will bring this up to about 80. > You can *never* get above 80 without using write cache, regardless of > your OS, if you have a single disk. You might want to look into wether > write cacheing is enabled on your linux box, and disable it. (unless you > are using RAID) A lot points towards write cache enabled on your system. Note that you can get higher rates for the server as a whole when using concurrent transactions (that is, several independend connections committing at the same time). The commit delay settings should be tuned accordingly. Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:[EMAIL PROTECTED] | www.logi-track.com signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Help me please !
Hi, Asatryan, Asatryan, Anahit schrieb: > I am running postgreSQL 8.0.1 under the Windows 2000. I want to use COPY > FROM STDIN function from Java application, but it doesn’t work, it throws: > > “org.postgresql.util.PSQLException: Unknown Response Type G” error. Currently, there is no COPY support in the postgresql jdbc driver. There were some patches enabling COPY support floating around on the pgsql-jdbc@postgresql.org mailing list. You can search the archive and try whether one of them fits your needs. AFAIR, COPY support is on the TODO list, but they wait for some other driver reworking to be finished. You can discuss this issue on psql-jdbc list or search the archives if you need more info. Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:[EMAIL PROTECTED] | www.logi-track.com signature.asc Description: OpenPGP digital signature
Re: [PERFORM] PostgreSQL is extremely slow on Windows
> Hi, > > I changed fsync to false. It took 8 minutes to restore the > full database. > That is 26 times faster than before. :-/ (aprox. 200 tps) > With background writer it took 12 minutes. :-( That seems reasonable. > The funny thing is, I had a VMWARE emulation on the same > Windows mashine, running Red Hat, with fsync turned on. It > took also 8 minutes to finish. > Probably the Linux code is better + VMWARE optimises (physical) disk > access.(?) Vmware makes fsync() into a no-op. It will always cache the disk. (This is vmware workstation. Their server products behave differntly, of course) > It seems to me, I need 2 types of operating modes: > - For bulk loading (database restore) : fsync=false > - Normal operation fsync=true Yes, fsync=false is very good for bulk loading *IFF* you can live with data loss in case you get a crash during load. > Am I right? How can I do it "elegantly"? You'll need to edit postgresql.conf and restart the server for this. > I Think, it should be a "performance tuning guide" in the > docomentation. > (not just explaning the settings) Playing with the settings > could be quite anoying. There is some information on techdocs.postgresql.org you miht be interested in. //Magnus ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL is extremely slow on Windows
> > You can *never* get above 80 without using write cache, > regardless of > > your OS, if you have a single disk. > > Why? Even with, say, a 15K RPM disk? Or the ability to > fsync() multiple concurrently-committing transactions at once? Uh. What I meant was a single *IDE* disk. Sorry. Been too deep into helping ppl with IDE disks lately to remember that SCSI can be a lot faster :-) And we're talking about restore of a dump, so it's a single session. (Strictly, that shuld be a 7200rpm IDE disk. I don't know if any others are common, though) //mha ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Problem with 7.4.5 and webmin 1.8 in grant function
> Well, sure looks like you only have one running. Your data directory is > /var/lib/pgsql/data so lets see the files: > > /var/lib/pgsql/data/pg_hba.conf > /var/lib/pgsql/data/pg_ident.conf > /var/lib/pgsql/data/postmaster.opts > > Might also be useful to know any nondefault settings in postgresql.conf too. > > As I understand it, these vendor shipped rpms have ident *enabled*. > I will download FC3 Pg and check this out... I'm a compile it from > source guy :-) > > Mark I got the answer that is in module config of postgresl-webmin , there is a check box for Use DBI to connect if available?yes nothe default is yes , but if I choosed no everything went fine. I also test it in the desktop mechine and get the same error and the same solution. Could you explain what happen to the FC3 + postgresql and webmin 1.8? Thanks Amrit ,Thailand ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] is pg_autovacuum so effective ?
Christopher Browne wrote: > Gaetano Mendola <[EMAIL PROTECTED]> writes: > > >>Tom Lane wrote: >> >>>Gaetano Mendola <[EMAIL PROTECTED]> writes: >>> >>> I'm using ony pg_autovacuum. I expect that disk usage will reach a steady state but is not. PG engine: 7.4.5 >>> >>> >>>One data point doesn't prove that you're not at a steady state. >> >>I do a graph about my disk usage and it's a ramp since one week, >>I'll continue to wait in order to see if it will decrease. >>I was expecting the steady state at something like 4 GB >>( after a full vacuum and reindex ) + 10 % = 4.4 GB >>I'm at 4.6 GB and increasing. I'll see how it will continue. > > > You probably want for the "experiment" to last more than a week. > > After all, it might actually be that with your usage patterns, that > table would stabilize at 15% "overhead," and that might take a couple > or three weeks. > > Unless it's clear that it's growing perilously quickly, just leave it > alone so that there's actually some possibility of reaching an > equilibrium. Any time you "VACUUM FULL" it, that _destroys_ any > experimental results or any noticeable patterns, and it guarantees > that you'll see "seemingly perilous growth" for a while. > > And if the table is _TRULY_ growing "perilously quickly," then it is > likely that you should add in some scheduled vacuums on the table. > Not VACUUM FULLs; just plain VACUUMs. > > I revised cron scripts yet again today to do hourly and "4x/day" > vacuums of certain tables in some of our systems where we know they > need the attention. I didn't schedule any VACUUM FULLs; it's > unnecessary, and would lead directly to system outages, which is > totally unacceptable. Yes, I'm in this direction too. Regards Gaetano Mendola ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
Hello All I am setting up a hardware clustering solution. My hardware is Dual Opteron 550 with 8GB ram. My external storage is a Kingston Fibre channel Infostation. With 14 15000'k 36GB drives. The OS we are running is Redhat ES 3.0. Clustering using Redhat Cluster Suite. Postgres Version is Postgres 7.4.7. We will be setting up about 9 databases which range in size from 100MB to 2.5GB on the config. The postgres application is mostly read intensive. What would be the best way to setup the hardrives on this server. Currently I have it partioned with 2 seperate raid 5 with 1 failover per raid. I have two database clusters configured with a seperate postmaster running for each. Running two postmasters seems like a pain but that is the only way I knew to seperate the load. I am mostly concerned about disk IO and performance. Is my current setup a good way to accomplish the best performance or would it be better to use all the drives in one huge raid five with a couple of failovers. I have looked around in the archives and found some info but I would like to here about some of the configs other people are running and how they have them setup. Thanks John Allgood - ESC Systems Admin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Inefficient Query Plans
"Luke Chambers" <[EMAIL PROTECTED]> writes: > The following query plans both result from the very same query run on > different servers. They obviously differ drastically, but I don't why > as one db is a slonied copy of the other with identical postgresql.conf > files. There's an order-of-magnitude difference in the estimated row counts for some of the joins, so it's hardly surprising that different plans would be chosen. Assuming that these are exactly the same Postgres version, the only explanation would be considerably different ANALYZE statistics stored in the two databases. > Both databases are vacuum analyzed nightly. Maybe you should double-check that. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] subquery vs join on 7.4.5
David Haas <[EMAIL PROTECTED]> writes: > I'm comparing the speeds of the following two queries on 7.4.5. I was > curious why query 1 was faster than query 2: > query 1: > Select layer_number > FROM batch_report_index > WHERE device_id = (SELECT device_id FROM device_index WHERE device_name > ='CP8M') > AND technology_id = (SELECT technology_id FROM technology_index WHERE > technology_name = 'G12'); > query 2: > Select b.layer_number > FROM batch_report_index b, device_index d, technology_index t > WHERE b.device_id = d.device_id > AND b.technology_id = t.technology_id > AND d.device_name = 'CP8M' > AND t.technology_name = 'G12'; Why didn't you try a two-column index on batch_report_index(device_id, technology_id) ? Whether this would actually be better than a seqscan I'm not sure, given the large number of matching rows. But the planner would surely try it given that it's drastically underestimating that number :-( regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Problem with 7.4.5 and webmin 1.8 in grant function
Sorry, just a fool tip, cause I haven't seen that you already done the pg_ctl stop && pg_ctl start ... (I mean, did you reload your conf settings?) Regards, Guido > > > I used you perl script and found the error => > > > [EMAIL PROTECTED] tmp]# perl relacl.pl > > > DBI connect('dbname=template1;port=5432','postgres',...) failed: FATAL: > > IDENT > > > authentication failed for user "postgres" at relacl.pl line 21 > > > Error in connect to DBI:Pg:dbname=template1;port=5432: > > > > > > > > Excellent - we know what is going on now! > > > > > > > And my pg_hba.conf is > > > > > > # IPv4-style local connections: > > > hostall all 127.0.0.1 255.255.255.255 trust > > > hostall all 192.168.0.0 255.255.0.0 trust > > > > > > trusted for every user. > > > > Ok, what I think has happened is that there is another Pg installation > > (or another initdb'ed cluster) on this machine that you are accidentally > > talking to. Try > > > > $ rpm -qa|grep -i postgres > > > > which will spot another software installation, you may just have to > > search for files called pg_hba.conf to find another initdb'ed cluster > > > > This other installation should have a pg_hba.conf that looks something > > like : > > > > local all allident > > hostall all 127.0.0.1 255.255.255.255 ident > > > > So a bit of detective work is in order :-) > > > > Mark > After being a detector I found that > [EMAIL PROTECTED] ~]# rpm -qa|grep -i postgres > postgresql-7.4.5-3.1.tlc > postgresql-python-7.4.5-3.1.tlc > postgresql-jdbc-7.4.5-3.1.tlc > postgresql-tcl-7.4.5-3.1.tlc > postgresql-server-7.4.5-3.1.tlc > postgresql-libs-7.4.5-3.1.tlc > postgresql-docs-7.4.5-3.1.tlc > postgresql-odbc-7.3-8.1.tlc > postgresql-pl-7.4.5-3.1.tlc > postgresql-test-7.4.5-3.1.tlc > postgresql-contrib-7.4.5-3.1.tlc > [EMAIL PROTECTED] ~]# > > no other pg installation except the pgsql for windows in samba folder which I > think it isn't matter ,is it? > No other pg being run. > [EMAIL PROTECTED] ~]# ps ax|grep postmaster > 2228 ?S 0:00 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data > 3308 pts/0S+ 0:00 grep postmaster > [EMAIL PROTECTED] ~]# > > Is it possible that it is related to pg_ident.conf ? > > Any comment please. > Amrit,Thailand > > > > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
On Wed, Feb 23, 2005 at 11:39:27AM -0500, John Allgood wrote: > Hello All > >I am setting up a hardware clustering solution. My hardware is Dual > Opteron 550 with 8GB ram. My external storage is a Kingston Fibre > channel Infostation. With 14 15000'k 36GB drives. The OS we are running > is Redhat ES 3.0. Clustering using Redhat Cluster Suite. Postgres > Version is Postgres 7.4.7. We will be setting up about 9 databases which > range in size from 100MB to 2.5GB on the config. The postgres > application is mostly read intensive. What would be the best way to > setup the hardrives on this server. Currently I have it partioned with 2 > seperate raid 5 with 1 failover per raid. I have two database clusters > configured with a seperate postmaster running for each. Running two > postmasters seems like a pain but that is the only way I knew to > seperate the load. I am mostly concerned about disk IO and performance. > Is my current setup a good way to accomplish the best performance or > would it be better to use all the drives in one huge raid five with a > couple of failovers. I have looked around in the archives and found some > info but I would like to here about some of the configs other people are > running and how they have them setup. http://www.powerpostgresql.com/PerfList/ Consider a separate array for pg_xlog. With tablespaces in 8.0, you can isolate much of the IO in a single cluster. -Mike Adler ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
Is there a real limit for max_connections? Here we've an Oracle server with up to 1200 simultaneous conections over it! "max_connections: exactly like previous versions, this needs to be set to the actual number of simultaneous connections you expect to need. High settings will require more shared memory (shared_buffers). As the per-connection overhead, both from PostgreSQL and the host OS, can be quite high, it's important to use connection pooling if you need to service a large number of users. For example, 150 active connections on a medium-end 32-bit Linux server will consume significant system resources, and 600 is about the limit." C ya, Bruno -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Adler Sent: Wednesday, February 23, 2005 3:04 PM To: John Allgood Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout On Wed, Feb 23, 2005 at 11:39:27AM -0500, John Allgood wrote: > Hello All > >I am setting up a hardware clustering solution. My hardware is Dual > Opteron 550 with 8GB ram. My external storage is a Kingston Fibre > channel Infostation. With 14 15000'k 36GB drives. The OS we are running > is Redhat ES 3.0. Clustering using Redhat Cluster Suite. Postgres > Version is Postgres 7.4.7. We will be setting up about 9 databases which > range in size from 100MB to 2.5GB on the config. The postgres > application is mostly read intensive. What would be the best way to > setup the hardrives on this server. Currently I have it partioned with 2 > seperate raid 5 with 1 failover per raid. I have two database clusters > configured with a seperate postmaster running for each. Running two > postmasters seems like a pain but that is the only way I knew to > seperate the load. I am mostly concerned about disk IO and performance. > Is my current setup a good way to accomplish the best performance or > would it be better to use all the drives in one huge raid five with a > couple of failovers. I have looked around in the archives and found some > info but I would like to here about some of the configs other people are > running and how they have them setup. http://www.powerpostgresql.com/PerfList/ Consider a separate array for pg_xlog. With tablespaces in 8.0, you can isolate much of the IO in a single cluster. -Mike Adler ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
"Bruno Almeida do Lago" <[EMAIL PROTECTED]> writes: > Is there a real limit for max_connections? Here we've an Oracle server with > up to 1200 simultaneous conections over it! [ shrug... ] If your machine has the beef to run 1200 simultaneous queries, you can set max_connections to 1200. The point of what you were quoting is that if you want to service 1200 concurrent users but you only expect maybe 100 simultaneously active queries from them (and you have a database box that can only service that many) then you want to put a connection pooler in front of 100 backends, not try to start a backend for every user. Oracle may handle this sort of thing differently, I dunno. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
On Wed, 2005-02-23 at 15:26 -0300, Bruno Almeida do Lago wrote: > Is there a real limit for max_connections? Here we've an Oracle server with > up to 1200 simultaneous conections over it! If you can reduce them by using something like pgpool between PostgreSQL and the client, you'll save some headache. PostgreSQL did not perform as well with a large number of idle connections and it does otherwise (last time I tested was 7.4 though -- perhaps it's better now). The kernel also starts to play a significant role with a high number of connections. Some operating systems don't perform as well with a high number of processes (process handling, scheduling, file handles, etc.). I think you can do it without any technical issues, but you will probably be happier with the result if you can hide idle connections from the database machine. -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
Rod Taylor <[EMAIL PROTECTED]> writes: > The kernel also starts to play a significant role with a high number of > connections. Some operating systems don't perform as well with a high > number of processes (process handling, scheduling, file handles, etc.). Right; the main problem with having lots more backends than you need is that the idle ones still eat their share of RAM and open file handles. A connection pooler uses relatively few resources per idle connection, so it's a much better impedance match if you want to service lots of connections that are mostly idle. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
I think maybe I didn't explain myself well enough. At most we will service 200-250 connections across all the 9 databases mentioned. The database we are building is for a trucking company. Each of the databases represents a different division. With one master database that everything is updated to. Most of the access to the database is by simple queries. Most of the IO intensive stuff is done when revenue reports are generated and when we have our month/year end processing. All the trucking loads that are mark as delivered are transferred to our master database during night time processing. All that will be handled using custom scripts. Maybe I have given a better explanation of the application. my biggest concern is how to partition the shared storage for maximum performance. Is there a real benifit to having more that one raid5 partition or am I wasting my time. Thanks John Allgood - ESC Tom Lane wrote: "Bruno Almeida do Lago" <[EMAIL PROTECTED]> writes: Is there a real limit for max_connections? Here we've an Oracle server with up to 1200 simultaneous conections over it! [ shrug... ] If your machine has the beef to run 1200 simultaneous queries, you can set max_connections to 1200. The point of what you were quoting is that if you want to service 1200 concurrent users but you only expect maybe 100 simultaneously active queries from them (and you have a database box that can only service that many) then you want to put a connection pooler in front of 100 backends, not try to start a backend for every user. Oracle may handle this sort of thing differently, I dunno. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] is pg_autovacuum so effective ?
Christopher Browne wrote: Gaetano Mendola <[EMAIL PROTECTED]> writes: I do a graph about my disk usage and it's a ramp since one week, I'll continue to wait in order to see if it will decrease. I was expecting the steady state at something like 4 GB ( after a full vacuum and reindex ) + 10 % = 4.4 GB I'm at 4.6 GB and increasing. I'll see how it will continue. You probably want for the "experiment" to last more than a week. After all, it might actually be that with your usage patterns, that table would stabilize at 15% "overhead," and that might take a couple or three weeks. Unless it's clear that it's growing perilously quickly, just leave it alone so that there's actually some possibility of reaching an equilibrium. Any time you "VACUUM FULL" it, that _destroys_ any experimental results or any noticeable patterns, and it guarantees that you'll see "seemingly perilous growth" for a while. And if the table is _TRULY_ growing "perilously quickly," then it is likely that you should add in some scheduled vacuums on the table. Not VACUUM FULLs; just plain VACUUMs. I revised cron scripts yet again today to do hourly and "4x/day" vacuums of certain tables in some of our systems where we know they need the attention. I didn't schedule any VACUUM FULLs; it's unnecessary, and would lead directly to system outages, which is totally unacceptable. Chris, is this in addition to pg_autovacuum? Or do you not use pg_autovacuum at all?, and if so why not? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
John Allgood wrote: I think maybe I didn't explain myself well enough. At most we will service 200-250 connections across all the 9 databases mentioned. The database we are building is for a trucking company. Each of the databases represents a different division. With one master database that everything is updated to. Most of the access to the database is by simple queries. Most of the IO intensive stuff is done when revenue reports are generated and when we have our month/year end processing. All the trucking loads that are mark as delivered are transferred to our master database during night time processing. All that will be handled using custom scripts. Maybe I have given a better explanation of the application. my biggest concern is how to partition the shared storage for maximum performance. Is there a real benifit to having more that one raid5 partition or am I wasting my time. Thanks John Allgood - ESC If you read the general advice statements, it's actually better to not use raid5, but to use raid10 (striping and mirroring). Simply because raid5 writing is quite poor. Also, if you have the disks, the next best improvement is to move pg_xlog onto it's own set of disks. I think that gets as much as 25% improvement by itself. pg_xlog is an append process, which must complete before the actual data gets updated, so giving it it's own set of spindles reduces seek time, and lets the log be written quickly. I think there is even some benefit to making pg_xlog be a solid state disk, as it doesn't have to be huge, but having high I/O rates can remove it as a bottleneck. (I'm not positive how large pg_xlog gets, but it is probably small compared with the total db size, and I think it can be flushed periodically as transactions are completed.) I'm not sure what you are considering "shared storage". Are you thinking that all the machines will be mounting a remote drive for writing the DB? They should all have their own local copy (multiple masters on the same database is not supported). I think it is possible to get better performance by having more raid systems. But it is application dependent. If you know that you have 2 tables that are being updated often and independently, then having each one on it's own raid would allow better concurrency. But it sounds like in your app you get concurrency by having a bunch of remote databases, which then do bulk updates on the master database. I think if you are careful to have each of the remote dbs update the master at a slightly different time, you could probably get very good transaction rates. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
On Wed, Feb 23, 2005 at 02:15:52PM -0500, John Allgood wrote: > using custom scripts. Maybe I have given a better explanation of the > application. my biggest concern is how to partition the shared storage > for maximum performance. Is there a real benifit to having more that one > raid5 partition or am I wasting my time. I think the simplest and most generic solution would be to put the OS and pg_xlog on a RAID 1 pair and dedicate the rest of the drives to RAID 5 or RAID 1+0 (striped set of mirrors) array. Depending on the nature of your work, you may get better performance by placing individual tables/indices on dedicated spindles for parallel access. -Mike Adler ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
This some good info. The type of attached storage is a Kingston 14 bay Fibre Channel Infostation. I have 14 36GB 15,000 RPM drives. I think the way it is being explained that I should build a mirror with two disk for the pg_xlog and the striping and mirroring the rest and put all my databases into one cluster. Also I might mention that I am running clustering using Redhat Clustering Suite. John Arbash Meinel wrote: John Allgood wrote: I think maybe I didn't explain myself well enough. At most we will service 200-250 connections across all the 9 databases mentioned. The database we are building is for a trucking company. Each of the databases represents a different division. With one master database that everything is updated to. Most of the access to the database is by simple queries. Most of the IO intensive stuff is done when revenue reports are generated and when we have our month/year end processing. All the trucking loads that are mark as delivered are transferred to our master database during night time processing. All that will be handled using custom scripts. Maybe I have given a better explanation of the application. my biggest concern is how to partition the shared storage for maximum performance. Is there a real benifit to having more that one raid5 partition or am I wasting my time. Thanks John Allgood - ESC If you read the general advice statements, it's actually better to not use raid5, but to use raid10 (striping and mirroring). Simply because raid5 writing is quite poor. Also, if you have the disks, the next best improvement is to move pg_xlog onto it's own set of disks. I think that gets as much as 25% improvement by itself. pg_xlog is an append process, which must complete before the actual data gets updated, so giving it it's own set of spindles reduces seek time, and lets the log be written quickly. I think there is even some benefit to making pg_xlog be a solid state disk, as it doesn't have to be huge, but having high I/O rates can remove it as a bottleneck. (I'm not positive how large pg_xlog gets, but it is probably small compared with the total db size, and I think it can be flushed periodically as transactions are completed.) I'm not sure what you are considering "shared storage". Are you thinking that all the machines will be mounting a remote drive for writing the DB? They should all have their own local copy (multiple masters on the same database is not supported). I think it is possible to get better performance by having more raid systems. But it is application dependent. If you know that you have 2 tables that are being updated often and independently, then having each one on it's own raid would allow better concurrency. But it sounds like in your app you get concurrency by having a bunch of remote databases, which then do bulk updates on the master database. I think if you are careful to have each of the remote dbs update the master at a slightly different time, you could probably get very good transaction rates. John =:-> ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
John Allgood wrote: This some good info. The type of attached storage is a Kingston 14 bay Fibre Channel Infostation. I have 14 36GB 15,000 RPM drives. I think the way it is being explained that I should build a mirror with two disk for the pg_xlog and the striping and mirroring the rest and put all my databases into one cluster. Also I might mention that I am running clustering using Redhat Clustering Suite. So are these 14-disks supposed to be shared across all of your 9 databases? It seems to me that you have a few architectural issues here. First, you can't really have 2 masters writing to the same disk array. I'm not sure if Redhat Clustering gets around this. But second is that you can't run 2 postgres engines on the same database. Postgres doesn't support a clustered setup. There are too many issues with concurancy and keeping everyone in sync. Since you seem to be okay with having a bunch of smaller localized databases, which update a master database 1/day, I would think you would want hardware to go something like this. 1 master server, at least dual opteron with access to lots of disks (likely the whole 14 if you can get away with it). Put 2 as a RAID1 for the OS, 4 as a RAID10 for pg_xlog, and then the other 8 as RAID10 for the rest of the database. 8-9 other servers, these don't need to be as powerful, since they are local domains. Probably a 4-disk RAID10 for the OS and pg_xlog is plenty good, and whatever extra disks you can get for the local database. The master database holds all information for all domains, but the other databases only hold whatever is the local information. Every night your script sequences through the domain databases one-by-one, updating the master database, and synchronizing whatever data is necesary back to the local domain. I would guess that this script could actually just continually run, going to each local db in turn, but you may want nighttime only updating depending on what kind of load they have. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Problem with 7.4.5 and webmin 1.8 in grant function
[EMAIL PROTECTED] wrote: I got the answer that is in module config of postgresl-webmin , there is a check box for Use DBI to connect if available?yes nothe default is yes , but if I choosed no everything went fine. I also test it in the desktop mechine and get the same error and the same solution. Could you explain what happen to the FC3 + postgresql and webmin 1.8? Well, given the error was coming from the postmaster, I don't believe that DBI or webmin have anything to do with it. What I can believe is that DBI=yes and DBI=no are using different parameters for connecting, therefore hitting different parts of your old (see below) pg_hba.conf settings. I concur with the other poster, and suspect that the files *were* using some form of ident identification, but have been subsequently edited to use trust - but the postmaster has not been restarted to know this! Try $ pg_ctl reload to get running with 'trust'. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
Here is a summary about the cluster suite from redhat. All 9 databases will be on the primary server the secondary server I have is the failover. They don't actually share the partitions at the same time. When you have some type of failure the backup server takes over. Once you setup the hardware and install the clustering software. You then setup a service "ie postgres" and then you tell it what harddrive you will be using. /dev/sde1 and the clustering software takes care of starting and stopping the postgres database. Cluster Manager The Cluster Manager feature of Red Hat Cluster Suite provides an application failover infrastructure that can be used by a wide range of applications, including: * Most custom and mainstream commercial applications * File and print serving * Databases and database applications * Messaging applications * Internet and open source application With Cluster Manager, these applications can be deployed in high availability configurations so that they are always operational—bringing "scale-out" capabilities to enterprise Linux deployments. For high-volume open source applications, such as NFS, Samba, and Apache, Cluster Manager provides a complete ready-to-use failover solution. For most other applications, customers can create custom failover scripts using provided templates. Red Hat Professional Services can provide custom Cluster Manager deployment services where required. Features * Support for up to eight nodes: Allows high availability to be provided for multiple applications simultaneously. * NFS/CIFS Failover: Supports highly available file serving in Unix and Windows environments. * Fully shared storage subsystem: All cluster members have access to the same storage. * Comprehensive Data Integrity guarantees: Uses the latest I/O barrier technology, such as programmable power switches and watchdog timers. * SCSI and Fibre Channel support: Cluster Manager configurations can be deployed using latest SCSI and Fibre Channel technology. Multi-terabyte configurations can readily be made highly available. * Service failover: Cluster Manager not only ensures hardware shutdowns or failures are detected and recovered from automatically, but also will monitor your applications to ensure they are running correctly, and will restart them automatically if they fail. John Arbash Meinel wrote: John Allgood wrote: This some good info. The type of attached storage is a Kingston 14 bay Fibre Channel Infostation. I have 14 36GB 15,000 RPM drives. I think the way it is being explained that I should build a mirror with two disk for the pg_xlog and the striping and mirroring the rest and put all my databases into one cluster. Also I might mention that I am running clustering using Redhat Clustering Suite. So are these 14-disks supposed to be shared across all of your 9 databases? It seems to me that you have a few architectural issues here. First, you can't really have 2 masters writing to the same disk array. I'm not sure if Redhat Clustering gets around this. But second is that you can't run 2 postgres engines on the same database. Postgres doesn't support a clustered setup. There are too many issues with concurancy and keeping everyone in sync. Since you seem to be okay with having a bunch of smaller localized databases, which update a master database 1/day, I would think you would want hardware to go something like this. 1 master server, at least dual opteron with access to lots of disks (likely the whole 14 if you can get away with it). Put 2 as a RAID1 for the OS, 4 as a RAID10 for pg_xlog, and then the other 8 as RAID10 for the rest of the database. 8-9 other servers, these don't need to be as powerful, since they are local domains. Probably a 4-disk RAID10 for the OS and pg_xlog is plenty good, and whatever extra disks you can get for the local database. The master database holds all information for all domains, but the other databases only hold whatever is the local information. Every night your script sequences through the domain databases one-by-one, updating the master database, and synchronizing whatever data is necesary back to the local domain. I would guess that this script could actually just continually run, going to each local db in turn, but you may want nighttime only updating depending on what kind of load they have. John =:-> ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
Bruno, > For example, 150 active connections on a medium-end > 32-bit Linux server will consume significant system resources, and 600 is > about the limit." That, is, "is about the limit for a medium-end 32-bit Linux server".Sorry if the implication didn't translate well. If you use beefier hardware, of course, you can manage more connections; personally I've never needed more than 1000, even on a web site that gets 100,000 d.u.v. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
John Allgood wrote: Here is a summary about the cluster suite from redhat. All 9 databases will be on the primary server the secondary server I have is the failover. They don't actually share the partitions at the same time. When you have some type of failure the backup server takes over. Once you setup the hardware and install the clustering software. You then setup a service "ie postgres" and then you tell it what harddrive you will be using. /dev/sde1 and the clustering software takes care of starting and stopping the postgres database. Okay, I misunderstood your hardware. So you actually only have 1 machine, with a second machine as a potential rollover. But all transactions occur on the same hardware, even if is a separate "database". I was thinking these were alternate machines. So my first question is why are you partitioning into a separate database, and then updating the master one at night. Since everything is restricted to the same machine, why not just have everything performed on the master? However, sticking with your arrangement, it would seem that you might be able to get some extra performance if each database is on it's own raid, since you are fairly likely to have 2 transactions occuring at the same time, that don't affect eachother (since you wouldn't have any foreign keys, etc on 2 separate databases.) But I think the basic OS RAID1, pg_xlog RAID10, database RAID10 is still a good separation of disks. And probably would help you maximize your throughput. I can't say too much about how the Cluster failover stuff will work with postgres. But as long as one is completely shutdown before the next is started, and they are both running binary compatible versions of postgres, it seems like it would be fine. Not much different from having a second machine that is sitting turned off, which you turn on when the first one fails. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] PostgreSQL is extremely slow on Windows
Magnus Hagander wrote: Yes, fsync=false is very good for bulk loading *IFF* you can live with data loss in case you get a crash during load. It's not merely data loss -- you could encounter potentially unrecoverable database corruption. There is a TODO item about allowing the delaying of WAL writes. If we maintain the WAL invariant (that is, a WAL record describing a change must hit disk before the change itself does) but simply don't flush the WAL at transaction commit, we should be able to get better performance without the risk of database corruption (so we would need to keep pages modified by the committed transaction pinned in memory until the WAL has been flushed, which might be done on a periodic basis). Naturally, there is a risk of losing data in the period between transaction commit and syncing the WAL, but no risk of database corruption. This seems a reasonable approach to providing better performance for people who don't need the strict guarantees provided by fsync=true. -Neil ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] PostgreSQL is extremely slow on Windows
Neil Conway <[EMAIL PROTECTED]> writes: > There is a TODO item about allowing the delaying of WAL writes. If we > maintain the WAL invariant (that is, a WAL record describing a change > must hit disk before the change itself does) but simply don't flush the > WAL at transaction commit, we should be able to get better performance > without the risk of database corruption (so we would need to keep pages > modified by the committed transaction pinned in memory until the WAL has > been flushed, which might be done on a periodic basis). That interlock already exists, in the form of the bufmgr LSN logic. I think this "feature" might be as simple as XLogFlush(recptr); becomes /* Must flush if we are deleting files... */ if (PerCommitFlush || nrels > 0) XLogFlush(recptr); in RecordTransactionCommit. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
[EMAIL PROTECTED] ("Bruno Almeida do Lago") wrote: > Is there a real limit for max_connections? Here we've an Oracle server with > up to 1200 simultaneous conections over it! > > "max_connections: exactly like previous versions, this needs to be set to > the actual number of simultaneous connections you expect to need. High > settings will require more shared memory (shared_buffers). As the > per-connection overhead, both from PostgreSQL and the host OS, can be quite > high, it's important to use connection pooling if you need to service a > large number of users. For example, 150 active connections on a medium-end > 32-bit Linux server will consume significant system resources, and 600 is > about the limit." Right now, I have an Opteron box with: a) A load average of about 0.1, possibly less ;-), and b) 570 concurrent connections. Having so connections is something of a "fool's errand," as it really is ludicrously unnecessary, but I wouldn't be too afraid of having 1000 connections on that box, as long as they're being used for relatively small transactions. You can, of course, kill performance on any not-outrageously-large system if a few of those users are doing big queries... -- wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','gmail.com'). http://cbbrowne.com/info/slony.html I've had a perfectly wonderful evening. But this wasn't it. -- Groucho Marx ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match