[PERFORM] Postgresql vs SQLserver for this application ?
hi all. We are designing a quite big application that requires a high-performance database backend. The rates we need to obtain are at least 5000 inserts per second and 15 selects per second for one connection. There should only be 3 or 4 simultaneous connections. I think our main concern is to deal with the constant flow of data coming from the inserts that must be available for selection as fast as possible. (kind of real time access ...) As a consequence, the database should rapidly increase up to more than one hundred gigs. We still have to determine how and when we shoud backup old data to prevent the application from a performance drop. We intend to develop some kind of real-time partionning on our main table keep the flows up. At first, we were planning to use SQL Server as it has features that in my opinion could help us a lot : - replication - clustering Recently we started to study Postgresql as a solution for our project : - it also has replication - Postgis module can handle geographic datatypes (which would facilitate our developments) - We do have a strong knowledge on Postgresql administration (we use it for production processes) - it is free (!) and we could save money for hardware purchase. Is SQL server clustering a real asset ? How reliable are Postgresql replication tools ? Should I trust Postgresql performance for this kind of needs ? My question is a bit fuzzy but any advices are most welcome... hardware,tuning or design tips as well :)) Thanks a lot. Benjamin.
[PERFORM] 8.0.1 performance question.
Hi, I have just upgraded our db from 7.4.2 to 8.0.1 and we are doing some testing. For some reasons, we have discovered that our application performs much slower on 8.0.1. My initial reaction was to turn on log_min_duration_statement to see what's happening. However, log_min_duration_statement does not work for JDBC clients in 8.0.1. As a result, I modified log_statement to all. Without my application doing anything, I see statements below being executed non-stop. Who is triggering these statemetns? Is this normal? What am I doing wrong? I am using Fedora Core 1 - Kernel: 2.4.22-1.2174.nptl Please help. Thanks. 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON (a.attrelid=c.oid ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc L IKE '%nextval(%' 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnul
Re: [PERFORM] How to improve db performance with $7K?
To be honest, I've yet to run across a SCSI configuration that can touch the 3ware SATA controllers. I have yet to see one top 80MB/sec, let alone 180MB/sec read or write, which is why we moved _away_ from SCSI. I've seen Compaq, Dell and LSI controllers all do pathetically badly on RAID 1, RAID 5 and RAID 10. 35MB/sec for a three drive RAID 0 is not bad, it's appalling. The hardware manufacturer should be publicly embarassed for this kind of speed. A single U320 10k drive can do close to 70MB/sec sustained. If someone can offer benchmarks to the contrary (particularly in linux), I would be greatly interested. Alex Turner netEconomist On Mar 29, 2005 8:17 AM, Dave Cramer <[EMAIL PROTECTED]> wrote: > Yeah, 35Mb per sec is slow for a raid controller, the 3ware mirrored is > about 50Mb/sec, and striped is about 100 > > Dave > > PFC wrote: > > > > >> With hardware tuning, I am sure we can do better than 35Mb per sec. Also > > > > > > WTF ? > > > > My Laptop does 19 MB/s (reading <10 KB files, reiser4) ! > > > > A recent desktop 7200rpm IDE drive > > # hdparm -t /dev/hdc1 > > /dev/hdc1: > > Timing buffered disk reads: 148 MB in 3.02 seconds = 49.01 MB/sec > > > > # ll "DragonBall 001.avi" > > -r--r--r--1 peufeu users218M mar 9 20:07 DragonBall > > 001.avi > > > > # time cat "DragonBall 001.avi" >/dev/null > > real0m4.162s > > user0m0.020s > > sys 0m0.510s > > > > (the file was not in the cache) > > => about 52 MB/s (reiser3.6) > > > > So, you have a problem with your hardware... > > > > ---(end of broadcast)--- > > TIP 7: don't forget to increase your free space map settings > > > > > > -- > Dave Cramer > http://www.postgresintl.com > 519 939 0336 > ICQ#14675561 > > ---(end of broadcast)--- > TIP 6: Have you searched our list archives? > >http://archives.postgresql.org > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 1, 2005 3:59 PM, Christopher Petrilli <[EMAIL PROTECTED]> wrote: > On Apr 1, 2005 3:53 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote: > > > > > What seems to happen is it slams into a "wall" of some sort, the > > > system goes into disk write frenzy (wait=90% CPU), and eventually > > > recovers and starts running for a while at a more normal speed. What > > > I need though, is to not have that wall happen. It is easier for me > > > to accept a constant degredation of 5%, rather than a 99% degredation > > > for short periods, as it can cause cascade problems in the system. > > > > Could this possibly be a checkpoint happening? > > > > Also how many checkpoint segments do you have? > > Changes to the postgresql.conf file from "default": > > maintenance_work_mem = 131072 > fsync = false > checkpoint_segments = 32 I've now had a chance to run a couple more tests, and here's two graphs of the time required to insert (via COPY from a file) 500 records at a time: http://www.amber.org/~petrilli/diagrams/pgsql_copy500.png http://www.amber.org/~petrilli/diagrams/pgsql_copy500_bgwriter.png The first is with the above changes, the second contains two additional modificiations to the configuration: bgwriter_percent = 25 bgwriter_maxpages = 1000 To my, likely faulty, intuition, it would seem that there is a backup happening in the moving of data from the WAL to the final resting place, and that by increasing these I could pull that forward. As you can see from the charts, that doesn't seem to have any major impact. The point, in the rough middle, is where the program begins inserting into a new table (inherited). The X axis is the "total" number of rows inserted. The table has: * 21 columns (nothing too strange) * No OIDS * 5 indexes, including the primary key on a string They are created by creating a main table, then doing: CREATE TABLE foo001 INHERITS (foos); And then recreating all the indexes. Thoughts? Any advice would be more than appreciated. Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Sustained inserts per sec ... ?
Yup, Battery backed, cache enabled. 6 drive RAID 10, and 4 drive RAID 10, and 2xRAID 1. It's a 3ware 9500S-8MI - not bad for $450 plus BBU. Alex Turner netEconomist On Apr 1, 2005 6:03 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > Alex Turner <[EMAIL PROTECTED]> writes: > > On Apr 1, 2005 4:17 PM, Alex Turner <[EMAIL PROTECTED]> wrote: > >> 1250/sec with record size average is 26 bytes > >> 800/sec with record size average is 48 bytes. > >> 250/sec with record size average is 618 bytes. > > > Oh - this is with a seperate transaction per command. > > fsync is on. > > [ raised eyebrow... ] What kind of disk hardware is that exactly, and > does it have write cache enabled? It's hard to believe those numbers > if not. > > Write caching is fine if it's done in a battery-backed cache, which you > can get in the higher-end hardware RAID controllers. Otherwise you're > going to have problems whenever the power goes away unexpectedly. > > regards, tom lane > ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 8.0.1 performance question.
<[EMAIL PROTECTED]> writes: > As a result, I modified log_statement to all. Without my application > doing anything, I see statements below being executed non-stop. Who > is triggering these statemetns? Is this normal? What am I doing > wrong? > 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM > pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 > 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM > pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON > (a.attrelid=c.oid > ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND > a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc > L > IKE '%nextval(%' Better ask about that on pgsql-jdbc. I suppose this is the trace of the JDBC driver trying to find out column metadata ... but if it's failing to cache the information that's a pretty serious performance hit. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sustained inserts per sec ... ?
Christopher Petrilli <[EMAIL PROTECTED]> writes: > The table has: > * 21 columns (nothing too strange) > * No OIDS > * 5 indexes, including the primary key on a string Could we see the *exact* SQL definitions of the table and indexes? Also some sample data would be interesting. I'm wondering for example about the incidence of duplicate index keys. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to improve db performance with $7K?
Alex Turner wrote: To be honest, I've yet to run across a SCSI configuration that can touch the 3ware SATA controllers. I have yet to see one top 80MB/sec, let alone 180MB/sec read or write, which is why we moved _away_ from SCSI. I've seen Compaq, Dell and LSI controllers all do pathetically badly on RAID 1, RAID 5 and RAID 10. Alex, How does the 3ware controller do in heavy writes back to the database? It may have been Josh, but someone said that SATA does well with reads but not writes. Would not equal amount of SCSI drives outperform SATA? I don't want to start a "whose better" war, I am just trying to learn here. It would seem the more drives you could place in a RAID configuration, the performance would increase. Steve Poe ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 4, 2005 11:52 AM, Tom Lane <[EMAIL PROTECTED]> wrote: > Christopher Petrilli <[EMAIL PROTECTED]> writes: > > The table has: > > * 21 columns (nothing too strange) > > * No OIDS > > * 5 indexes, including the primary key on a string > > Could we see the *exact* SQL definitions of the table and indexes? > Also some sample data would be interesting. I'm wondering for example > about the incidence of duplicate index keys. Of course, this is a bit cleansed, since it's an internal project, but only the column names are changed: CREATE TABLE foos ( foo_id VARCHAR(32), s_tsINTEGER NOT NULL, c_tsINTEGER NOT NULL, bar_id INTEGER NOT NULL, proto INTEGER NOT NULL, src_ip INT8 NOT NULL, dst_ip INT8 NOT NULL, src_portINTEGER, dst_portINTEGER, nated INTEGER NOT NULL, src_nat_ip INT8, dst_nat_ip INT8, src_nat_portINTEGER, dst_nat_portINTEGER, foo_class INTEGER NOT NULL, foo_typeINTEGER NOT NULL, src_bar INTEGER NOT NULL, dst_bar INTEGER NOT NULL, user_name VARCHAR(255), infoTEXT ) WITHOUT OIDS; ALTER TABLE foos ADD CONSTRAINT foos_foo_id_pk UNIQUE (foo_id); CREATE INDEX foos_c_ts_idx ON foos(conduit_ts); CREATE INDEX foos_src_ip_idx ON foos(src_ip); CREATE INDEX foos_dst_ip_idx ON foos(dst_ip); CREATE INDEX foos_foo_class_idx ON foos(foo_class); CREATE INDEX foos_foo_type_idx ON foos(foo_type); CREATE TABLE foos001 ( ) INHERITS (foos) WITHOUT OIDS; ALTER TABLE foos001 ADD CONSTRAINT foos001_foo_id_pk UNIQUE (foo_id); CREATE INDEX foos001_c_ts_idx ON foos001(conduit_ts); CREATE INDEX foos001_src_ip_idx ON foos001(src_ip); CREATE INDEX foos001_dst_ip_idx ON foos001(dst_ip); CREATE INDEX foos001_foo_class_idx ON foos001(foo_class); CREATE INDEX foos001_foo_type_idx ON foos001(foo_type); That continues on, but you get the idea... So, as you asked about data content, specifically regarding indices, here's what the "simulator" creates: foo_id - 32 character UID (generated by the UUID function in mxTools, which looks like '00beef19420053c64f3f01aeb0b4a2a5', and varies in the upper components more than the lower. *_ts - UNIX epoch timestamps, sequential. There's a long story behind not using DATETIME format, but if that's the big issue, it can be dealt with. *_ip - Randomly selected 32-bit integers from a pre-generated list containing about 500 different numbers ranging from 3232235500 to 3232236031. This is unfortunately, not too atypical from the "real world". *_class - Randomly selected 1-100 (again, not atypical, although normal distribution would be less random) *_type - Randomly selected 1-1 (not atypical, and more random than in real world) Hopefully this helps? Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] [JDBC] 8.0.1 performance question.
On Tue, 5 Apr 2005 [EMAIL PROTECTED] wrote: > I see statements below being executed non-stop. Who is triggering these > statemetns? Is this normal? What am I doing wrong? > > > 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM > pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 > 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM > pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON > (a.attrelid=c.oid > ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND > a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc > L > IKE '%nextval(%' These are the results of ResultSetMetaData.isNullable() and isAutoIncrement(), which your code is apparently calling. The results of these calls are cached on a per ResultSet data. We have discussed caching them at a higher level, but couldn't find a way to know when to flush that cache. Kris Jurka ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Sustained inserts per sec ... ?
Christopher Petrilli <[EMAIL PROTECTED]> writes: > On Apr 4, 2005 11:52 AM, Tom Lane <[EMAIL PROTECTED]> wrote: >> Could we see the *exact* SQL definitions of the table and indexes? > Of course, this is a bit cleansed, since it's an internal project, but > only the column names are changed: Thanks. No smoking gun in sight there. But out of curiosity, can you do a test run with *no* indexes on the table, just to see if it behaves any differently? Basically I was wondering if index overhead might be part of the problem. Also, the X-axis on your graphs seems to be total number of rows inserted ... can you relate that to elapsed real time for us? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 4, 2005 12:23 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > Christopher Petrilli <[EMAIL PROTECTED]> writes: > > On Apr 4, 2005 11:52 AM, Tom Lane <[EMAIL PROTECTED]> wrote: > >> Could we see the *exact* SQL definitions of the table and indexes? > > > Of course, this is a bit cleansed, since it's an internal project, but > > only the column names are changed: > > Thanks. No smoking gun in sight there. But out of curiosity, can you > do a test run with *no* indexes on the table, just to see if it behaves > any differently? Basically I was wondering if index overhead might be > part of the problem. Running now, but it'll take a while since I have a 3/4 second pause after each COPY to better reflect "real world" ... the application does 1 COPY per second, or whenever it hits 1000 entries. This seemed to be a sane way to deal with it, and not burden the system with needless index balancing, etc. > Also, the X-axis on your graphs seems to be total number of rows > inserted ... can you relate that to elapsed real time for us? Sure, like I said, there's a 3/4 second sleep between each COPY, regardless of how long it took (which well, isn't quite right, but close enough for this test). I've created a PNG with the X axies reflecting "elapsed time": http://www.amber.org/~petrilli/diagrams/pgsql_copyperf_timeline.png In addition, I've put up the raw data I used: http://www.amber.org/~petrilli/diagrams/results_timeline.txt The columns are rowcount, elapsed time, instance time. Hopefully this might help some? This machine has nothing else running on it other than the normal stripped down background processes (like sshd). -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [JDBC] 8.0.1 performance question.
Thank you for the quick response. To help me debug what's happening, can you tell me what's the difference between the 7.4 and 8.0 jdbc drivers in this regard? Is this something that is newly introduced in 8.0? Or is this something that has always been happening? Thanks. On Apr 5, 2005 12:15 AM, Kris Jurka <[EMAIL PROTECTED]> wrote: > > > On Tue, 5 Apr 2005 [EMAIL PROTECTED] wrote: > > > I see statements below being executed non-stop. Who is triggering these > > statemetns? Is this normal? What am I doing wrong? > > > > > > 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT attnotnull FROM > > pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2 > > 2005-04-04 18:05:00 CST PARSELOG: statement: SELECT def.adsrc FROM > > pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON > > (a.attrelid=c.oid > > ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND > > a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc > > L > > IKE '%nextval(%' > > These are the results of ResultSetMetaData.isNullable() and > isAutoIncrement(), which your code is apparently calling. The results of > these calls are cached on a per ResultSet data. We have discussed > caching them at a higher level, but couldn't find a way to know when to > flush that cache. > > Kris Jurka > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] [JDBC] 8.0.1 performance question.
On Tue, 5 Apr 2005 [EMAIL PROTECTED] wrote: > Thank you for the quick response. To help me debug what's happening, > can you tell me what's the difference between the 7.4 and 8.0 jdbc > drivers in this regard? Is this something that is newly introduced in > 8.0? Or is this something that has always been happening? > 8.0 is the first driver version to take advantage of the V3 protocol's ability to return the base tables and columns of a ResultSet. Previously isNullable was hardcoded to always return columnNullableUnknown and isAutoIncrement always returned false. I guess the question is why are you calling these methods if they didn't work previously? Kris Jurka ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to improve db performance with $7K?
I'm no drive expert, but it seems to me that our write performance is excellent. I think what most are concerned about is OLTP where you are doing heavy write _and_ heavy read performance at the same time. Our system is mostly read during the day, but we do a full system update everynight that is all writes, and it's very fast compared to the smaller SCSI system we moved off of. Nearly a 6x spead improvement, as fast as 900 rows/sec with a 48 byte record, one row per transaction. I don't know enough about how SATA works to really comment on it's performance as a protocol compared with SCSI. If anyone has a usefull link on that, it would be greatly appreciated. More drives will give more throughput/sec, but not necesarily more transactions/sec. For that you will need more RAM on the controler, and defaintely a BBU to keep your data safe. Alex Turner netEconomist On Apr 4, 2005 10:39 AM, Steve Poe <[EMAIL PROTECTED]> wrote: > > > Alex Turner wrote: > > >To be honest, I've yet to run across a SCSI configuration that can > >touch the 3ware SATA controllers. I have yet to see one top 80MB/sec, > >let alone 180MB/sec read or write, which is why we moved _away_ from > >SCSI. I've seen Compaq, Dell and LSI controllers all do pathetically > >badly on RAID 1, RAID 5 and RAID 10. > > > > > Alex, > > How does the 3ware controller do in heavy writes back to the database? > It may have been Josh, but someone said that SATA does well with reads > but not writes. Would not equal amount of SCSI drives outperform SATA? > I don't want to start a "whose better" war, I am just trying to learn > here. It would seem the more drives you could place in a RAID > configuration, the performance would increase. > > Steve Poe > > ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How to improve db performance with $7K?
On Apr 4, 2005, at 3:12 PM, Alex Turner wrote: Our system is mostly read during the day, but we do a full system update everynight that is all writes, and it's very fast compared to the smaller SCSI system we moved off of. Nearly a 6x spead improvement, as fast as 900 rows/sec with a 48 byte record, one row per transaction. Well, if you're not heavily multitasking, the advantage of SCSI is lost on you. Vivek Khera, Ph.D. +1-301-869-4449 x806 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How to improve db performance with $7K?
I'm doing some research on SATA vs SCSI right now, but to be honest I'm not turning up much at the protocol level. Alot of stupid benchmarks comparing 10k Raptor drives against Top of the line 15k drives, where usnurprsingly the SCSI drives win but of course cost 4 times as much. Although even in some, SATA wins, or draws. I'm trying to find something more apples to apples. 10k to 10k. Alex Turner netEconomist On Apr 4, 2005 3:23 PM, Vivek Khera <[EMAIL PROTECTED]> wrote: > > On Apr 4, 2005, at 3:12 PM, Alex Turner wrote: > > > Our system is mostly read during the day, but we do a full system > > update everynight that is all writes, and it's very fast compared to > > the smaller SCSI system we moved off of. Nearly a 6x spead > > improvement, as fast as 900 rows/sec with a 48 byte record, one row > > per transaction. > > > > Well, if you're not heavily multitasking, the advantage of SCSI is lost > on you. > > Vivek Khera, Ph.D. > +1-301-869-4449 x806 > > > ---(end of broadcast)--- > TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Sustained inserts per sec ... ?
On Mon, 2005-04-04 at 09:48 -0400, Christopher Petrilli wrote: > The point, in the rough middle, is where the program begins inserting > into a new table (inherited). The X axis is the "total" number of rows > inserted. and you also mention the same data plotted with elapsed time: http://www.amber.org/~petrilli/diagrams/pgsql_copyperf_timeline.png Your graphs look identical to others I've seen, so I think we're touching on something wider than your specific situation. The big difference is that things seem to go back to high performance when you switch to a new inherited table. I'm very interested in the graphs of elapsed time for COPY 500 rows against rows inserted. The simplistic inference from those graphs are that if you only inserted 5 million rows into each table, rather than 10 million rows then everything would be much quicker. I hope this doesn't work, but could you try that to see if it works? I'd like to rule out a function of "number of rows" as an issue, or focus in on it depending upon the results. Q: Please can you confirm that the discontinuity on the graph at around 5000 elapsed seconds matches EXACTLY with the switch from one table to another? That is an important point. Q: How many data files are there for these relations? Wouldn't be two, by any chance, when we have 10 million rows in them? Q: What is the average row length? About 150-160 bytes? Thanks, Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 4, 2005 3:46 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > On Mon, 2005-04-04 at 09:48 -0400, Christopher Petrilli wrote: > > The point, in the rough middle, is where the program begins inserting > > into a new table (inherited). The X axis is the "total" number of rows > > inserted. > > and you also mention the same data plotted with elapsed time: > http://www.amber.org/~petrilli/diagrams/pgsql_copyperf_timeline.png > > Your graphs look identical to others I've seen, so I think we're > touching on something wider than your specific situation. The big > difference is that things seem to go back to high performance when you > switch to a new inherited table. This is correct. > I'm very interested in the graphs of elapsed time for COPY 500 rows > against rows inserted. The simplistic inference from those graphs are > that if you only inserted 5 million rows into each table, rather than 10 > million rows then everything would be much quicker. I hope this doesn't > work, but could you try that to see if it works? I'd like to rule out a > function of "number of rows" as an issue, or focus in on it depending > upon the results. > > Q: Please can you confirm that the discontinuity on the graph at around > 5000 elapsed seconds matches EXACTLY with the switch from one table to > another? That is an important point. Well, the change over happens at 51593.395205 seconds :-) Here's two lines from the results with row count and time added: 100051584.9818912 8.41331386566 150051593.3952050.416964054108 Note that 10M is when it swaps. I see no reason to interpret it differently, so it seems to be totally based around switching tables (and thereby indices). > Q: How many data files are there for these relations? Wouldn't be two, > by any chance, when we have 10 million rows in them? I allow PostgreSQL to manage all the data files itself, so here's the default tablespace: total 48 drwx-- 2 pgsql pgsql 4096 Jan 26 20:59 1 drwx-- 2 pgsql pgsql 4096 Dec 17 19:15 17229 drwx-- 2 pgsql pgsql 4096 Feb 16 17:55 26385357 drwx-- 2 pgsql pgsql 4096 Mar 24 23:56 26425059 drwx-- 2 pgsql pgsql 8192 Mar 28 11:31 26459063 drwx-- 2 pgsql pgsql 8192 Mar 31 23:54 26475755 drwx-- 2 pgsql pgsql 4096 Apr 4 15:07 26488263 [EMAIL PROTECTED] base]# du 16624 ./26425059 5028./26385357 5660./26459063 4636./17229 6796./26475755 4780./1 1862428 ./26488263 1905952 . > Q: What is the average row length? > About 150-160 bytes? Raw data is around 150bytes, after insertion, I'd need to do some other calculations. Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Sustained inserts per sec ... ?
On Mon, 2005-04-04 at 15:56 -0400, Christopher Petrilli wrote: > On Apr 4, 2005 3:46 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > > On Mon, 2005-04-04 at 09:48 -0400, Christopher Petrilli wrote: > > > The point, in the rough middle, is where the program begins inserting > > > into a new table (inherited). The X axis is the "total" number of rows > > > inserted. > > > > and you also mention the same data plotted with elapsed time: > > http://www.amber.org/~petrilli/diagrams/pgsql_copyperf_timeline.png > > > > Your graphs look identical to others I've seen, so I think we're > > touching on something wider than your specific situation. The big > > difference is that things seem to go back to high performance when you > > switch to a new inherited table. > > This is correct. > > > I'm very interested in the graphs of elapsed time for COPY 500 rows > > against rows inserted. The simplistic inference from those graphs are > > that if you only inserted 5 million rows into each table, rather than 10 > > million rows then everything would be much quicker. I hope this doesn't > > work, but could you try that to see if it works? I'd like to rule out a > > function of "number of rows" as an issue, or focus in on it depending > > upon the results. Any chance of running a multiple load of 4 million rows per table, leaving the test running for at least 3 tables worth (12+ M rows)? > > > > Q: Please can you confirm that the discontinuity on the graph at around > > 5000 elapsed seconds matches EXACTLY with the switch from one table to > > another? That is an important point. > > Well, the change over happens at 51593.395205 seconds :-) Here's two > lines from the results with row count and time added: > > 1000 51584.9818912 8.41331386566 > 1500 51593.3952050.416964054108 > > Note that 10M is when it swaps. I see no reason to interpret it > differently, so it seems to be totally based around switching tables > (and thereby indices). OK, but do you have some other external knowledge that it is definitely happening at that time? Your argument above seems slightly circular to me. This is really important because we need to know whether it ties in with that event, or some other. Have you run this for more than 2 files, say 3 or more? You COMMIT after each 500 rows? > > Q: How many data files are there for these relations? Wouldn't be two, > > by any chance, when we have 10 million rows in them? > > I allow PostgreSQL to manage all the data files itself, so here's the > default tablespace: > > total 48 > drwx-- 2 pgsql pgsql 4096 Jan 26 20:59 1 > drwx-- 2 pgsql pgsql 4096 Dec 17 19:15 17229 > drwx-- 2 pgsql pgsql 4096 Feb 16 17:55 26385357 > drwx-- 2 pgsql pgsql 4096 Mar 24 23:56 26425059 > drwx-- 2 pgsql pgsql 8192 Mar 28 11:31 26459063 > drwx-- 2 pgsql pgsql 8192 Mar 31 23:54 26475755 > drwx-- 2 pgsql pgsql 4096 Apr 4 15:07 26488263 > [EMAIL PROTECTED] base]# du > 16624 ./26425059 > 5028./26385357 > 5660./26459063 > 4636./17229 > 6796./26475755 > 4780./1 > 1862428 ./26488263 > 1905952 . OK. Please... cd $PGDATA/base/26488263 ls -l I'm looking for the number of files associated with each inherited table (heap). > > Q: What is the average row length? > > About 150-160 bytes? > > Raw data is around 150bytes, after insertion, I'd need to do some > other calculations. By my calculations, you should have just 2 data files per 10M rows for the main table. The performance degradation seems to coincide with the point where we move to inserting into the second of the two files. I'm not looking for explanations yet, just examining coincidences and trying to predict the behaviour based upon conjectures. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 4, 2005 4:11 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > > > I'm very interested in the graphs of elapsed time for COPY 500 rows > > > against rows inserted. The simplistic inference from those graphs are > > > that if you only inserted 5 million rows into each table, rather than 10 > > > million rows then everything would be much quicker. I hope this doesn't > > > work, but could you try that to see if it works? I'd like to rule out a > > > function of "number of rows" as an issue, or focus in on it depending > > > upon the results. > > Any chance of running a multiple load of 4 million rows per table, > leaving the test running for at least 3 tables worth (12+ M rows)? As soon as I get done running a test without indexes :-) > > > Q: Please can you confirm that the discontinuity on the graph at around > > > 5000 elapsed seconds matches EXACTLY with the switch from one table to > > > another? That is an important point. > > > > Well, the change over happens at 51593.395205 seconds :-) Here's two > > lines from the results with row count and time added: > > > > 1000 51584.9818912 8.41331386566 > > 1500 51593.3952050.416964054108 > > > > Note that 10M is when it swaps. I see no reason to interpret it > > differently, so it seems to be totally based around switching tables > > (and thereby indices). > > OK, but do you have some other external knowledge that it is definitely > happening at that time? Your argument above seems slightly circular to > me. My program *SPECIFICALLY* counts to 10M then switches the COPY statement. > This is really important because we need to know whether it ties in with > that event, or some other. Unless basic integer math is failing, it's definately happening at 10M rows. > Have you run this for more than 2 files, say 3 or more? You mean, 3 or more tables? I'm not sure which type of files you are reffering to here. > You COMMIT after each 500 rows? This is done using COPY syntax, not INSERT syntax. So I suppose "yes" I do. The file that is being used for COPY is kept on a ramdisk. > OK. Please... > cd $PGDATA/base/26488263 > ls -l [EMAIL PROTECTED] base]# cd 26488263/ [EMAIL PROTECTED] 26488263]# ls -l total 2003740 -rw--- 1 pgsql pgsql 49152 Apr 4 12:26 1247 -rw--- 1 pgsql pgsql 245760 Apr 4 12:27 1249 -rw--- 1 pgsql pgsql 573440 Apr 4 12:24 1255 -rw--- 1 pgsql pgsql 57344 Apr 4 14:44 1259 -rw--- 1 pgsql pgsql 0 Apr 4 12:24 16384 -rw--- 1 pgsql pgsql 8192 Apr 4 12:26 16386 -rw--- 1 pgsql pgsql 8192 Apr 4 12:26 16388 -rw--- 1 pgsql pgsql 24576 Apr 4 12:29 16390 -rw--- 1 pgsql pgsql 106496 Apr 4 12:24 16392 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16394 -rw--- 1 pgsql pgsql 8192 Apr 4 12:24 16396 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16398 -rw--- 1 pgsql pgsql 8192 Apr 4 12:24 16400 -rw--- 1 pgsql pgsql 8192 Apr 4 12:24 16402 -rw--- 1 pgsql pgsql 0 Apr 4 12:24 16404 -rw--- 1 pgsql pgsql 8192 Apr 4 12:24 16406 -rw--- 1 pgsql pgsql 212992 Apr 4 14:44 16408 -rw--- 1 pgsql pgsql 49152 Apr 4 12:24 16410 -rw--- 1 pgsql pgsql 8192 Apr 4 12:24 16412 -rw--- 1 pgsql pgsql 0 Apr 4 12:24 16414 -rw--- 1 pgsql pgsql 114688 Apr 4 12:24 16416 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16418 -rw--- 1 pgsql pgsql 8192 Apr 4 12:24 16672 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16674 -rw--- 1 pgsql pgsql 237568 Apr 4 12:26 16676 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16678 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16679 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16680 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16681 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16682 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16683 -rw--- 1 pgsql pgsql 8192 Apr 4 12:24 16684 -rw--- 1 pgsql pgsql 8192 Apr 4 12:24 16685 -rw--- 1 pgsql pgsql 245760 Apr 4 12:26 16686 -rw--- 1 pgsql pgsql 73728 Apr 4 12:26 16687 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16688 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16689 -rw--- 1 pgsql pgsql 16384 Apr 4 12:26 16690 -rw--- 1 pgsql pgsql 65536 Apr 4 12:26 16691 -rw--- 1 pgsql pgsql 16384 Apr 4 12:26 16692 -rw--- 1 pgsql pgsql 16384 Apr 4 12:26 16693 -rw--- 1 pgsql pgsql 16384 Apr 4 12:26 16694 -rw--- 1 pgsql pgsql 16384 Apr 4 12:26 16695 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16696 -rw--- 1 pgsql pgsql 32768 Apr 4 12:24 16697 -rw--- 1 pgsql pgsql 16384 Apr 4 12:24 16698 -rw--- 1 pgsql pgsql 163840 Apr 4 12:26 16701 -rw--- 1 pgsql pgsql 196608 Apr 4 12:26 16702 -rw--- 1 pgsql pgsql 73728 Apr 4 12:24 16703 -rw--- 1 pgsql pgsql
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 4, 2005 4:53 PM, PFC <[EMAIL PROTECTED]> wrote: > > This is done using COPY syntax, not INSERT syntax. So I suppose "yes" > > I do. The file that is being used for COPY is kept on a ramdisk. > > COPY or psql \copy ? > If you wanna be sure you commit after each COPY, launch a psql in a > shell > and check if the inserted rows are visible (watching SELECT count(*) grow > will do) The script is Python, using pyexpect (a'la expect) and does this, exactly: psql = pexpect.spawn('/usr/local/pgsql/bin/psql -d bench2 ') [ ...] start = time.time() psql.expect_exact('bench2=#') psql.sendline("COPY events%03i FROM '/mnt/tmpfs/loadfile';" % (tablenum+1)) results.write('%s\n' % (time.time() - start)) results.flush() There's other code, but it's all related to building the loadfile. Note that I'm specifically including the time it takes to get the prompt back in the timing (but it does slip 1 loop, which isn't relevent). Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Sustained inserts per sec ... ?
This is done using COPY syntax, not INSERT syntax. So I suppose "yes" I do. The file that is being used for COPY is kept on a ramdisk. COPY or psql \copy ? If you wanna be sure you commit after each COPY, launch a psql in a shell and check if the inserted rows are visible (watching SELECT count(*) grow will do) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Sustained inserts per sec ... ?
On Mon, 2005-04-04 at 16:18 -0400, Christopher Petrilli wrote: > On Apr 4, 2005 4:11 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > > > > I'm very interested in the graphs of elapsed time for COPY 500 rows > > > > against rows inserted. The simplistic inference from those graphs are > > > > that if you only inserted 5 million rows into each table, rather than 10 > > > > million rows then everything would be much quicker. I hope this doesn't > > > > work, but could you try that to see if it works? I'd like to rule out a > > > > function of "number of rows" as an issue, or focus in on it depending > > > > upon the results. > > > > Any chance of running a multiple load of 4 million rows per table, > > leaving the test running for at least 3 tables worth (12+ M rows)? > > As soon as I get done running a test without indexes :-) > > > > > Q: Please can you confirm that the discontinuity on the graph at around > > > > 5000 elapsed seconds matches EXACTLY with the switch from one table to > > > > another? That is an important point. > > > > > > Well, the change over happens at 51593.395205 seconds :-) Here's two > > > lines from the results with row count and time added: > > > > > > 1000 51584.9818912 8.41331386566 > > > 1500 51593.3952050.416964054108 > > > > My program *SPECIFICALLY* counts to 10M then switches the COPY statement. > > OK. Please... > > cd $PGDATA/base/26488263 > > ls -l > > [EMAIL PROTECTED] base]# cd 26488263/ > [EMAIL PROTECTED] 26488263]# ls -l > total 2003740 > -rw--- 1 pgsql pgsql 1073741824 Apr 4 15:07 26488271 > -rw--- 1 pgsql pgsql 407527424 Apr 4 16:17 26488271.1 Can you do: select relname from pg_class where relfilenode = 26488271 and confirm that the name is the table you've been loading... Couldn't see all your indexes... are they still there? Thanks, Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 4, 2005 4:58 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > Can you do: > select relname from pg_class where relfilenode = 26488271 > and confirm that the name is the table you've been loading... It is. > Couldn't see all your indexes... are they still there? Nope, I'm running a second run without the auxilary indices. I only have the primary key index. So far, a quick scan with the eye says that it's behaving "better", but beginning to have issues again. I'll post results as soon as they are done. Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sustained inserts per sec ... ?
On Mon, 2005-04-04 at 17:03 -0400, Christopher Petrilli wrote: > On Apr 4, 2005 4:58 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > > Can you do: > > select relname from pg_class where relfilenode = 26488271 > > and confirm that the name is the table you've been loading... > > It is. > > > Couldn't see all your indexes... are they still there? > > Nope, I'm running a second run without the auxilary indices. I only > have the primary key index. So far, a quick scan with the eye says > that it's behaving "better", but beginning to have issues again. I'll > post results as soon as they are done. Hmmm Before I start to tunnel-vision on a particular coincidence... How much memory have you got on the system? How much of that have you allocated to various tasks? What else is happening on your system? Tell us more about disk set-up and other hardware related things. Disk cache...disk speed...seek timesetc Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Sustained inserts per sec ... ?
If I'm getting the point of this thread correctly, have a huge amount of data in one table degrades INSERT/COPY performance even with just a PKEY index. If that's about the size of it, read on. If not, ignore me because I missed something. On Apr 4, 2005 10:44 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > Before I start to tunnel-vision on a particular coincidence... > Don't worry too much about tunnel vision. I see the same thing every day with multi-million row tables. The bigger the table gets (with only a pkey index) the slower the inserts go. If I start over (truncate, drop/create table), or if I point the initial load at a new table, everything gets speedy. I've always figured it was a function of table size and learned to live with it... > How much memory have you got on the system? On mine, 16G > How much of that have you allocated to various tasks? shared buffers: 15000 > What else is happening on your system? Nothing on mine. > Tell us more about disk set-up and other hardware related things. 6-disk RAID10 on a Compaq SmartArray 6404 with 256M BB cache, WAL on 2-disk mirror on built in SmartArray5 controller. -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer http://open-ils.org ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Bulk COPY end of copy delimiter
Today while trying to do a bulk COPY of data into a table, the process aborted with the following error message: ERROR: end-of-copy marker corrupt CONTEXT: COPY tbl_logged_event, line 178519: "606447014,1492,2005-02-24 03:16:14,2005-02-23 20:27:48,win_applog,,error,adsmclientservice,nt author..." Googling the error, we found reference to the '\.' (backslash-period) being an "end-of-copy marker". Unfortunately, our data contains the backslash-period character sequence. Is there any know fix or workaround for this condition? We're using Postgresql 7.3.9 and also running tests on an 8.0.1 system. Thanks in advance, --- Steve ___ Steven Rosenstein IT Architect/Developer | IBM Virtual Server Administration Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001 Text Messaging: 6463456978 @ mobile.mycingular.com Email: srosenst @ us.ibm.com "Learn from the mistakes of others because you can't live long enough to make them all yourself." -- Eleanor Roosevelt ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Bulk COPY end of copy delimiter
Hi, On Mon, 4 Apr 2005, Steven Rosenstein wrote: > > > > > Today while trying to do a bulk COPY of data into a table, the process > aborted with the following error message: > > ERROR: end-of-copy marker corrupt > CONTEXT: COPY tbl_logged_event, line 178519: "606447014,1492,2005-02-24 > 03:16:14,2005-02-23 20:27:48,win_applog,,error,adsmclientservice,nt > author..." > > Googling the error, we found reference to the '\.' (backslash-period) being > an "end-of-copy marker". Unfortunately, our data contains the > backslash-period character sequence. Is there any know fix or workaround > for this condition? Any sequence \. in COPY input data should be escaped as \\. If this data was generated by pg_dump then its a problem, but I haven't seen any other reports of this. Can I assume that you've generated the data for bulk load yourself? If so, there is discussion of escaping characters here: http://www.postgresql.org/docs/8.0/static/sql-copy.html. Gavin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 4, 2005 6:44 PM, Simon Riggs <[EMAIL PROTECTED]> wrote: > Before I start to tunnel-vision on a particular coincidence... > > How much memory have you got on the system? Now, 2Gb, but most of it is free in this situation. Earlier, I posted some of the settings related to work mem. > How much of that have you allocated to various tasks? Do you mean inside PostgreSQL? > What else is happening on your system? sshd, that's it :-) > Tell us more about disk set-up and other hardware related things. > Disk cache...disk speed...seek timesetc Sure, here's the system configuration: * AMD64/3000 * 2GB RAM (was 1GB, has made no difference) * 1 x 120GB SATA drive (w/WAL), 7200RPM Seagate * 1 x 160GB SATA drive (main), 7200RPM Seagate Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Sustained inserts per sec ... ?
Christopher Petrilli <[EMAIL PROTECTED]> writes: > On Apr 4, 2005 12:23 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> do a test run with *no* indexes on the table, just to see if it behaves >> any differently? Basically I was wondering if index overhead might be >> part of the problem. > http://www.amber.org/~petrilli/diagrams/pgsql_copy500_pkonly.png > I appologize, I forgot to kill the PK, but as you can see, the curve > flattened out a lot. It still begins to increase in what seems like > the same place. You can find the results themselves at: Yeah, this confirms the thought that the indexes are the source of the issue. (Which is what I'd expect, because a bare INSERT ought to be an approximately constant-time operation. But it's good to verify.) Now some amount of slowdown is to be expected as the indexes get larger, since it ought to take roughly O(log N) time to insert a new entry in an index of size N. The weird thing about your curves is the very sudden jump in the insert times. What I think might be happening is that the "working set" of pages touched during index inserts is gradually growing, and at some point it exceeds shared_buffers, and at that point performance goes in the toilet because we are suddenly doing lots of reads to pull in index pages that fell out of the shared buffer area. It would be interesting to watch the output of iostat or vmstat during this test run. If I'm correct about this, the I/O load should be basically all writes during the initial part of the test, and then suddenly develop a significant and increasing fraction of reads at the point where the slowdown occurs. The indicated fix of course is to increase shared_buffers. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 4, 2005 10:36 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > Christopher Petrilli <[EMAIL PROTECTED]> writes: > > On Apr 4, 2005 12:23 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > >> do a test run with *no* indexes on the table, just to see if it behaves > >> any differently? Basically I was wondering if index overhead might be > >> part of the problem. > > > http://www.amber.org/~petrilli/diagrams/pgsql_copy500_pkonly.png > > > I appologize, I forgot to kill the PK, but as you can see, the curve > > flattened out a lot. It still begins to increase in what seems like > > the same place. You can find the results themselves at: > > Yeah, this confirms the thought that the indexes are the source of > the issue. (Which is what I'd expect, because a bare INSERT ought to be > an approximately constant-time operation. But it's good to verify.) This seemsed to be my original idea, but I wanted to eliminate everything else as much as possible. I was also concerned that I might be hitting a bad case in the trees. I had to change some UID generation code to better hash, so... > Now some amount of slowdown is to be expected as the indexes get larger, > since it ought to take roughly O(log N) time to insert a new entry in an > index of size N. The weird thing about your curves is the very sudden > jump in the insert times. Right, I expected O(log N) behavior myself, and it seems to behave that way, if you look at the first section (although there's some interesting patterns that are visible if you exclude data outside the 90th percentile in the first section, that seems to coincide with some write activity. > It would be interesting to watch the output of iostat or vmstat during > this test run. If I'm correct about this, the I/O load should be > basically all writes during the initial part of the test, and then > suddenly develop a significant and increasing fraction of reads at the > point where the slowdown occurs. Well, I can track this on a run, if it would be useful, but I think you're right as it matches what I saw from looking at iostat at those points. > The indicated fix of course is to increase shared_buffers. Any idea where it should be set? Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Bulk COPY end of copy delimiter
Your assumption is correct. The data was generated out of a DB2 database, and uses commas as field delimiters. Thank you for the workaround, --- Steve ___ Steven Rosenstein IT Architect/Developer | IBM Virtual Server Administration Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001 Text Messaging: 6463456978 @ mobile.mycingular.com Email: srosenst @ us.ibm.com "Learn from the mistakes of others because you can't live long enough to make them all yourself." -- Eleanor Roosevelt Gavin Sherry <[EMAIL PROTECTED] u> To Sent by: Steven Rosenstein/New pgsql-performance York/[EMAIL PROTECTED] [EMAIL PROTECTED] cc .org pgsql-performance@postgresql.org Subject Re: [PERFORM] Bulk COPY end of copy 04/04/2005 08:00 delimiter PM Hi, On Mon, 4 Apr 2005, Steven Rosenstein wrote: > > > > > Today while trying to do a bulk COPY of data into a table, the process > aborted with the following error message: > > ERROR: end-of-copy marker corrupt > CONTEXT: COPY tbl_logged_event, line 178519: "606447014,1492,2005-02-24 > 03:16:14,2005-02-23 20:27:48,win_applog,,error,adsmclientservice,nt > author..." > > Googling the error, we found reference to the '\.' (backslash-period) being > an "end-of-copy marker". Unfortunately, our data contains the > backslash-period character sequence. Is there any know fix or workaround > for this condition? Any sequence \. in COPY input data should be escaped as \\. If this data was generated by pg_dump then its a problem, but I haven't seen any other reports of this. Can I assume that you've generated the data for bulk load yourself? If so, there is discussion of escaping characters here: http://www.postgresql.org/docs/8.0/static/sql-copy.html. Gavin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Sustained inserts per sec ... ?
Christopher Petrilli <[EMAIL PROTECTED]> writes: > On Apr 4, 2005 10:36 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> The indicated fix of course is to increase shared_buffers. > Any idea where it should be set? Not really. An upper bound would be the total size of the finished indexes for one 10M-row table, but one would suppose that that's overkill. The leaf pages shouldn't have to stay in RAM to have reasonable behavior --- the killer case is when upper-level tree pages drop out. Or that's what I'd expect anyway. You could probably drop the inter-insert sleep for testing purposes, if you want to experiment with several shared_buffers values quickly. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Sustained inserts per sec ... ?
Tom Lane <[EMAIL PROTECTED]> writes: > What I think might be happening is that the "working set" of pages > touched during index inserts is gradually growing, and at some point it > exceeds shared_buffers, and at that point performance goes in the toilet > because we are suddenly doing lots of reads to pull in index pages that > fell out of the shared buffer area. All this is happening within a single transaction too, right? So there hasn't been an fsync the entire time. It's entirely up to the kernel when to decide to start writing data. It's possible it's just buffering all the writes in memory until the amount of free buffers drops below some threshold then it suddenly starts writing out buffers. > It would be interesting to watch the output of iostat or vmstat during > this test run. If I'm correct about this, the I/O load should be > basically all writes during the initial part of the test, and then > suddenly develop a significant and increasing fraction of reads at the > point where the slowdown occurs. I think he's right, if you see a reasonable write volume before the performance drop followed by a sudden increase in read volume (and decrease of write volume proportionate to the drop in performance) then it's just shared buffers becoming a bottleneck. If there's hardly any write volume before, then a sudden increase in write volume despite a drop in performance then I might be right. In which case you might want to look into tools to tune your kernel vm system. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Sustained inserts per sec ... ?
On 04 Apr 2005 23:45:47 -0400, Greg Stark <[EMAIL PROTECTED]> wrote: > > Tom Lane <[EMAIL PROTECTED]> writes: > > > What I think might be happening is that the "working set" of pages > > touched during index inserts is gradually growing, and at some point it > > exceeds shared_buffers, and at that point performance goes in the toilet > > because we are suddenly doing lots of reads to pull in index pages that > > fell out of the shared buffer area. > > All this is happening within a single transaction too, right? So there hasn't > been an fsync the entire time. It's entirely up to the kernel when to decide > to start writing data. This was my concern, and in fact moving from ext3 -> XFS has helped substantially in this regard. This is all happening inside COPY statements, so there's effectively a commit every 500 rows. I could enlarge this, but I didn't notice a huge increase in performance when doing tests on smaller bits. Also, you are correct, I am running without fsync, although I could change that if you thought it would "smooth" the performance. The issue is less absolute performance than something more deterministic. Going from 0.05 seconds for a 500 row COPY to 26 seconds really messes with the system. One thing that was mentioned early on, and I hope people remember, is that I am running autovacuum in the background, but the timing of it seems to have little to do with the system's problems, at least the debug output doesn't conincide with performance loss. > It's possible it's just buffering all the writes in memory until the amount of > free buffers drops below some threshold then it suddenly starts writing out > buffers. That was happening with ext3, actually, or at least to the best of my knowledge. > > It would be interesting to watch the output of iostat or vmstat during > > this test run. If I'm correct about this, the I/O load should be > > basically all writes during the initial part of the test, and then > > suddenly develop a significant and increasing fraction of reads at the > > point where the slowdown occurs. > > I think he's right, if you see a reasonable write volume before the > performance drop followed by a sudden increase in read volume (and decrease of > write volume proportionate to the drop in performance) then it's just shared > buffers becoming a bottleneck. I've set shared_buffers to 16000 (from the original 1000) and am running now, without the pauses. We'll see what it looks like, but so far it seems to be running faster. How much and how it degrades will be an interesting view. > If there's hardly any write volume before, then a sudden increase in write > volume despite a drop in performance then I might be right. In which case you > might want to look into tools to tune your kernel vm system. Here's a quick snapshot of iostat: Linux 2.6.9-1.667 (bigbird.amber.org) 04/04/2005 avg-cpu: %user %nice%sys %iowait %idle 1.050.010.63 13.15 85.17 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda 0.00 0.00 0.00 3616 0 sda 23.1568.09 748.89 246884021 2715312654 sdb 19.0837.65 773.03 136515457 2802814036 The first 3 columns have been identical (or nearly so) the whole time, which tells me the system is pegged in its performance on IO. This is not surprising. Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Sustained inserts per sec ... ?
Greg Stark <[EMAIL PROTECTED]> writes: > All this is happening within a single transaction too, right? So there hasn't > been an fsync the entire time. It's entirely up to the kernel when to decide > to start writing data. No ... there's a commit every 500 records. However, I think Chris said he was running with fsync off; so you're right that the kernel is at liberty to write stuff to disk when it feels like. It could be that those outlier points are transactions that occurred in the middle of periodic syncer-driven mass writes. Maybe fsync off is counterproductive for this situation? regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Compressing WAL
Maybe better for -hackers, but here it goes anyway... Has anyone looked at compressing WAL's before writing to disk? On a system generating a lot of WAL it seems there might be some gains to be had WAL data could be compressed before going to disk, since today's machines are generally more I/O bound than CPU bound. And unlike the base tables, you generally don't need to read the WAL, so you don't really need to worry about not being able to quickly scan through the data without decompressing it. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Sustained inserts per sec ... ?
On Apr 4, 2005 11:57 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > Greg Stark <[EMAIL PROTECTED]> writes: > > All this is happening within a single transaction too, right? So there > > hasn't > > been an fsync the entire time. It's entirely up to the kernel when to decide > > to start writing data. > > No ... there's a commit every 500 records. However, I think Chris said > he was running with fsync off; so you're right that the kernel is at > liberty to write stuff to disk when it feels like. It could be that > those outlier points are transactions that occurred in the middle of > periodic syncer-driven mass writes. Maybe fsync off is > counterproductive for this situation? Looking at preliminary results from running with shared_buffers at 16000, it seems this may be correct. Performance was flatter for a BIT longer, but slammed right into the wall and started hitting the 3-30 second range per COPY. I've restarted the run, with fsync turned on (fdatasync), and we'll see. My fear is that it's some bizarre situation interacting with both issues, and one that might not be solvable. Does anyone else have much experience with this sort of sustained COPY? Chris -- | Christopher Petrilli | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Sustained inserts per sec ... ?
On Tue, Apr 05, 2005 at 12:16:27AM -0400, Christopher Petrilli wrote: > My fear is that it's some bizarre situation interacting with both > issues, and one that might not be solvable. Does anyone else have > much experience with this sort of sustained COPY? You might ask the guy who just posted to -admin about a database that's doing 340M inserts a day in 300M transactions... -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])