date:20050404

[PERFORM] Postgresql vs SQLserver for this application ?

2005-04-04 Thread bsimon

hi all.

We are designing a quite big application that requires a high-performance database backend.
The rates we need to obtain are at least 5000 inserts per second and 15 selects per second for one connection. There should only be 3 or 4 simultaneous connections.
I think our main concern is to deal with the constant flow of data coming from the inserts that must be available for selection as fast as possible. (kind of real time access ...)

As a consequence, the database should rapidly increase up to more than one hundred gigs. We still have to determine how and when we shoud backup old data to prevent the application from a performance drop. We intend to develop some kind of real-time partionning on our main table keep the flows up.

At first, we were planning to use SQL Server as it has features that in my opinion could help us a lot :
- replication
- clustering

Recently we started to study Postgresql as a solution for our project :
- it also has replication
- Postgis module can handle geographic datatypes (which would facilitate our developments)
- We do have a strong knowledge on Postgresql administration (we use it for production processes)
- it is free (!) and we could save money for hardware purchase.

Is SQL server clustering a real asset ? How reliable are Postgresql replication tools ? Should I trust Postgresql performance for this kind of needs ?

My question is a bit fuzzy but any advices are most welcome... hardware,tuning or design tips as well :))

Thanks a lot.

Benjamin.

[PERFORM] 8.0.1 performance question.

2005-04-04 Thread alvin.yk

Hi,

I have just upgraded our db from 7.4.2 to 8.0.1 and we are doing some
testing.  For some reasons, we have discovered that  our application
performs much slower on 8.0.1.

My initial reaction was to turn on  log_min_duration_statement to see
what's happening.  However, log_min_duration_statement does not work
for JDBC clients in 8.0.1.

As a result, I modified log_statement to all.  Without my application
doing anything, I see statements below being executed non-stop.  Who
is triggering these statemetns?  Is this normal?  What am I doing
wrong?

I am using Fedora Core 1 - Kernel: 2.4.22-1.2174.nptl

Please help.  Thanks.




2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid
) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
L
IKE '%nextval(%'
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid
) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
L
IKE '%nextval(%'
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid
) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
L
IKE '%nextval(%'
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid
) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
L
IKE '%nextval(%'
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid
) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
L
IKE '%nextval(%'
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid
) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
L
IKE '%nextval(%'
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid
) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
L
IKE '%nextval(%'
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid
) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
L
IKE '%nextval(%'
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid
) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
L
IKE '%nextval(%'
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
(a.attrelid=c.oid
) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
L
IKE '%nextval(%'
2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnul

Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Alex Turner

To be honest, I've yet to run across a SCSI configuration that can
touch the 3ware SATA controllers.  I have yet to see one top 80MB/sec,
let alone 180MB/sec read or write, which is why we moved _away_ from
SCSI.  I've seen Compaq, Dell and LSI controllers all do pathetically
badly on RAID 1, RAID 5 and RAID 10.

35MB/sec for a three drive RAID 0 is not bad, it's appalling.  The
hardware manufacturer should be publicly embarassed for this kind of
speed.  A single U320 10k drive can do close to 70MB/sec sustained.

If someone can offer benchmarks to the contrary (particularly in
linux), I would be greatly interested.

Alex Turner
netEconomist

On Mar 29, 2005 8:17 AM, Dave Cramer <[EMAIL PROTECTED]> wrote:
> Yeah, 35Mb per sec is slow for a raid controller, the 3ware mirrored is
> about 50Mb/sec, and striped is about 100
> 
> Dave
> 
> PFC wrote:
> 
> >
> >> With hardware tuning, I am sure we can do better than 35Mb per sec. Also
> >
> >
> > WTF ?
> >
> > My Laptop does 19 MB/s (reading <10 KB files, reiser4) !
> >
> > A recent desktop 7200rpm IDE drive
> > # hdparm -t /dev/hdc1
> > /dev/hdc1:
> >  Timing buffered disk reads:  148 MB in  3.02 seconds =  49.01 MB/sec
> >
> > # ll "DragonBall 001.avi"
> > -r--r--r--1 peufeu   users218M mar  9 20:07 DragonBall
> > 001.avi
> >
> > # time cat "DragonBall 001.avi" >/dev/null
> > real0m4.162s
> > user0m0.020s
> > sys 0m0.510s
> >
> > (the file was not in the cache)
> > => about 52 MB/s (reiser3.6)
> >
> > So, you have a problem with your hardware...
> >
> > ---(end of broadcast)---
> > TIP 7: don't forget to increase your free space map settings
> >
> >
> 
> --
> Dave Cramer
> http://www.postgresintl.com
> 519 939 0336
> ICQ#14675561
> 
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
> 
>http://archives.postgresql.org
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On Apr 1, 2005 3:59 PM, Christopher Petrilli <[EMAIL PROTECTED]> wrote:
> On Apr 1, 2005 3:53 PM, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> >
> > > What seems to happen is it slams into a "wall" of some sort, the
> > > system goes into disk write frenzy (wait=90% CPU), and eventually
> > > recovers and starts running for a while at a more normal speed.  What
> > > I need though, is to not have that wall happen.  It is easier for me
> > > to accept a constant degredation of 5%, rather than a 99% degredation
> > > for short periods, as it can cause cascade problems in the system.
> >
> > Could this possibly be a checkpoint happening?
> >
> > Also how many checkpoint segments do you have?
> 
> Changes to the postgresql.conf file from "default":
> 
> maintenance_work_mem = 131072
> fsync = false
> checkpoint_segments = 32

I've now had a chance to run a couple more tests, and here's two
graphs of the time required to insert (via COPY from a file) 500
records at a time:

http://www.amber.org/~petrilli/diagrams/pgsql_copy500.png
http://www.amber.org/~petrilli/diagrams/pgsql_copy500_bgwriter.png

The first is with the above changes, the second contains two
additional modificiations to the configuration:

bgwriter_percent = 25
bgwriter_maxpages = 1000 

To my, likely faulty, intuition, it would seem that there is a backup
happening in the moving of data from the WAL to the final resting
place, and that by increasing these I could pull that forward.  As you
can see from the charts, that doesn't seem to have any major impact. 
The point, in the rough middle, is where the program begins inserting
into a new table (inherited). The X axis is the "total" number of rows
inserted. The table has:

* 21 columns (nothing too strange)
* No OIDS
* 5 indexes, including the primary key on a string

They are created by creating a main table, then doing:

CREATE TABLE foo001 INHERITS (foos);

And then recreating all the indexes.

Thoughts? Any advice would be more than appreciated. 

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Alex Turner

Yup, Battery backed, cache enabled.  6 drive RAID 10, and 4 drive RAID
10, and 2xRAID 1.

It's a 3ware 9500S-8MI - not bad for $450 plus BBU.

Alex Turner
netEconomist

On Apr 1, 2005 6:03 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Alex Turner <[EMAIL PROTECTED]> writes:
> > On Apr 1, 2005 4:17 PM, Alex Turner <[EMAIL PROTECTED]> wrote:
> >> 1250/sec with record size average is 26 bytes
> >> 800/sec with record size average is 48 bytes.
> >> 250/sec with record size average is 618 bytes.
> 
> > Oh - this is with a seperate transaction per command.
> > fsync is on.
> 
> [ raised eyebrow... ]  What kind of disk hardware is that exactly, and
> does it have write cache enabled?  It's hard to believe those numbers
> if not.
> 
> Write caching is fine if it's done in a battery-backed cache, which you
> can get in the higher-end hardware RAID controllers.  Otherwise you're
> going to have problems whenever the power goes away unexpectedly.
> 
> regards, tom lane
>

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] 8.0.1 performance question.

2005-04-04 Thread Tom Lane

<[EMAIL PROTECTED]> writes:
> As a result, I modified log_statement to all.  Without my application
> doing anything, I see statements below being executed non-stop.  Who
> is triggering these statemetns?  Is this normal?  What am I doing
> wrong?

> 2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
> pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
> 2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
> pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
> (a.attrelid=c.oid
> ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
> a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
> L
> IKE '%nextval(%'

Better ask about that on pgsql-jdbc.  I suppose this is the trace of the
JDBC driver trying to find out column metadata ... but if it's failing
to cache the information that's a pretty serious performance hit.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Tom Lane

Christopher Petrilli <[EMAIL PROTECTED]> writes:
> The table has:
> * 21 columns (nothing too strange)
> * No OIDS
> * 5 indexes, including the primary key on a string

Could we see the *exact* SQL definitions of the table and indexes?
Also some sample data would be interesting.  I'm wondering for example
about the incidence of duplicate index keys.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Steve Poe


Alex Turner wrote:
To be honest, I've yet to run across a SCSI configuration that can
touch the 3ware SATA controllers.  I have yet to see one top 80MB/sec,
let alone 180MB/sec read or write, which is why we moved _away_ from
SCSI.  I've seen Compaq, Dell and LSI controllers all do pathetically
badly on RAID 1, RAID 5 and RAID 10.
 

Alex,
How does the 3ware controller do in heavy writes back to the database? 
It may have been Josh, but someone said that SATA does well with reads 
but not writes. Would not equal amount of SCSI drives outperform SATA?  
I don't want to start a "whose better" war, I am just trying to learn 
here. It would seem the more  drives you could place in a RAID 
configuration, the performance would increase.

Steve Poe

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On Apr 4, 2005 11:52 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Christopher Petrilli <[EMAIL PROTECTED]> writes:
> > The table has:
> > * 21 columns (nothing too strange)
> > * No OIDS
> > * 5 indexes, including the primary key on a string
> 
> Could we see the *exact* SQL definitions of the table and indexes?
> Also some sample data would be interesting.  I'm wondering for example
> about the incidence of duplicate index keys.

Of course, this is a bit cleansed, since it's an internal project, but
only the column names are changed:

CREATE TABLE foos (
foo_id  VARCHAR(32),
s_tsINTEGER NOT NULL,
c_tsINTEGER NOT NULL,
bar_id  INTEGER NOT NULL,
proto   INTEGER NOT NULL,
src_ip  INT8 NOT NULL,
dst_ip  INT8 NOT NULL,
src_portINTEGER,
dst_portINTEGER,
nated   INTEGER NOT NULL,
src_nat_ip  INT8,
dst_nat_ip  INT8,
src_nat_portINTEGER,
dst_nat_portINTEGER,
foo_class   INTEGER NOT NULL,
foo_typeINTEGER NOT NULL,
src_bar INTEGER NOT NULL,
dst_bar INTEGER NOT NULL,
user_name   VARCHAR(255),
infoTEXT
) WITHOUT OIDS;
ALTER TABLE foos ADD CONSTRAINT foos_foo_id_pk UNIQUE (foo_id);
CREATE INDEX foos_c_ts_idx ON foos(conduit_ts);
CREATE INDEX foos_src_ip_idx ON foos(src_ip);
CREATE INDEX foos_dst_ip_idx ON foos(dst_ip);
CREATE INDEX foos_foo_class_idx ON foos(foo_class);
CREATE INDEX foos_foo_type_idx ON foos(foo_type);


CREATE TABLE foos001 ( ) INHERITS (foos) WITHOUT OIDS;
ALTER TABLE foos001 ADD CONSTRAINT foos001_foo_id_pk UNIQUE (foo_id);
CREATE INDEX foos001_c_ts_idx ON foos001(conduit_ts);
CREATE INDEX foos001_src_ip_idx ON foos001(src_ip);
CREATE INDEX foos001_dst_ip_idx ON foos001(dst_ip);
CREATE INDEX foos001_foo_class_idx ON foos001(foo_class);
CREATE INDEX foos001_foo_type_idx ON foos001(foo_type);

That continues on, but you get the idea...

So, as you asked about data content, specifically regarding indices,
here's what the "simulator" creates:

foo_id - 32 character UID (generated by the UUID function in mxTools,
which looks like '00beef19420053c64f3f01aeb0b4a2a5', and varies in the
upper components more than the lower.

*_ts - UNIX epoch timestamps, sequential.  There's a long story behind
not using DATETIME format, but if that's the big issue, it can be
dealt with.

*_ip - Randomly selected 32-bit integers from a pre-generated list
containing about 500 different numbers ranging from 3232235500 to
3232236031. This is unfortunately, not too atypical from the "real
world".

*_class - Randomly selected 1-100 (again, not atypical, although
normal distribution would be less random)

*_type - Randomly selected 1-1 (not atypical, and more random than
in real world)

Hopefully this helps? 

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] [JDBC] 8.0.1 performance question.

2005-04-04 Thread Kris Jurka

On Tue, 5 Apr 2005 [EMAIL PROTECTED] wrote:

> I see statements below being executed non-stop.  Who is triggering these
> statemetns?  Is this normal?  What am I doing wrong?
> 
> 
> 2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
> pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
> 2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
> pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
> (a.attrelid=c.oid
> ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
> a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
> L
> IKE '%nextval(%'

These are the results of ResultSetMetaData.isNullable() and 
isAutoIncrement(), which your code is apparently calling.  The results of 
these calls are cached on a per ResultSet data.  We have discussed 
caching them at a higher level, but couldn't find a way to know when to 
flush that cache.

Kris Jurka

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Tom Lane

Christopher Petrilli <[EMAIL PROTECTED]> writes:
> On Apr 4, 2005 11:52 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> Could we see the *exact* SQL definitions of the table and indexes?

> Of course, this is a bit cleansed, since it's an internal project, but
> only the column names are changed:

Thanks.  No smoking gun in sight there.  But out of curiosity, can you
do a test run with *no* indexes on the table, just to see if it behaves
any differently?  Basically I was wondering if index overhead might be
part of the problem.

Also, the X-axis on your graphs seems to be total number of rows
inserted ... can you relate that to elapsed real time for us?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On Apr 4, 2005 12:23 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Christopher Petrilli <[EMAIL PROTECTED]> writes:
> > On Apr 4, 2005 11:52 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> >> Could we see the *exact* SQL definitions of the table and indexes?
> 
> > Of course, this is a bit cleansed, since it's an internal project, but
> > only the column names are changed:
> 
> Thanks.  No smoking gun in sight there.  But out of curiosity, can you
> do a test run with *no* indexes on the table, just to see if it behaves
> any differently?  Basically I was wondering if index overhead might be
> part of the problem.

Running now, but it'll take a while since I have a 3/4 second pause
after each COPY to better reflect "real world" ... the application
does 1 COPY per second, or whenever it hits 1000 entries. This seemed
to be a sane way to deal with it, and not burden the system with
needless index balancing, etc.

> Also, the X-axis on your graphs seems to be total number of rows
> inserted ... can you relate that to elapsed real time for us?

Sure, like I said, there's a 3/4 second sleep between each COPY,
regardless of how long it took (which well, isn't quite right, but
close enough for this test).  I've created a PNG with the X axies
reflecting "elapsed time":

http://www.amber.org/~petrilli/diagrams/pgsql_copyperf_timeline.png

In addition, I've put up the raw data I used:

http://www.amber.org/~petrilli/diagrams/results_timeline.txt

The columns are rowcount, elapsed time, instance time.
Hopefully this might help some?  This machine has nothing else running
on it other than the normal stripped down background processes (like
sshd).

-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] [JDBC] 8.0.1 performance question.

2005-04-04 Thread alvin.yk

Thank you for the quick response.  To help me debug what's happening,
can you tell me what's the difference between the 7.4 and 8.0 jdbc
drivers in this regard?  Is this something that is newly introduced in
8.0?  Or is this something that has always been happening?

Thanks.



On Apr 5, 2005 12:15 AM, Kris Jurka <[EMAIL PROTECTED]> wrote:
> 
> 
> On Tue, 5 Apr 2005 [EMAIL PROTECTED] wrote:
> 
> > I see statements below being executed non-stop.  Who is triggering these
> > statemetns?  Is this normal?  What am I doing wrong?
> >
> >
> > 2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT attnotnull FROM
> > pg_catalog.pg_attribute WHERE attrelid = $1 AND attnum = $2
> > 2005-04-04 18:05:00 CST PARSELOG:  statement: SELECT def.adsrc FROM
> > pg_catalog.pg_class c JOIN pg_catalog.pg_attribute a ON
> > (a.attrelid=c.oid
> > ) LEFT JOIN pg_catalog.pg_attrdef def ON (a.attrelid=def.adrelid AND
> > a.attnum = def.adnum) WHERE c.oid = $1 and a.attnum = $2 AND def.adsrc
> > L
> > IKE '%nextval(%'
> 
> These are the results of ResultSetMetaData.isNullable() and
> isAutoIncrement(), which your code is apparently calling.  The results of
> these calls are cached on a per ResultSet data.  We have discussed
> caching them at a higher level, but couldn't find a way to know when to
> flush that cache.
> 
> Kris Jurka
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] [JDBC] 8.0.1 performance question.

2005-04-04 Thread Kris Jurka

On Tue, 5 Apr 2005 [EMAIL PROTECTED] wrote:

> Thank you for the quick response.  To help me debug what's happening,
> can you tell me what's the difference between the 7.4 and 8.0 jdbc
> drivers in this regard?  Is this something that is newly introduced in
> 8.0?  Or is this something that has always been happening?
> 

8.0 is the first driver version to take advantage of the V3 protocol's 
ability to return the base tables and columns of a ResultSet.  
Previously isNullable was hardcoded to always return 
columnNullableUnknown and isAutoIncrement always returned false.

I guess the question is why are you calling these methods if they didn't 
work previously?

Kris Jurka

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Alex Turner

I'm no drive expert, but it seems to me that our write performance is
excellent.  I think what most are concerned about is OLTP where you
are doing heavy write _and_ heavy read performance at the same time.

Our system is mostly read during the day, but we do a full system
update everynight that is all writes, and it's very fast compared to
the smaller SCSI system we moved off of.  Nearly a 6x spead
improvement, as fast as 900 rows/sec with a 48 byte record, one row
per transaction.

I don't know enough about how SATA works to really comment on it's
performance as a protocol compared with SCSI.  If anyone has a usefull
link on that, it would be greatly appreciated.

More drives will give more throughput/sec, but not necesarily more
transactions/sec.  For that you will need more RAM on the controler,
and defaintely a BBU to keep your data safe.

Alex Turner
netEconomist

On Apr 4, 2005 10:39 AM, Steve Poe <[EMAIL PROTECTED]> wrote:
> 
> 
> Alex Turner wrote:
> 
> >To be honest, I've yet to run across a SCSI configuration that can
> >touch the 3ware SATA controllers.  I have yet to see one top 80MB/sec,
> >let alone 180MB/sec read or write, which is why we moved _away_ from
> >SCSI.  I've seen Compaq, Dell and LSI controllers all do pathetically
> >badly on RAID 1, RAID 5 and RAID 10.
> >
> >
> Alex,
> 
> How does the 3ware controller do in heavy writes back to the database?
> It may have been Josh, but someone said that SATA does well with reads
> but not writes. Would not equal amount of SCSI drives outperform SATA?
> I don't want to start a "whose better" war, I am just trying to learn
> here. It would seem the more  drives you could place in a RAID
> configuration, the performance would increase.
> 
> Steve Poe
> 
>

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Vivek Khera

On Apr 4, 2005, at 3:12 PM, Alex Turner wrote:
Our system is mostly read during the day, but we do a full system
update everynight that is all writes, and it's very fast compared to
the smaller SCSI system we moved off of.  Nearly a 6x spead
improvement, as fast as 900 rows/sec with a 48 byte record, one row
per transaction.
Well, if you're not heavily multitasking, the advantage of SCSI is lost 
on you.

Vivek Khera, Ph.D.
+1-301-869-4449 x806
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Alex Turner

I'm doing some research on SATA vs SCSI right now, but to be honest
I'm not turning up much at the protocol level.  Alot of stupid
benchmarks comparing 10k Raptor drives against Top of the line 15k
drives, where usnurprsingly the SCSI drives win but of course cost 4
times as much.  Although even in some, SATA wins, or draws.  I'm
trying to find something more apples to apples. 10k to 10k.

Alex Turner
netEconomist

On Apr 4, 2005 3:23 PM, Vivek Khera <[EMAIL PROTECTED]> wrote:
> 
> On Apr 4, 2005, at 3:12 PM, Alex Turner wrote:
> 
> > Our system is mostly read during the day, but we do a full system
> > update everynight that is all writes, and it's very fast compared to
> > the smaller SCSI system we moved off of.  Nearly a 6x spead
> > improvement, as fast as 900 rows/sec with a 48 byte record, one row
> > per transaction.
> >
> 
> Well, if you're not heavily multitasking, the advantage of SCSI is lost
> on you.
> 
> Vivek Khera, Ph.D.
> +1-301-869-4449 x806
> 
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Simon Riggs

On Mon, 2005-04-04 at 09:48 -0400, Christopher Petrilli wrote:
> The point, in the rough middle, is where the program begins inserting
> into a new table (inherited). The X axis is the "total" number of rows
> inserted.

and you also mention the same data plotted with elapsed time:
http://www.amber.org/~petrilli/diagrams/pgsql_copyperf_timeline.png

Your graphs look identical to others I've seen, so I think we're
touching on something wider than your specific situation. The big
difference is that things seem to go back to high performance when you
switch to a new inherited table.

I'm very interested in the graphs of elapsed time for COPY 500 rows
against rows inserted. The simplistic inference from those graphs are
that if you only inserted 5 million rows into each table, rather than 10
million rows then everything would be much quicker. I hope this doesn't
work, but could you try that to see if it works? I'd like to rule out a
function of "number of rows" as an issue, or focus in on it depending
upon the results.

Q: Please can you confirm that the discontinuity on the graph at around
5000 elapsed seconds matches EXACTLY with the switch from one table to
another? That is an important point.

Q: How many data files are there for these relations? Wouldn't be two,
by any chance, when we have 10 million rows in them?

Q: What is the average row length?
About 150-160 bytes?

Thanks,

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On Apr 4, 2005 3:46 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> On Mon, 2005-04-04 at 09:48 -0400, Christopher Petrilli wrote:
> > The point, in the rough middle, is where the program begins inserting
> > into a new table (inherited). The X axis is the "total" number of rows
> > inserted.
> 
> and you also mention the same data plotted with elapsed time:
> http://www.amber.org/~petrilli/diagrams/pgsql_copyperf_timeline.png
> 
> Your graphs look identical to others I've seen, so I think we're
> touching on something wider than your specific situation. The big
> difference is that things seem to go back to high performance when you
> switch to a new inherited table.

This is correct.
 
> I'm very interested in the graphs of elapsed time for COPY 500 rows
> against rows inserted. The simplistic inference from those graphs are
> that if you only inserted 5 million rows into each table, rather than 10
> million rows then everything would be much quicker. I hope this doesn't
> work, but could you try that to see if it works? I'd like to rule out a
> function of "number of rows" as an issue, or focus in on it depending
> upon the results.
> 
> Q: Please can you confirm that the discontinuity on the graph at around
> 5000 elapsed seconds matches EXACTLY with the switch from one table to
> another? That is an important point.

Well, the change over happens at 51593.395205 seconds :-)  Here's two
lines from the results with row count and time added:

100051584.9818912   8.41331386566
150051593.3952050.416964054108

Note that 10M is when it swaps.  I see no reason to interpret it
differently, so it seems to be totally based around switching tables
(and thereby indices).

> Q: How many data files are there for these relations? Wouldn't be two,
> by any chance, when we have 10 million rows in them?

I allow PostgreSQL to manage all the data files itself, so here's the
default tablespace:

total 48
drwx--  2 pgsql pgsql 4096 Jan 26 20:59 1
drwx--  2 pgsql pgsql 4096 Dec 17 19:15 17229
drwx--  2 pgsql pgsql 4096 Feb 16 17:55 26385357
drwx--  2 pgsql pgsql 4096 Mar 24 23:56 26425059
drwx--  2 pgsql pgsql 8192 Mar 28 11:31 26459063
drwx--  2 pgsql pgsql 8192 Mar 31 23:54 26475755
drwx--  2 pgsql pgsql 4096 Apr  4 15:07 26488263
[EMAIL PROTECTED] base]# du
16624   ./26425059
5028./26385357
5660./26459063
4636./17229
6796./26475755
4780./1
1862428 ./26488263
1905952 .

> Q: What is the average row length?
> About 150-160 bytes?

Raw data is around 150bytes, after insertion, I'd need to do some
other calculations.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Simon Riggs

On Mon, 2005-04-04 at 15:56 -0400, Christopher Petrilli wrote:
> On Apr 4, 2005 3:46 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> > On Mon, 2005-04-04 at 09:48 -0400, Christopher Petrilli wrote:
> > > The point, in the rough middle, is where the program begins inserting
> > > into a new table (inherited). The X axis is the "total" number of rows
> > > inserted.
> > 
> > and you also mention the same data plotted with elapsed time:
> > http://www.amber.org/~petrilli/diagrams/pgsql_copyperf_timeline.png
> > 
> > Your graphs look identical to others I've seen, so I think we're
> > touching on something wider than your specific situation. The big
> > difference is that things seem to go back to high performance when you
> > switch to a new inherited table.
> 
> This is correct.
>  
> > I'm very interested in the graphs of elapsed time for COPY 500 rows
> > against rows inserted. The simplistic inference from those graphs are
> > that if you only inserted 5 million rows into each table, rather than 10
> > million rows then everything would be much quicker. I hope this doesn't
> > work, but could you try that to see if it works? I'd like to rule out a
> > function of "number of rows" as an issue, or focus in on it depending
> > upon the results.

Any chance of running a multiple load of 4 million rows per table,
leaving the test running for at least 3 tables worth (12+ M rows)?

> > 
> > Q: Please can you confirm that the discontinuity on the graph at around
> > 5000 elapsed seconds matches EXACTLY with the switch from one table to
> > another? That is an important point.
> 
> Well, the change over happens at 51593.395205 seconds :-)  Here's two
> lines from the results with row count and time added:
> 
> 1000  51584.9818912   8.41331386566
> 1500  51593.3952050.416964054108
> 
> Note that 10M is when it swaps.  I see no reason to interpret it
> differently, so it seems to be totally based around switching tables
> (and thereby indices).

OK, but do you have some other external knowledge that it is definitely
happening at that time? Your argument above seems slightly circular to
me.

This is really important because we need to know whether it ties in with
that event, or some other. 

Have you run this for more than 2 files, say 3 or more?

You COMMIT after each 500 rows?

> > Q: How many data files are there for these relations? Wouldn't be two,
> > by any chance, when we have 10 million rows in them?
> 
> I allow PostgreSQL to manage all the data files itself, so here's the
> default tablespace:
> 
> total 48
> drwx--  2 pgsql pgsql 4096 Jan 26 20:59 1
> drwx--  2 pgsql pgsql 4096 Dec 17 19:15 17229
> drwx--  2 pgsql pgsql 4096 Feb 16 17:55 26385357
> drwx--  2 pgsql pgsql 4096 Mar 24 23:56 26425059
> drwx--  2 pgsql pgsql 8192 Mar 28 11:31 26459063
> drwx--  2 pgsql pgsql 8192 Mar 31 23:54 26475755
> drwx--  2 pgsql pgsql 4096 Apr  4 15:07 26488263
> [EMAIL PROTECTED] base]# du
> 16624   ./26425059
> 5028./26385357
> 5660./26459063
> 4636./17229
> 6796./26475755
> 4780./1
> 1862428 ./26488263
> 1905952 .

OK. Please...
cd $PGDATA/base/26488263
ls -l

I'm looking for the number of files associated with each inherited table
(heap).

> > Q: What is the average row length?
> > About 150-160 bytes?
> 
> Raw data is around 150bytes, after insertion, I'd need to do some
> other calculations.

By my calculations, you should have just 2 data files per 10M rows for
the main table. The performance degradation seems to coincide with the
point where we move to inserting into the second of the two files.

I'm not looking for explanations yet, just examining coincidences and
trying to predict the behaviour based upon conjectures.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On Apr 4, 2005 4:11 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> > > I'm very interested in the graphs of elapsed time for COPY 500 rows
> > > against rows inserted. The simplistic inference from those graphs are
> > > that if you only inserted 5 million rows into each table, rather than 10
> > > million rows then everything would be much quicker. I hope this doesn't
> > > work, but could you try that to see if it works? I'd like to rule out a
> > > function of "number of rows" as an issue, or focus in on it depending
> > > upon the results.
> 
> Any chance of running a multiple load of 4 million rows per table,
> leaving the test running for at least 3 tables worth (12+ M rows)?

As soon as I get done running a test without indexes :-)  

> > > Q: Please can you confirm that the discontinuity on the graph at around
> > > 5000 elapsed seconds matches EXACTLY with the switch from one table to
> > > another? That is an important point.
> >
> > Well, the change over happens at 51593.395205 seconds :-)  Here's two
> > lines from the results with row count and time added:
> >
> > 1000  51584.9818912   8.41331386566
> > 1500  51593.3952050.416964054108
> >
> > Note that 10M is when it swaps.  I see no reason to interpret it
> > differently, so it seems to be totally based around switching tables
> > (and thereby indices).
> 
> OK, but do you have some other external knowledge that it is definitely
> happening at that time? Your argument above seems slightly circular to
> me.

My program *SPECIFICALLY* counts to 10M then switches the COPY statement.

> This is really important because we need to know whether it ties in with
> that event, or some other.

Unless basic integer math is failing, it's definately happening at 10M rows.

> Have you run this for more than 2 files, say 3 or more?

You mean, 3 or more tables? I'm not sure which type of files you are
reffering to here.

> You COMMIT after each 500 rows?

This is done using COPY syntax, not INSERT syntax. So I suppose "yes"
I do.  The file that is being used for COPY is kept on a ramdisk.

> OK. Please...
> cd $PGDATA/base/26488263
> ls -l

[EMAIL PROTECTED] base]# cd 26488263/
[EMAIL PROTECTED] 26488263]# ls -l
total 2003740
-rw---  1 pgsql pgsql  49152 Apr  4 12:26 1247
-rw---  1 pgsql pgsql 245760 Apr  4 12:27 1249
-rw---  1 pgsql pgsql 573440 Apr  4 12:24 1255
-rw---  1 pgsql pgsql  57344 Apr  4 14:44 1259
-rw---  1 pgsql pgsql  0 Apr  4 12:24 16384
-rw---  1 pgsql pgsql   8192 Apr  4 12:26 16386
-rw---  1 pgsql pgsql   8192 Apr  4 12:26 16388
-rw---  1 pgsql pgsql  24576 Apr  4 12:29 16390
-rw---  1 pgsql pgsql 106496 Apr  4 12:24 16392
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16394
-rw---  1 pgsql pgsql   8192 Apr  4 12:24 16396
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16398
-rw---  1 pgsql pgsql   8192 Apr  4 12:24 16400
-rw---  1 pgsql pgsql   8192 Apr  4 12:24 16402
-rw---  1 pgsql pgsql  0 Apr  4 12:24 16404
-rw---  1 pgsql pgsql   8192 Apr  4 12:24 16406
-rw---  1 pgsql pgsql 212992 Apr  4 14:44 16408
-rw---  1 pgsql pgsql  49152 Apr  4 12:24 16410
-rw---  1 pgsql pgsql   8192 Apr  4 12:24 16412
-rw---  1 pgsql pgsql  0 Apr  4 12:24 16414
-rw---  1 pgsql pgsql 114688 Apr  4 12:24 16416
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16418
-rw---  1 pgsql pgsql   8192 Apr  4 12:24 16672
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16674
-rw---  1 pgsql pgsql 237568 Apr  4 12:26 16676
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16678
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16679
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16680
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16681
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16682
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16683
-rw---  1 pgsql pgsql   8192 Apr  4 12:24 16684
-rw---  1 pgsql pgsql   8192 Apr  4 12:24 16685
-rw---  1 pgsql pgsql 245760 Apr  4 12:26 16686
-rw---  1 pgsql pgsql  73728 Apr  4 12:26 16687
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16688
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16689
-rw---  1 pgsql pgsql  16384 Apr  4 12:26 16690
-rw---  1 pgsql pgsql  65536 Apr  4 12:26 16691
-rw---  1 pgsql pgsql  16384 Apr  4 12:26 16692
-rw---  1 pgsql pgsql  16384 Apr  4 12:26 16693
-rw---  1 pgsql pgsql  16384 Apr  4 12:26 16694
-rw---  1 pgsql pgsql  16384 Apr  4 12:26 16695
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16696
-rw---  1 pgsql pgsql  32768 Apr  4 12:24 16697
-rw---  1 pgsql pgsql  16384 Apr  4 12:24 16698
-rw---  1 pgsql pgsql 163840 Apr  4 12:26 16701
-rw---  1 pgsql pgsql 196608 Apr  4 12:26 16702
-rw---  1 pgsql pgsql  73728 Apr  4 12:24 16703
-rw---  1 pgsql pgsql

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On Apr 4, 2005 4:53 PM, PFC <[EMAIL PROTECTED]> wrote:
> > This is done using COPY syntax, not INSERT syntax. So I suppose "yes"
> > I do.  The file that is being used for COPY is kept on a ramdisk.
> 
> COPY or psql \copy ?
> If you wanna be sure you commit after each COPY, launch a psql in a 
> shell
> and check if the inserted rows are visible (watching SELECT count(*) grow
> will do)

The script is Python, using pyexpect (a'la expect) and does this, exactly:

psql = pexpect.spawn('/usr/local/pgsql/bin/psql -d bench2 ')
[ ...]
start = time.time()
psql.expect_exact('bench2=#')
psql.sendline("COPY events%03i FROM '/mnt/tmpfs/loadfile';" % (tablenum+1))
results.write('%s\n' % (time.time() - start))
results.flush()

There's other code, but it's all related to building the loadfile.
Note that I'm specifically including the time it takes to get the
prompt back in the timing (but it does slip 1 loop, which isn't
relevent).

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread PFC


This is done using COPY syntax, not INSERT syntax. So I suppose "yes"
I do.  The file that is being used for COPY is kept on a ramdisk.
	COPY or psql \copy ?
	If you wanna be sure you commit after each COPY, launch a psql in a shell  
and check if the inserted rows are visible (watching SELECT count(*) grow  
will do)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Simon Riggs

On Mon, 2005-04-04 at 16:18 -0400, Christopher Petrilli wrote:
> On Apr 4, 2005 4:11 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> > > > I'm very interested in the graphs of elapsed time for COPY 500 rows
> > > > against rows inserted. The simplistic inference from those graphs are
> > > > that if you only inserted 5 million rows into each table, rather than 10
> > > > million rows then everything would be much quicker. I hope this doesn't
> > > > work, but could you try that to see if it works? I'd like to rule out a
> > > > function of "number of rows" as an issue, or focus in on it depending
> > > > upon the results.
> > 
> > Any chance of running a multiple load of 4 million rows per table,
> > leaving the test running for at least 3 tables worth (12+ M rows)?
> 
> As soon as I get done running a test without indexes :-)  
> 
> > > > Q: Please can you confirm that the discontinuity on the graph at around
> > > > 5000 elapsed seconds matches EXACTLY with the switch from one table to
> > > > another? That is an important point.
> > >
> > > Well, the change over happens at 51593.395205 seconds :-)  Here's two
> > > lines from the results with row count and time added:
> > >
> > > 1000  51584.9818912   8.41331386566
> > > 1500  51593.3952050.416964054108
> > >
> My program *SPECIFICALLY* counts to 10M then switches the COPY statement.

> > OK. Please...
> > cd $PGDATA/base/26488263
> > ls -l
> 
> [EMAIL PROTECTED] base]# cd 26488263/
> [EMAIL PROTECTED] 26488263]# ls -l
> total 2003740

> -rw---  1 pgsql pgsql 1073741824 Apr  4 15:07 26488271
> -rw---  1 pgsql pgsql  407527424 Apr  4 16:17 26488271.1

Can you do:
select relname from pg_class where relfilenode = 26488271
and confirm that the name is the table you've been loading...

Couldn't see all your indexes... are they still there?

Thanks,

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On Apr 4, 2005 4:58 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> Can you do:
> select relname from pg_class where relfilenode = 26488271
> and confirm that the name is the table you've been loading...

It is.

> Couldn't see all your indexes... are they still there?

Nope, I'm running a second run without the auxilary indices.  I only
have the primary key index.  So far, a quick scan with the eye says
that it's behaving "better", but beginning to have issues again.  I'll
post results as soon as they are done.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Simon Riggs

On Mon, 2005-04-04 at 17:03 -0400, Christopher Petrilli wrote:
> On Apr 4, 2005 4:58 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> > Can you do:
> > select relname from pg_class where relfilenode = 26488271
> > and confirm that the name is the table you've been loading...
> 
> It is.
>  
> > Couldn't see all your indexes... are they still there?
> 
> Nope, I'm running a second run without the auxilary indices.  I only
> have the primary key index.  So far, a quick scan with the eye says
> that it's behaving "better", but beginning to have issues again.  I'll
> post results as soon as they are done.

Hmmm

Before I start to tunnel-vision on a particular coincidence...

How much memory have you got on the system?
How much of that have you allocated to various tasks?
What else is happening on your system?
Tell us more about disk set-up and other hardware related things.
Disk cache...disk speed...seek timesetc

Best Regards, Simon Riggs






---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Mike Rylander

If I'm getting the point of this thread correctly, have a huge amount
of data in one table degrades INSERT/COPY performance even with just a
PKEY index.  If that's about the size of it, read on.  If not, ignore
me because I missed something.

On Apr 4, 2005 10:44 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> Before I start to tunnel-vision on a particular coincidence...
> 

Don't worry too much about tunnel vision.  I see the same thing every
day with multi-million row tables.  The bigger the table gets (with
only a pkey index) the slower the inserts go.  If I start over
(truncate, drop/create table), or if I point the initial load at a new
table, everything gets speedy.  I've always figured it was a function
of table size and learned to live with it...

> How much memory have you got on the system?

On mine, 16G

> How much of that have you allocated to various tasks?

shared buffers: 15000

> What else is happening on your system?

Nothing on mine.

> Tell us more about disk set-up and other hardware related things.

6-disk RAID10 on a Compaq SmartArray 6404 with 256M BB cache, WAL on
2-disk mirror on built in SmartArray5 controller.

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[PERFORM] Bulk COPY end of copy delimiter

2005-04-04 Thread Steven Rosenstein





Today while trying to do a bulk COPY of data into a table, the process
aborted with the following error message:

ERROR: end-of-copy marker corrupt
CONTEXT: COPY tbl_logged_event, line 178519: "606447014,1492,2005-02-24
03:16:14,2005-02-23 20:27:48,win_applog,,error,adsmclientservice,nt
author..."

Googling the error, we found reference to the '\.' (backslash-period) being
an "end-of-copy marker".  Unfortunately, our data contains the
backslash-period character sequence.  Is there any know fix or workaround
for this condition?

We're using Postgresql 7.3.9 and also running tests on an 8.0.1 system.

Thanks in advance,
--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Bulk COPY end of copy delimiter

2005-04-04 Thread Gavin Sherry

Hi,

On Mon, 4 Apr 2005, Steven Rosenstein wrote:

>
>
>
>
> Today while trying to do a bulk COPY of data into a table, the process
> aborted with the following error message:
>
> ERROR: end-of-copy marker corrupt
> CONTEXT: COPY tbl_logged_event, line 178519: "606447014,1492,2005-02-24
> 03:16:14,2005-02-23 20:27:48,win_applog,,error,adsmclientservice,nt
> author..."
>
> Googling the error, we found reference to the '\.' (backslash-period) being
> an "end-of-copy marker".  Unfortunately, our data contains the
> backslash-period character sequence.  Is there any know fix or workaround
> for this condition?

Any sequence \. in COPY input data should be escaped as \\. If this data
was generated by pg_dump then its a problem, but I haven't seen any other
reports of this. Can I assume that you've generated the data for bulk load
yourself? If so, there is discussion of escaping characters here:
http://www.postgresql.org/docs/8.0/static/sql-copy.html.

Gavin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On Apr 4, 2005 6:44 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> Before I start to tunnel-vision on a particular coincidence...
> 
> How much memory have you got on the system?

Now, 2Gb, but most of it is free in this situation.  Earlier, I posted
some of the settings related to work mem.

> How much of that have you allocated to various tasks?

Do you mean inside PostgreSQL?

> What else is happening on your system?

sshd, that's it :-)

> Tell us more about disk set-up and other hardware related things.
> Disk cache...disk speed...seek timesetc

Sure, here's the system configuration:

* AMD64/3000
* 2GB RAM (was 1GB, has made no difference)
* 1 x 120GB SATA drive (w/WAL), 7200RPM Seagate
* 1 x 160GB SATA drive (main), 7200RPM Seagate

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Tom Lane

Christopher Petrilli <[EMAIL PROTECTED]> writes:
> On Apr 4, 2005 12:23 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> do a test run with *no* indexes on the table, just to see if it behaves
>> any differently?  Basically I was wondering if index overhead might be
>> part of the problem.

> http://www.amber.org/~petrilli/diagrams/pgsql_copy500_pkonly.png

> I appologize, I forgot to kill the PK, but as you can see, the curve
> flattened out a lot.  It still begins to increase in what seems like
> the same place.  You can find the results themselves at:

Yeah, this confirms the thought that the indexes are the source of
the issue.  (Which is what I'd expect, because a bare INSERT ought to be
an approximately constant-time operation.  But it's good to verify.)

Now some amount of slowdown is to be expected as the indexes get larger,
since it ought to take roughly O(log N) time to insert a new entry in an
index of size N.  The weird thing about your curves is the very sudden
jump in the insert times.

What I think might be happening is that the "working set" of pages
touched during index inserts is gradually growing, and at some point it
exceeds shared_buffers, and at that point performance goes in the toilet
because we are suddenly doing lots of reads to pull in index pages that
fell out of the shared buffer area.

It would be interesting to watch the output of iostat or vmstat during
this test run.  If I'm correct about this, the I/O load should be
basically all writes during the initial part of the test, and then
suddenly develop a significant and increasing fraction of reads at the
point where the slowdown occurs.

The indicated fix of course is to increase shared_buffers.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On Apr 4, 2005 10:36 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Christopher Petrilli <[EMAIL PROTECTED]> writes:
> > On Apr 4, 2005 12:23 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> >> do a test run with *no* indexes on the table, just to see if it behaves
> >> any differently?  Basically I was wondering if index overhead might be
> >> part of the problem.
> 
> > http://www.amber.org/~petrilli/diagrams/pgsql_copy500_pkonly.png
> 
> > I appologize, I forgot to kill the PK, but as you can see, the curve
> > flattened out a lot.  It still begins to increase in what seems like
> > the same place.  You can find the results themselves at:
> 
> Yeah, this confirms the thought that the indexes are the source of
> the issue.  (Which is what I'd expect, because a bare INSERT ought to be
> an approximately constant-time operation.  But it's good to verify.)

This seemsed to be my original idea, but I wanted to eliminate
everything else as much as possible. I was also concerned that I might
be hitting a bad case in the trees.  I had to change some UID
generation code to better hash, so...

> Now some amount of slowdown is to be expected as the indexes get larger,
> since it ought to take roughly O(log N) time to insert a new entry in an
> index of size N.  The weird thing about your curves is the very sudden
> jump in the insert times.

Right, I expected O(log N) behavior myself, and it seems to behave
that way, if you look at the first section (although there's some
interesting patterns that are visible if you exclude data outside the
90th percentile in the first section, that seems to coincide with some
write activity.

> It would be interesting to watch the output of iostat or vmstat during
> this test run.  If I'm correct about this, the I/O load should be
> basically all writes during the initial part of the test, and then
> suddenly develop a significant and increasing fraction of reads at the
> point where the slowdown occurs.

Well, I can track this on a run, if it would be useful, but I think
you're right as it matches what I saw from looking at iostat at those
points.

> The indicated fix of course is to increase shared_buffers.

Any idea where it should be set?

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Bulk COPY end of copy delimiter

2005-04-04 Thread Steven Rosenstein





Your assumption is correct.  The data was generated out of a DB2 database,
and uses commas as field delimiters.

Thank you for the workaround,
--- Steve
___

Steven Rosenstein
IT Architect/Developer | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt


   
 Gavin Sherry  
 <[EMAIL PROTECTED] 
 u> To 
 Sent by:  Steven Rosenstein/New   
 pgsql-performance York/[EMAIL PROTECTED]   
   
 [EMAIL PROTECTED]  cc 
 .org  pgsql-performance@postgresql.org
   Subject 
   Re: [PERFORM] Bulk COPY end of copy 
 04/04/2005 08:00  delimiter   
 PM
   
   
   
   
   




Hi,

On Mon, 4 Apr 2005, Steven Rosenstein wrote:

>
>
>
>
> Today while trying to do a bulk COPY of data into a table, the process
> aborted with the following error message:
>
> ERROR: end-of-copy marker corrupt
> CONTEXT: COPY tbl_logged_event, line 178519: "606447014,1492,2005-02-24
> 03:16:14,2005-02-23 20:27:48,win_applog,,error,adsmclientservice,nt
> author..."
>
> Googling the error, we found reference to the '\.' (backslash-period)
being
> an "end-of-copy marker".  Unfortunately, our data contains the
> backslash-period character sequence.  Is there any know fix or workaround
> for this condition?

Any sequence \. in COPY input data should be escaped as \\. If this data
was generated by pg_dump then its a problem, but I haven't seen any other
reports of this. Can I assume that you've generated the data for bulk load
yourself? If so, there is discussion of escaping characters here:
http://www.postgresql.org/docs/8.0/static/sql-copy.html.

Gavin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Tom Lane

Christopher Petrilli <[EMAIL PROTECTED]> writes:
> On Apr 4, 2005 10:36 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> The indicated fix of course is to increase shared_buffers.

> Any idea where it should be set?

Not really.  An upper bound would be the total size of the finished
indexes for one 10M-row table, but one would suppose that that's
overkill.  The leaf pages shouldn't have to stay in RAM to have
reasonable behavior --- the killer case is when upper-level tree
pages drop out.  Or that's what I'd expect anyway.

You could probably drop the inter-insert sleep for testing purposes,
if you want to experiment with several shared_buffers values quickly.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> What I think might be happening is that the "working set" of pages
> touched during index inserts is gradually growing, and at some point it
> exceeds shared_buffers, and at that point performance goes in the toilet
> because we are suddenly doing lots of reads to pull in index pages that
> fell out of the shared buffer area.

All this is happening within a single transaction too, right? So there hasn't
been an fsync the entire time. It's entirely up to the kernel when to decide
to start writing data. 

It's possible it's just buffering all the writes in memory until the amount of
free buffers drops below some threshold then it suddenly starts writing out
buffers. 

> It would be interesting to watch the output of iostat or vmstat during
> this test run.  If I'm correct about this, the I/O load should be
> basically all writes during the initial part of the test, and then
> suddenly develop a significant and increasing fraction of reads at the
> point where the slowdown occurs.

I think he's right, if you see a reasonable write volume before the
performance drop followed by a sudden increase in read volume (and decrease of
write volume proportionate to the drop in performance) then it's just shared
buffers becoming a bottleneck.

If there's hardly any write volume before, then a sudden increase in write
volume despite a drop in performance then I might be right. In which case you
might want to look into tools to tune your kernel vm system.

-- 
greg

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On 04 Apr 2005 23:45:47 -0400, Greg Stark <[EMAIL PROTECTED]> wrote:
> 
> Tom Lane <[EMAIL PROTECTED]> writes:
> 
> > What I think might be happening is that the "working set" of pages
> > touched during index inserts is gradually growing, and at some point it
> > exceeds shared_buffers, and at that point performance goes in the toilet
> > because we are suddenly doing lots of reads to pull in index pages that
> > fell out of the shared buffer area.
> 
> All this is happening within a single transaction too, right? So there hasn't
> been an fsync the entire time. It's entirely up to the kernel when to decide
> to start writing data.

This was my concern, and in fact moving from ext3 -> XFS has helped
substantially in this regard. This is all happening inside COPY
statements, so there's effectively a commit every 500 rows. I could
enlarge this, but I didn't notice a huge increase in performance when
doing tests on smaller bits.

Also, you are correct, I am running without fsync, although I could
change that if you thought it would "smooth" the performance.  The
issue is less absolute performance than something more deterministic. 
Going from 0.05 seconds for a 500 row COPY to 26 seconds really messes
with the system.

One thing that was mentioned early on, and I hope people remember, is
that I am running autovacuum in the background, but the timing of it
seems to have little to do with the system's problems, at least the
debug output doesn't conincide with performance loss.

> It's possible it's just buffering all the writes in memory until the amount of
> free buffers drops below some threshold then it suddenly starts writing out
> buffers.

That was happening with ext3, actually, or at least to the best of my knowledge.

> > It would be interesting to watch the output of iostat or vmstat during
> > this test run.  If I'm correct about this, the I/O load should be
> > basically all writes during the initial part of the test, and then
> > suddenly develop a significant and increasing fraction of reads at the
> > point where the slowdown occurs.
> 
> I think he's right, if you see a reasonable write volume before the
> performance drop followed by a sudden increase in read volume (and decrease of
> write volume proportionate to the drop in performance) then it's just shared
> buffers becoming a bottleneck.

I've set shared_buffers to 16000 (from the original 1000) and am
running now, without the pauses. We'll see what it looks like, but so
far it seems to be running faster. How much and how it degrades will
be an interesting view.

> If there's hardly any write volume before, then a sudden increase in write
> volume despite a drop in performance then I might be right. In which case you
> might want to look into tools to tune your kernel vm system.

Here's a quick snapshot of iostat:

Linux 2.6.9-1.667 (bigbird.amber.org)   04/04/2005

avg-cpu:  %user   %nice%sys %iowait   %idle
   1.050.010.63   13.15   85.17

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
hda   0.00 0.00 0.00   3616  0
sda  23.1568.09   748.89  246884021 2715312654
sdb  19.0837.65   773.03  136515457 2802814036

The first 3 columns have been identical (or nearly so) the whole time,
which tells me the system is pegged in its performance on IO.  This is
not surprising.

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Tom Lane

Greg Stark <[EMAIL PROTECTED]> writes:
> All this is happening within a single transaction too, right? So there hasn't
> been an fsync the entire time. It's entirely up to the kernel when to decide
> to start writing data. 

No ... there's a commit every 500 records.  However, I think Chris said
he was running with fsync off; so you're right that the kernel is at
liberty to write stuff to disk when it feels like.  It could be that
those outlier points are transactions that occurred in the middle of
periodic syncer-driven mass writes.  Maybe fsync off is
counterproductive for this situation?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[PERFORM] Compressing WAL

2005-04-04 Thread Jim C. Nasby

Maybe better for -hackers, but here it goes anyway...

Has anyone looked at compressing WAL's before writing to disk? On a
system generating a lot of WAL it seems there might be some gains to be
had WAL data could be compressed before going to disk, since today's
machines are generally more I/O bound than CPU bound. And unlike the
base tables, you generally don't need to read the WAL, so you don't
really need to worry about not being able to quickly scan through the
data without decompressing it.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Christopher Petrilli

On Apr 4, 2005 11:57 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Greg Stark <[EMAIL PROTECTED]> writes:
> > All this is happening within a single transaction too, right? So there 
> > hasn't
> > been an fsync the entire time. It's entirely up to the kernel when to decide
> > to start writing data.
> 
> No ... there's a commit every 500 records.  However, I think Chris said
> he was running with fsync off; so you're right that the kernel is at
> liberty to write stuff to disk when it feels like.  It could be that
> those outlier points are transactions that occurred in the middle of
> periodic syncer-driven mass writes.  Maybe fsync off is
> counterproductive for this situation?

Looking at preliminary results from running with shared_buffers at
16000, it seems this may be correct.  Performance was flatter for a
BIT longer, but slammed right into the wall and started hitting the
3-30 second range per COPY.  I've restarted the run, with fsync turned
on (fdatasync), and we'll see.

My fear is that it's some bizarre situation interacting with both
issues, and one that might not be solvable.  Does anyone else have
much experience with this sort of sustained COPY?

Chris
-- 
| Christopher Petrilli
| [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Jim C. Nasby

On Tue, Apr 05, 2005 at 12:16:27AM -0400, Christopher Petrilli wrote:
> My fear is that it's some bizarre situation interacting with both
> issues, and one that might not be solvable.  Does anyone else have
> much experience with this sort of sustained COPY?

You might ask the guy who just posted to -admin about a database that's
doing 340M inserts a day in 300M transactions...
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])