Re: [PERFORM] optimization ideas for frequent, large(ish) updates in frequently accessed DB?

2004-02-13 Thread Manfred Koizar
On Fri, 13 Feb 2004 01:58:34 +0100, "Marinos J. Yannikos"
<[EMAIL PROTECTED]> wrote:
>I'm looking for ideas that might improve the interactive performance of 
>the system, without slowing down the updates too much.

IOW, you could accept slower updates.  Did you actually try and throttle
down the insert rate?

> Here are the 
>characteristics of the table and its use:
>
>- approx. 2 million rows

Doesn't sound worrying.  What's the min/max/average size of these rows?
How large is this table?
SELECT relpages FROM pg_class WHERE relname='...';

What else is in this database, how many tables, how large is the
database (du $PGDATA)?

>- approx. 4-5 million rows per day are replaced in short bursts of 
>1-200k rows (average ~3000 rows per update)

How often do you VACUUM [ANALYSE]?

>- the table needs 6 indexes (not all indexes are used all the time, but 
>keeping them all the time slows the system down less than re-creating 
>some of them just before they're needed and dropping them afterwards)

I agree.

>- an "update" means that 1-200k rows with a common value in a particular 
>field are replaced with an arbitrary number of new rows (with the same 
>value in that field), i.e.:
>
>begin transaction;
>   delete from t where id=5;
>   insert into t (id,...) values (5,...);
>   ... [1-200k rows]
>end;

This is a wide variation in the number of rows.  You told us the average
batch size is 3000.  Is this also a *typical* batch size?  And what is
the number of rows where you start to get the feeling that it slows down
other sessions?

Where do the new values come from?  I don't think they are typed in :-)
Do they come from external sources or from the same database?  If the
latter, INSERT INTO ... SELECT ... might help.

>The problem is, that a large update of this kind can delay SELECT 
>queries running in parallel for several seconds, so the web interface 
>used by several people will be unusable for a short while.

Silly question:  By SELECT you mean pure SELECT transactions and not
some transaction that *mostly* reads from the database?  I mean, you are
sure your SELECT queries are slowed down and not blocked by the
"updates".

Show us the EXPLAIN ANALYSE output for the same SELECT, once when it is
fast and once when it is slow.  BTW, what is fast and what is slow?

>Currently, I'm using temporary tables:
> [...]
>This is slightly faster than inserting directly into t (and probably 
>faster than using COPY, even though using that might reduce the overall 
>load on the database).

You might try using a prepared INSERT statement or COPY.

>shared_buffers=10
>(I tried many values, this seems to work well for us - 12GB RAM)
>wal_buffers=500
>sort_mem=80
>checkpoint_segments=16
>effective_cache_size=100

See Josh's comments.

>Any help/suggestions would be greatly appreciated... Even if it's 
>something like "you need a faster db box, there's no other way" ;-)

We have to find out, what is the bottleneck.  Tell us about your
environment (hardware, OS, ...).  Run top and/or vmstat and look for
significant differences between times of normal processing and slow
phases.  Post top/vmstat output here if you need help.

Servus
 Manfred

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Disappointing performance in db migrated from MS SQL Server

2004-02-13 Thread Andrew Sullivan
On Thu, Feb 12, 2004 at 05:19:27PM -0500, Chris Ruprecht wrote:

> what he's doing, this might be a bottleneck. I don't like the virtual memory 
> strategy of Linux too much and would rather increase this to 1 - 2 GB for the 
> Postgres DB - Specially since he's not running anything else on the machine 
> and he has 4 GB to play with.

Have you ever had luck with 2G of shared memory?

When I have tried that, the system is very fast initially, and
gradually slows to molasses-like speed.  My hypothesis is that the
cache-lookup logic isn't that smart, and so is inefficient either
when using the cache or when doing cache maintenance.

A

-- 
Andrew Sullivan 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] optimization ideas for frequent, large(ish) updates in frequently accessed DB?

2004-02-13 Thread Manfred Koizar
Marinos, while you are busy answering my first set of questions :-),
here is an idea that might help even out resource consumption.

On Fri, 13 Feb 2004 01:58:34 +0100, "Marinos J. Yannikos"
<[EMAIL PROTECTED]> wrote:
>begin transaction;
>   delete from t where id=5;
>   insert into t (id,...) values (5,...);
>   ... [1-200k rows]
>end;
>
>The problem is, that a large update of this kind can delay SELECT 
>queries running in parallel for several seconds, so the web interface 
>used by several people will be unusable for a short while.

CREATE TABLE idmap (
internalid int NOT NULL PRIMARY KEY,
visibleid int NOT NULL,
active bool NOT NULL
);
CREATE INDEX ipmap_visible ON idmap(visibleid);

Populate this table with
INSERT INTO idmap
SELECT id, id, true
  FROM t;

Change
SELECT ...
  FROM t
 WHERE t.id = 5;

to
SELECT ...
  FROM t INNER JOIN idmap ON (idmap.internalid = t.id AND
  idmap.active)
 WHERE idmap.visibleid = 5;

When you have to replace the rows in t for id=5, start by

INSERT INTO idmap VALUES (12345, 5, false);

Then repeatedly
INSERT INTO t (id, ...) VALUES (12345, ...);
at a rate as slow as you can accept.  You don't have to wrap all INSERTs
into a single transaction, but batching together a few hundred to a few
thousand INSERTs will improve performance.

When all the new values are in the database, you switch to the new id in
one short transaction:
BEGIN;
UPDATE idmap SET active = false WHERE visibleid = 5 AND active;
UPDATE idmap SET active = true WHERE internalid = 12345;
COMMIT;

Do the cleanup in off-peak hours (pseudocode):

FOR delid IN (SELECT internalid FROM idmap WHERE NOT active)
BEGIN
DELETE FROM t WHERE id = delid;
DELETE FROM idmap WHERE internalid = delid;
END;
VACUUM ANALYSE t;
VACUUM ANALYSE idmap;

To prevent this cleanup from interfering with INSERTs in progress, you
might want to add a "beinginserted" flag to idmap.

HTH.
Servus
 Manfred

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Disappointing performance in db migrated from MS SQL Server

2004-02-13 Thread Leon Out
All, thanks for your suggestions. I've tweaked my configuration, and I 
think I've squeezed a little more performance out of the setup. I also 
tried running several tests simultaneously against postgres and SQL 
Server, and postgres did much better with the heavy load.

My new settings are:
max_connections = 50
shared_buffers = 12 # min 16, at least max_connections*2, 
8KB each - default is 1000
sort_mem = 8000 # min 64, size in KB - default is 1024 
(commented out)
effective_cache_size = 375000   # typically 8KB each - default is 1000 
(commented out)
random_page_cost = 1# units are one sequential page fetch 
cost - default is 4 (commented out)
geqo = true

Josh, the disks in the new system should be substantially faster than 
the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has 
15k RPM disks, as opposed to the 10k RPM disks in the old system.

On Feb 12, 2004, at 3:26 PM, Josh Berkus wrote:

Leon,

Hello all. I am in the midst of porting a large web application from a
MS SQL Server backend to PostgreSQL. The migration work is basically
complete, and we're at the testing and optimization phase of the
project. The results so far have been disappointing, with Postgres
performing queries in about the same time as SQL Server even though
Postgres is running on a dedicated box with about 4 times the clock
speed of the SQL Server box. For a chart of my results, please see
http://leonout.com/pggraph.pdf for a graph of some test results.
Your settings look ok to start, but we'll probably want to tune them 
further.
Can you post some details of the tests?   Include:

1) the query
2) the EXPLAIN ANALYZE results of the query
3) Whether you ran the test as the only connection, or whether you 
tested
multi-user load.

The last is fairly important for a SQL Server vs. PostgreSQL test; SQL 
Server
is basically a single-user-database, so like MySQL it appears very 
fast until
you get a bunch o' users on it.

Finally, for most queries the disk I/O and the RAM are more important 
than the
CPU clock speed.   From the looks of it, you upgraded the CPU + RAM, 
but did
downgraded the disk array as far as database writes are concered; not a
terrible effective way to gain performance on your hardware.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] Lengthy pg_restore and high iowait?

2004-02-13 Thread Leon Out
Courier NewHello again. I'm setting up a backup routine for my new db server. As part of my testing, I'm attempting to pg_restore a pg_dump'ed backup of my database. The database is about 4.3 GB, and the dump file is about 100 MB.

I first did a schema-only restore, then started a data-only restore with --disable-triggers to get around the referential integrity issues of reloading the data. The data-only restore has been running for a couple of hours now, and I'm seeing high iowait numbers in top. 

15:57:58  up 23:55,  2 users,  load average: 2.04, 2.07, 2.01
60 processes: 57 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
total4.0%0.0%0.7%   0.0% 0.0%   43.5%   51.6%
cpu000.0%0.0%0.3%   0.0% 0.0%   84.8%   14.7%
cpu01   15.7%0.0%1.7%   0.0% 0.0%2.7%   79.6%
cpu020.1%0.0%0.7%   0.0% 0.0%   84.2%   14.7%
cpu030.2%0.0%0.0%   0.0% 0.0%2.4%   97.4%
Mem:  3869544k av, 3849280k used,   20264k free,   0k shrd,  110544k buff
1297452k actv, 2298928k in_d,   57732k in_c
Swap: 2040244k av,   0k used, 2040244k free 3576684k cached

PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
8366 postgres  16   0  954M 954M  952M R 4.3 25.2  21:58   1 postmaster
9282 postgres  15   0 00 0 RW0.2  0.0   0:00   2 postmaster
1 root  15   0   496  496   444 S 0.0  0.0   0:05   3 init

Questions:
* Do these iowait numbers indicate a problem with my setup?
* Does anyone have a good method for disabling indexes before a restore and restoring them afterwards? I've spent some time writing scripts to do this, but I have yet to come up with drop/recreate solution that returns my database to the same pre-drop state.

Thanks in advance!

Leon

Re: [PERFORM] Disappointing performance in db migrated from MS SQL Server

2004-02-13 Thread Vivek Khera
> "LO" == Leon Out <[EMAIL PROTECTED]> writes:

LO> Josh, the disks in the new system should be substantially faster than
LO> the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has
LO> 15k RPM disks, as opposed to the 10k RPM disks in the old system.

If you've got the time, try making your 5 disk array into a RAID10
plus one spare.  I found that with that few disks, RAID10 was a better
performer for an even mix of read/write to the DB.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Disappointing performance in db migrated from MS SQL

2004-02-13 Thread matt
> Josh, the disks in the new system should be substantially faster than
> the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has
> 15k RPM disks, as opposed to the 10k RPM disks in the old system.

Spindle speed does not correlate with 'throughput' in any easy way.  What
controllers are you using for these disks?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Disappointing performance in db migrated from MS SQL Server

2004-02-13 Thread Vivek Khera
> "LO" == Leon Out <[EMAIL PROTECTED]> writes:

LO> project. The results so far have been disappointing, with Postgres
LO> performing queries in about the same time as SQL Server even though
LO> Postgres is running on a dedicated box with about 4 times the clock
LO> speed of the SQL Server box. For a chart of my results, please see
LO> http://leonout.com/pggraph.pdf for a graph of some test results.

Are you using transactions liberally?   If you have large groups of
inserts/updates, putting them inside transactions buys you a lot of
improvement by batching the writes to the WAL.

Also, increase your checkpoint_segments if you do a lot of writes.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] slow GIST index creation

2004-02-13 Thread Rajesh Kumar Mallah
Greetings!

Why does creation of gist indexes takes significantly more time
than normal btree index. Can any configuration changes lead to faster index
creation?
query:
CREATE INDEX co_name_index_idx ON profiles USING gist (co_name_index 
public.gist_txtidx_ops);

regds
mallah.
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])