[PERFORM] Update table performance

2007-08-07 Thread Mark Makarowsky
I have a table with 4,889,820 records in it.  The
table also has 47 fields.  I'm having problems with
update performance.  Just as a test, I issued the
following update:

update valley set test='this is a test'

This took 905641 ms.  Isn't that kind of slow?  There
aren't any indexes, triggers, constraints or anything
on this table.  The version of Postgres is "PostgreSQL
8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe
(GCC) 3.4.2 (mingw-special)".  The operating
environment is Windows 2003 Standard Edition w/service
pack 2.  It is 2.20 Ghz with 1.0 GB of RAM.  Here is
the results from Explain:

"Seq Scan on valley  (cost=0.00..1034083.57
rows=4897257 width=601)"

Here are the settings in the postgresql.conf.  Any
ideas or is this the expected speed?

#---
# CONNECTIONS AND AUTHENTICATION
#---

# - Connection Settings -

listen_addresses = '*'  # what IP address(es) to
listen on; 
# comma-separated list of addresses;
# defaults to 'localhost', '*' = all
# (change requires restart)
port = 5432 # (change requires restart)
max_connections = 20# (change requires restart)
# Note: increasing max_connections costs ~400 bytes of
shared memory per 
# connection slot, plus lock space (see
max_locks_per_transaction).  You
# might also need to raise shared_buffers to support
more connections.
#superuser_reserved_connections = 3 # (change requires
restart)
#unix_socket_directory = '' # (change requires
restart)
#unix_socket_group = '' # (change requires restart)
#unix_socket_permissions = 0777 # octal
# (change requires restart)
#bonjour_name = ''  # defaults to the computer name
# (change requires restart)

# - Security & Authentication -

#authentication_timeout = 1min  # 1s-600s
#ssl = off  # (change requires restart)
#password_encryption = on
#db_user_namespace = off

# Kerberos
#krb_server_keyfile = ''# (change requires restart)
#krb_srvname = 'postgres'   # (change requires restart)
#krb_server_hostname = ''   # empty string matches any
keytab entry
# (change requires restart)
#krb_caseins_users = off# (change requires restart)

# - TCP Keepalives -
# see 'man 7 tcp' for details

#tcp_keepalives_idle = 0# TCP_KEEPIDLE, in seconds;
# 0 selects the system default
#tcp_keepalives_interval = 0# TCP_KEEPINTVL, in
seconds;
# 0 selects the system default
#tcp_keepalives_count = 0   # TCP_KEEPCNT;
# 0 selects the system default


#---
# RESOURCE USAGE (except WAL)
#---

# - Memory -

shared_buffers = 512MB  # min 128kB or
max_connections*16kB
# (change requires restart)
temp_buffers = 8MB  # min 800kB
max_prepared_transactions = 5   # can be 0 or more
# (change requires restart)
# Note: increasing max_prepared_transactions costs
~600 bytes of shared memory
# per transaction slot, plus lock space (see
max_locks_per_transaction).
work_mem = 8MB  # min 64kB
maintenance_work_mem = 16MB # min 1MB
#max_stack_depth = 4MB  # min 100kB

# - Free Space Map -

max_fsm_pages = 70  # min max_fsm_relations*16, 6
bytes each
# (change requires restart)
max_fsm_relations = 1000# min 100, ~70 bytes each
# (change requires restart)

# - Kernel Resource Usage -

max_files_per_process = 1000# min 25
# (change requires restart)
#shared_preload_libraries = ''  # (change requires
restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1   # 0-1 credits
#vacuum_cost_page_miss = 10 # 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 0-1 credits

# - Background writer -

#bgwriter_delay = 200ms # 10-1ms between rounds
#bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers
scanned/round
#bgwriter_lru_maxpages = 5  # 0-1000 b

Re: [PERFORM] Update table performance

2007-08-07 Thread Richard Huxton

Mark Makarowsky wrote:

I have a table with 4,889,820 records in it.  The
table also has 47 fields.  I'm having problems with
update performance.  Just as a test, I issued the
following update:

update valley set test='this is a test'

This took 905641 ms.  Isn't that kind of slow?


The limiting factor here will be how fast you can write to your disk. 
Let's see: 5 million rows in ~900 seconds, that's about 5500 
rows/second. Now, you don't say  how large your rows are, but assuming 
each row is say 1kB that'd be 5.5MB/sec - or not brilliant. Simplest way 
to find out total activity is check how much disk space PG is using 
before and after the update.


What you'll need to do is monitor disk activity, in particular how many 
writes and how much time the processor spends waiting on writes to complete.


If your standard usage pattern is to update a single field in all rows 
of a large table, then performance isn't going to be sub-second I'm afraid.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Performance problems with large telemetric datasets on 7.4.2

2007-08-07 Thread Scott Marlowe
On 8/6/07, Sven Clement <[EMAIL PROTECTED]> wrote:
> Ok thanks everybody for the calrification, after all now I allready learned
> something new... ;)
>
> My employer is currently thinking about migration to 8.2.x because of your
> feedback, so I think that the problem could be resolved... ;)

Note that whether the 8.2 migration is forthcoming, you should
definitely update to the latest patch version of 7.4.  The update is
pretty much painless, although I remember there being a change around
7.4.13 that might have changed db behaviour for security reasons.
Release notes here:

http://www.postgresql.org/docs/8.2/static/release-7-4-13.html

Note that there are settings to work around the change in behaviour
should it create a problem.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Update table performance

2007-08-07 Thread Scott Marlowe
On 8/7/07, Mark Makarowsky <[EMAIL PROTECTED]> wrote:
> I have a table with 4,889,820 records in it.  The
> table also has 47 fields.  I'm having problems with
> update performance.  Just as a test, I issued the
> following update:
>
> update valley set test='this is a test'
>
> This took 905641 ms.  Isn't that kind of slow?  There
> aren't any indexes, triggers, constraints or anything
> on this table.  The version of Postgres is "PostgreSQL
> 8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe
> (GCC) 3.4.2 (mingw-special)".  The operating
> environment is Windows 2003 Standard Edition w/service
> pack 2.  It is 2.20 Ghz with 1.0 GB of RAM.  Here is
> the results from Explain:
>
> "Seq Scan on valley  (cost=0.00..1034083.57
> rows=4897257 width=601)"

Have you done this a few times?  You could easily have a very large
and bloated table if you do this several times in a row.  That would
explain the slow performance.  If you're going to do a lot of updates
without where clauses on large tables, you'll need to run a vacuum
right afterwards to clean things up.

I see that you included a lot about your machine, but you didn't
include any specs on your disk subsystem.  When it comes to update
speed, the disk subsystem is probably the most important part.

Note also that Windows is still not the preferred platform for
postgresql from a performance perspective (actually, the only database
where that's true is MS-SQL really).

Have you run any benchmarks on your disk subsystem to see how fast it is?

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Update table performance

2007-08-07 Thread Alan Hodgson
On Tuesday 07 August 2007 05:58, Mark Makarowsky 
<[EMAIL PROTECTED]> wrote:
> I have a table with 4,889,820 records in it.  The
> table also has 47 fields.  I'm having problems with
> update performance.  Just as a test, I issued the
> following update:
>
> update valley set test='this is a test'
>
> This took 905641 ms.  Isn't that kind of slow?  

PostgreSQL has to write a full new version of every row that gets updated. 
Updates are, therefore, relatively slow. 

I'm guessing you're doing this on a single SATA drive, too, which probably 
doesn't help.

-- 
"If a nation values anything more than freedom, it will lose its freedom;
and the irony of it is that if it is comfort or money that it values more,
it will lose that too." -- Somerset Maugham, Author


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Update table performance

2007-08-07 Thread Decibel!
On Tue, Aug 07, 2007 at 02:33:19PM +0100, Richard Huxton wrote:
> Mark Makarowsky wrote:
> >I have a table with 4,889,820 records in it.  The
> >table also has 47 fields.  I'm having problems with
> >update performance.  Just as a test, I issued the
> >following update:
> >
> >update valley set test='this is a test'
> >
> >This took 905641 ms.  Isn't that kind of slow?
> 
> The limiting factor here will be how fast you can write to your disk. 

Well, very possibly how fast you can read, too. Using your assumption of
1k per row, 5M rows means 5G of data, which might well not fit in
memory. And if the entire table's been updated just once before, even
with vacuuming you're now at 10G of data.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpC7cdBrUEbs.pgp
Description: PGP signature


Re: [PERFORM] Update table performance

2007-08-07 Thread Chris Browne
[EMAIL PROTECTED] (Mark Makarowsky) writes:
> I have a table with 4,889,820 records in it.  The
> table also has 47 fields.  I'm having problems with
> update performance.  Just as a test, I issued the
> following update:
>
> update valley set test='this is a test'
>
> This took 905641 ms.  Isn't that kind of slow?  There
> aren't any indexes, triggers, constraints or anything
> on this table.  The version of Postgres is "PostgreSQL
> 8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe
> (GCC) 3.4.2 (mingw-special)".  The operating
> environment is Windows 2003 Standard Edition w/service
> pack 2.  It is 2.20 Ghz with 1.0 GB of RAM.  Here is
> the results from Explain:
>
> "Seq Scan on valley  (cost=0.00..1034083.57
> rows=4897257 width=601)"
>
> Here are the settings in the postgresql.conf.  Any
> ideas or is this the expected speed?

Hmm.  

- You asked to update 4,889,820 records.

- It's a table consisting of 8.5GB of data (based on the cost info)

For this to take 15 minutes doesn't seem particularly outrageous.
-- 
output = ("cbbrowne" "@" "acm.org")
http://cbbrowne.com/info/oses.html
Rules of the Evil Overlord #65.  "If I must have computer systems with
publically available  terminals, the maps  they display of  my complex
will have  a room clearly marked  as the Main Control  Room. That room
will be  the Execution Chamber. The  actual main control  room will be
marked as Sewage Overflow Containment." 

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Update table performance

2007-08-07 Thread Scott Marlowe
On 8/7/07, Decibel! <[EMAIL PROTECTED]> wrote:
> On Tue, Aug 07, 2007 at 02:33:19PM +0100, Richard Huxton wrote:
> > Mark Makarowsky wrote:
> > >I have a table with 4,889,820 records in it.  The
> > >table also has 47 fields.  I'm having problems with
> > >update performance.  Just as a test, I issued the
> > >following update:
> > >
> > >update valley set test='this is a test'
> > >
> > >This took 905641 ms.  Isn't that kind of slow?
> >
> > The limiting factor here will be how fast you can write to your disk.
>
> Well, very possibly how fast you can read, too. Using your assumption of
> 1k per row, 5M rows means 5G of data, which might well not fit in
> memory. And if the entire table's been updated just once before, even
> with vacuuming you're now at 10G of data.

Where one might have to update just one column of a wide table often,
it's often a good idea to move that column into its own dependent
table.

Or just don't update one column of every row in  table...

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Update table performance

2007-08-07 Thread Decibel!
On Tue, Aug 07, 2007 at 02:36:18PM -0500, Scott Marlowe wrote:
> On 8/7/07, Decibel! <[EMAIL PROTECTED]> wrote:
> > On Tue, Aug 07, 2007 at 02:33:19PM +0100, Richard Huxton wrote:
> > > Mark Makarowsky wrote:
> > > >I have a table with 4,889,820 records in it.  The
> > > >table also has 47 fields.  I'm having problems with
> > > >update performance.  Just as a test, I issued the
> > > >following update:
> > > >
> > > >update valley set test='this is a test'
> > > >
> > > >This took 905641 ms.  Isn't that kind of slow?
> > >
> > > The limiting factor here will be how fast you can write to your disk.
> >
> > Well, very possibly how fast you can read, too. Using your assumption of
> > 1k per row, 5M rows means 5G of data, which might well not fit in
> > memory. And if the entire table's been updated just once before, even
> > with vacuuming you're now at 10G of data.
> 
> Where one might have to update just one column of a wide table often,
> it's often a good idea to move that column into its own dependent
> table.

Yeah, I've used "vertical partitioning" very successfully in the past,
though I've never done it for just a single field. I'll typically leave
the few most common fields in the "main" table and pull everything else
into a second table.

> Or just don't update one column of every row in  table...

Yeah, that too. :) Though sometimes you can't avoid it.

I should mention that if you can handle splitting the update into
multiple transactions, that will help a lot since it means you won't be
doubling the size of the table.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp9IchIPHEK0.pgp
Description: PGP signature


Re: [PERFORM] Update table performance

2007-08-07 Thread Mark Makarowsky
Can you provide more detail on what you mean by your
two suggestions below:

Yeah, I've used "vertical partitioning" very
successfully in the past, though I've never done it
for just a single field. I'll typically leave the few
most common fields in the "main" table and pull
everything else into a second table.

I should mention that if you can handle splitting the
update into multiple transactions, that will help a
lot since it means you won't be doubling the size of
the table.

I guess I was just surprised by the speed it takes to
update the field in Postgres since on an almost
identical table in FoxPro (400,000 records less), it
updates the table with the same exact update table
statement in about 4 minutes.
--- Decibel! <[EMAIL PROTECTED]> wrote:

> On Tue, Aug 07, 2007 at 02:36:18PM -0500, Scott
> Marlowe wrote:
> > On 8/7/07, Decibel! <[EMAIL PROTECTED]> wrote:
> > > On Tue, Aug 07, 2007 at 02:33:19PM +0100,
> Richard Huxton wrote:
> > > > Mark Makarowsky wrote:
> > > > >I have a table with 4,889,820 records in it. 
> The
> > > > >table also has 47 fields.  I'm having
> problems with
> > > > >update performance.  Just as a test, I issued
> the
> > > > >following update:
> > > > >
> > > > >update valley set test='this is a test'
> > > > >
> > > > >This took 905641 ms.  Isn't that kind of
> slow?
> > > >
> > > > The limiting factor here will be how fast you
> can write to your disk.
> > >
> > > Well, very possibly how fast you can read, too.
> Using your assumption of
> > > 1k per row, 5M rows means 5G of data, which
> might well not fit in
> > > memory. And if the entire table's been updated
> just once before, even
> > > with vacuuming you're now at 10G of data.
> > 
> > Where one might have to update just one column of
> a wide table often,
> > it's often a good idea to move that column into
> its own dependent
> > table.
> 
> Yeah, I've used "vertical partitioning" very
> successfully in the past,
> though I've never done it for just a single field.
> I'll typically leave
> the few most common fields in the "main" table and
> pull everything else
> into a second table.
> 
> > Or just don't update one column of every row in 
> table...
> 
> Yeah, that too. :) Though sometimes you can't avoid
> it.
> 
> I should mention that if you can handle splitting
> the update into
> multiple transactions, that will help a lot since it
> means you won't be
> doubling the size of the table.
> -- 
> Decibel!, aka Jim Nasby   
> [EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com 
> 512.569.9461 (cell)
> 



  

Park yourself in front of a world of choices in alternative vehicles. Visit the 
Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/ 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Update table performance

2007-08-07 Thread Erik Jones

On Aug 7, 2007, at 6:13 PM, Mark Makarowsky wrote:


Can you provide more detail on what you mean by your
two suggestions below:

Yeah, I've used "vertical partitioning" very
successfully in the past, though I've never done it
for just a single field. I'll typically leave the few
most common fields in the "main" table and pull
everything else into a second table.


Vertical partitioning is where you split up your table on disk by  
columns, i.e on the vertical lines.  He quoted it because Postgres  
doesn't actually support it transparently but you can always fake it  
by splitting up your table.  For example, given the following table  
wherein column bar gets updated a lot but the others don't:


create table foo (
id  int not null,
bar int,
baz int,

primary key (id)
);

You could split it up like so:

create table foo_a (
id  int,
baz int,

primary key (id)
);

create table foo_b (
foo_id  int,
bar int,

foreign key foo_a_id (foo_id) references foo_a (id)
);

The reason you'd ever want to do this is that when Postgres goes to  
update a row what it actually does is inserts a new row with the new  
value(s) that you changed and marks the old one as deleted.  So, if  
you have a wide table and frequently update only certain columns,  
you'll take a performance hit as you're having to re-write a lot of  
static values.




I should mention that if you can handle splitting the
update into multiple transactions, that will help a
lot since it means you won't be doubling the size of
the table.


As I mentioned above, when you do an update you're actually inserting  
a new row and deleting the old one.  That deleted row is still  
considered part of the table (for reasons of concurrency, read up on  
the concurrency chapter in the manual for the details) and once it is  
no longer visible by any live transactions can be re-used by future  
inserts.  So, if you update one column on every row of a one million  
row table all at once, you have to allocate and write out one million  
new rows.  But, if you do the update a quarter million at a time, the  
last three updates would be able to re-use many of the rows deleted  
in earlier updates.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly