[PERFORM] Altering a column type - Most efficient way

2008-07-10 Thread Ow Mun Heng
Is there any quick hacks to do this quickly? There's around
20-30million
rows of data. 

I want to change a column type from varchar(4) to varchar()

table size is ~10-15GB (and another 10-15G for indexes)

What would be the preferrred way of doing it? SHould I be dropping the
indexes 1st to make things faster? Would it matter?

The punch line is that since the databases are connected via slony, this
makes it even harder to pull it off. My last try got the DDL change
completed in like 3 hours (smallest table of the bunch) and it hung
everything.






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Altering a column type - Most efficient way

2008-07-10 Thread Mario Weilguni

Ow Mun Heng schrieb:

Is there any quick hacks to do this quickly? There's around
20-30million
rows of data. 


I want to change a column type from varchar(4) to varchar()

table size is ~10-15GB (and another 10-15G for indexes)

What would be the preferrred way of doing it? SHould I be dropping the
indexes 1st to make things faster? Would it matter?

The punch line is that since the databases are connected via slony, this
makes it even harder to pull it off. My last try got the DDL change
completed in like 3 hours (smallest table of the bunch) and it hung
everything
  
Before Postgresql supported "alter table ... type ... " conversions, I 
did it a few times when I detected later that my varchar() fields were 
too short, and it worked perfectly.


Example:
{OLDLEN} = 4
{NEWLEN} = 60

update pg_attribute
  set atttypmod={NEWLEN}+4
where attname='the-name-of-the-column'
  and attrelid=(select oid from pg_class where 
relname='the-name-of-the-table')

  and atttypmod={OLDLEN}+4;


This worked very well when you want to increase the maximum length, 
don't try to reduce the maximum length this way!


Disclaimer: I do not know if slony might be have a problem with this.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Altering a column type - Most efficient way

2008-07-10 Thread Ow Mun Heng
On Thu, 2008-07-10 at 10:36 +0200, Mario Weilguni wrote:
> Ow Mun Heng schrieb:
> >
> > I want to change a column type from varchar(4) to varchar()
> >
> >   
> Example:
> {OLDLEN} = 4
> {NEWLEN} = 60
> 
> update pg_attribute
>set atttypmod={NEWLEN}+4
>  where attname='the-name-of-the-column'
>and attrelid=(select oid from pg_class where 
> relname='the-name-of-the-table')
>and atttypmod={OLDLEN}+4;
> 

This is what I see on the table

NEW attypmod = -1
OLD attypmod =  8

I'm not very confident in doint this change since this is not a
development server. If would help though if someone else has any other
suggestion or something.. Thanks for the information though, it's
enlightening knowledge.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] how big shmmax is good for Postgres...

2008-07-10 Thread Jessica Richard
On a Linux system,  if the total memory is 4G and the shmmax is set to 4G, I 
know it is bad, but how bad can it be? Just trying to understand the impact the 
"shmmax" parameter can have  on Postgres  and the entire system after Postgres 
comes up on this number.

What is the reasonable setting for shmmax on a 4G total machine?

Thanks a lot,
Jessica



  

Re: [PERFORM] how big shmmax is good for Postgres...

2008-07-10 Thread Bill Moran
In response to Jessica Richard <[EMAIL PROTECTED]>:

> On a Linux system,  if the total memory is 4G and the shmmax is set to 4G, I 
> know it is bad, but how bad can it be? Just trying to understand the impact 
> the "shmmax" parameter can have  on Postgres  and the entire system after 
> Postgres comes up on this number.

It's not bad by definition.  shmmax is a cap on the max that can be used.
Just because you set it to 4G doesn't mean any application is going to
use all of that.  With PostgreSQL, the maximum amount of shared memory it
will allocate is governed by the shared_buffers setting in the
postgresql.conf.

It _is_ a good idea to set shmmax to a reasonable size to prevent
a misbehaving application from eating up all the memory on a system,
but I've yet to see PostgreSQL misbehave in this manner.  Perhaps I'm
too trusting.

> What is the reasonable setting for shmmax on a 4G total machine?

If you mean what's a reasonable setting for shared_buffers, conventional
wisdom says to start with 25% of the available RAM and increase it or
decrease it if you discover your workload benefits from more or less.
By "available RAM" is meant the free RAM after all other applications
are running, which will be 4G if this machine only runs PostgreSQL, but
could be less if it runs other things like a web server.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how big shmmax is good for Postgres...

2008-07-10 Thread Scott Marlowe
On Thu, Jul 10, 2008 at 4:53 AM, Jessica Richard <[EMAIL PROTECTED]> wrote:
> On a Linux system,  if the total memory is 4G and the shmmax is set to 4G, I
> know it is bad, but how bad can it be? Just trying to understand the impact
> the "shmmax" parameter can have  on Postgres  and the entire system after
> Postgres comes up on this number.

There are two settings, shmmax for the OS, and shared_buffers for
pgsql.  If the OS has a max setting of 4G but pgsql is set to use
512Meg then you'd be safe.

Now, assuming that both the OS and pgsql are set to 4G, and you're
approaching that level of usage, the OS will start swapping out to
make room for the shared memory.  The machine will likely be starved
of memory, and will go into what's often called a swap storm where it
spends all its time swapping stuff in and out and doing little else.

> What is the reasonable setting for shmmax on a 4G total machine?

Keep in mind that postgresql's shared buffers are just one level of
caching / buffering that's going on.  The OS caches file access, and
sometimes the RAID controller and / or hard drive.

Generally the nominal setting is 25% of memory, but that's variable.
If the working set of your data will fit in 10% then there's no need
for setting shared_memory to 25%...

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how big shmmax is good for Postgres...

2008-07-10 Thread Scott Marlowe
I just wanted to add to my previous post that shared_memory generally
has a performance envelope of quickly increasing performance as you
first increase share_memory, then a smaller performance step with each
increase in shared_memory.  Once all of the working set of your data
fits, the return starts to fall off quickly.  Assuming you were on a
machine with infinite memory and there were no penalties for a large
shared_buffers, then you would go until you were comfortably in the
level area.  If your working set were 1G on a dedicated machine with
64G, you could assume that memory was functionally unlimited.
Somewhere around 1.1 Gigs of memory and you get no performance
increase.

In real life, maintaining a cache has costs too.  The kernel of most
OSes now caches data very well.  and it does it very well for large
chunks of data.  So on a machine that big, the OS would be caching the
whole dataset as well.

What we're interested in is the working set size.  If you typically
call up a couple k of data, band it around some and commit, then never
hit that section again for hours, and all your calls do that,  and the
access area is evenly spread across the database, then your working
set is the whole thing.  This is a common access pattern when dealing
with transactional systems.

If you commonly have 100 transactions doing that at once, then you
multiply much memory they use times 100 to get total buffers in use,
and the rest is likely NEVER going to get used.

In these systems, what seems like a bad idea, lowering the
buffer_size, might be the exact right call.

For session servers and large transactional systems, it's often best
to let the OS do the best caching of the most of the data, and have
enough shared buffers to handle 2-10 times the in memory data set
size.  This will result in a buffer size of a few hundred megabytes.

The advantage here is that the OS doesn't have to spend a lot of time
maintaining a large buffer pool and checkpoints are cheaper.  With
spare CPU the background writer can use spare I/O cycles to write out
the smaller number of dirty pages in shared_memory and the system runs
faster.

Same is true of session servers.  If a DB is just used for tracking
logged in users, it only needs a tiny amount of shared_buffers.

Conversely, when you need large numbers of shared_buffers is when you
have something like a large social networking site.  A LOT of people
updating a large data set at the same time likely need way more
shared_buffers to run well.  A user might be inputing data for several
minutes or even hours.  The same pages are getting hit over and over
too.  For this kind of app, you need as much memory as you can afford
to throw at the problem, and a semi fast large RAID array.  A large
cache means your RAID controller / array only have to write, on
average, as fast as the database commits it.

When you can reach the point where shared_buffers is larger than 1/2
your memory you're now using postgresql for caching more so than the
kernel.  As your shared_buffers goes past 75% of available memory you
now have three times as much cache under postgesql'l control than
under the OS's control.  This would mean you've already testing this
and found that postgresql's caching works better for you than the OS's
caching does.  Haven't seen that happen a lot.  Maybe some large
transactional systems would work well that way.

My point here, and I have one, is that larger shared_buffers only
makes sense if you can use them.  They can work against you in a few
different ways that aren't obvious up front.  Checkpointing, Self DOS
due to swap storm, using up memory that the kernel might be better at
using as cache, etc.

So the answer really is, do some realistic testing, with an eye
towards anamalous behavior.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Altering a column type - Most efficient way

2008-07-10 Thread Alvaro Herrera
Ow Mun Heng wrote:

> This is what I see on the table
> 
> NEW attypmod = -1
> OLD attypmod =  8

8 means varchar(4) which is what you said you had (4+4)
-1 means unlimited size.


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how big shmmax is good for Postgres...

2008-07-10 Thread Scott Marlowe
Some corrections:

On Thu, Jul 10, 2008 at 6:11 AM, Scott Marlowe <[EMAIL PROTECTED]> wrote:

SNIP

> If you commonly have 100 transactions doing that at once, then you
> multiply much memory they use times 100 to get total buffer >> SPACE << in 
> use,
> and the rest is likely NEVER going to get used.
>
> In these systems, what seems like a bad idea, lowering the
> buffer_size, might be the exact right call.
>
> For session servers and large transactional systems, it's often best
> to let the OS do the best caching of the most of the data, and have
> enough shared buffers to handle 2-10 times the in memory data set
> size.  This will result in a buffer size of a few hundred megabytes.
>
> The advantage here is that the (NOT OS) DATABASE doesn't have to spend a lot 
> of time
> maintaining a large buffer pool and checkpoints are cheaper.
> The background writer can use spare >> CPU << and I/O cycles to write out
> the now smaller number of dirty pages in shared_memory and the system runs
> faster.

>
> Conversely, when you need large numbers of shared_buffers is when you
> have something like a large social networking site.  A LOT of people
> updating a large data set at the same time likely need way more
> shared_buffers to run well.  A user might be inputing data for several
> minutes or even hours.  The same pages are getting hit over and over
> too.  For this kind of app, you need as much memory as you can afford
> to throw at the problem, and a semi fast large RAID array.  A large
>  >> RAID << cache means your RAID controller / array only have to write, on
> average, as fast as the database commits.

Just minor edits.  If there's anything obviously wrong someone please
let me know.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Altering a column type - Most efficient way

2008-07-10 Thread Ow Mun Heng
On Thu, 2008-07-10 at 09:57 -0400, Alvaro Herrera wrote:
> Ow Mun Heng wrote:
> 
> > This is what I see on the table
> > 
> > NEW attypmod = -1
> > OLD attypmod =  8
> 
> 8 means varchar(4) which is what you said you had (4+4)
> -1 means unlimited size.
> 

This is cool. 

If it were this simple a change, I'm not certain why (I believe) PG is
checking each and every row to see if it will fit into the new column
definition/type. 

Thus, I'm still a bit hesitant to do the change, although it is
definitely a very enticing thing to do. ( I presume also that this
change will be instantaneous and does not need to check on each and
every row of the table?)

Thanks./

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance