[PERFORM] Altering a column type - Most efficient way
Is there any quick hacks to do this quickly? There's around 20-30million rows of data. I want to change a column type from varchar(4) to varchar() table size is ~10-15GB (and another 10-15G for indexes) What would be the preferrred way of doing it? SHould I be dropping the indexes 1st to make things faster? Would it matter? The punch line is that since the databases are connected via slony, this makes it even harder to pull it off. My last try got the DDL change completed in like 3 hours (smallest table of the bunch) and it hung everything. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Altering a column type - Most efficient way
Ow Mun Heng schrieb: Is there any quick hacks to do this quickly? There's around 20-30million rows of data. I want to change a column type from varchar(4) to varchar() table size is ~10-15GB (and another 10-15G for indexes) What would be the preferrred way of doing it? SHould I be dropping the indexes 1st to make things faster? Would it matter? The punch line is that since the databases are connected via slony, this makes it even harder to pull it off. My last try got the DDL change completed in like 3 hours (smallest table of the bunch) and it hung everything Before Postgresql supported "alter table ... type ... " conversions, I did it a few times when I detected later that my varchar() fields were too short, and it worked perfectly. Example: {OLDLEN} = 4 {NEWLEN} = 60 update pg_attribute set atttypmod={NEWLEN}+4 where attname='the-name-of-the-column' and attrelid=(select oid from pg_class where relname='the-name-of-the-table') and atttypmod={OLDLEN}+4; This worked very well when you want to increase the maximum length, don't try to reduce the maximum length this way! Disclaimer: I do not know if slony might be have a problem with this. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Altering a column type - Most efficient way
On Thu, 2008-07-10 at 10:36 +0200, Mario Weilguni wrote: > Ow Mun Heng schrieb: > > > > I want to change a column type from varchar(4) to varchar() > > > > > Example: > {OLDLEN} = 4 > {NEWLEN} = 60 > > update pg_attribute >set atttypmod={NEWLEN}+4 > where attname='the-name-of-the-column' >and attrelid=(select oid from pg_class where > relname='the-name-of-the-table') >and atttypmod={OLDLEN}+4; > This is what I see on the table NEW attypmod = -1 OLD attypmod = 8 I'm not very confident in doint this change since this is not a development server. If would help though if someone else has any other suggestion or something.. Thanks for the information though, it's enlightening knowledge. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] how big shmmax is good for Postgres...
On a Linux system, if the total memory is 4G and the shmmax is set to 4G, I know it is bad, but how bad can it be? Just trying to understand the impact the "shmmax" parameter can have on Postgres and the entire system after Postgres comes up on this number. What is the reasonable setting for shmmax on a 4G total machine? Thanks a lot, Jessica
Re: [PERFORM] how big shmmax is good for Postgres...
In response to Jessica Richard <[EMAIL PROTECTED]>: > On a Linux system, if the total memory is 4G and the shmmax is set to 4G, I > know it is bad, but how bad can it be? Just trying to understand the impact > the "shmmax" parameter can have on Postgres and the entire system after > Postgres comes up on this number. It's not bad by definition. shmmax is a cap on the max that can be used. Just because you set it to 4G doesn't mean any application is going to use all of that. With PostgreSQL, the maximum amount of shared memory it will allocate is governed by the shared_buffers setting in the postgresql.conf. It _is_ a good idea to set shmmax to a reasonable size to prevent a misbehaving application from eating up all the memory on a system, but I've yet to see PostgreSQL misbehave in this manner. Perhaps I'm too trusting. > What is the reasonable setting for shmmax on a 4G total machine? If you mean what's a reasonable setting for shared_buffers, conventional wisdom says to start with 25% of the available RAM and increase it or decrease it if you discover your workload benefits from more or less. By "available RAM" is meant the free RAM after all other applications are running, which will be 4G if this machine only runs PostgreSQL, but could be less if it runs other things like a web server. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how big shmmax is good for Postgres...
On Thu, Jul 10, 2008 at 4:53 AM, Jessica Richard <[EMAIL PROTECTED]> wrote: > On a Linux system, if the total memory is 4G and the shmmax is set to 4G, I > know it is bad, but how bad can it be? Just trying to understand the impact > the "shmmax" parameter can have on Postgres and the entire system after > Postgres comes up on this number. There are two settings, shmmax for the OS, and shared_buffers for pgsql. If the OS has a max setting of 4G but pgsql is set to use 512Meg then you'd be safe. Now, assuming that both the OS and pgsql are set to 4G, and you're approaching that level of usage, the OS will start swapping out to make room for the shared memory. The machine will likely be starved of memory, and will go into what's often called a swap storm where it spends all its time swapping stuff in and out and doing little else. > What is the reasonable setting for shmmax on a 4G total machine? Keep in mind that postgresql's shared buffers are just one level of caching / buffering that's going on. The OS caches file access, and sometimes the RAID controller and / or hard drive. Generally the nominal setting is 25% of memory, but that's variable. If the working set of your data will fit in 10% then there's no need for setting shared_memory to 25%... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how big shmmax is good for Postgres...
I just wanted to add to my previous post that shared_memory generally has a performance envelope of quickly increasing performance as you first increase share_memory, then a smaller performance step with each increase in shared_memory. Once all of the working set of your data fits, the return starts to fall off quickly. Assuming you were on a machine with infinite memory and there were no penalties for a large shared_buffers, then you would go until you were comfortably in the level area. If your working set were 1G on a dedicated machine with 64G, you could assume that memory was functionally unlimited. Somewhere around 1.1 Gigs of memory and you get no performance increase. In real life, maintaining a cache has costs too. The kernel of most OSes now caches data very well. and it does it very well for large chunks of data. So on a machine that big, the OS would be caching the whole dataset as well. What we're interested in is the working set size. If you typically call up a couple k of data, band it around some and commit, then never hit that section again for hours, and all your calls do that, and the access area is evenly spread across the database, then your working set is the whole thing. This is a common access pattern when dealing with transactional systems. If you commonly have 100 transactions doing that at once, then you multiply much memory they use times 100 to get total buffers in use, and the rest is likely NEVER going to get used. In these systems, what seems like a bad idea, lowering the buffer_size, might be the exact right call. For session servers and large transactional systems, it's often best to let the OS do the best caching of the most of the data, and have enough shared buffers to handle 2-10 times the in memory data set size. This will result in a buffer size of a few hundred megabytes. The advantage here is that the OS doesn't have to spend a lot of time maintaining a large buffer pool and checkpoints are cheaper. With spare CPU the background writer can use spare I/O cycles to write out the smaller number of dirty pages in shared_memory and the system runs faster. Same is true of session servers. If a DB is just used for tracking logged in users, it only needs a tiny amount of shared_buffers. Conversely, when you need large numbers of shared_buffers is when you have something like a large social networking site. A LOT of people updating a large data set at the same time likely need way more shared_buffers to run well. A user might be inputing data for several minutes or even hours. The same pages are getting hit over and over too. For this kind of app, you need as much memory as you can afford to throw at the problem, and a semi fast large RAID array. A large cache means your RAID controller / array only have to write, on average, as fast as the database commits it. When you can reach the point where shared_buffers is larger than 1/2 your memory you're now using postgresql for caching more so than the kernel. As your shared_buffers goes past 75% of available memory you now have three times as much cache under postgesql'l control than under the OS's control. This would mean you've already testing this and found that postgresql's caching works better for you than the OS's caching does. Haven't seen that happen a lot. Maybe some large transactional systems would work well that way. My point here, and I have one, is that larger shared_buffers only makes sense if you can use them. They can work against you in a few different ways that aren't obvious up front. Checkpointing, Self DOS due to swap storm, using up memory that the kernel might be better at using as cache, etc. So the answer really is, do some realistic testing, with an eye towards anamalous behavior. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Altering a column type - Most efficient way
Ow Mun Heng wrote: > This is what I see on the table > > NEW attypmod = -1 > OLD attypmod = 8 8 means varchar(4) which is what you said you had (4+4) -1 means unlimited size. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how big shmmax is good for Postgres...
Some corrections: On Thu, Jul 10, 2008 at 6:11 AM, Scott Marlowe <[EMAIL PROTECTED]> wrote: SNIP > If you commonly have 100 transactions doing that at once, then you > multiply much memory they use times 100 to get total buffer >> SPACE << in > use, > and the rest is likely NEVER going to get used. > > In these systems, what seems like a bad idea, lowering the > buffer_size, might be the exact right call. > > For session servers and large transactional systems, it's often best > to let the OS do the best caching of the most of the data, and have > enough shared buffers to handle 2-10 times the in memory data set > size. This will result in a buffer size of a few hundred megabytes. > > The advantage here is that the (NOT OS) DATABASE doesn't have to spend a lot > of time > maintaining a large buffer pool and checkpoints are cheaper. > The background writer can use spare >> CPU << and I/O cycles to write out > the now smaller number of dirty pages in shared_memory and the system runs > faster. > > Conversely, when you need large numbers of shared_buffers is when you > have something like a large social networking site. A LOT of people > updating a large data set at the same time likely need way more > shared_buffers to run well. A user might be inputing data for several > minutes or even hours. The same pages are getting hit over and over > too. For this kind of app, you need as much memory as you can afford > to throw at the problem, and a semi fast large RAID array. A large > >> RAID << cache means your RAID controller / array only have to write, on > average, as fast as the database commits. Just minor edits. If there's anything obviously wrong someone please let me know. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Altering a column type - Most efficient way
On Thu, 2008-07-10 at 09:57 -0400, Alvaro Herrera wrote: > Ow Mun Heng wrote: > > > This is what I see on the table > > > > NEW attypmod = -1 > > OLD attypmod = 8 > > 8 means varchar(4) which is what you said you had (4+4) > -1 means unlimited size. > This is cool. If it were this simple a change, I'm not certain why (I believe) PG is checking each and every row to see if it will fit into the new column definition/type. Thus, I'm still a bit hesitant to do the change, although it is definitely a very enticing thing to do. ( I presume also that this change will be instantaneous and does not need to check on each and every row of the table?) Thanks./ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance