Re: [PERFORM] CPU utilization vs. IO wait, shared buffers?
Quoting Oliver Johnson <[EMAIL PROTECTED]>: Hi everybody, I am running a bake/load test and I am seeing sudden, daily shifts from CPU utilization to IO wait. The load harness has been running for 3 weeks and should be putting a uniform load on the application. The application processes data on a daily basis and a sawtooth CPU pattern on the database is expected as more values are added throughout the day and processing resets with the next day. Each day, I see the CPU utilization climb as expected until a shift occurs and it spends the rest of the day primarily in IO wait. Looking at pg_statio_user_tables, I can see that during the CPU intense timeframe, most of the results come from the buffer cache (hits). During the IO wait, most of the results are being read in (misses). Examples from each timeframe (CPU/IO) are included below. For each sample, I issued pg_stat_reset(), waited briefly, and then queried pg_statio_user_tables. *during CPU Intense timeframe* db=# select * from pg_statio_user_tables; relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit ---++---++---+---+--+-+++--- 16612 | public | tablea| 1 | 1346782 | 1 |55956 | 0 | 0 | 0 | 0 16619 | public | tableb| 0 | 579 | 0 | 1158 | | || *during IO WAIT timeframe* db=# select * from pg_statio_user_tables; relid | schemaname | relname | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit ---++---++---+---+--+-+++--- 16612 | public | tablea| 244146 | 594 | 4885 | 3703 | 0 | 0 | 0 | 0 16619 | public | tableb| 418 | 589 | 432 | 1613 | | || Another thing to note, we have VACUUM ANALYZE running on an hourly interval and the switch from CPU to IO wait appears to always coincide with a vacuum. What might cause this shift? I have tried adjusting buffer_cache from 512 MB to 1024 MB, but this did not appear to have an impact. I also tried upping the work_mem from 1MB to 10MB, and this did not appear to have an impact either. Any ideas? Thanks for your help! Oliver We're running Postgresql 8.2.9 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance First of all, stop me if you're not running Linux -- that's the only OS I know. :) Second, if you're not running a fairly recent 2.6 kernel (2.6.18/RHEL 5 or later), you should probably upgrade, because the performance stats are better. 2.6.25 is better still. Next, if you haven't already, install the "sysstat" package. My recollection is that it does not install by default on most distros. It should -- go beat up on the distributors. :) Now you have "iostat" installed. That will give you detailed information on both processor and I/O activity. Use the command $ iostat -cdmtx 10 99 | tee iostat.log This will sample the processor(s), all the devices, and on 2.6.25 or later kernels, all the *partitions*. This last is important if you have things in different filesystems. What you will probably see is samples where the I/O wait is high correlated with high levels of read activity (reads per second and read megabytes per second) and high device utilization. That means you are reading data from disk and the processors are waiting for it. What can you do about it? 1. Add RAM. This will let Linux put more stuff in page cache, making it have to read less. 2. Experiment with the four I/O schedulers. You can change them at run time (as "root"). I've put a little bit of this on line -- it's fairly detailed, and it's not PostgreSQL-specific, but you can get an indication of the concepts. By the way, I am working on some scripts that will actually integrate this type of monitoring and analysis with PostgreSQL. What they will do is load the raw Linux data into a PostgreSQL database and provide analysis queries and other tools. But for now, see if this makes any sense to you: http://cougar.rubyforge.org/svn/trunk/procmodel/IO-Counters/beamer/handout.pdf -- Sent via pgsql-
Re: [PERFORM] Index usage problem on 8.3.3
Tom Lane <[EMAIL PROTECTED]> writes: > Jeff Frost <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> Huh. That does sound like it's a version-to-version difference. >>> There's nothing in the CVS log that seems related though. Are you >>> willing to post your test case? >>> >> It's a customer DB, so I'll contact them and see if we can boil it down >> to a test case with no sensitive data. > > Well, if there was a change it seems to have been in the right direction > ;-) so this is mostly just idle curiosity. Don't jump through hoops to > get a test case. Assuming it's not a bug... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Configuring for maximum memory usage
On Thu, Oct 30, 2008 at 2:06 PM, Christiaan Willemsen < [EMAIL PROTECTED]> wrote: > Thanks guys, > Lots of info here that I didn't know about! Since I have one of the latest > Opensolaris builds, I guess the write throttle feature is already in there. > Sadly, the blog doesn't say what build has it included. > If I recall correctly, it went in at about build 89 or so (I think the bottom of the link I provided has a comment to that effect). So its in there now, but not in OpenSolaris 2008.05. > > For writes, I do everything synchronized because we really need a > consistent database on disk. We can see that during large inserts, the > intend log is used a lot. > The DB synchronizes the WAL log automatically, and the table and index data are written non-synchronously until the commit at the end of a checkpoint, in which case sync is called on them. This keeps things consistent on disk. With ZFS, each block written is always consistent, with a checksum kept in the parent block. There are no partial page writes, ever. In theory, you can disable full page writes on the WAL log if there is a bottleneck there since ZFS guarantees fully transactional consistent state of the file system, even if you have a RAID controller or hardware failure that causes a partial write. But WAL log activity is probably not your bottleneck so turning off full page writes on the WAL log is not necessary. > > What I'm going to te testing is a smaller shared_buffers value, and a > large ARC cache, and exactly the other way around. > > Another question: since we have huge tables with hundreds of millions or > rows, we partitioned the database (it actually is creating the partitions > dynamically now on inserts with very good performance :D ), but the question > is: is the size of the partions important for the memory parameters in > config file? How can we determine the optimal size of the partition. > obviously, when doing selects, you want those preferably only needing a > single partition for speed. At the moment, that is for the majority of > situations the case. But there might be some other things to think about... > > Kind regards, > > Christiaan > > > On Oct 30, 2008, at 7:27 PM, Scott Carey wrote: > > > > On Thu, Oct 30, 2008 at 9:55 AM, Joshua D. Drake <[EMAIL PROTECTED]>wrote: > >> On Thu, 2008-10-30 at 09:46 -0700, Scott Carey wrote: >> >> > >> > Remember that PostgreSQL doesn't cache anything on its own so >> > if you do >> > want to hit disk it has to be in file cache. >> > >> > By my understanding, this is absolutely false. Postgres caches pages >> > from tables/indexes in shared_buffers. You can make this very large if >> > you wish. >> >> You can make it very large with a potentially serious performance hit. >> It is very expensive to manage large amounts of shared buffers. It can >> also nail your IO on checkpoint if you are not careful (even with >> checkpoint smoothing). You are correct that I did not explain what I >> meant very well because shared buffers are exactly that, shared >> buffers. > > > You can slam your I/O by havnig too large of either OS file cache or > shared_buffers, and you have to tune both. > In the case of large shared_buffers you have to tune postgres and > especially the background writer and checkpoints. > In the case of a large OS cache, you have to tune parameters to limit the > ammount of dirty pages there and force writes out smoothly. > Both layers attempt to delay writes for their own, often similar reasons, > and suffer when a large sync comes along with lots of dirty data. > > Recent ZFS changes have been made to limit this, ( > http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle) > in earlier ZFS versions, this is what usually killed databases -- ZFS in > some situations would delay writes too long (even if "long" is 5 seconds) > and get in trouble. This still has to be tuned well, combined with good > checkpoint tuning in Postgres as you mention. For Linux, there are similar > issues that have to be tuned on many kernels, or up to 40% of RAM can fill > with dirty pages not written to disk. > > Letting the OS do it doesn't get rid of the problem, both levels of cache > share very similar issues with large sizes and dirty pages followed by a > sync. > > The buffer cache in shared_buffers is a lot more efficient for large > scanning queries -- A select count(*) test will be CPU bound if it comes > from shared_buffers or the OS page cache, and in the former case I have seen > it execute up to 50% faster than the latter, by avoiding calling out to the > OS to get pages, purely as a result of less CPU used. > > > >> >> >> However that isn't the exact same thing as a "cache" at least as I was >> trying to describe it. shared buffers are used to keep track of pages >> (as well as some other stuff) and their current status. That is not the >> same as caching a relation. >> >> It is not possible to pin a relation to memory using PostgreSQL. >> P
Re: [PERFORM] Index usage problem on 8.3.3
On Fri, 31 Oct 2008, Gregory Stark wrote: Tom Lane <[EMAIL PROTECTED]> writes: Jeff Frost <[EMAIL PROTECTED]> writes: Tom Lane wrote: Huh. That does sound like it's a version-to-version difference. There's nothing in the CVS log that seems related though. Are you willing to post your test case? It's a customer DB, so I'll contact them and see if we can boil it down to a test case with no sensitive data. Well, if there was a change it seems to have been in the right direction ;-) so this is mostly just idle curiosity. Don't jump through hoops to get a test case. Assuming it's not a bug... Well, after boiling down my test case to the bare essentials, I was unable to reproduce the different behavior between 8.3.3 and 8.3.4. Now, I've gone back to the original script and can't reproduce the behavior I previously saw on 8.3.4 and my screen session doesn't have enough scrollback to look at what happened previously. I was thinking perhaps I had inadvertently committed the transaction, but then the act would have been dropped as it's a temp table created with ON COMMIT DROP. But, I've tested 3 times in a row and every time 8.3.4 uses the seq scan just like 8.3.3 now, so I must've done something differently to get that result as Tom had originally suggested. I just can't think what it might have been. Perhaps it's time to buy some glasses. :-/ -- Jeff Frost, Owner <[EMAIL PROTECTED]> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 916-647-6411 FAX: 916-405-4032 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Occasional Slow Commit
(Resending this, the first one got bounced by mail.postgresql.org) On Wed, Oct 29, 2008 at 3:30 PM, David Rees <[EMAIL PROTECTED]> wrote: > On Wed, Oct 29, 2008 at 6:26 AM, Greg Smith <[EMAIL PROTECTED]> wrote: >> What you should do first is confirm >> whether or not the slow commits line up with the end of the checkpoint, >> which is easy to see if you turn on log_checkpoints. That gives you timings >> for the write and fsync phases of the checkpoint which can also be >> informative. > > OK, log_checkpoints is turned on to see if any delays correspond to > checkpoint activity... Well, I'm pretty sure the delays are not checkpoint related. None of the slow commits line up at all with the end of checkpoints. The period of high delays occur during the same period of time each week, but it's not during a particularly high load period on the systems. It really seems like there must be something running in the background that is not showing up on the system activity logs, like a background RAID scrub or something. Here are a couple representative checkpoint complete messages from the logs: 2008-10-31 20:12:03 UTC: : : LOG: checkpoint complete: wrote 285 buffers (0.3%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=57.933 s, sync=0.053 s, total=57.990 s 2008-10-31 20:17:33 UTC: : : LOG: checkpoint complete: wrote 437 buffers (0.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=87.891 s, sync=0.528 s, total=88.444 s 2008-10-31 20:22:05 UTC: : : LOG: checkpoint complete: wrote 301 buffers (0.3%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=60.774 s, sync=0.033 s, total=60.827 s 2008-10-31 20:27:46 UTC: : : LOG: checkpoint complete: wrote 504 buffers (0.5%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=101.037 s, sync=0.049 s, total=101.122 s During this period of time there was probably 100 different queries writing/commiting data that took longer than a second (3-4 seconds seems to be the worst). The RAID controller on this machine is some sort of MegaRAID controller. I'll have to see if there is some sort of scheduled scan running during this period of time. One of the replicate nodes is an identical machine which I just noticed has the same slony commit slow downs logged even though it's only receiving data from slony from the primary node. There are two other nodes listening in on the same subscription, but these two nodes don't show the same slony commit slow downs, but these machines use a slightly different raid controller and are about 9 months newer than the other two. I'm still hoping that the checkpoint tuning has reduced commit latency during busy periods, I should have more data after the weekend. -Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance