Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-08 Thread Robert Haas
On Sun, Apr 8, 2012 at 12:53 PM, Tom Lane wrote: > However, I do have a couple of quibbles with the comments. Good points. I made some adjustments; see what you think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-08 Thread Tom Lane
Robert Haas writes: > On reflection, it seems to me that the right fix here is to make > SlruSelectLRUPage() to avoid selecting a page on which an I/O is > already in progress. This patch seems reasonably sane to me. It's not intuitively obvious that we should ignore I/O-busy pages, but your tes

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-06 Thread Jignesh Shah
On Wed, Apr 4, 2012 at 7:06 PM, Josh Berkus wrote: > On 4/4/12 4:02 PM, Tom Lane wrote: >> Greg Stark writes: >>> On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs wrote: Why is this pgbench run accessing so much unhinted data that is > 1 million transactions old? Do you believe those number

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Robert Haas
On Thu, Apr 5, 2012 at 12:44 PM, Greg Stark wrote: > On Thu, Apr 5, 2012 at 3:05 PM, Robert Haas wrote: >> I'm not sure I find those numbers all that helpful, but there they >> are.  There are a couple of outliers beyond 12 s on the patched run, >> but I wouldn't read anything into that; the abso

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Robert Haas
On Thu, Apr 5, 2012 at 12:30 PM, Jeff Janes wrote: >> I'm not sure I find those numbers all that helpful, but there they >> are.  There are a couple of outliers beyond 12 s on the patched run, >> but I wouldn't read anything into that; the absolute worst values >> bounce around a lot from test to

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Greg Stark
On Thu, Apr 5, 2012 at 3:05 PM, Robert Haas wrote: > I'm not sure I find those numbers all that helpful, but there they > are.  There are a couple of outliers beyond 12 s on the patched run, > but I wouldn't read anything into that; the absolute worst values > bounce around a lot from test to test

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Jeff Janes
On Thu, Apr 5, 2012 at 7:05 AM, Robert Haas wrote: > On Thu, Apr 5, 2012 at 9:29 AM, Greg Stark wrote: >> On Thu, Apr 5, 2012 at 2:24 PM, Robert Haas wrote: >>> Sorry, I don't understand specifically what you're looking for.  I >>> provided latency percentiles in the last email; what else do you

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Robert Haas
On Thu, Apr 5, 2012 at 9:29 AM, Greg Stark wrote: > On Thu, Apr 5, 2012 at 2:24 PM, Robert Haas wrote: >> Sorry, I don't understand specifically what you're looking for.  I >> provided latency percentiles in the last email; what else do you want? > > I think he wants how many waits were there tha

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Greg Stark
On Thu, Apr 5, 2012 at 2:24 PM, Robert Haas wrote: > Sorry, I don't understand specifically what you're looking for.  I > provided latency percentiles in the last email; what else do you want? I think he wants how many waits were there that were between 0 and 1s how many between 1s and 2s, etc. M

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Robert Haas
On Thu, Apr 5, 2012 at 8:30 AM, Simon Riggs wrote: > On Thu, Apr 5, 2012 at 12:56 PM, Robert Haas wrote: > >> Overall tps, first without and then with patch: >> >> tps = 14546.644712 (including connections establishing) >> tps = 14550.515173 (including connections establishing) >> >> TPS graphs b

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Simon Riggs
On Thu, Apr 5, 2012 at 12:56 PM, Robert Haas wrote: > Overall tps, first without and then with patch: > > tps = 14546.644712 (including connections establishing) > tps = 14550.515173 (including connections establishing) > > TPS graphs by second attached. Again, I'm not that fussed about throughp

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Robert Haas
On Thu, Apr 5, 2012 at 5:41 AM, Simon Riggs wrote: > I'm also loathe to back patch. But its not very often we find a > problem that causes all backends to wait behind a single I/O. You have a point. Meanwhile, here are the benchmark results you requested. I did half hour runs with -l. Here are

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Simon Riggs
On Thu, Apr 5, 2012 at 12:25 AM, Robert Haas wrote: >> That seems much smarter. I'm thinking this should be back patched >> because it appears to be fairly major, so I'm asking for some more >> certainty that every thing you say here is valid. No doubt much of it >> is valid, but that's not enoug

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-05 Thread Simon Riggs
On Thu, Apr 5, 2012 at 1:23 AM, Robert Haas wrote: > I don't think we're micro-optimizing, either.  I don't consider > avoiding a 10-second cessation of all database activity to be a > micro-optimization even on a somewhat artificial benchmark. Robert is not skewing the SLRU mechanism towards th

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 7:02 PM, Tom Lane wrote: > Greg Stark writes: >> On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs wrote: >>> Why is this pgbench run accessing so much unhinted data that is > 1 >>> million transactions old? Do you believe those numbers? Looks weird. > >> I think this is in the

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 4:34 PM, Simon Riggs wrote: > Interesting. You've spoken at length how this hardly ever happens and > so this can't have any performance effect. That was the reason for > kicking out my patch addressing clog history, wasn't it? Uh, no, the reason for kicking out your clog h

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 4:23 PM, Simon Riggs wrote: > Measurement? > > Sounds believable, I just want to make sure we have measured things. Yes, I measured things. I didn't post the results because they're almost identical to the previous set of results which I already posted. That is, I wrote t

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Josh Berkus
On 4/4/12 4:02 PM, Tom Lane wrote: > Greg Stark writes: >> On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs wrote: >>> Why is this pgbench run accessing so much unhinted data that is > 1 >>> million transactions old? Do you believe those numbers? Looks weird. > >> I think this is in the nature of the

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Tom Lane
Greg Stark writes: > On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs wrote: >> Why is this pgbench run accessing so much unhinted data that is > 1 >> million transactions old? Do you believe those numbers? Looks weird. > I think this is in the nature of the workload pgbench does. Because > the updat

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Greg Stark
On Wed, Apr 4, 2012 at 9:05 PM, Robert Haas wrote: > Here's a sample of how often that's firing, by second, on > this test (pgbench with 32 clients): > >   4191 19:54:21 >   4540 19:54:22 Hm, so if that's evenly spread out that's 1/4ms between slru flushes and if each flush takes 5-10ms that's go

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Greg Stark
On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs wrote: > Why is this pgbench run accessing so much unhinted data that is > 1 > million transactions old? Do you believe those numbers? Looks weird. I think this is in the nature of the workload pgbench does. Because the updates are uniformly distributed

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Simon Riggs
On Wed, Apr 4, 2012 at 9:05 PM, Robert Haas wrote: > Yes, the SLRU is thrashing heavily.  In this configuration, there are > 32 CLOG buffers.  I just added an elog() every time we replace a > buffer.  Here's a sample of how often that's firing, by second, on > this test (pgbench with 32 clients):

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Simon Riggs
On Wed, Apr 4, 2012 at 6:25 PM, Alvaro Herrera wrote: > > Excerpts from Greg Stark's message of mié abr 04 14:11:29 -0300 2012: >> On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote: >> > , everybody's next few CLOG requests hit some other >> > buffer but eventually the long-I/O-in-progress buffer

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Simon Riggs
On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote: >> I'll do some testing to try to confirm whether this theory is correct >> and whether the above fix helps. Very interesting work. > Having performed this investigation, I've discovered a couple of > interesting things.  First, SlruRecentlyU

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 1:11 PM, Greg Stark wrote: > On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote: >> , everybody's next few CLOG requests hit some other >> buffer but eventually the long-I/O-in-progress buffer again becomes >> least recently used and the next CLOG eviction causes a second ba

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Alvaro Herrera
Excerpts from Greg Stark's message of mié abr 04 14:11:29 -0300 2012: > On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote: > > , everybody's next few CLOG requests hit some other > > buffer but eventually the long-I/O-in-progress buffer again becomes > > least recently used and the next CLOG evic

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Greg Stark
On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote: > 3. I noticed that the blocking described by "slru.c:311 blocked by > slru.c:405" seemed to be clumpy - I would get a bunch of messages > about that all at once.  This makes me wonder if the SLRU machinery is > occasionally making a real bad deci

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Greg Stark
On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote: > , everybody's next few CLOG requests hit some other > buffer but eventually the long-I/O-in-progress buffer again becomes > least recently used and the next CLOG eviction causes a second backend > to begin waiting for that buffer. This still so

Re: [HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Wed, Apr 4, 2012 at 8:00 AM, Robert Haas wrote: > There's some apparent regression on the single-client test, but I'm > inclined to think that's a testing artifact of some kind and also > probably not very important.  It would be worth paying a small price > in throughput to avoid many-second e

[HACKERS] patch: improve SLRU replacement algorithm

2012-04-04 Thread Robert Haas
On Mon, Apr 2, 2012 at 12:33 PM, Robert Haas wrote: > This particular example shows the above chunk of code taking >13s to > execute.  Within 3s, every other backend piles up behind that, leading > to the database getting no work at all done for a good ten seconds. > > My guess is that what's happ