On Sun, Apr 8, 2012 at 12:53 PM, Tom Lane wrote:
> However, I do have a couple of quibbles with the comments.
Good points. I made some adjustments; see what you think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing
Robert Haas writes:
> On reflection, it seems to me that the right fix here is to make
> SlruSelectLRUPage() to avoid selecting a page on which an I/O is
> already in progress.
This patch seems reasonably sane to me. It's not intuitively obvious
that we should ignore I/O-busy pages, but your tes
On Wed, Apr 4, 2012 at 7:06 PM, Josh Berkus wrote:
> On 4/4/12 4:02 PM, Tom Lane wrote:
>> Greg Stark writes:
>>> On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs wrote:
Why is this pgbench run accessing so much unhinted data that is > 1
million transactions old? Do you believe those number
On Thu, Apr 5, 2012 at 12:44 PM, Greg Stark wrote:
> On Thu, Apr 5, 2012 at 3:05 PM, Robert Haas wrote:
>> I'm not sure I find those numbers all that helpful, but there they
>> are. There are a couple of outliers beyond 12 s on the patched run,
>> but I wouldn't read anything into that; the abso
On Thu, Apr 5, 2012 at 12:30 PM, Jeff Janes wrote:
>> I'm not sure I find those numbers all that helpful, but there they
>> are. There are a couple of outliers beyond 12 s on the patched run,
>> but I wouldn't read anything into that; the absolute worst values
>> bounce around a lot from test to
On Thu, Apr 5, 2012 at 3:05 PM, Robert Haas wrote:
> I'm not sure I find those numbers all that helpful, but there they
> are. There are a couple of outliers beyond 12 s on the patched run,
> but I wouldn't read anything into that; the absolute worst values
> bounce around a lot from test to test
On Thu, Apr 5, 2012 at 7:05 AM, Robert Haas wrote:
> On Thu, Apr 5, 2012 at 9:29 AM, Greg Stark wrote:
>> On Thu, Apr 5, 2012 at 2:24 PM, Robert Haas wrote:
>>> Sorry, I don't understand specifically what you're looking for. I
>>> provided latency percentiles in the last email; what else do you
On Thu, Apr 5, 2012 at 9:29 AM, Greg Stark wrote:
> On Thu, Apr 5, 2012 at 2:24 PM, Robert Haas wrote:
>> Sorry, I don't understand specifically what you're looking for. I
>> provided latency percentiles in the last email; what else do you want?
>
> I think he wants how many waits were there tha
On Thu, Apr 5, 2012 at 2:24 PM, Robert Haas wrote:
> Sorry, I don't understand specifically what you're looking for. I
> provided latency percentiles in the last email; what else do you want?
I think he wants how many waits were there that were between 0 and 1s
how many between 1s and 2s, etc. M
On Thu, Apr 5, 2012 at 8:30 AM, Simon Riggs wrote:
> On Thu, Apr 5, 2012 at 12:56 PM, Robert Haas wrote:
>
>> Overall tps, first without and then with patch:
>>
>> tps = 14546.644712 (including connections establishing)
>> tps = 14550.515173 (including connections establishing)
>>
>> TPS graphs b
On Thu, Apr 5, 2012 at 12:56 PM, Robert Haas wrote:
> Overall tps, first without and then with patch:
>
> tps = 14546.644712 (including connections establishing)
> tps = 14550.515173 (including connections establishing)
>
> TPS graphs by second attached.
Again, I'm not that fussed about throughp
On Thu, Apr 5, 2012 at 5:41 AM, Simon Riggs wrote:
> I'm also loathe to back patch. But its not very often we find a
> problem that causes all backends to wait behind a single I/O.
You have a point.
Meanwhile, here are the benchmark results you requested. I did half
hour runs with -l. Here are
On Thu, Apr 5, 2012 at 12:25 AM, Robert Haas wrote:
>> That seems much smarter. I'm thinking this should be back patched
>> because it appears to be fairly major, so I'm asking for some more
>> certainty that every thing you say here is valid. No doubt much of it
>> is valid, but that's not enoug
On Thu, Apr 5, 2012 at 1:23 AM, Robert Haas wrote:
> I don't think we're micro-optimizing, either. I don't consider
> avoiding a 10-second cessation of all database activity to be a
> micro-optimization even on a somewhat artificial benchmark.
Robert is not skewing the SLRU mechanism towards th
On Wed, Apr 4, 2012 at 7:02 PM, Tom Lane wrote:
> Greg Stark writes:
>> On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs wrote:
>>> Why is this pgbench run accessing so much unhinted data that is > 1
>>> million transactions old? Do you believe those numbers? Looks weird.
>
>> I think this is in the
On Wed, Apr 4, 2012 at 4:34 PM, Simon Riggs wrote:
> Interesting. You've spoken at length how this hardly ever happens and
> so this can't have any performance effect. That was the reason for
> kicking out my patch addressing clog history, wasn't it?
Uh, no, the reason for kicking out your clog h
On Wed, Apr 4, 2012 at 4:23 PM, Simon Riggs wrote:
> Measurement?
>
> Sounds believable, I just want to make sure we have measured things.
Yes, I measured things. I didn't post the results because they're
almost identical to the previous set of results which I already
posted. That is, I wrote t
On 4/4/12 4:02 PM, Tom Lane wrote:
> Greg Stark writes:
>> On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs wrote:
>>> Why is this pgbench run accessing so much unhinted data that is > 1
>>> million transactions old? Do you believe those numbers? Looks weird.
>
>> I think this is in the nature of the
Greg Stark writes:
> On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs wrote:
>> Why is this pgbench run accessing so much unhinted data that is > 1
>> million transactions old? Do you believe those numbers? Looks weird.
> I think this is in the nature of the workload pgbench does. Because
> the updat
On Wed, Apr 4, 2012 at 9:05 PM, Robert Haas wrote:
> Here's a sample of how often that's firing, by second, on
> this test (pgbench with 32 clients):
>
> 4191 19:54:21
> 4540 19:54:22
Hm, so if that's evenly spread out that's 1/4ms between slru flushes
and if each flush takes 5-10ms that's go
On Wed, Apr 4, 2012 at 9:34 PM, Simon Riggs wrote:
> Why is this pgbench run accessing so much unhinted data that is > 1
> million transactions old? Do you believe those numbers? Looks weird.
I think this is in the nature of the workload pgbench does. Because
the updates are uniformly distributed
On Wed, Apr 4, 2012 at 9:05 PM, Robert Haas wrote:
> Yes, the SLRU is thrashing heavily. In this configuration, there are
> 32 CLOG buffers. I just added an elog() every time we replace a
> buffer. Here's a sample of how often that's firing, by second, on
> this test (pgbench with 32 clients):
On Wed, Apr 4, 2012 at 6:25 PM, Alvaro Herrera
wrote:
>
> Excerpts from Greg Stark's message of mié abr 04 14:11:29 -0300 2012:
>> On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote:
>> > , everybody's next few CLOG requests hit some other
>> > buffer but eventually the long-I/O-in-progress buffer
On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote:
>> I'll do some testing to try to confirm whether this theory is correct
>> and whether the above fix helps.
Very interesting work.
> Having performed this investigation, I've discovered a couple of
> interesting things. First, SlruRecentlyU
On Wed, Apr 4, 2012 at 1:11 PM, Greg Stark wrote:
> On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote:
>> , everybody's next few CLOG requests hit some other
>> buffer but eventually the long-I/O-in-progress buffer again becomes
>> least recently used and the next CLOG eviction causes a second ba
Excerpts from Greg Stark's message of mié abr 04 14:11:29 -0300 2012:
> On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote:
> > , everybody's next few CLOG requests hit some other
> > buffer but eventually the long-I/O-in-progress buffer again becomes
> > least recently used and the next CLOG evic
On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote:
> 3. I noticed that the blocking described by "slru.c:311 blocked by
> slru.c:405" seemed to be clumpy - I would get a bunch of messages
> about that all at once. This makes me wonder if the SLRU machinery is
> occasionally making a real bad deci
On Wed, Apr 4, 2012 at 1:00 PM, Robert Haas wrote:
> , everybody's next few CLOG requests hit some other
> buffer but eventually the long-I/O-in-progress buffer again becomes
> least recently used and the next CLOG eviction causes a second backend
> to begin waiting for that buffer.
This still so
On Wed, Apr 4, 2012 at 8:00 AM, Robert Haas wrote:
> There's some apparent regression on the single-client test, but I'm
> inclined to think that's a testing artifact of some kind and also
> probably not very important. It would be worth paying a small price
> in throughput to avoid many-second e
On Mon, Apr 2, 2012 at 12:33 PM, Robert Haas wrote:
> This particular example shows the above chunk of code taking >13s to
> execute. Within 3s, every other backend piles up behind that, leading
> to the database getting no work at all done for a good ten seconds.
>
> My guess is that what's happ
30 matches
Mail list logo