Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-04-02 Thread Bruce Momjian
"test" version, but I am putting in the queue so we can track it there. Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. ---

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-26 Thread Bruce Momjian
Simon, is this patch ready to be added to the patch queue? I assume not. --- Simon Riggs wrote: > On Mon, 2007-03-12 at 09:14 +, Simon Riggs wrote: > > On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote: > > > >

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-13 Thread Luke Lonergan
Simon, On 3/13/07 2:37 AM, "Simon Riggs" <[EMAIL PROTECTED]> wrote: >> We're planning a modification that I think you should consider: when there >> is a sequential scan of a table larger than the size of shared_buffers, we >> are allowing the scan to write through the shared_buffers cache. > >

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-13 Thread Simon Riggs
On Tue, 2007-03-13 at 13:40 +0900, ITAGAKI Takahiro wrote: > "Simon Riggs" <[EMAIL PROTECTED]> wrote: > > > > > With the default > > > > value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers > > > > in pool, > > > > just like existing sequential scans. Is this intended? > > > > >

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-13 Thread Simon Riggs
On Mon, 2007-03-12 at 22:16 -0700, Luke Lonergan wrote: > You may know we've built something similar and have seen similar gains. Cool > We're planning a modification that I think you should consider: when there > is a sequential scan of a table larger than the size of shared_buffers, we > are a

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-12 Thread Luke Lonergan
Simon, You may know we've built something similar and have seen similar gains. We're planning a modification that I think you should consider: when there is a sequential scan of a table larger than the size of shared_buffers, we are allowing the scan to write through the shared_buffers cache. The

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-12 Thread ITAGAKI Takahiro
"Simon Riggs" <[EMAIL PROTECTED]> wrote: > > > With the default > > > value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in > > > pool, > > > just like existing sequential scans. Is this intended? > > > New test version enclosed, where scan_recycle_buffers = 0 doesn't change

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-12 Thread Simon Riggs
On Mon, 2007-03-12 at 10:30 -0400, Tom Lane wrote: > ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > > I tested your patch with VACUUM FREEZE. The performance was improved when > > I set scan_recycle_buffers > 32. I used VACUUM FREEZE to increase WAL > > traffic, > > but this patch should be useful

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-12 Thread Tom Lane
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > I tested your patch with VACUUM FREEZE. The performance was improved when > I set scan_recycle_buffers > 32. I used VACUUM FREEZE to increase WAL traffic, > but this patch should be useful for normal VACUUMs with backgrond jobs! Proving that you can s

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-12 Thread Simon Riggs
On Mon, 2007-03-12 at 09:14 +, Simon Riggs wrote: > On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote: > > With the default > > value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in > > pool, > > just like existing sequential scans. Is this intended? > > Yes, but i

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-12 Thread Simon Riggs
On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote: > "Simon Riggs" <[EMAIL PROTECTED]> wrote: > > > I've implemented buffer recycling, as previously described, patch being > > posted now to -patches as "scan_recycle_buffers". > > > > - for VACUUMs of any size, with the objective of reduci

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-11 Thread ITAGAKI Takahiro
"Simon Riggs" <[EMAIL PROTECTED]> wrote: > I've implemented buffer recycling, as previously described, patch being > posted now to -patches as "scan_recycle_buffers". > > - for VACUUMs of any size, with the objective of reducing WAL thrashing > whilst keeping VACUUM's behaviour of not spoiling t

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-09 Thread Luke Lonergan
; PGSQL Hackers; Doug Rady Subject:Re: [HACKERS] Bug: Buffer cache is not scan resistant On Tue, 2007-03-06 at 22:32 -0500, Luke Lonergan wrote: > Incidentally, we tried triggering NTA (L2 cache bypass) > unconditionally and in various patterns and did not see the > substantial gai

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-09 Thread Simon Riggs
On Tue, 2007-03-06 at 22:32 -0500, Luke Lonergan wrote: > Incidentally, we tried triggering NTA (L2 cache bypass) > unconditionally and in various patterns and did not see the > substantial gain as with reducing the working set size. > > My conclusion: Fixing the OS is not sufficient to alleviate

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-08 Thread Sherry Moore
Hi Simon, > and what you haven't said > > - all of this is orthogonal to the issue of buffer cache spoiling in > PostgreSQL itself. That issue does still exist as a non-OS issue, but > we've been discussing in detail the specific case of L2 cache effects > with specific kernel calls. All of the t

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-07 Thread Marko Kreen
On 3/7/07, Hannu Krosing <[EMAIL PROTECTED]> wrote: Do any of you know about a way to READ PAGE ONLY IF IN CACHE in *nix systems ? Supposedly you could mmap() a file and then do mincore() on the area to see which pages are cached. But you were talking about postgres cache before, there it shou

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-03-06 kell 18:28, kirjutas Jeff Davis: > On Tue, 2007-03-06 at 18:29 +, Heikki Linnakangas wrote: > > Jeff Davis wrote: > > > On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote: > > >> On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote: > > >>> Another approach I pr

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Luke Lonergan
; Pavan Deolasee; Gavin Sherry; PGSQL Hackers; Doug Rady Subject:Re: [HACKERS] Bug: Buffer cache is not scan resistant Hi Simon, > and what you haven't said > > - all of this is orthogonal to the issue of buffer cache spoiling in > PostgreSQL itself. That issue does sti

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Jeff Davis
On Tue, 2007-03-06 at 18:29 +, Heikki Linnakangas wrote: > Jeff Davis wrote: > > On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote: > >> On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote: > >>> Another approach I proposed back in December is to not have a > >>> variable like that at all,

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Jeff Davis
On Tue, 2007-03-06 at 17:43 -0700, Jim Nasby wrote: > On Mar 6, 2007, at 10:56 AM, Jeff Davis wrote: > >> We also don't need an exact count, either. Perhaps there's some way > >> we could keep a counter or something... > > > > Exact count of what? The pages already in cache? > > Yes. The idea bein

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Jim Nasby
On Mar 6, 2007, at 10:56 AM, Jeff Davis wrote: We also don't need an exact count, either. Perhaps there's some way we could keep a counter or something... Exact count of what? The pages already in cache? Yes. The idea being if you see there's 10k pages in cache, you can likely start 9k page

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Jim Nasby
On Mar 6, 2007, at 12:17 AM, Tom Lane wrote: Jim Nasby <[EMAIL PROTECTED]> writes: An idea I've been thinking about would be to have the bgwriter or some other background process actually try and keep the free list populated, The bgwriter already tries to keep pages "just in front" of the cloc

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Jeff Davis
On Tue, 2007-03-06 at 18:47 +, Heikki Linnakangas wrote: > Tom Lane wrote: > > Jeff Davis <[EMAIL PROTECTED]> writes: > >> If I were to implement this idea, I think Heikki's bitmap of pages > >> already read is the way to go. > > > > I think that's a good way to guarantee that you'll not finis

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Simon Riggs
On Mon, 2007-03-05 at 21:34 -0800, Sherry Moore wrote: > - Based on a lot of the benchmarks and workloads I traced, the > target buffer of read operations are typically accessed again > shortly after the read, while writes are usually not. Therefore, > the default operation

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Heikki Linnakangas
Tom Lane wrote: Jeff Davis <[EMAIL PROTECTED]> writes: If I were to implement this idea, I think Heikki's bitmap of pages already read is the way to go. I think that's a good way to guarantee that you'll not finish in time for 8.3. Heikki's idea is just at the handwaving stage at this point,

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Heikki Linnakangas
Jeff Davis wrote: On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote: On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote: Another approach I proposed back in December is to not have a variable like that at all, but scan the buffer cache for pages belonging to the table you're scanning to i

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Jeff Davis
On Tue, 2007-03-06 at 12:59 -0500, Tom Lane wrote: > Jeff Davis <[EMAIL PROTECTED]> writes: > > If I were to implement this idea, I think Heikki's bitmap of pages > > already read is the way to go. > > I think that's a good way to guarantee that you'll not finish in time > for 8.3. Heikki's idea

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Tom Lane
Jeff Davis <[EMAIL PROTECTED]> writes: > If I were to implement this idea, I think Heikki's bitmap of pages > already read is the way to go. I think that's a good way to guarantee that you'll not finish in time for 8.3. Heikki's idea is just at the handwaving stage at this point, and I'm not even

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Jeff Davis
On Mon, 2007-03-05 at 21:02 -0700, Jim Nasby wrote: > On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote: > > Another approach I proposed back in December is to not have a > > variable like that at all, but scan the buffer cache for pages > > belonging to the table you're scanning to initiali

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Sherry Moore
Hi Tom, Sorry about the delay. I have been away from computers all day. In the current Solaris release in development (Code name Nevada, available for download at http://opensolaris.org), I have implemented non-temporal access (NTA) which bypasses L2 for most writes, and reads larger than copyou

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-06 Thread Simon Riggs
On Tue, 2007-03-06 at 00:54 +0100, Florian G. Pflug wrote: > Simon Riggs wrote: > But it would break the idea of letting a second seqscan follow in the > first's hot cache trail, no? No, but it would make it somewhat harder to achieve without direct synchronization between scans. It could still w

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
Jim Nasby <[EMAIL PROTECTED]> writes: > An idea I've been thinking about would be to have the bgwriter or > some other background process actually try and keep the free list > populated, The bgwriter already tries to keep pages "just in front" of the clock sweep pointer clean.

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Jim Nasby
On Mar 5, 2007, at 2:03 PM, Heikki Linnakangas wrote: Another approach I proposed back in December is to not have a variable like that at all, but scan the buffer cache for pages belonging to the table you're scanning to initialize the scan. Scanning all the BufferDescs is a fairly CPU and l

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Jim Nasby
On Mar 5, 2007, at 11:46 AM, Josh Berkus wrote: Tom, I seem to recall that we've previously discussed the idea of letting the clock sweep decrement the usage_count before testing for 0, so that a buffer could be reused on the first sweep after it was initially used, but that we rejected it

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > Here's the x86 assembler routine for Solaris: > http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/intel/ia32 > /ml/copy.s > The actual uiomove routine is a simple wrapper that calls the assembler > kcopy or xcopyout routines. There are

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
Tom, On 3/5/07 7:58 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > I looked a bit at the Linux code that's being used here, but it's all > x86_64 assembler which is something I've never studied :-(. Here's the C wrapper routine in Solaris: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > Good info - it's the same in Solaris, the routine is uiomove (Sherry > wrote it). Cool. Maybe Sherry can comment on the question whether it's possible for a large-scale-memcpy to not take a hit on filling a cache line that wasn't previously in cache?

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > What happens if VACUUM comes across buffers that *are* already in the buffer > cache. Does it throw those on the freelist too? Not unless they have usage_count 0, in which case they'd be subject to recycling by the next clock sweep anyway.

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
Pavan Deolasee; Gavin Sherry; Luke Lonergan; PGSQL Hackers; Doug Rady; Sherry Moore Subject:Re: [HACKERS] Bug: Buffer cache is not scan resistant Mark Kirkwood <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> But what I wanted to see was the curve of >> elap

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes: > I don't see any good reason why overwriting a whole cache line oughtn't be > the same speed either way. I can think of a couple theories, but I don't know if they're reasonable. The one the comes to mind is the inter-processor cache coherency protocol. Wh

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
Mark Kirkwood <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> But what I wanted to see was the curve of >> elapsed time vs shared_buffers? > ... > Looks *very* similar. Yup, thanks for checking. I've been poking into this myself. I find that I can reproduce the behavior to some extent even with

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Mark Kirkwood
Tom Lane wrote: But what I wanted to see was the curve of elapsed time vs shared_buffers? Of course! (lets just write that off to me being pre coffee...). With the patch applied: Shared Buffers Elapsed vmstat IO rate -- --- -- 400MB 101 s122 MB/

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Florian G. Pflug
Simon Riggs wrote: On Mon, 2007-03-05 at 14:41 -0500, Tom Lane wrote: "Simon Riggs" <[EMAIL PROTECTED]> writes: Itakgaki-san and I were discussing in January the idea of cache-looping, whereby a process begins to reuse its own buffers in a ring of ~32 buffers. When we cycle back round, if usage

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Jeff Davis
On Mon, 2007-03-05 at 21:03 +, Heikki Linnakangas wrote: > Another approach I proposed back in December is to not have a variable > like that at all, but scan the buffer cache for pages belonging to the > table you're scanning to initialize the scan. Scanning all the > BufferDescs is a fairl

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
Mark Kirkwood <[EMAIL PROTECTED]> writes: > Elapsed time is exactly the same (101 s). Is is expected that HEAD would > behave differently? Offhand I don't think so. But what I wanted to see was the curve of elapsed time vs shared_buffers? regards, tom lane -

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Mark Kirkwood
Tom Lane wrote: Hm, not really a smoking gun there. But just for grins, would you try this patch and see if the numbers change? Applied to 8.2.3 (don't have lineitem loaded in HEAD yet) - no change that I can see: procs ---memory-- ---swap-- -io --system-- cp

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
Mark Kirkwood <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Mark, can you detect "hiccups" in the read rate using >> your setup? > I think so, here's the vmstat output for 400MB of shared_buffers during > the scan: Hm, not really a smoking gun there. But just for grins, would you try this pat

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Mark Kirkwood
Tom Lane wrote: So the problem is not so much the clock sweep overhead as that it's paid in a very nonuniform fashion: with N buffers you pay O(N) once every N reads and O(1) the rest of the time. This is no doubt slowing things down enough to delay that one read, instead of leaving it nicely I

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Heikki Linnakangas
Jeff Davis wrote: On Mon, 2007-03-05 at 15:30 -0500, Tom Lane wrote: Jeff Davis <[EMAIL PROTECTED]> writes: Absolutely. I've got a parameter in my patch "sync_scan_offset" that starts a seq scan N pages before the position of the last seq scan running on that table (or a current seq scan if the

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
Jeff Davis <[EMAIL PROTECTED]> writes: > On Mon, 2007-03-05 at 15:30 -0500, Tom Lane wrote: >> Strikes me that expressing that parameter as a percentage of >> shared_buffers might make it less in need of manual tuning ... > The original patch was a percentage of effective_cache_size, because in >

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Jeff Davis
On Mon, 2007-03-05 at 15:30 -0500, Tom Lane wrote: > Jeff Davis <[EMAIL PROTECTED]> writes: > > Absolutely. I've got a parameter in my patch "sync_scan_offset" that > > starts a seq scan N pages before the position of the last seq scan > > running on that table (or a current seq scan if there's sti

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
Jeff Davis <[EMAIL PROTECTED]> writes: > Absolutely. I've got a parameter in my patch "sync_scan_offset" that > starts a seq scan N pages before the position of the last seq scan > running on that table (or a current seq scan if there's still a scan > going). Strikes me that expressing that param

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Jeff Davis
On Mon, 2007-03-05 at 09:09 +, Heikki Linnakangas wrote: > In fact, the pages that are left in the cache after the seqscan finishes > would be useful for the next seqscan of the same table if we were smart > enough to read those pages first. That'd make a big difference for > seqscanning a t

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
"Simon Riggs" <[EMAIL PROTECTED]> writes: > Best way is to prove it though. Seems like not too much work to have a > private ring data structure when the hint is enabled. The extra > bookeeping is easily going to be outweighed by the reduction in mem->L2 > cache fetches. I'll do it tomorrow, if no

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Jeff Davis
On Mon, 2007-03-05 at 11:10 +0200, Hannu Krosing wrote: > > My proposal for a fix: ensure that when relations larger (much larger?) > > than buffer cache are scanned, they are mapped to a single page in the > > shared buffer cache. > > How will this approach play together with synchronized scan pa

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Jeff Davis
On Mon, 2007-03-05 at 03:51 -0500, Luke Lonergan wrote: > The Postgres shared buffer cache algorithm appears to have a bug. When > there is a sequential scan the blocks are filling the entire shared > buffer cache. This should be "fixed". > > My proposal for a fix: ensure that when relations lar

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Simon Riggs
On Mon, 2007-03-05 at 14:41 -0500, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > Itakgaki-san and I were discussing in January the idea of cache-looping, > > whereby a process begins to reuse its own buffers in a ring of ~32 > > buffers. When we cycle back round, if usage_count==1

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
; PGSQL Hackers; Doug Rady; Sherry Moore Cc: pgsql-hackers@postgresql.org Subject:Re: [HACKERS] Bug: Buffer cache is not scan resistant On Mon, 2007-03-05 at 10:46 -0800, Josh Berkus wrote: > Tom, > > > I seem to recall that we've previously discussed the idea of lett

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
"Simon Riggs" <[EMAIL PROTECTED]> writes: > Itakgaki-san and I were discussing in January the idea of cache-looping, > whereby a process begins to reuse its own buffers in a ring of ~32 > buffers. When we cycle back round, if usage_count==1 then we assume that > we can reuse that buffer. This avoid

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Simon Riggs
On Mon, 2007-03-05 at 10:46 -0800, Josh Berkus wrote: > Tom, > > > I seem to recall that we've previously discussed the idea of letting the > > clock sweep decrement the usage_count before testing for 0, so that a > > buffer could be reused on the first sweep after it was initially used, > > but t

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes: > I seem to recall that we've previously discussed the idea of letting the > clock sweep decrement the usage_count before testing for 0, so that a > buffer could be reused on the first sweep after it was initially used, > but that we rejected it as being a b

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
"Pavan Deolasee" <[EMAIL PROTECTED]> writes: > I am wondering whether seqscan would set the usage_count to 1 or to a higher > value. usage_count is incremented while unpinning the buffer. Even if > we use > page-at-a-time mode, won't the buffer itself would get pinned/unpinned > every time seqsca

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Pavan Deolasee
Tom Lane wrote: Nope, Pavan's nailed it: the problem is that after using a buffer, the seqscan leaves it with usage_count = 1, which means it has to be passed over once by the clock sweep before it can be re-used. I was misled in the 32-buffer case because catalog accesses during startup had le

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Josh Berkus
Tom, > I seem to recall that we've previously discussed the idea of letting the > clock sweep decrement the usage_count before testing for 0, so that a > buffer could be reused on the first sweep after it was initially used, > but that we rejected it as being a bad idea.  But at least with large >

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
Here's four more points on the curve - I'd use a "dirac delta function" for your curve fit ;-) Shared_buffers Select CountVacuum (KB)(s) (s) === 248 5.522.46 368 4.772.40 552

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
I wrote: > "Pavan Deolasee" <[EMAIL PROTECTED]> writes: >> Isn't the size of the shared buffer pool itself acting as a performance >> penalty in this case ? May be StrategyGetBuffer() needs to make multiple >> passes over the buffers before the usage_count of any buffer is reduced >> to zero and th

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
Tom, On 3/5/07 8:53 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Hm, that seems to blow the "it's an L2 cache effect" theory out of the > water. If it were a cache effect then there should be a performance > cliff at the point where the cache size is exceeded. I see no such > cliff, in fact the

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Josh Berkus
Tom, > Yes, autovacuum is off, and bgwriter shouldn't have anything useful to > do either, so I'm a bit at a loss what's going on --- but in any case, > it doesn't look like we are cycling through the entire buffer space > for each fetch. I'd be happy to DTrace it, but I'm a little lost as to whe

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
"Pavan Deolasee" <[EMAIL PROTECTED]> writes: > Isn't the size of the shared buffer pool itself acting as a performance > penalty in this case ? May be StrategyGetBuffer() needs to make multiple > passes over the buffers before the usage_count of any buffer is reduced > to zero and the buffer is cho

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
Hi Tom, On 3/5/07 8:53 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Hm, that seems to blow the "it's an L2 cache effect" theory out of the > water. If it were a cache effect then there should be a performance > cliff at the point where the cache size is exceeded. I see no such > cliff, in fact t

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Pavan Deolasee
Tom Lane wrote: Mark Kirkwood <[EMAIL PROTECTED]> writes: Shared Buffers Elapsed IO rate (from vmstat) -- --- - 400MB 101 s122 MB/s 2MB 100 s 1MB 97 s 768KB93 s 512KB86 s 256KB77

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
Mark Kirkwood <[EMAIL PROTECTED]> writes: > Shared Buffers Elapsed IO rate (from vmstat) > -- --- - > 400MB 101 s122 MB/s > 2MB 100 s > 1MB 97 s > 768KB93 s > 512KB86 s > 256KB77 s > 1

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Gregory Stark
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > The evidence seems to clearly indicate reduced memory writing due to an > L2 related effect. You might try using valgrind's cachegrind tool which I understand can actually emulate various processors' cache to show how efficiently code uses it. I hav

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
Hi Mark, > lineitem has 1535724 pages (11997 MB) > > Shared Buffers Elapsed IO rate (from vmstat) > -- --- - > 400MB 101 s122 MB/s > > 2MB 100 s > 1MB 97 s > 768KB93 s > 512KB86 s > 256KB

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Mark Kirkwood
Gavin Sherry wrote: On Mon, 5 Mar 2007, Mark Kirkwood wrote: To add a little to this - forgetting the scan resistant point for the moment... cranking down shared_buffers to be smaller than the L2 cache seems to help *any* sequential scan immensely, even on quite modest HW: (snipped) When I'v

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
> > The Postgres shared buffer cache algorithm appears to have a bug. > > When there is a sequential scan the blocks are filling the entire > > shared buffer cache. This should be "fixed". > > No, this is not a bug; it is operating as designed. The > point of the current bufmgr algorithm

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Hannu Krosing
Ühel kenal päeval, E, 2007-03-05 kell 04:15, kirjutas Tom Lane: > "Luke Lonergan" <[EMAIL PROTECTED]> writes: > > I think you're missing my/our point: > > > The Postgres shared buffer cache algorithm appears to have a bug. When > > there is a sequential scan the blocks are filling the entire shar

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Florian Weimer
* Tom Lane: > That makes absolutely zero sense. The data coming from the disk was > certainly not in processor cache to start with, and I hope you're not > suggesting that it matters whether the *target* page of a memcpy was > already in processor cache. If the latter, it is not our bug to fix.

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > I think you're missing my/our point: > The Postgres shared buffer cache algorithm appears to have a bug. When > there is a sequential scan the blocks are filling the entire shared > buffer cache. This should be "fixed". No, this is not a bug; it is

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Hannu Krosing
Ühel kenal päeval, E, 2007-03-05 kell 03:51, kirjutas Luke Lonergan: > Hi Tom, > > > Even granting that your conclusions are accurate, we are not > > in the business of optimizing Postgres for a single CPU architecture. > > I think you're missing my/our point: > > The Postgres shared buffer ca

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Heikki Linnakangas
Luke Lonergan wrote: The Postgres shared buffer cache algorithm appears to have a bug. When there is a sequential scan the blocks are filling the entire shared buffer cache. This should be "fixed". My proposal for a fix: ensure that when relations larger (much larger?) than buffer cache are sc

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
Hi Tom, > Even granting that your conclusions are accurate, we are not > in the business of optimizing Postgres for a single CPU architecture. I think you're missing my/our point: The Postgres shared buffer cache algorithm appears to have a bug. When there is a sequential scan the blocks are

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: >> So either way, it isn't in processor cache after the read. >> So how can there be any performance benefit? > It's the copy from kernel IO cache to the buffer cache that is L2 > sensitive. When the shared buffer cache is polluted, it thrashes the L2

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Grzegorz Jaskiewicz
On Mar 5, 2007, at 2:36 AM, Tom Lane wrote: n into account. I'm also less than convinced that it'd be helpful for a big seqscan: won't reading a new disk page into memory via DMA cause that memory to get flushed from the processor cache anyway? Nope. DMA is writing directly into main memory.

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
Hi Tom, > Now this may only prove that the disk subsystem on this > machine is too cheap to let the system show any CPU-related > issues. Try it with a warm IO cache. As I posted before, we see double the performance of a VACUUM from a table in IO cache when the shared buffer cache isn't bein

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Luke Lonergan
> So either way, it isn't in processor cache after the read. > So how can there be any performance benefit? It's the copy from kernel IO cache to the buffer cache that is L2 sensitive. When the shared buffer cache is polluted, it thrashes the L2 cache. When the number of pages being written to

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-05 Thread Tom Lane
Grzegorz Jaskiewicz <[EMAIL PROTECTED]> writes: > On Mar 5, 2007, at 2:36 AM, Tom Lane wrote: >> I'm also less than convinced that it'd be helpful for a big seqscan: >> won't reading a new disk page into memory via DMA cause that memory to >> get flushed from the processor cache anyway? > Nope. DM

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Tom Lane
Gavin Sherry <[EMAIL PROTECTED]> writes: > Could you demonstrate that point by showing us timings for shared_buffers > sizes from 512K up to, say, 2 MB? The two numbers you give there might > just have to do with managing a large buffer. Using PG CVS HEAD on 64-bit Intel Xeon (1MB L2 cache), Fedor

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Luke Lonergan
Gavin, Mark, > Could you demonstrate that point by showing us timings for > shared_buffers sizes from 512K up to, say, 2 MB? The two > numbers you give there might just have to do with managing a > large buffer. I suggest two experiments that we've already done: 1) increase shared buffers to d

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Gavin Sherry
On Mon, 5 Mar 2007, Mark Kirkwood wrote: > To add a little to this - forgetting the scan resistant point for the > moment... cranking down shared_buffers to be smaller than the L2 cache > seems to help *any* sequential scan immensely, even on quite modest HW: > > e.g: PIII 1.26Ghz 512Kb L2 cache,

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Mark Kirkwood
Tom Lane wrote: "Luke Lonergan" <[EMAIL PROTECTED]> writes: The issue is summarized like this: the buffer cache in PGSQL is not "scan resistant" as advertised. Sure it is. As near as I can tell, your real complaint is that the bufmgr doesn't attempt to limit its usage footprint to fit in L2 c

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Luke Lonergan
Tom Lane [mailto:[EMAIL PROTECTED] Sent: Sunday, March 04, 2007 08:36 PM Eastern Standard Time To: Luke Lonergan Cc: PGSQL Hackers; Doug Rady; Sherry Moore Subject: Re: [HACKERS] Bug: Buffer cache is not scan resistant "Luke Lonergan" <[EMAIL PROTECTED]> writes:

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Luke Lonergan
g is shrt cuz m on ma treo -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Sunday, March 04, 2007 08:36 PM Eastern Standard Time To: Luke Lonergan Cc: PGSQL Hackers; Doug Rady; Sherry Moore Subject: Re: [HACKERS] Bug: Buffer cache is not scan resistant

Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > The issue is summarized like this: the buffer cache in PGSQL is not "scan > resistant" as advertised. Sure it is. As near as I can tell, your real complaint is that the bufmgr doesn't attempt to limit its usage footprint to fit in L2 cache; which is h

[HACKERS] Bug: Buffer cache is not scan resistant

2007-03-04 Thread Luke Lonergan
I'm putting this out there before we publish a fix so that we can discuss how best to fix it. Doug and Sherry recently found the source of an important performance issue with the Postgres shared buffer cache. The issue is summarized like this: the buffer cache in PGSQL is not "scan resistant" as