Re: [HACKERS] Seq scans roadmap

2007-05-17 Thread Luke Lonergan
Hi Jeff, On 5/16/07 4:56 PM, "Jeff Davis" <[EMAIL PROTECTED]> wrote: >> The main benefit of a sync scan will be the ability to start the scan where >> other scans have already filled the I/O cache with useful blocks. This will >> require some knowledge of the size of the I/O cache by the syncsca

Re: [HACKERS] Seq scans roadmap

2007-05-16 Thread Jeff Davis
On Wed, 2007-05-16 at 10:31 -0700, Luke Lonergan wrote: > I think the analysis on syncscan needs to take the external I/O cache into > account. I believe it is not necessary or desirable to keep the scans in > lock step within the PG bufcache. I partially agree. I don't think we need any huge amo

Re: [HACKERS] Seq scans roadmap

2007-05-16 Thread Luke Lonergan
I think the analysis on syncscan needs to take the external I/O cache into account. I believe it is not necessary or desirable to keep the scans in lock step within the PG bufcache. The main benefit of a sync scan will be the ability to start the scan where other scans have already filled the I/O

Re: [HACKERS] Seq scans roadmap

2007-05-16 Thread Zeugswetter Andreas ADI SD
> > > > 32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 > > > > cache effect. I'd say in a scenario where 32k pages are indicated you will also want larger than average L2 caches. > > > > > > > > How about using 256/blocksize? The reading ahead uses 1/4 ring size. To the best of

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Heikki Linnakangas
Jeff Davis wrote: On Tue, 2007-05-15 at 10:42 +0100, Heikki Linnakangas wrote: Luke Lonergan wrote: 32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache effect. How about using 256/blocksize? Sounds reasonable. We need to check the effect on the synchronized scans, though.

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Jim C. Nasby
On Tue, May 15, 2007 at 10:25:35AM -0700, Jeff Davis wrote: > On Tue, 2007-05-15 at 10:42 +0100, Heikki Linnakangas wrote: > > Luke Lonergan wrote: > > > 32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache > > > effect. > > > > > > How about using 256/blocksize? > > > > Sounds rea

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Jeff Davis
On Tue, 2007-05-15 at 10:42 +0100, Heikki Linnakangas wrote: > Luke Lonergan wrote: > > 32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache > > effect. > > > > How about using 256/blocksize? > > Sounds reasonable. We need to check the effect on the synchronized > scans, though. >

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Heikki Linnakangas
Luke Lonergan wrote: 32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache effect. How about using 256/blocksize? Sounds reasonable. We need to check the effect on the synchronized scans, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Luke Lonergan
To: PostgreSQL-development > Cc: Simon Riggs; Zeugswetter Andreas ADI SD; CK.Tan; Luke > Lonergan; Jeff Davis > Subject: Re: [HACKERS] Seq scans roadmap > > Just to keep you guys informed, I've been busy testing and > pondering over different buffer ring strategies for v

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Heikki Linnakangas
Just to keep you guys informed, I've been busy testing and pondering over different buffer ring strategies for vacuum, seqscans and copy. Here's what I'm going to do: Use a fixed size ring. Fixed as in doesn't change after the ring is initialized, however different kinds of scans use different

Re: [HACKERS] Seq scans roadmap

2007-05-14 Thread Heikki Linnakangas
Simon Riggs wrote: On Fri, 2007-05-11 at 22:59 +0100, Heikki Linnakangas wrote: For comparison, here's the test results with vanilla CVS HEAD: copy-head | 00:06:21.533137 copy-head | 00:05:54.141285 I'm slightly worried that the results for COPY aren't anywhere near as go

Re: [HACKERS] Seq scans roadmap

2007-05-13 Thread CK Tan
Sorry, I should have been clearer. I meant because we need to check for trigger firing pre/post insertion, and the trigger definitions expect tuples to be inserted one by one, therefore we cannot insert N- tuples at a time into the heap. Checking for triggers itself is not taking up much CPU

Re: [HACKERS] Seq scans roadmap

2007-05-13 Thread Tom Lane
"CK Tan" <[EMAIL PROTECTED]> writes: > COPY/INSERT are also bottlenecked on record at a time insertion into > heap, and in checking for pre-insert trigger, post-insert trigger and > constraints. > To speed things up, we really need to special case insertions without > triggers and constraint

Re: [HACKERS] Seq scans roadmap

2007-05-13 Thread CK Tan
Hi All, COPY/INSERT are also bottlenecked on record at a time insertion into heap, and in checking for pre-insert trigger, post-insert trigger and constraints. To speed things up, we really need to special case insertions without triggers and constraints, [probably allow for unique constr

Re: [HACKERS] Seq scans roadmap

2007-05-12 Thread Luke Lonergan
Hi Simon, On 5/12/07 12:35 AM, "Simon Riggs" <[EMAIL PROTECTED]> wrote: > I'm slightly worried that the results for COPY aren't anywhere near as > good as the SELECT and VACUUM results. It isn't clear from those numbers > that the benefit really is significant. COPY is bottlenecked on datum form

Re: [HACKERS] Seq scans roadmap

2007-05-12 Thread Simon Riggs
On Fri, 2007-05-11 at 22:59 +0100, Heikki Linnakangas wrote: > For comparison, here's the test results with vanilla CVS HEAD: > > copy-head | 00:06:21.533137 > copy-head | 00:05:54.141285 I'm slightly worried that the results for COPY aren't anywhere near as good as the SELEC

Re: [HACKERS] Seq scans roadmap

2007-05-11 Thread Heikki Linnakangas
I wrote: I'll review my test methodology and keep testing... I ran a set of tests on a 100 warehouse TPC-C stock table that is ~3.2 GB in size and the server has 4 GB of memory. IOW the table fits in OS cache, but not in shared_buffers (set at 1 GB). copy - COPY from a file select - SELECT

Re: [HACKERS] Seq scans roadmap

2007-05-11 Thread Zeugswetter Andreas ADI SD
> Sorry, 16x8K page ring is too small indeed. The reason we > selected 16 is because greenplum db runs on 32K page size, so > we are indeed reading 128K at a time. The #pages in the ring > should be made relative to the page size, so you achieve 128K > per read. Ah, ok. New disks here also ha

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread CK Tan
Sorry, 16x8K page ring is too small indeed. The reason we selected 16 is because greenplum db runs on 32K page size, so we are indeed reading 128K at a time. The #pages in the ring should be made relative to the page size, so you achieve 128K per read. Also agree that KillAndReadBuffer coul

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread CK Tan
The patch has no effect on scans that do updates. The KillAndReadBuffer routine does not force out a buffer if the dirty bit is set. So updated pages revert to the current performance characteristics. -cktan GreenPlum, Inc. On May 10, 2007, at 5:22 AM, Heikki Linnakangas wrote: Zeugswett

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas
Heikki Linnakangas wrote: However, it caught me by total surprise that the performance with 1 buffer is so horrible. Using 2 buffers is enough to avoid whatever the issue is with just 1 buffer. I have no idea what's causing that. There must be some interaction that I don't understand. Ok, I f

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas
Heikki Linnakangas wrote: But all these assumptions need to be validated. I'm setting up tests with different ring sizes and queries to get a clear picture of this: - VACUUM on a clean table - VACUUM on a table with 1 dead tuple per page - read-only scan, large table - read-only scan, table fits

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas
Zeugswetter Andreas ADI SD wrote: Also, that patch doesn't address the VACUUM issue at all. And using a small fixed size ring with scans that do updates can be devastating. I'm experimenting with different ring sizes for COPY at the moment. Too small ring leads to a lot of WAL flushes, it's ba

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Zeugswetter Andreas ADI SD
> Also, that patch doesn't address the VACUUM issue at all. And > using a small fixed size ring with scans that do updates can > be devastating. I'm experimenting with different ring sizes > for COPY at the moment. Too small ring leads to a lot of WAL > flushes, it's basically the same problem

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas
Zeugswetter Andreas ADI SD wrote: In reference to the seq scans roadmap, I have just submitted a patch that addresses some of the concerns. The patch does this: 1. for small relation (smaller than 60% of bufferpool), use the current logic 2. for big relation: - use a ring buffer in h

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Zeugswetter Andreas ADI SD
> In reference to the seq scans roadmap, I have just submitted > a patch that addresses some of the concerns. > > The patch does this: > > 1. for small relation (smaller than 60% of bufferpool), use > the current logic 2. for big relation: > - use a ring buffer in heap scan > - pin

Re: [HACKERS] Seq scans roadmap

2007-05-09 Thread CK Tan
Hi, In reference to the seq scans roadmap, I have just submitted a patch that addresses some of the concerns. The patch does this: 1. for small relation (smaller than 60% of bufferpool), use the current logic 2. for big relation: - use a ring buffer in heap scan - pin firs

Re: [HACKERS] Seq scans roadmap

2007-05-09 Thread Simon Riggs
On Tue, 2007-05-08 at 11:40 +0100, Heikki Linnakangas wrote: > Here's my roadmap for the "scan-resistant buffer cache" and > "synchronized scans" patches. > > 1. Fix the current vacuum behavior of throwing dirty buffers to the > freelist, forcing a lot of WAL flushes. Instead, use a backend-priv

Re: [HACKERS] Seq scans roadmap

2007-05-09 Thread Zeugswetter Andreas ADI SD
> >> Are you filling multiple buffers in the buffer cache with a single > >> read-call? > > > > yes, needs vector or ScatterGather IO. > > I would expect that to get only moderate improvement. The vast improvement comes from 256k blocksize. > To get > the full benefit I would think you would

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Jeff Davis
On Tue, 2007-05-08 at 07:47 -0400, Luke Lonergan wrote: > Heikki, > > On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc) > implement dynamic I/O caching. The experiments have shown that benefit > of re-using PG buffer cache on large sequential scans is vanishingly > small when t

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Jeff Davis
On Tue, 2007-05-08 at 11:40 +0100, Heikki Linnakangas wrote: > I'm going to do this incrementally, and we'll see how far we get for > 8.3. We might push 3A and/or 3B to 8.4. First, I'm going to finish up > Simon's patch (step 1), run some performance tests with vacuum, and > submit a patch for t

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Gregory Stark
"Zeugswetter Andreas ADI SD" <[EMAIL PROTECTED]> writes: >> Are you filling multiple buffers in the buffer cache with a >> single read-call? > > yes, needs vector or ScatterGather IO. I would expect that to get only moderate improvement. To get the full benefit I would think you would want to e

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Zeugswetter Andreas ADI SD
> >> What do you mean with using readahead inside the heapscan? > >> Starting an async read request? > > > > Nope - just reading N buffers ahead for seqscans. Subsequent calls > > use previously read pages. The objective is to issue > contiguous reads > > to the OS in sizes greater than the

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Zeugswetter Andreas ADI SD
> Nope - just reading N buffers ahead for seqscans. Subsequent > calls use previously read pages. The objective is to issue > contiguous reads to the OS in sizes greater than the PG page > size (which is much smaller than what is needed for fast > sequential I/O). Problem here is that eight

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Heikki Linnakangas
Luke Lonergan wrote: What do you mean with using readahead inside the heapscan? Starting an async read request? Nope - just reading N buffers ahead for seqscans. Subsequent calls use previously read pages. The objective is to issue contiguous reads to the OS in sizes greater than the PG page

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Luke Lonergan
Heikki, > That's interesting. Care to share the results of the > experiments you ran? I was thinking of running tests of my > own with varying table sizes. Yah - it may take a while - you might get there faster. There are some interesting effects to look at between I/O cache performance and PG

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Heikki Linnakangas
Luke Lonergan wrote: On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc) implement dynamic I/O caching. The experiments have shown that benefit of re-using PG buffer cache on large sequential scans is vanishingly small when the buffer cache size is small compared to the system m

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Luke Lonergan
To: PostgreSQL-development > Cc: Jeff Davis; Simon Riggs > Subject: [HACKERS] Seq scans roadmap > > Here's my roadmap for the "scan-resistant buffer cache" and > "synchronized scans" patches. > > 1. Fix the current vacuum behavior of throwing dirty

[HACKERS] Seq scans roadmap

2007-05-08 Thread Heikki Linnakangas
Here's my roadmap for the "scan-resistant buffer cache" and "synchronized scans" patches. 1. Fix the current vacuum behavior of throwing dirty buffers to the freelist, forcing a lot of WAL flushes. Instead, use a backend-private ring of shared buffers that are recycled. This is what Simon's "