[HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-04 Thread Mitsuru IWASAKI
Hi,

I am working on new feature `Buffer Cache Hibernation' which enables
postgres to keep higher cache hit ratio even just started.

Postgres usually starts with ZERO buffer cache.  By saving the buffer
cache data structure into hibernation files just before shutdown, and
loading them at startup, postgres can start operations with the saved
buffer cache as the same condition as just before the last shutdown.

Here is the patch for 9.0.3 (also tested on 8.4.7)
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-9.0.3.patch

The patch includes the following.
- At shutdown, buffer cache data structure (such as BufferDescriptors,
  BufferBlocks and StrategyControl) is saved into hibernation files.
- At startup, buffer cache data structure is loaded from hibernation
  files and buffer lookup hashtable is setup based on buffer descriptors.
- Above functions are enabled by specifying `enable_buffer_cache_hibernation=on'
  in postgresql.conf.

Any comments are welcome and I would very much appreciate merging the
patch in source tree.

Have fun and thanks!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-05 Thread Mitsuru IWASAKI
Hi, thanks for good suggestions.

> > Postgres usually starts with ZERO buffer cache.  By saving the buffer
> > cache data structure into hibernation files just before shutdown, and
> > loading them at startup, postgres can start operations with the saved
> > buffer cache as the same condition as just before the last shutdown.
> 
> This seems like a lot of complication for rather dubious gain.  What
> happens when the DBA changes the shared_buffers setting, for instance?

It was my first concern actually.  Current implementation is stopping
reading hibernation file when detecting the size mismatch among
shared_buffers and hibernation file.  I think it is a safety way.
As Alvaro Herrera mentioned, it would be possible to adjust copying
buffer bloks, but changing shared_buffers setting is not so often I
think.

> How do you protect against the cached buffers getting out-of-sync with
> the actual disk files (especially during recovery scenarios)?  What

Saving DB buffer cahce is called at shutdown after finishing
bgwriter's final checkpoint process, so dirty-buffers should not exist
I believe.
For recovery scenarios, I need to research it though...
Could you describe what is need to be consider?

> about crash-induced corruption in the cache file itself (consider the
> not-unlikely possibility that init will kill the database before it's
> had time to dump all the buffers during a system shutdown)?  Do you have

I think this is important point.  I'll implement validation function for
hibernation file.

> any proof that writing out a few GB of buffers and then reading them
> back in is actually much cheaper than letting the database re-read the
> data from the disk files?

I think this means sequential-read vs scattered-read.
The largest hibernation file is for buffer blocks, and sequential-read
from it would be much faster than scattered-read from database file
via smgrread() block by block.
As Greg Stark suggested, re-reading from database file based on buffer
descriptors was one of implementation candidates (it can reduce
storage consumption for hibernation), but I chose creating buffer
blocks raw image file and reading it for the performance.


Thanks

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-05 Thread Mitsuru IWASAKI
Hi,

> I think that PgFincore (http://pgfoundry.org/projects/pgfincore/)
> provides similar functionality.  Are you familiar with that?  If so,
> could you contrast your approach with that one?

I'm not familiar with PgFincore at all sorry, but I got source code
and documents and read through them just now.
# and I'm a novice on postgres actually...
The target both is to reduce physical I/O, but their approaches and
gains are different.
My understanding is like this;

+-+ +-+
| Postgres(backend)   | | Postgres|
| +-+ | | |
| | DB Buffer Cache | | | |
| | (shared buffers)| | | |
| |*my target   | | | |
| +-+ | | |
|   ^  ^  | | |
|   |  |  | | |
|   v  v  | | |
| +-+ | | +-+ | 
| |  buffer manager | | | |pgfincore| |
| +-+ | | +-+ |
+---^--^--+ +--^--+
|  |smgrread() |posix_fadvise()
|read()|   | userland
==
|  |   | kernel
|  +-+-+
||
|v
|   ++
|   | File System|
|   |   +-+  |
+-->|   | FS Buffer Cache |  |
|   |*PgFincore target|  |
|   +-+  |
|^   ^   |
+|---|---+
 |   |
==
 |   |   hardware
   +-|---|+
   | |   v  Physical Disk |
   | |   +--+ |
   | |   | base/16384/24598 | |
   | v   +--+ |
   | +--+ |
   | |Buffer Cache Hibernation Files| |
   | +--+ |
   +--+

In summary, PgFincore's target is File System Buffer Cache, Buffer
Cache Hibernation's target is DB Buffer Cache(shared buffers).

PgFincore is trying to preload database file by posix_fadvise() into
File System Buffer Cache, not into DB Buffer Cache(shared buffers).
On query execution, buffer manager will get DB buffer blocks by
smgrread() from file system unless necessary blocks exist in DB Buffer
Cache.  At this point, physical reads may not happen because part of
(or entire) database file is already loaded into FS Buffer Cache.

The gain depends on the file system, especially size of File System
Buffer Cache.
Preloading database file is equivalent to following command in short.
$ cat base/16384/24598 > /dev/null

I think PgFincore is good for data warehouse in applications.


Buffer Cache Hibernation, my approach, is more simple and straight forward.
It try to save/load the contents of DB Buffer Cache(shared buffers) using
regular files(called Buffer Cache Hibernation Files).
At startup, buffer manager will load DB buffer blocks into DB Buffer
Cache from Buffer Cache Hibernation Files which was saved at the last
shutdown.  Note that database file will not be read, so it is not
cached in File System Buffer Cache at all.  Only contents of DB Buffer
Cache are filled.  Therefore, the DB buffer cache miss penalty would
be larger than PgFincore's.

The gain depends on the size of shared buffers, and how often the
similar queries are executed before and after restarting.

Buffer Cache Hibernation is good for OLTP in applications.


I think that PgFincore and Buffer Cache Hibernation is not exclusive,
they can co-work together in different caching levels.



Sorry for my poor english skill, but I'm doing my best :)

Thanks

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-06 Thread Mitsuru IWASAKI
Hi,

I revised the patch against HEAD, it's available at:
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110506.patch

Implemented hibernation file validations:
- comparison with pg_control
At shutdown:
pg_control state should be DB_SHUTDOWNED.
At startup:
pg_control state should be DB_SHUTDOWNED.
hibernation files should be newer than pg_control.

- CRC check
At shutdown:
compute CRC values for hibernation files and store them into a file.
At startup:
CRC values for hibernation files should be the same with read from the
file created at shutdown.

- file size
At startup:
The size of hibernation file should be the same with calculated file
size based on shared_buffers.

- buffer descriptors validation
At startup:
The descriptor flags should not include BM_DIRTY, BM_IO_IN_PROGRESS,
BM_IO_ERROR, BM_JUST_DIRTIED and BM_PIN_COUNT_WAITER.
Sanity checks for usage_count and usage_count should be done.
(wait_backend_pid is zero-cleared because the process was terminated already)

- system call error checking
At shutdown and startup:
Evaluation for return value system call (eg. open(), read(), write()
and etc) should be done.

> > How do you protect against the cached buffers getting out-of-sync with
> > the actual disk files (especially during recovery scenarios)?  What
> 
> Saving DB buffer cahce is called at shutdown after finishing
> bgwriter's final checkpoint process, so dirty-buffers should not exist
> I believe.
> For recovery scenarios, I need to research it though...
> Could you describe what is need to be consider?

I think hibernation should be allowed only when the system is shutdown
normaly by checking pg_control state.
And once the abnormal shutdown was detected, the hibernation files
should be ignored.
The latest patch includes this.
# modifications for xlog.c:ReadControlFile() was required though...

> > about crash-induced corruption in the cache file itself (consider the
> > not-unlikely possibility that init will kill the database before it's
> > had time to dump all the buffers during a system shutdown)?  Do you have
> 
> I think this is important point.  I'll implement validation function for
> hibernation file.

Added validations seem enough for me.
# because my understanding on postgres is not enough ;)
If any other considerations are required, please point them out.

Thanks

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-06 Thread Mitsuru IWASAKI
Hi, thanks for your comments!
I'm glad to discuss about this topic.

>  * pgfadv_WILLNEED
>  * pgfadv_WILLNEED_snapshot
> 
> The former ask to load each segment of a relation *but* the kernel can
> decide to not do that or load only part of each segment. (so it is not
> as brutal as cat file > /dev/null )
> The later read *exactly* each blocks required in each segment, not all
> blocks except if all were in cache while doing the snapshot. (this one
> is the part of the snapshot/restore combo)

Sorry about that, I'm not so familiar with posix_fadvise().
I'll check posix_fadvise() later.
Actually I used to execute 'cat database_file > /dev/null' script on
other DBSM before starting.
# or 'select /*+ INDEX(emp emp_pk) */ count(*) from emp;' to load
# index blocks

> I may prefer the per relation approach (so you can snapshot and
> restore only the interesting tables/index). Given what I read in your
> patch it looks easy to do, isn't it ?

I would like to keep my patch as simple as possible, because
it is just a hibernation function, not complicated buffer management.
But I want to try improving buffer management on next vacation.
# currently I'm in 11-days vacation until Sunday.

My rough idea on improving buffer management like this;
SQL> alter table table_name buffer pin priority 7;
SQL> alter index index_name buffer pin priority 10;

This DDL set 'buffer pin priority' property to table/index and
also buffer descriptors related with table/index.
Optionally preloading database files in FS cache and relation blocks
in DB cache would be possible.

When new buffer is required, buffer manager refer to the priority in
each buffers and select a victim buffer.

I think it helps batch job runs in better buffer cache condition
by giving hints for buffer management.
For example, job-A reads table_A, index_A and writes only table_B;
SQL> alter table table_A buffer pin priority 7;
SQL> alter index index_A buffer pin priority 10;
SQL> alter table table_B buffer pin priority 1;
keeps buffers of index_A, table_A (table_B will be victims soon).

Buffer pin priority can be reset like this;
SQL> alter system buffer pin priority 5;

Next job-B reads and writes table_C, reads index_C with preloading;
SQL> alter table table_C buffer pin priority 5;
SQL> alter index index_C buffer pin priority 10 with preloading 50%;
something like this.

> I also prefer the idea to keep a map of the Buffer Cache (yes, like
> what I do with pgfincore) than storing the data directly and reading
> it directly. This later part semmes a bit dangerous to me, even if it
> looks sane from a normal postgresql stop/start process.

Never mind :)
I added enough validations and will add more.

> better than me, and anyway your patch remain very easy to read in all case.

Thanks a lot!  My policy on experimental implementation is easy-to-read
so that people understand my idea quickly.
That's why my first patch doesn't have enough error checkings ;)

Thanks



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-07 Thread Mitsuru IWASAKI
Hi, folks!

> I'll do more testing tomorrow, and hopefully finalize my patch.

Done!  the patch is available at:
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110508.patch
 

I hope this would be committable and the final version.
Major changes from the experimental implementation are the following.

- add many validations against hibernation file corruption and etc.
- restore buffer blocks based on buffer descriptors, not from the saved file.
- support restoring cache state even if shared_buffers had changed.

My vacation ends today and I have to go back my work from tomorrow,
but I would try to find spare time for this.

Thanks a lot for happy hacking days with you!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-08 Thread Mitsuru IWASAKI
Hi,
Sorry, I missed these messages because I didn't subscribe to this list.
# I've just subscribed temporary

> > I think that all the complexity with CRCs etc. is unlikely to lead anywhere
> > too, and those two issues are not completely unrelated.  The simplest,
> > safest thing here is the right way to approach this, not the most
> > complicated one, and a simpler format might add some flexibility here to
> > reload more cache state too.  The bottleneck on reloading the cache state is
> > reading everything from disk.  Trying to micro-optimize any other part of
> > that is moving in the wrong direction to me.  I doubt you'll ever measure a
> > useful benefit that overcomes the expense of maintaining the code.  And you
> > seem to be moving to where someone can't restore cache state when they
> > change shared_buffers.  A simpler implementation might still work in that
> > situation; reload until you run out of buffers if shared_buffers shrinks,
> > reload until you're done with the original size.
> 
> Yeah, I'm pretty well convinced this whole approach is a dead end.
> Priming the OS buffer cache seems way more useful.  I also think
> saving the blocks to be read rather than the actual blocks makes a lot
> more sense.

OK, there are two your suggestions here IIUC.
# if not, please correct me.
1. restore buffer blocks based on buffer descriptors, not from the saved file.
2. support restoring cache state even if shared_buffers had changed.

For 1, I've just finish my work.  The latest patch is available at:
http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch
 

On my box, shared_buffers can be set up to only 200MB.
Elapsed time for starting up is almost the same, about 3 sec (w/o
hibernation takes about 1 sec).
For shutdown, writing buffer blocks takes about 10 sec, otherwise
about 1 sec.

Well, it seems you were right :)
By restoring buffer blocks based on buffer descriptors, the OS buffer
cache will be filled too.  This can help buffer updating performance
I believe.

I think saving buffer blocks is still useful for debugging or portability,
so I would like to remain the support code in my patch.


For 2, I'm not sure how to implement this.
The problem is that freelist.c:StrategyControl is also restored at
startup, but I have no idea currently how to adjust StrategyControl
when shared_buffer had changed.
StrategyControl has important data on buffer allocation, so this should be
matched with shared_buffer, I belive.

Changing shared_buffer is not so often on production environment.
Current implementation like this;
If shared_buffer had changed, restoring is aborted only on that time
and saving is executed with new shared_buffer at shutdown, restoring
is executed at startup on next time.

I have one more day for working on this, but I may give up...

Thanks

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-14 Thread Mitsuru IWASAKI
Hi,

> We can't accept patches just based on a pointer to a web site.  Please 
> e-mail this to the mailing list so that it can be considered a 
> submission under the project's licensing terms.
> 
> > I hope this would be committable and the final version.
> >   
> 
> PostgreSQL has high standards for code submissions.  Extremely few 
> submissions are committed without significant revisions to them based on 
> code review.  So far you've gotten a first round of high-level design 
> review, there's several additional steps before something is considered 
> for a commit.  The whole process is outlined at 
> http://wiki.postgresql.org/wiki/Submitting_a_Patch

OK, I would do so for my next patch.

>  From a couple of minutes of reading the patch, the first things that 
> pop out as problems are:
> 
> -All of the ControlFile -> controlFile renaming has add a larger 
> difference to ReadControlFile than I would consider ideal.

I think so too, I will consider this again.

> -Touching StrategyControl is not something this patch should be doing.

Sorry, I could not get this.  Could you describe this?
I think StrategyControl needs to be adjusted if shared_buffers setting
was changed.

> -I don't think your justification ("debugging or portability") for 
> keeping around your original code in here is going to be sufficient to 
> do so.
> -This should not be named enable_buffer_cache_hibernation.  That very 
> large diff you ended up with in the regression tests is because all of 
> the settings named enable_* are optimizer control settings.  Using the 
> name "buffer_cache_hibernation" instead would make a better starting point.

OK, how about `buffer_cache_hibernation_level'?
The value 0 to disable(default), 1 for saving buffer descriptors only,
2 for saving buffer descriptors and buffer blocks.

>  From a bigger picture perspective, this really hasn't addressed any of 
> my comments about shared_buffers only being the beginning of the useful 
> cache state to worry about here.  I'd at least like the solution to the 
> buffer cache save/restore to have a plan for how it might address that 
> too one day.  This project is also picky about only committing code that 
> fits into the long-term picture for desired features.

My simple motivation on this is that `We don't want to restart our DB
server because the DB buffer cache will be lost and the DB server
needs to start its operations with zero cache.  Does any DBMS product
support holding the contents of DB cache as it is even by restarting,
just like the hibernation feature of PC?'.
It's very simple and many of DB admins will be happy soon with this
feature, I think.

Thanks

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-14 Thread Mitsuru IWASAKI
Hi,

> I'd suggest doing this as an extension module. All the changes to 
> existing server code seem superficial.

It sounds interesting.  I'll try it later.
Are there any good examples for extension module?

Thanks

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-06-05 Thread Mitsuru IWASAKI
Hi,

> On 05/07/2011 03:32 AM, Mitsuru IWASAKI wrote:
> > For 1, I've just finish my work.  The latest patch is available at:
> > http://people.freebsd.org/~iwasaki/postgres/buffer-cache-hibernation-postgresql-20110507.patch
> >
> 
> Reminder here--we can't accept code based on it being published to a web 
> page.  You'll need to e-mail it to the pgsql-hackers mailing list to be 
> considered for the next PostgreSQL CommitFest, which is starting in a 
> few weeks.  Code submitted to the mailing list is considered a release 
> of it to the project under the PostgreSQL license, which we can't just 
> assume for things when given only a URL to them.

Sorry about that, but I had enough time to revise my patches this week-end.
I attached the patches in this mail, and will update CommitFest page soon.

> Also, you suggested you were out of time to work on this.  If that's the 
> case, we'd like to know that so we don't keep cc'ing you about things in 
> expectation of an answer.  Someone else may pick this up as a project to 
> continue working on.  But it's going to need a fair amount of revision 
> before it matches what people want here, and I'm not sure how much of 
> what you've written is going to end up in any commit that may happen 
> from this idea.

It seems that I don't have enough time to complete this work.
You don't need to keep cc'ing me, and I'm very happy if postgres to be
the first DBMS which support buffer cache hibernation feature.

Thanks!


diff --git src/backend/access/transam/xlog.c src/backend/access/transam/xlog.c
index b0e4c41..7a3a207 100644
--- src/backend/access/transam/xlog.c
+++ src/backend/access/transam/xlog.c
@@ -4834,6 +4834,19 @@ ReadControlFile(void)
 #endif
 }
 
+bool
+GetControlFile(ControlFileData *controlFile)
+{
+   if (ControlFile == NULL)
+   {
+   return false;
+   }
+
+   memcpy(controlFile, ControlFile, sizeof(ControlFileData));
+
+   return true;
+}
+
 void
 UpdateControlFile(void)
 {
diff --git src/backend/bootstrap/bootstrap.c src/backend/bootstrap/bootstrap.c
index fc093cc..7ecf6bb 100644
--- src/backend/bootstrap/bootstrap.c
+++ src/backend/bootstrap/bootstrap.c
@@ -360,6 +360,15 @@ AuxiliaryProcessMain(int argc, char *argv[])
BaseInit();
 
/*
+* Only StartupProcess can call ResumeBufferCacheHibernation() after
+* InitFileAccess() and smgrinit().
+*/
+   if (auxType == StartupProcess && BufferCacheHibernationLevel > 0)
+   {
+   ResumeBufferCacheHibernation();
+   }
+
+   /*
 * When we are an auxiliary process, we aren't going to do the full
 * InitPostgres pushups, but there are a couple of things that need to 
get
 * lit up even in an auxiliary process.
diff --git src/backend/storage/buffer/buf_init.c 
src/backend/storage/buffer/buf_init.c
index dadb49d..52eb51a 100644
--- src/backend/storage/buffer/buf_init.c
+++ src/backend/storage/buffer/buf_init.c
@@ -127,6 +127,14 @@ InitBufferPool(void)
 
/* Init other shared buffer-management stuff */
StrategyInitialize(!foundDescs);
+
+   if (BufferCacheHibernationLevel > 0)
+   {
+   
ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_DESCRIPTORS,
+   (char *)BufferDescriptors, sizeof(BufferDesc), 
NBuffers);
+   
ResisterBufferCacheHibernation(BUFFER_CACHE_HIBERNATION_TYPE_BLOCKS,
+   (char *)BufferBlocks, BLCKSZ, NBuffers);
+   }
 }
 
 /*
diff --git src/backend/storage/buffer/bufmgr.c 
src/backend/storage/buffer/bufmgr.c
index f96685d..dba8ebf 100644
--- src/backend/storage/buffer/bufmgr.c
+++ src/backend/storage/buffer/bufmgr.c
@@ -31,6 +31,7 @@
 #include "postgres.h"
 
 #include 
+#include 
 #include 
 
 #include "catalog/catalog.h"
@@ -61,6 +62,13 @@
 #define BUF_WRITTEN0x01
 #define BUF_REUSABLE   0x02
 
+/*
+ * Buffer Cache Hibernation stuff.
+ */
+/* enable this to debug buffer cache hibernation. */
+#if 0
+#define DEBUG_BUFFER_CACHE_HIBERNATION
+#endif
 
 /* GUC variables */
 bool   zero_damaged_pages = false;
@@ -765,6 +773,16 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, 
ForkNumber forkNum,
}
}
 
+#ifdef DEBUG_BUFFER_CACHE_HIBERNATION
+   elog(DEBUG5,
+   "alloc  
[%d]\t%03x,%d,%d,%d,%d\t%08x,%d,%d,%d,%d,%d",
+   buf->buf_id, buf->flags, 
buf->usage_count, buf->refcount,
+   buf->wait_backend_pid, buf->freeNext,
+   newHash, newTag.rnode.spcNode,
+   newTag.rnod