On 29/03/16 18:25, Alvaro Herrera wrote:
+ /*------------------------------------------------------------------------- >+ * API for construction of generic xlog records >+ * >+ * This API allows user to construct generic xlog records which describe >+ * difference between pages in a generic way. This is useful for >+ * extensions which provide custom access methods because they can't >+ * register their own WAL redo routines. >+ * >+ * Each record must be constructed by following these steps: >+ * 1) GenericXLogStart(relation) - start construction of a generic xlog >+ * record for the given relation. >+ * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers >+ * for the record. This function returns a copy of the page >+ * image where modifications can be performed. The second argument >+ * indicates if the block is new (i.e. a full page image should be taken). >+ * 3) Apply modification of page images obtained in the previous step. >+ * 4) GenericXLogFinish() - finish construction of generic xlog record. >+ * >+ * The xlog record construction can be canceled at any step by calling >+ * GenericXLogAbort(). All changes made to page images copies will be >+ * discarded. >+ * >+ * Please, note the following points when constructing generic xlog records. >+ * - No direct modifications of page images are allowed! All modifications >+ * must be done in the copies returned by GenericXLogRegister(). In other >+ * words the code which makes generic xlog records must never call >+ * BufferGetPage(). >+ * - Registrations of buffers (step 2) and modifications of page images >+ * (step 3) can be mixed in any sequence. The only restriction is that >+ * you can only modify page image after registration of corresponding >+ * buffer. >+ * - After registration, the buffer also can be unregistered by calling >+ * GenericXLogUnregister(buffer). In this case the changes made in >+ * that particular page image copy will be discarded. >+ * - Generic xlog assumes that pages are using standard layout, i.e., all >+ * data between pd_lower and pd_upper will be discarded. >+ * - Maximum number of buffers simultaneously registered for a generic xlog >+ * record is MAX_GENERIC_XLOG_PAGES. An error will be thrown if this limit >+ * is exceeded. >+ * - Since you modify copies of page images, GenericXLogStart() doesn't >+ * start a critical section. Thus, you can do memory allocation, error >+ * throwing etc between GenericXLogStart() and GenericXLogFinish(). >+ * The actual critical section is present inside GenericXLogFinish(). >+ * - GenericXLogFinish() takes care of marking buffers dirty and setting their >+ * LSNs. You don't need to do this explicitly. >+ * - For unlogged relations, everything works the same except there is no >+ * WAL record produced. Thus, you typically don't need to do any explicit >+ * checks for unlogged relations. >+ * - If registered buffer isn't new, generic xlog record contains delta >+ * between old and new page images. This delta is produced by per byte >+ * comparison. This current delta mechanism is not effective for data shifts >+ * inside the page and may be improved in the future. >+ * - Generic xlog redo function will acquire exclusive locks on buffers >+ * in the same order they were registered. After redo of all changes, >+ * the locks will be released in the same order. >+ * >+ * >+ * Internally, delta between pages consists of set of fragments. Each >+ * fragment represents changes made in given region of page. A fragment is >+ * described as follows: >+ * >+ * - offset of page region (OffsetNumber) >+ * - length of page region (OffsetNumber) >+ * - data - the data to place into described region ('length' number of bytes) >+ * >+ * Unchanged regions of page are not represented in the delta. As a result, >+ * the delta can be more compact than full page image. But if the unchanged region >+ * of the page is less than fragment header (offset and length) the delta >+ * would be bigger than the full page image. For this reason we break into fragments >+ * only if the unchanged region is bigger than MATCH_THRESHOLD. >+ * >+ * The worst case for delta size is when we didn't find any unchanged region >+ * in the page. Then size of delta would be size of page plus size of fragment >+ * header. >+ */ >+ #define FRAGMENT_HEADER_SIZE (2 * sizeof(OffsetNumber)) >+ #define MATCH_THRESHOLD FRAGMENT_HEADER_SIZE >+ #define MAX_DELTA_SIZE BLCKSZ + FRAGMENT_HEADER_SIZE
I incorporated your changes and did some additional refinements on top of them still.
Attached is delta against v12, that should cause less issues when merging for Teodor.
-- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c index 7ca03bf..eab40a2 100644 --- a/src/backend/access/transam/generic_xlog.c +++ b/src/backend/access/transam/generic_xlog.c @@ -19,78 +19,77 @@ #include "utils/memutils.h" /*------------------------------------------------------------------------- - * API for construction of generic xlog records + * API for construction of generic xlThis is useful forog records * - * This API allows user to construct generic xlog records which are - * describing difference between pages in general way. Thus it's useful - * for extension which provides custom access methods because they couldn't - * register their own WAL redo routines. + * This API allows user to construct generic xlog records which describe + * difference between pages in a generic way. This is useful for extensions + * which provide custom access methods because they can't register their own + * WAL redo routines. * - * Generic xlog record should be constructed in following steps. - * 1) GenericXLogStart(relation) - start construction of generic xlog - * record for given relation. + * Each record must be constructed by following these steps: + * 1) GenericXLogStart(relation) - start construction of a generic xlog + * record for the given relation. * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers - * for generic xlog record. This function return a copy of page image - * where modifications should be performed. The second argument - * indicates that block is new and full image should be taken. - * 3) Do modification of page images obtained in previous step. + * for generic xlog record. This function returns a copy of the page image + * where modifications can be performed. The second argument indicates + * if block is new (i.e. a full page image should be taken). + * 3) Apply modification of page images obtained in the previous step. * 4) GenericXLogFinish() - finish construction of generic xlog record. * - * Please, note following points while constructing generic xlog records. + * The xlog record construction can be canceled at any step by calling + * GenericXLogAbort(). All changes made to page images copies will be + * discarded. + * + * Please note following points when constructing generic xlog records. * - No direct modifications of page images are allowed! All modifications - * should be done in copies returned by GenericXLogRegister(). Literally - * code which makes generic xlog records should never call - * BufferGetPage() function. - * - On any step generic xlog record construction could be canceled by - * calling GenericXLogAbort(). All changes made in page images copies - * would be discarded. + * must be done in copies returned by GenericXLogRegister(). In other words + * code which makes generic xlog records must never call BufferGetPage(). * - Registrations of buffers (step 2) and modifications of page images - * (step 3) could be mixed in any sequence. The only restriction is that - * you can modify page image only after registration of corresponding + * (step 3) can be mixed in any sequence. The only restriction is that + * you can only modify page image after registration of corresponding * buffer. - * - After registration buffer also can be unregistered by calling - * GenericXLogUnregister(buffer). In this case changes made in particular - * page image copy will be discarded. + * - After registration, the buffer can also be unregistered by calling + * GenericXLogUnregister(buffer). In this case the changes made in + * that particular page image copy will be discarded. * - Generic xlog assumes that pages are using standard layout. I.e. all * information between pd_lower and pd_upper will be discarded. - * - Maximum number of buffers simultaneously registered for generic xlog - * is MAX_GENERIC_XLOG_PAGES. Error would be thrown if this limit + * - Maximum number of buffers simultaneously registered for a generic xlog + * is MAX_GENERIC_XLOG_PAGES. Error will be thrown if this limit is * exceeded. * - Since you modify copies of page images, GenericXLogStart() doesn't * start a critical section. Thus, you can do memory allocation, error * throwing etc between GenericXLogStart() and GenericXLogFinish(). - * Actual critical section present inside GenericXLogFinish(). - * - GenericXLogFinish() takes care about marking buffers dirty and setting + * The actual critical section is present inside GenericXLogFinish(). + * - GenericXLogFinish() takes care of marking buffers dirty and setting * their LSNs. You don't need to do this explicitly. - * - For unlogged relations, everything work the same expect there is no + * - For unlogged relations, everything works the same except there is no * WAL record produced. Thus, you typically don't need to do any explicit * checks for unlogged relations. * - If registered buffer isn't new, generic xlog record contains delta - * between old and new page images. This delta is produced by per byte - * comparison. Current delta mechanist is not effective for data shift - * inside the page. However, it could be improved in further versions. + * between old and new page images. This delta is produced using per byte + * comparison. The current delta mechanist is not effective for data shifts + * inside the page and may be improved in the future. * - Generic xlog redo function will acquire exclusive locks to buffers - * in the same order they were registered. After redo of all changes - * locks would be released in the same order. That could makes sense for - * concurrency. + * in the same order as they were registered. After redo of all changes, + * locks will be released in the same order. * - * Internally delta between pages consists of set of fragments. Each fragment - * represents changes made in given region of page. Fragment is described - * as following. + * Internally, delta between pages consists of set of fragments. Each + * fragment represents changes made in a given region of a page. A fragment + * is described as following. * * - offset of page region (OffsetNumber) * - length of page region (OffsetNumber) * - data - the data to place into described region ('length' number of bytes) * - * Unchanged regions of page are uncovered by these fragments. This is why - * delta could be more compact than full page image. But if unchanged region - * of page is less than fragment header (offset and length) then it would - * increase size of delta instead of decreasing. Thus, we break fragment only - * for unchanged regions greater than MATCH_THRESHOLD. + * Unchanged regions of page are not represented in the delta. As a result + * delta can be more compact than the full page image. But if the unchanged + * region of the page is smaller than the fragment header (offset and length) + * the delta would be bigger than the full page image. For this reason we + * break fragment only if the unchanged region is bigger than MATCH_THRESHOLD. * * The worst case for delta size is when we didn't find any unchanged region - * in the page. Then size of delta would be size of page plus size of fragment - * header. + * in the page. The size of delta will be size of page plus size of fragment + * header in that case. */ #define FRAGMENT_HEADER_SIZE (2 * sizeof(OffsetNumber)) #define MATCH_THRESHOLD FRAGMENT_HEADER_SIZE @@ -168,8 +167,8 @@ writeDelta(PageData *pageData) bool match; /* - * Check if bytes in old and new page images matches. We don't rely - * data in unallocated area between pd_lower and pd_upper. Thus we + * Check if bytes in old and new page images matches. We don't care + * about data in unallocated area between pd_lower and pd_upper. We * assume unallocated area to expand with unmatched bytes. Bytes * inside unallocated area are assumed to always match. */
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers