On 04/12/2015 02:56 AM, Petr Jelinek wrote:
On 10/04/15 18:03, Andres Freund wrote:
On 2015-04-07 17:08:16 +0200, Andres Freund wrote:
I'm starting benchmarks now.

What I'm benchmarking here is the WAL overhead, since that's what we're
debating.

The test setup I used was a pgbench scale 10 instance. I've run with
full_page_write=off to have more reproducible results. This of course
over-emphasizes the overhead a bit. But for a long checkpoint interval
and a memory resident working set it's not that unrealistic.

I ran 50k transactions in a signle b
baseline:
- 20445024
- 20437128
- 20436864
- avg: 20439672
extern 2byte identifiers:
- 23318368
- 23148648
- 23128016
- avg: 23198344
- avg overhead: 13.5%
padding 2byte identifiers:
- 21160408
- 21319720
- 21164280
- avg: 21214802
- avg overhead: 3.8%
extern 4byte identifiers:
- 23514216
- 23540128
- 23523080
- avg: 23525808
- avg overhead: 15.1%

To me that shows pretty clearly that a) reusing the padding is
worthwhile b) even without that using 2byte instead of 4 byte
identifiers is beneficial.


My opinion is that 10% of WAL size difference is quite high price to pay
so that we can keep the padding for some other, yet unknown feature that
hasn't come up in several years, which would need those 2 bytes.

But if we are willing to pay it then we can really go all the way and
just use Oids...

This needs to be weighed against removing the padding bytes altogether. See attached. That would reduce the WAL size further when you don't need replication IDs. It's very straightforward, but need to do some performance/scalability testing to make sure that using memcpy instead of a straight 32-bit assignment doesn't hurt performance, since it happens in very performance critical paths.

I'm surprised there's such a big difference between the "extern" and "padding" versions above. At a quick approximation, storing the ID as a separate "fragment", along with XLogRecordDataHeaderShort and XLogRecordDataHeaderLong, should add one byte of overhead plus the ID itself. So that would be 3 extra bytes for 2-byte identifiers, or 5 bytes for 4-byte identifiers. Does that mean that the average record length is only about 30 bytes? That's what it seems like, if adding the "extern 2 byte identifiers" added about 10% of overhead compared to the "padding 2 byte identifiers" version. That doesn't sound right, 30 bytes is very little. Perhaps the size of the records created by pgbench happen to cross a 8-byte alignment boundary at that point, making a big difference. In another workload, there might be no difference at all, due to alignment.

Also, you don't need to tag every record type with the replication ID. All indexam records can skip it, for starters, since logical decoding doesn't care about them. That should remove a fair amount of bloat.

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 24cf520..09934f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -963,10 +963,10 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn)
 		 * Now that xl_prev has been filled in, calculate CRC of the record
 		 * header.
 		 */
-		rdata_crc = rechdr->xl_crc;
+		memcpy(&rdata_crc, rechdr->xl_crc, sizeof(pg_crc32));
 		COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc));
 		FIN_CRC32C(rdata_crc);
-		rechdr->xl_crc = rdata_crc;
+		memcpy(rechdr->xl_crc, &rdata_crc, sizeof(pg_crc32));
 
 		/*
 		 * All the record data, including the header, is now ready to be
@@ -4685,7 +4685,7 @@ BootStrapXLOG(void)
 	COMP_CRC32C(crc, ((char *) record) + SizeOfXLogRecord, record->xl_tot_len - SizeOfXLogRecord);
 	COMP_CRC32C(crc, (char *) record, offsetof(XLogRecord, xl_crc));
 	FIN_CRC32C(crc);
-	record->xl_crc = crc;
+	memcpy(record->xl_crc, &crc, sizeof(pg_crc32));
 
 	/* Create first XLOG segment file */
 	use_existent = false;
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 88209c3..fbe97b1 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -724,7 +724,7 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 	rechdr->xl_info = info;
 	rechdr->xl_rmid = rmid;
 	rechdr->xl_prev = InvalidXLogRecPtr;
-	rechdr->xl_crc = rdata_crc;
+	memcpy(rechdr->xl_crc, &rdata_crc, sizeof(pg_crc32));
 
 	return &hdr_rdt;
 }
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index a4124d9..80a48ba 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -666,6 +666,9 @@ static bool
 ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr)
 {
 	pg_crc32	crc;
+	pg_crc32	rec_crc;
+
+	memcpy(&rec_crc, record->xl_crc, sizeof(pg_crc32));
 
 	/* Calculate the CRC */
 	INIT_CRC32C(crc);
@@ -674,7 +677,7 @@ ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr)
 	COMP_CRC32C(crc, (char *) record, offsetof(XLogRecord, xl_crc));
 	FIN_CRC32C(crc);
 
-	if (!EQ_CRC32C(record->xl_crc, crc))
+	if (!EQ_CRC32C(rec_crc, crc))
 	{
 		report_invalid_record(state,
 			   "incorrect resource manager data checksum in record at %X/%X",
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
index 3361111..6af27f2 100644
--- a/src/bin/pg_resetxlog/pg_resetxlog.c
+++ b/src/bin/pg_resetxlog/pg_resetxlog.c
@@ -1101,7 +1101,7 @@ WriteEmptyXLOG(void)
 	COMP_CRC32C(crc, ((char *) record) + SizeOfXLogRecord, record->xl_tot_len - SizeOfXLogRecord);
 	COMP_CRC32C(crc, (char *) record, offsetof(XLogRecord, xl_crc));
 	FIN_CRC32C(crc);
-	record->xl_crc = crc;
+	memcpy(record->xl_crc, &crc, sizeof(pg_crc32));
 
 	/* Write the first page */
 	XLogFilePath(path, ControlFile.checkPointCopy.ThisTimeLineID, newXlogSegNo);
diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index 09bbcb1..e26f354 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -45,14 +45,14 @@ typedef struct XLogRecord
 	XLogRecPtr	xl_prev;		/* ptr to previous record in log */
 	uint8		xl_info;		/* flag bits, see below */
 	RmgrId		xl_rmid;		/* resource manager for this record */
-	/* 2 bytes of padding here, initialize to zero */
-	pg_crc32	xl_crc;			/* CRC for this record */
+	uint8		xl_crc[4];		/* CRC for this record. (as a byte array rather
+								 * than pg_crc32 to avoid padding) */
 
 	/* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding */
 
 } XLogRecord;
 
-#define SizeOfXLogRecord	(offsetof(XLogRecord, xl_crc) + sizeof(pg_crc32))
+#define SizeOfXLogRecord	(offsetof(XLogRecord, xl_crc) + 4 * sizeof(uint8))
 
 /*
  * The high 4 bits in xl_info may be used freely by rmgr. The
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to