On Mon, Nov 02, 2020 at 04:44:22PM +0300, Anastasia Lubennikova wrote:
On 30.10.2020 03:42, Tomas Vondra wrote:
Hi,
I might be somewhat late to the party, but I've done a bit of
benchmarking too ;-) I used TPC-H data from a 100GB test, and tried
different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on
current master and with the patch.
So I don't really observe any measurable slowdowns in the COPY part (in
fact there seems to be a tiny speedup, but it might be just noise). In
the VACUUM part, there's clear speedup when the data was already frozen
by COPY (Yes, those are zeroes, because it took less than 1 second.)
So that looks pretty awesome, I guess.
For the record, these tests were run on a server with NVMe SSD, so
hopefully reliable and representative numbers.
Thank you for the review.
A couple minor comments about the code:
2) I find the "if (all_frozen_set)" block a bit strange. It's a matter
of personal preference, but I'd just use a single level of nesting, i.e.
something like this:
/* if everything frozen, the whole page has to be visible */
Assert(!(all_frozen_set && !PageIsAllVisible(page)));
/*
* If we've frozen everything on the page, and if we're already
* holding pin on the vmbuffer, record that in the visibilitymap.
* If we're not holding the pin, it's OK to skip this - it's just
* an optimization.
*
* It's fine to use InvalidTransactionId here - this is only used
* when HEAP_INSERT_FROZEN is specified, which intentionally
* violates visibility rules.
*/
if (all_frozen_set &&
visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
visibilitymap_set(...);
IMHO it's easier to read, but YMMV. I've also reworded the comment a bit
to say what we're doing etc. And I've moved the comment from inside the
if block into the main comment - that was making it harder to read too.
I agree that it's a matter of taste. I've updated comments and left
nesting unchanged to keep assertions simple.
3) I see RelationGetBufferForTuple does this:
/*
* This is for COPY FREEZE needs. If page is empty,
* pin vmbuffer to set all_frozen bit
*/
if ((options & HEAP_INSERT_FROZEN) &&
(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
visibilitymap_pin(relation, BufferGetBlockNumber(buffer),
vmbuffer);
so is it actually possible to get into the (all_frozen_set) without
holding a pin on the visibilitymap? I haven't investigated this so
maybe there are other ways to get into that code block. But the new
additions to hio.c get the pin too.
I was thinking that GetVisibilityMapPins() can somehow unset the pin.
I gave it a second look. And now I don't think it's possible to get
into this code block without a pin. So, I converted this check into
an assertion.
4) In heap_xlog_multi_insert we now have this:
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
IIUC it makes no sense to have both flags at the same time, right? So
what about adding
Assert(!(XLH_INSERT_ALL_FROZEN_SET &&
XLH_INSERT_ALL_VISIBLE_CLEARED));
to check that?
Agree.
I placed this assertion to the very beginning of the function. It also
helped to simplify the code a bit.
I also noticed, that we were not updating visibility map for
all_frozen from heap_xlog_multi_insert. Fixed.
I wonder what to do about the heap_insert - I know there are concerns it
would negatively impact regular insert, but is it really true? I suppose
this optimization would be valuable even for cases where multi-insert is
not possible.
Do we have something like INSERT .. FREEZE? I only see
TABLE_INSERT_FROZEN set for COPY FREEZE and for matview operations.
Can you explain, what use-case are we trying to optimize by extending
this patch to heap_insert()?
I might be mistaken, but isn't copy forced to use heap_insert for a
bunch of reasons? For example in the presence of before/after triggers,
statement triggers on partitioned tables, or with volatile functions.
The new version is attached.
I've also fixed a typo in the comment by Tatsuo Ishii suggestion.
Also, I tested this patch with replication and found no issues.
Thanks. I'll take a look.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services