Re: Pluggable Storage - Andres's take

Alexander Korotkov Mon, 15 Oct 2018 12:07:42 -0700

Hi!

On Wed, Oct 3, 2018 at 8:16 AM Andres Freund <[email protected]> wrote:
> I've pushed an updated version, with a fair amount of pending changes,
> and I hope all your pending (and not redundant, by our concurrent
> development), patches merged.


I'd like to also share some patches.  I've used current state of
pluggable-zheap for the base of my patches.

 * 0001-remove-extra-snapshot-functions.patch – removes
snapshot_satisfiesUpdate() and snapshot_satisfiesVacuum() functions
from tableam API.  snapshot_satisfiesUpdate() was unused completely.
snapshot_satisfiesVacuum()  was used only in heap_copy_for_cluster().
So, I've replaced it with direct heapam_satisfies_vacuum().

 * 0002-add-costing-function-to-API.patch – adds function for costing
sequential and table sample scan to tableam API.  zheap costing
function are now copies of heap costing function.  This should be
adjusted in future.  Estimation for heap lookup during index scans
should be also pluggable, but not yet implemented (TODO).

I've examined code in pluggable-zheap branch and EDB github [1] and I
didn't found anything related to "delete-marking" indexes as stated on
slide #25 of presentation [2].  So, basically contract between heap
and indexes is remain unchanged: once you update one indexed field,
you have to update all the others.  Did I understand correctly that
this is postponed?

And couple more notes from me:
* Right now table_fetch_row_version() is called in most of places with
SnapshotAny.  That might be working in majority of cases, because in
heap there couldn't be multiple tuples residing the same TID, while
zheap always returns most recent tuple residing this TID.  But I think
it would be better to provide some meaningful snapshot instead of
SnapshotAny.  If even the best thing we can do is to ask for most
recent tuple on some TID, we need more consistent way for asking table
AM for this.  I'm going to elaborate more on this.
* I'm not really sure we need ability to iterate multiple tuples
referenced by index.  It seems that the only place, which really needs
this is heap_copy_for_cluster(), which itself is table AM specific.
Also zheap doesn't seem to be able to return more than one tuple by
zheapam_fetch_follow().  So, I'm going to investigate more on this.
And if this iteration is really unneeded, I'll propose a patch to
delete that.

1. https://github.com/EnterpriseDB/zheap
2. 
http://www.pgcon.org/2018/schedule/attachments/501_zheap-a-new-storage-format-postgresql-5.pdf

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

0001-remove-extra-snapshot-functions.patch
Description: Binary data

0002-add-costing-function-to-API.patch
Description: Binary data

Re: Pluggable Storage - Andres's take

Reply via email to