On Fri, Jun 29, 2018 at 11:04 AM, Andrey V. Lepikhov <a.lepik...@postgrespro.ru> wrote: > On 29.06.2018 10:00, Kuntal Ghosh wrote: >> >> On Wed, Jun 27, 2018 at 12:10 PM, Andrey V. Lepikhov >> <a.lepik...@postgrespro.ru> wrote: >>> >>> I prepare third version of the patches. Summary: >>> 1. Mask DEAD tuples at a page during consistency checking (See comments >>> for >>> the mask_dead_tuples() function). >> >> >> - ItemIdSetDead(lp); >> + if (target_index_deletion_factor > 0) >> + ItemIdMarkDead(lp); >> + else >> + ItemIdSetDead(lp); >> IIUC, you want to hold the storage of DEAD tuples to form the ScanKey >> which is required for the index scan in the second phase of quick >> vacuum strategy. To achieve that, you've only marked the tuple as DEAD >> during pruning. Hence, PageRepairFragmentation cannot claim the space >> for DEAD tuples since they still have storage associated with them. >> But, you've WAL-logged it as DEAD tuples having no storage. So, when >> the WAL record is replayed in standby(or crash recovery), the tuples >> will be marked as DEAD having no storage and their space may be >> reclaimed by PageRepairFragmentation. Now, if you do byte-by-byte >> comparison with wal_consistency tool, it may fail even for normal >> tuple as well. Please let me know if you feel the same way. >> > Thanks for your analysis. > In this development version of the patch I expect the same prune() strategy > on a master and standby (i.e. target_index_deletion_factor is equal for > both). > In this case storage of a DEAD tuple holds during replay or recovery in the > same way. Okay. That's acceptable for now.
> On some future step of development I plan to use more flexible prune() > strategy. This will require to append a 'isDeadStorageHold' field to the WAL > record. That sounds interesting. I'll be waiting for your next patches. Few minor comments: + qsort((void *)vacrelstats->dead_tuples, vacrelstats->num_dead_tuples, sizeof(ItemPointerData), tid_comparator); + ivinfo.isSorted = true; My understanding is that vacrelstats->dead_tuples are already sorted based on their tids. I'm referring to the following part in lazy_scan_heap(), for (blk=0 to nblocks) { for (offset=1 to max offset) { if certain conditions are met store the tuple in vacrelstats->dead_tuples using lazy_record_dead_tuple(); } } So, you don't have to sort them again, right? + slot = MakeSingleTupleTableSlot(RelationGetDescr(hrel)); + econtext->ecxt_scantuple = slot; + ExecDropSingleTupleTableSlot(slot); You don't have to do this for every tuple. Before storing the tuple, you can just call MemoryContextReset(econtext->ecxt_per_tuple_memory); Perhaps, you can check IndexBuildHeapRangeScan for the details. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com