On Wed, Sep 5, 2018 at 2:04 PM Haribabu Kommi <kommi.harib...@gmail.com> wrote:
> > On Tue, Sep 4, 2018 at 10:33 AM Andres Freund <and...@anarazel.de> wrote: > >> Hi, >> >> Thanks for the patches! >> >> On 2018-09-03 19:06:27 +1000, Haribabu Kommi wrote: >> > I found couple of places where the zheap is using some extra logic in >> > verifying >> > whether it is zheap AM or not, based on that it used to took some extra >> > decisions. >> > I am analyzing all the extra code that is done, whether any callbacks >> can >> > handle it >> > or not? and how? I can come back with more details later. >> >> Yea, I think some of them will need to stay (particularly around >> integrating undo) and som other ones we'll need to abstract. >> > > OK. I will list all the areas that I found with my observation of how to > abstract or leaving it and then implement around it. > The following are the change where the code is specific to checking whether it is a zheap relation or not? Overall I found that It needs 3 new API's at the following locations. 1. RelationSetNewRelfilenode 2. heap_create_init_fork 3. estimate_rel_size 4. Facility to provide handler options like (skip WAL and etc). _hash_vacuum_one_page: xlrec.flags = RelationStorageIsZHeap(heapRel) ? XLOG_HASH_VACUUM_RELATION_STORAGE_ZHEAP : 0; _bt_delitems_delete: xlrec_delete.flags = RelationStorageIsZHeap(heapRel) ? XLOG_BTREE_DELETE_RELATION_STORAGE_ZHEAP : 0; Storing the type of the handler and while checking for these new types adding a new API for special handing can remove the need of the above code. RelationAddExtraBlocks: if (RelationStorageIsZHeap(relation)) { ZheapInitPage(page, BufferGetPageSize(buffer)); freespace = PageGetZHeapFreeSpace(page); } Adding a new API for PageInt and PageGetHeapFreeSpace to redirect the calls to specific table am handlers. visibilitymap_set: if (RelationStorageIsZHeap(rel)) { recptr = log_zheap_visible(rel->rd_node, heapBuf, vmBuf, cutoff_xid, flags); /* * We do not have a page wise visibility flag in zheap. * So no need to set LSN on zheap page. */ } Handler options may solve need of above code. validate_index_heapscan: /* Set up for predicate or expression evaluation */ /* For zheap relations, the tuple is locally allocated, so free it. */ ExecStoreHeapTuple(heapTuple, slot, RelationStorageIsZHeap(heapRelation)); This will solve after changing the validate_index_heapscan function to slotify. RelationTruncate: /* Create the meta page for zheap */ if (RelationStorageIsZHeap(rel)) RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence, InvalidTransactionId, InvalidMultiXactId); if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED && rel->rd_rel->relkind != 'p') { heap_create_init_fork(rel); if (RelationStorageIsZHeap(rel)) ZheapInitMetaPage(rel, INIT_FORKNUM); } new API in RelationSetNewRelfilenode and heap_create_init_fork can solve it. cluster: if (RelationStorageIsZHeap(rel)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot cluster a zheap table"))); No change required. copyFrom: /* * In zheap, we don't support the optimization for HEAP_INSERT_SKIP_WAL. * See zheap_prepare_insert for details. * PBORKED / ZBORKED: abstract */ if (!RelationStorageIsZHeap(cstate->rel) && !XLogIsNeeded()) hi_options |= HEAP_INSERT_SKIP_WAL; How about requesting the table am handler to provide options and use them here? ExecuteTruncateGuts: // PBORKED: Need to abstract this minmulti = GetOldestMultiXactId(); /* * Need the full transaction-safe pushups. * * Create a new empty storage file for the relation, and assign it * as the relfilenode value. The old storage file is scheduled for * deletion at commit. * * PBORKED: needs to be a callback */ if (RelationStorageIsZHeap(rel)) RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence, InvalidTransactionId, InvalidMultiXactId); else RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence, RecentXmin, minmulti); if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED) { heap_create_init_fork(rel); if (RelationStorageIsZHeap(rel)) ZheapInitMetaPage(rel, INIT_FORKNUM); } New API inside RelationSetNewRelfilenode can handle it. ATRewriteCatalogs: /* Inherit the storage_engine reloption from the parent table. */ if (RelationStorageIsZHeap(rel)) { static char *validnsps[] = HEAP_RELOPT_NAMESPACES; DefElem *storage_engine; storage_engine = makeDefElemExtended("toast", "storage_engine", (Node *) makeString("zheap"), DEFELEM_UNSPEC, -1); reloptions = transformRelOptions((Datum) 0, list_make1(storage_engine), "toast", validnsps, true, false); } I don't think anything can be done in API. ATRewriteTable: /* * In zheap, we don't support the optimization for HEAP_INSERT_SKIP_WAL. * See zheap_prepare_insert for details. * * ZFIXME / PFIXME: We probably need a different abstraction for this. */ if (!RelationStorageIsZHeap(newrel) && !XLogIsNeeded()) hi_options |= HEAP_INSERT_SKIP_WAL; Options can solve this also. estimate_rel_size: if (curpages < 10 && (rel->rd_rel->relpages == 0 || (RelationStorageIsZHeap(rel) && rel->rd_rel->relpages == ZHEAP_METAPAGE + 1)) && !rel->rd_rel->relhassubclass && rel->rd_rel->relkind != RELKIND_INDEX) curpages = 10; /* report estimated # pages */ *pages = curpages; /* quick exit if rel is clearly empty */ if (curpages == 0 || (RelationStorageIsZHeap(rel) && curpages == ZHEAP_METAPAGE + 1)) { *tuples = 0; *allvisfrac = 0; break; } /* coerce values in pg_class to more desirable types */ relpages = (BlockNumber) rel->rd_rel->relpages; reltuples = (double) rel->rd_rel->reltuples; relallvisible = (BlockNumber) rel->rd_rel->relallvisible; /* * If it's a zheap relation, then subtract the pages * to account for the metapage. */ if (relpages > 0 && RelationStorageIsZHeap(rel)) { curpages--; relpages--; } An API may be needed to find out estimation size based on handler type? pg_stat_get_tuples_hot_updated and others: /* * Counter tuples_hot_updated stores number of hot updates for heap table * and the number of inplace updates for zheap table. */ if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL || RelationStorageIsZHeap(rel)) result = 0; else result = (int64) (tabentry->tuples_hot_updated); Is the special condition is needed? The values should be 0 because of zheap right? RelationSetNewRelfilenode: /* Initialize the metapage for zheap relation. */ if (RelationStorageIsZHeap(relation)) ZheapInitMetaPage(relation, MAIN_FORKNUM); New API in RelationSetNetRelfilenode Can solve this problem. > >> > >> And then: >> > >> - lotsa cleanups >> > >> - rebasing onto a newer version of the abstract slot patchset >> > >> - splitting out smaller patches >> > >> >> > >> >> > >> You'd moved the bulk insert into tableam callbacks - I don't quite >> get >> > >> why? There's not really anything AM specific in that code? >> > >> >> > > >> > > The main reason of adding them to AM is just to provide a control to >> > > the specific AM to decide whether they can support the bulk insert or >> > > not? >> > > >> > > Current framework doesn't support AM specific bulk insert state to be >> > > passed from one function to another and it's structure is fixed. This >> needs >> > > to be enhanced to add AM specific private members also. >> > > >> > >> > Do you want me to work on it to make it generic to AM methods to extend >> > the structure? >> >> I think the best thing here would be to *remove* all AM abstraction for >> bulk insert, until it's actuallly needed. The likelihood of us getting >> the interface right and useful without an actual user seems low. Also, >> this already is a huge patch... >> > > OK. Will remove them and share the patch. > Bulk insert API changes are removed. > > >> >> > @@ -308,7 +308,7 @@ static void CopyFromInsertBatch(CopyState cstate, >> EState *estate, >> > CommandId mycid, int hi_options, >> > ResultRelInfo *resultRelInfo, >> > BulkInsertState bistate, >> > - int nBufferedTuples, >> TupleTableSlot **bufferedSlots, >> > + int nBufferedSlots, >> TupleTableSlot **bufferedSlots, >> > uint64 firstBufferedLineNo); >> > static bool CopyReadLine(CopyState cstate); >> > static bool CopyReadLineText(CopyState cstate); >> > @@ -2309,11 +2309,12 @@ CopyFrom(CopyState cstate) >> > void *bistate; >> > uint64 processed = 0; >> > bool useHeapMultiInsert; >> > - int nBufferedTuples = 0; >> > + int nBufferedSlots = 0; >> > int prev_leaf_part_index = -1; >> >> > -#define MAX_BUFFERED_TUPLES 1000 >> > +#define MAX_BUFFERED_SLOTS 1000 >> >> What's the point of these renames? We're still dealing in tuples. Just >> seems to make the patch larger. >> > > OK. I will correct it. > > >> >> > if (useHeapMultiInsert) >> > { >> > + int tup_size; >> > + >> > /* Add this tuple to the tuple >> buffer */ >> > - if (nBufferedTuples == 0) >> > + if (nBufferedSlots == 0) >> > + { >> > firstBufferedLineNo = >> cstate->cur_lineno; >> > - >> Assert(bufferedSlots[nBufferedTuples] == myslot); >> > - nBufferedTuples++; >> > + >> > + /* >> > + * Find out the Tuple >> size of the first tuple in a batch and >> > + * use it for the rest >> tuples in a batch. There may be scenarios >> > + * where the first tuple >> is very small and rest can be large, but >> > + * that's rare and this >> should work for majority of the scenarios. >> > + */ >> > + tup_size = >> heap_compute_data_size(myslot->tts_tupleDescriptor, >> > + >> myslot->tts_values, >> > + >> myslot->tts_isnull); >> > + } >> >> This seems too exensive to me. I think it'd be better if we instead >> used the amount of input data consumed for the tuple as a proxy. Does that >> sound reasonable? >> > > Yes, the cstate structure contains the line_buf member that holds the > information of > the line length of the row, this can be used as a tuple length to limit > the size usage. > comments? > copy from batch insert memory usage limit fix and Provide grammer support for USING method to create table as also. Regards, Haribabu Kommi Fujitsu Australia
0001-copy-memory-limit-fix.patch
Description: Binary data
0003-CREATE-AS-USING-method-grammer-support.patch
Description: Binary data
0002-Remove-of-Bulk-insert-state-API.patch
Description: Binary data