Hi Rahila On 2019-Mar-11, Rahila Syed wrote:
> On Tue, 5 Mar 2019 at 08:32, Alvaro Herrera <alvhe...@2ndquadrant.com> > wrote: > > +extern char *btbuildphasename(int64 phasenum); > > 1. I think int64 is too large a datatype for phasenum. > Also int32 is used for phasenum in pg_indexam_progress_phasename(). > Can we have it as int8? It does look strange, I agree, and the first code I wrote had it using a smaller type. However, I later realized that since the value comes directly from pg_stat_get_progress_info(), which returns int8 values, it was pointless to only accept a small fraction of the possible values for no good reason, so I widened it to int64 as you see now. > 2. > . In validate_index_heapscan, we dont calculate blocks_done similar to > IndexBuildHeapScan i.e > block_done += scan->rs_cblock - previous_blkno which IMO is more accurate. > Reporting scan->rs_cblock as blocks_done might give slightly inaccurate > results as we are > still processing that block. Thanks for pointing out that there's an off-by-one bug there (should be cblock-1). However, IndexBuildHeapScan uses more complicated code because it needs to cover for two additional things that validate_index_heapscan doesn't: parallel heapscans and synchronized seqscans. We could do that, I just saw no point in it. > 3. There is no if(progress) check in validate_index()/ > validate_index_heapscan() code. Wont it be a problem if it is called from > other index methods which dont support reporting progress at the > moment? Good question. I'll have a look. Most likely, I'll end up having things so that building an index using an unsupported index AM reports progress based on IndexBuildHeapScan / validate_index / validate_index_heapscan ... which might mean I should remove the 'progress' parameter from there and have them report unconditionally. > 4. Just to clarify my understanding can you please see below comment > > Quoting your following comment in cluster command progress monitor thread > while referring to progress reporting from IndexBuildHeapScan, > > "One, err, small issue with that idea is that we need the param numbers > not to conflict for any "progress update providers" that are to be used > simultaneously by any command." > > Does that mean that we can't have any other INDEX progress monitoring, use > PROGRESS_SCAN_BLOCKS_TOTAL and PROGRESS_SCAN_BLOCKS_DONE > parameter numbers to report anything but the metrics they report now. What I mean is that the literal parameter numbers defined as PROGRESS_SCAN_BLOCKS_DONE/TOTAL may not be used for other parameters by commands that call IndexBuildHeapScan, if those other parameters are used by the same commands simultaneously with IndexBuildHeapScan. So those parameter numbers become "reserved". > 5. > > > 15:56:44.682 | building index (3 of 8): initializing (1/5) | > > 442478 | 442399 | 0 | 0 | 0 | > > I wonder how is the phase 'building index(3 of 8): initializing(1/5)' when > the blocks_done count is increasing. Shouldn't it have > changed to reflect PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN as building > index(3 of 8): table scan(2/5) ? > Although I think it has been rectified in the latest patch as I now get > 'table scan' phase in output as I do CREATE INDEX on table with 1000000 > records Yeah, this was a bug that I fixed in v5. (It was a misunderstanding about how parallel scanning is set up, IIRC). For v5, I tested both parallel and non-parallel builds, with and without sync seqscans, and everything seemed to behave correctly. Thanks for looking! I intend to post a new version later this week. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services