Re: display offset along with block number in vacuum errors
Thanks Justin. On Sat, 1 Aug 2020 at 11:47, Justin Pryzby wrote: > > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote: > > Bcc: > > Subject: Re: display offset along with block number in vacuum errors > > Reply-To: > > In-Reply-To: > > > > whoops > > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > > > > Here: > > > > > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber > > > > blkno, Buffer buffer, > > > > BlockNumber tblk; > > > > OffsetNumber toff; > > > > ItemId itemid; > > > > + LVSavedErrInfo loc_saved_err_info; > > > > > > > > tblk = > > > > ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); > > > > if (tblk != blkno) > > > > break; /* past end of > > > > tuples for this block */ > > > > toff = > > > > ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); > > > > + > > > > + /* Update error traceback information */ > > > > + update_vacuum_error_info(vacrelstats, > > > > &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, > > > > +blkno, > > > > toff); > > > > itemid = PageGetItemId(page, toff); > > > > ItemIdSetUnused(itemid); > > > > unused[uncnt++] = toff; > > > > + > > > > + /* Revert to the previous phase information for error > > > > traceback */ > > > > + restore_vacuum_error_info(vacrelstats, > > > > &loc_saved_err_info); > > > > } > > > > > > > > I'm not sure why you use restore_vacuum_error_info() at all. It's > > > > already > > > > called at the end of lazy_vacuum_page() (and others) to allow functions > > > > to > > > > clean up after their own state changes, rather than requiring callers > > > > to do it. > > > > I don't think you should use it in a loop, nor introduce another > > > > LVSavedErrInfo. > > > > > > > > Since phase and blkno are already set, I think you should just set > > > > vacrelstats->offnum = toff, rather than calling > > > > update_vacuum_error_info(). > > > > Similar to whats done in lazy_vacuum_heap(): > > > > > > > > tblk = > > > > ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]); > > > > vacrelstats->blkno = tblk; > > > > > > Fixed. > > > > I rearead this thread and I think the earlier suggestion from Masahiko was > > right. The loop around dead_tuples only does ItemIdSetUnused() which > > updates > > the page, which has already been read from disk. On my suggestion, your v2 > > patch sets offnum directly, but now I think it's not useful to set at all, > > since the whole page is manipulated by PageRepairFragmentation() and > > log_heap_clean(). An error there would misleadingly say "..at offset number > > MM", but would always show the page's last offset, and not the offset where > > an > > error occured. > > This makes me question whether offset numbers are ever useful during > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by > internal functions that don't update vacrelstats->offno. Note that my initial > problem report that led to the errcontext implementation was an ERROR in heap > *scan* (not vacuum). So an offset number at that point would've been > sufficient. > https://www.postgresql.org/message-id/20190808012436.gg11...@telsasoft.com > > I mentioned that lazy_check_needs_freeze() should save and restore the > errinfo, > so an error in heap_page_prune() (for example) doesn't get the wrong offset > associated with it. > > So see the attached changes on top of your v2 patch. Actually I was waiting for review comments from committer and other people also and was planning to send a patch after that. I already fixed your comments in my offline patch and was waiting for more comments. Anyway, thanks for delta patch. Here, attaching v3 patch for review. > > postgres=# DROP TABLE tt; CREATE TABLE tt(a int) WITH (fillfactor=90); INSERT > INTO tt SELECT generate_series(1,9); VACUUM tt; UPDATE tt SET a=1 WHERE > ctid='(345,10)'; UPDATE pg_class SET relfrozenxid=(relfrozenxid::text::int + > (1<<30))::int::text::xid WHERE oid='tt'::regclass; VACUUM tt; > ERROR: found xmin 1961 from before relfrozenxid 1073743785 > CONTEXT: while scanning block 345 of relation "public.tt", item offset 205 > > Hmm.. is it confusing that the block number is 0-indexed but the offset is > 1-indexed ? > > -- > Justin -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.c v03_0001-Added-offset-with-block-number-in-vacuum-errcontext.patch Description: Binary data
Re: psql - improve test coverage from 41% to 88%
Hello, This patch no longer applies: http://cfbot.cputube.org/patch_27_2262.log CF entry has been updated to Waiting on Author. This patch hasn't been updated and still doesn't apply, do you intend to rebase it during this commitfest or should we move it to returned with feedback? It can always be re-opened at a later date. As the thread has stalled, I've marked this Returned with Feedback. Hmmm. AFAICR the feedback is that the Expect perl module is not welcome, which seems to suggest that it would have to be re-implemented somehow. This is not my dev philosophy, I won't do that, so I'm sorry to say that psql coverage will remain pretty abysmal. Sigh. -- Fabien.
Re: Getting rid of some more lseek() calls
On Fri, Feb 14, 2020 at 4:47 AM Alvaro Herrera wrote: > So, you said lseek() is faster than fstat, and we would only use the > latter because we want to avoid the file position jumping ahead, even > though it's slower. But if the next read/write is not going to care > about the file position because pread/pwrite, then why not just do one > lseek() and not worry about the file position jumping ahead? Maybe the > API could offer this as an option; caller can say "I do care about file > position" (a boolean flag) and then we use fstat; or they can say "I > don't care" and then we just do a single lseek(SEEK_END). Of course, in > Windows we ignore the flag since we can do it in the fast way. > > pg_file_size(int fd, bool careful) > > Let's have the function comment indicate that -1 is returned in case of > failure. Reviving an old thread... yeah, we should probably figure out if we still want a central fd size function and how it should look, but in the meantime, we might as well have slru.c using pread/pwrite as already agreed, so I went ahead and pushed that part.
Re: display offset along with block number in vacuum errors
On Sat, Aug 01, 2020 at 12:31:53PM +0530, Mahendra Singh Thalor wrote: > Actually I was waiting for review comments from committer and other > people also and was planning to send a patch after that. I already > fixed your comments in my offline patch and was waiting for more > comments. Anyway, thanks for delta patch. > > Here, attaching v3 patch for review. I wasn't being impatient but I spent enough time thinking about this that it made sense to put it in patch form. Your patch has a couple extaneous changes: case VACUUM_ERRCB_PHASE_VACUUM_HEAP: if (BlockNumberIsValid(errinfo->blkno)) + { errcontext("while vacuuming block %u of relation \"%s.%s\"", errinfo->blkno, errinfo->relnamespace, errinfo->relname); + } break; case VACUUM_ERRCB_PHASE_VACUUM_INDEX: @@ -3589,6 +3618,7 @@ vacuum_error_callback(void *arg) errinfo->indname, errinfo->relnamespace, errinfo->relname); break; + case VACUUM_ERRCB_PHASE_INDEX_CLEANUP: errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"", I would get rid of these by doing like: git reset -p HEAD~1 (say "n" to most hunks and "y" to reset just the two, above), then git commit --amend (without -a and without pathnames), then git diff will show local changes (including those no-longer-committed hunks), which you can git checkout -p (or similar). I'd be interested to hear if there's a better way. -- Justin
Re: [HACKERS] [PATCH] Generic type subscripting
> On Fri, Jul 31, 2020 at 03:35:22PM -0400, Tom Lane wrote: > > I started to look through this again, and really found myself wondering > why we're going to all this work to invent what are fundamentally pretty > bogus "features". The thing that particularly sticks in my craw is the > 0005 patch, which tries to interpret a subscript of a JSON value as either > integer or text depending on, seemingly, the phase of the moon. I don't > think that will work. For example, with existing arrays you can do > something like arraycol['42'] and the unknown-type literal is properly > cast to an integer. The corresponding situation with a JSON subscript > would have no principled resolution. > > It doesn't help any that both coercion alternatives are attempted at > COERCION_ASSIGNMENT level, which makes it noticeably more likely that > they'll both succeed. But using ASSIGNMENT level is quite inappropriate > in any context where it's not 100% certain what the intended type is. > > The proposed commit message for 0005 claims that this is somehow improving > our standards compliance, but I see nothing in the SQL spec suggesting > that you can subscript a JSON value at all within the SQL language, so > I think that claim is just false. It's due to my lack of writing skills. As far as I can remember the discussion was about JSON path part of the standard, where one allowed to use float indexes with implementation-defined rounding or truncation. In this patch series I'm trying to think of JSON subscript as an equivalent for JSON path, hence this misleading description. Having said that, I guess the main motivation behind 0005 is performance improvements. Hopefully Nikita can provide more insights. Overall back when 0005 patch was suggested its implementation looked reasonable for me, but I'll review it again. > Maybe this could be salvaged by flushing 0005 in its current form and > having the jsonb subscript executor do something like "if the current > value-to-be-subscripted is a JSON array, then try to convert the textual > subscript value to an integer". Not sure about what the error handling > rules ought to be like, though. I'm fine with the idea of separating 0005 patch and potentially prusuing it as an independent item. Just need to rebase 0006, since Pavel mentioned that it's a reasonable change he would like to see in the final result.
Re: Comment simplehash/dynahash trade-offs
On Fri, Jul 31, 2020 at 8:17 PM Thomas Munro wrote: > > On Sat, Aug 1, 2020 at 7:22 AM James Coleman wrote: > > [v2 patch set] > > I ran it through pgindent which insisted on adding some newlines, I > manually replaced some spaces with tabs to match nearby lines, I added > some asterisks in your example function prototypes where is > returned because they seemed to be missing, and I pushed this. > Thanks! Thanks for reviewing and committing! James
Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM
On Mon, Mar 16, 2020 at 7:08 AM Michail Nikolaev wrote: > While working on support for index hint bits on standby [1] I have > started to getting > "ERROR: could not find left sibling of block in index " > during stress tests. I reproduced the bug using your steps (including the pg_usleep() hack) today. It was fairly easy to confirm the problem. Attached is a revised version of your patch. It renames the buffer variable names, and changes the precise order in which the locks are released (for consistency with _bt_unlink_halfdead_page()). It also changes the comments, and adds a new paragraph to the README. The existing paragraph was about cross-level differences, this new one is about same-level differences (plus a second new paragraph to talk about backwards scans + page deletion). This revised version is essentially the same as your original patch -- I have only made superficial adjuments. I think that I will be able to commit this next week, barring objections. -- Peter Geoghegan v3-0001-Avoid-backwards-scan-page-deletion-standby-race.patch Description: Binary data
Re: psql - improve test coverage from 41% to 88%
> On 1 Aug 2020, at 09:06, Fabien COELHO wrote: > > Hello, > This patch no longer applies: http://cfbot.cputube.org/patch_27_2262.log CF entry has been updated to Waiting on Author. >>> >>> This patch hasn't been updated and still doesn't apply, do you intend to >>> rebase >>> it during this commitfest or should we move it to returned with feedback? >>> It >>> can always be re-opened at a later date. >> >> As the thread has stalled, I've marked this Returned with Feedback. > > Hmmm. > > AFAICR the feedback is that the Expect perl module is not welcome, which > seems to suggest that it would have to be re-implemented somehow. This is not > my dev philosophy, I won't do that, so I'm sorry to say that psql coverage > will remain pretty abysmal. Re-reading this thread, I see no complaints about introducing a dependency on Expect. The feedback returned in this case is that the patch hasn't applied since March, and that the patch is more than welcome to be re-entered in the next CF once it does. cheers ./daniel
Re: Rethinking opclass member checks and dependency strength
Anastasia Lubennikova writes: > On 31.03.2020 23:45, Tom Lane wrote: >> Still haven't got a better naming idea, but in the meantime here's >> a rebase to fix a conflict with 612a1ab76. > Maybe "amadjustmembers" will work? Not having any better idea, I adopted that one. > I've looked through the patch and noticed this comment: > + /* Probably we should throw error here */ > I suggest adding an ERROR or maybe Assert, so that future developers > wouldn't > forget about setting dependencies. Other than that, the patch looks good > to me. I'd figured that adding error checks could be left for a second pass, but there's no strong reason not to insert these particular checks now ... and indeed, doing so showed me that the patch hadn't been updated to cover the recent addition of opclass options procs :-(. So I fixed that and pushed it. regards, tom lane
Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM
> On 1 Aug 2020, at 20:30, Peter Geoghegan wrote: > This revised version is essentially the same as your original patch -- > I have only made superficial adjuments. I think that I will be able to > commit this next week, barring objections. As we're out of time for the July CF where this is registered, I've moved this to 2020-09. Based on the above comment, I've marked it Ready for Committer. cheers ./daniel
Re: psql - improve test coverage from 41% to 88%
Daniel Gustafsson writes: > On 1 Aug 2020, at 09:06, Fabien COELHO wrote: >> AFAICR the feedback is that the Expect perl module is not welcome, which >> seems to suggest that it would have to be re-implemented somehow. This is >> not my dev philosophy, I won't do that, so I'm sorry to say that psql >> coverage will remain pretty abysmal. > Re-reading this thread, I see no complaints about introducing a dependency on > Expect. The feedback returned in this case is that the patch hasn't applied > since March, and that the patch is more than welcome to be re-entered in the > next CF once it does. Personally, I'd object to introducing a hard dependency on Expect, as there are no doubt a lot of developers and buildfarm members who don't have that installed. But I see no reason we couldn't skip some tests if it's lacking, as we're already doing with IO::Pty in 010_tab_completion.pl. That does raise the question of whether Expect makes things enough easier than raw IO::Pty that it's worth adding that dependency (and hence foregoing the tests on some machines). But I'm prepared to be convinced on that point. regards, tom lane
Re: Autovacuum on partitioned table (autoanalyze)
> On 6 Jul 2020, at 12:35, yuzuko wrote: > >> On Wed, Jul 1, 2020 at 6:26 PM Daniel Gustafsson wrote: >> >>> On 21 Apr 2020, at 18:21, yuzuko wrote: >> >>> I'll update the patch soon. >> >> Do you have an updated version to submit? The previous patch no longer >> applies >> to HEAD, so I'm marking this entry Waiting on Author in the meantime. >> > Thank you for letting me know. > I attach the latest patch applies to HEAD. This version seems to fail under Werror which is used in the Travis builds: autovacuum.c: In function ‘relation_needs_vacanalyze’: autovacuum.c:3117:59: error: ‘reltuples’ may be used uninitialized in this function [-Werror=maybe-uninitialized] anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples; ^ autovacuum.c:2972:9: note: ‘reltuples’ was declared here float4 reltuples; /* pg_class.reltuples */ ^ I've moved this patch to the next commitfest, but kept the status as Waiting on Author. Please submit a new version of the patch. cheers ./daniel
Re: Default gucs for EXPLAIN
> On 10 Jul 2020, at 15:30, Peter Eisentraut > wrote: > > On 2020-05-23 11:14, Vik Fearing wrote: >> Here is a patch to provide default gucs for EXPLAIN options. >> I have two goals with this patch. The first is that I personally >> *always* want BUFFERS turned on, so this would allow me to do it without >> typing it every time. > > There was a lot of opposition to the approach taken by this patch, but there > was a lot of support turning BUFFERS on by default. Would you like to submit > a patch for that? My reading of this thread and the above that the patch, and CF entry, as it stands should be rejected - but that a separate patch for turning BUFFERS on by default would be highly appreciated. Unless objections I'll go do that in the CF app for 2020-07. cheers ./daniel
Re: Default gucs for EXPLAIN
On Sun, Aug 02, 2020 at 12:00:56AM +0200, Daniel Gustafsson wrote: > > On 10 Jul 2020, at 15:30, Peter Eisentraut > > wrote: > > > > On 2020-05-23 11:14, Vik Fearing wrote: > >> Here is a patch to provide default gucs for EXPLAIN options. > >> I have two goals with this patch. The first is that I personally > >> *always* want BUFFERS turned on, so this would allow me to do it without > >> typing it every time. > > > > There was a lot of opposition to the approach taken by this patch, but > > there was a lot of support turning BUFFERS on by default. Would you like > > to submit a patch for that? > > My reading of this thread and the above that the patch, and CF entry, as it > stands should be rejected - but that a separate patch for turning BUFFERS on > by > default would be highly appreciated. Unless objections I'll go do that in the > CF app for 2020-07. Sounds right. I have a patch to enable buffers by default, but that raises the issue about how to avoid machine-dependent output when we run explain ANALYZE. One option is to add "buffers off" to the existing incantation of (COSTS OFF, TIMING OFF, SUMMARY OFF). But I think we should just add a new "REGRESS" option, which does that and any future similar thing (like WAL OFF). I have a patch to do that, too, which I included in an early version of this series (which I since changed to run explain without analyze). That handles most but not all machine-dependant output. https://www.postgresql.org/message-id/20200306213310.GM684%40telsasoft.com -- Justin
Re: Comment simplehash/dynahash trade-offs
On Sat, 1 Aug 2020 at 12:17, Thomas Munro wrote: > > On Sat, Aug 1, 2020 at 7:22 AM James Coleman wrote: > > [v2 patch set] > > I ran it through pgindent which insisted on adding some newlines, I > manually replaced some spaces with tabs to match nearby lines, I added > some asterisks in your example function prototypes where is > returned because they seemed to be missing, and I pushed this. > Thanks! I was just reading over this and wondered about the following: + * The element type is required to contain a "uint32 status" member. I see that PagetableEntry does not follow this and I also didn't follow it when writing the Result Cache patch in [1]. I managed to shrink the struct I was using for the hash table by 4 bytes by using a char instead of an int. That sounds like a small amount of memory, but it did result in much better cache hit ratios in the patch Maybe it would be better just to get rid of the enum and just #define the values. It seems unlikely that we're every going to need many more states than what are there already, let along more than, say 127 of them. It does look like manifest_file could be shrunk down a bit too by making the status field a char. [1] https://www.postgresql.org/message-id/flat/caaphdvrpcqyqdwergywx8j+2dlungxu+fosbq1uscxrunyx...@mail.gmail.com
Re: Re: fixing pg_ctl with relative paths
On 07/01/13 20:10, Josh Kupershmidt wrote: > I am eager to see the relative paths issue fixed, but maybe we need to > bite the bullet and sort out the escaping of command-line options in > the rest of pg_ctl first, so that a DataDir like "/tmp/here's a \" > quote" can consistently be used by pg_ctl {start|stop|restart} before > we can fix this wart. It was timely to see this thread recently revived, as I had only just recently needed to contend with the same escaping issue while writing a PostgresNode-like test harness for PL/Java, where I discovered I have to pass -o values pre-transformed to pg_ctl, and even have to do that platform-sensitively, to pre-transform them according to the rules for Bourne shell or those for cmd.exe. I looked at the history of that code in pg_ctl and saw that it went all the way back, so I assumed that any proposal to fix it would probably be met with "it has always been that way and anybody calling it with arbitrary arguments must be pre-transforming them anyway and it would be bad to break that." (And anyway, my test harness thing is now yet one more thing that depends on the current behavior.) But would it be worthwhile to perhaps make a start, add an option (non-default at first) that changes to an implementation that passes values transparently and isn't injection-prone? (I use "injection-prone" not because I'd be especially concerned about command injection by anybody who can already run pg_ctl, just because it's an economical way to describe what pg_ctl currently does.) Regards, -Chap
Re: Comment simplehash/dynahash trade-offs
On Sun, 2 Aug 2020 at 11:16, David Rowley wrote: > Maybe it would be better just to get rid of the enum and just #define > the values. It seems unlikely that we're every going to need many more > states than what are there already, let along more than, say 127 of > them. It does look like manifest_file could be shrunk down a bit too > by making the status field a char. This didn't turn out quite as pretty as I had imagined. I needed to leave the two statuses defined in simplehash.h so that callers could make use of them. (Result Cache will do this). The change here would be callers would need to use SH_STATUS_IN_USE rather than _SH_IN_USE. I'm not really that sold on doing things this way. I really just don't want to have to make my status field a uint32 in Result Cache per what the new comment states we must do. If there's a nicer way, then perhaps that's worth considering. David simplehash_fix.patch Description: Binary data
Re: Use of "long" in incremental sort code
On Sat, 1 Aug 2020 at 02:02, James Coleman wrote: > I'd previously attached a patch [1], and there seemed to be agreement > it was reasonable (lightly so, but I also didn't see any > disagreement); would someone be able to either commit the change or > provide some additional feedback? It looks fine to me. Pushed. David > [1]: > https://www.postgresql.org/message-id/CAAaqYe_Y5zwCTFCJeso7p34yJgf4khR8EaKeJtGd%3DQPudOad6A%40mail.gmail.com
Re: display offset along with block number in vacuum errors
On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor wrote: > > Thanks Justin. > > On Sat, 1 Aug 2020 at 11:47, Justin Pryzby wrote: > > > > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote: > > > Bcc: > > > Subject: Re: display offset along with block number in vacuum errors > > > Reply-To: > > > In-Reply-To: > > > > > > > whoops > > > > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > > > > > Here: > > > > > > > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber > > > > > blkno, Buffer buffer, > > > > > BlockNumber tblk; > > > > > OffsetNumber toff; > > > > > ItemId itemid; > > > > > + LVSavedErrInfo loc_saved_err_info; > > > > > > > > > > tblk = > > > > > ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); > > > > > if (tblk != blkno) > > > > > break; /* past end > > > > > of tuples for this block */ > > > > > toff = > > > > > ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); > > > > > + > > > > > + /* Update error traceback information */ > > > > > + update_vacuum_error_info(vacrelstats, > > > > > &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, > > > > > + > > > > > blkno, toff); > > > > > itemid = PageGetItemId(page, toff); > > > > > ItemIdSetUnused(itemid); > > > > > unused[uncnt++] = toff; > > > > > + > > > > > + /* Revert to the previous phase information for error > > > > > traceback */ > > > > > + restore_vacuum_error_info(vacrelstats, > > > > > &loc_saved_err_info); > > > > > } > > > > > > > > > > I'm not sure why you use restore_vacuum_error_info() at all. It's > > > > > already > > > > > called at the end of lazy_vacuum_page() (and others) to allow > > > > > functions to > > > > > clean up after their own state changes, rather than requiring callers > > > > > to do it. > > > > > I don't think you should use it in a loop, nor introduce another > > > > > LVSavedErrInfo. > > > > > > > > > > Since phase and blkno are already set, I think you should just set > > > > > vacrelstats->offnum = toff, rather than calling > > > > > update_vacuum_error_info(). > > > > > Similar to whats done in lazy_vacuum_heap(): > > > > > > > > > > tblk = > > > > > ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]); > > > > > vacrelstats->blkno = tblk; > > > > > > > > Fixed. > > > > > > I rearead this thread and I think the earlier suggestion from Masahiko was > > > right. The loop around dead_tuples only does ItemIdSetUnused() which > > > updates > > > the page, which has already been read from disk. On my suggestion, your > > > v2 > > > patch sets offnum directly, but now I think it's not useful to set at all, > > > since the whole page is manipulated by PageRepairFragmentation() and > > > log_heap_clean(). An error there would misleadingly say "..at offset > > > number > > > MM", but would always show the page's last offset, and not the offset > > > where an > > > error occured. > > > > This makes me question whether offset numbers are ever useful during > > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by > > internal functions that don't update vacrelstats->offno. Note that my > > initial > > problem report that led to the errcontext implementation was an ERROR in > > heap > > *scan* (not vacuum). So an offset number at that point would've been > > sufficient. > > https://www.postgresql.org/message-id/20190808012436.gg11...@telsasoft.com > > > > I mentioned that lazy_check_needs_freeze() should save and restore the > > errinfo, > > so an error in heap_page_prune() (for example) doesn't get the wrong offset > > associated with it. > > > > So see the attached changes on top of your v2 patch. > > Actually I was waiting for review comments from committer and other > people also and was planning to send a patch after that. I already > fixed your comments in my offline patch and was waiting for more > comments. Anyway, thanks for delta patch. > > Here, attaching v3 patch for review. Thank you for updating the patch! Here are my comments on v3 patch: @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) if (PageIsNew(page) || PageIsEmpty(page)) return false; + /* Update error traceback information */ + update_vacuum_error_info(vacrelstats, &saved_err_info, + VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno, + InvalidOffsetNumber); + maxoff = PageGetMaxOffsetNumber(page); for (offnum = FirstOffsetNumber; offnum <= maxoff; You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP but I think we're already in
Re: display offset along with block number in vacuum errors
On Sun, Aug 02, 2020 at 01:02:42PM +0900, Masahiko Sawada wrote: > Thank you for updating the patch! > > Here are my comments on v3 patch: > > @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) > if (PageIsNew(page) || PageIsEmpty(page)) > return false; > > + /* Update error traceback information */ > + update_vacuum_error_info(vacrelstats, &saved_err_info, > + VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno, > + InvalidOffsetNumber); > + > maxoff = PageGetMaxOffsetNumber(page); > for (offnum = FirstOffsetNumber; > offnum <= maxoff; > > You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP > but I think we're already in that phase. I'm okay with explicitly > setting it but on the other hand, we don't set the phase in > heap_page_is_all_visible(). Is there any reason for that? That part was my suggestion, so I can answer that. I added update_vacuum_error_info() to lazy_check_needs_freeze() to allow it to later call restore_vacuum_error_info(). > Also, since we don't reset vacrelstats->offnum at the end of > heap_page_is_all_visible(), if an error occurs by the end of > lazy_vacuum_page(), the caller of heap_page_is_all_visible(), we > report the error context with the last offset number in the page, > making the users confused. So it looks like heap_page_is_all_visible() should also call the update and restore functions. Do you agree with my suggestion that the VACUUM phase should never try to report an offset ? How do you think we can phrase the message to avoid confusion due to 0-based block number and 1-based offset ? Thanks, -- Justin
LDAP check flapping on crake due to race
Hi, There are one or two failures per month on crake. It looks like when authentication is rejected, as expected in the tests, the psql process is exiting, but there is a race where the Perl script still wants to write a dummy query to its stdin (?), so you get: psql: FATAL: LDAP authentication failed for user "test1" ack Broken pipe: write( 13, 'SELECT 1' ) at /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549. Example: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2019-11-10%2023%3A36%3A04 tmunro=> select animal, snapshot, branch from run where fail_stage = 'ldapCheck' order by snapshot desc; animal | snapshot |branch +-+--- crake | 2020-08-02 02:32:30 | REL_13_STABLE crake | 2020-07-22 23:36:04 | REL_12_STABLE crake | 2020-07-14 00:52:04 | REL_13_STABLE crake | 2020-05-15 17:35:05 | REL_11_STABLE crake | 2020-04-07 20:51:03 | REL_12_STABLE mantid | 2020-03-04 18:17:58 | REL_12_STABLE mantid | 2020-03-04 17:59:58 | REL_11_STABLE crake | 2020-01-17 14:33:21 | REL_12_STABLE crake | 2019-11-10 23:36:04 | REL_11_STABLE crake | 2019-09-09 08:48:25 | HEAD crake | 2019-08-05 21:18:23 | REL_12_STABLE crake | 2019-07-19 01:33:31 | HEAD crake | 2019-07-16 01:06:02 | REL_11_STABLE (13 rows) (Ignore mantid, it had a temporary setup problem that was resolved.)
Re: LDAP check flapping on crake due to race
On Sun, Aug 02, 2020 at 05:29:57PM +1200, Thomas Munro wrote: > There are one or two failures per month on crake. It looks like when > authentication is rejected, as expected in the tests, the psql process > is exiting, but there is a race where the Perl script still wants to > write a dummy query to its stdin (?), so you get: > > psql: FATAL: LDAP authentication failed for user "test1" > ack Broken pipe: write( 13, 'SELECT 1' ) at > /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549. Do you suppose a fix like e12a472 would cover this? ("psql <&-" fails with status 1 after successful authentication, and authentication failure gives status 2.)
Re: SimpleLruTruncate() mutual exclusion
On Fri, Jan 31, 2020 at 12:42:13AM -0800, Noah Misch wrote: > On Thu, Jan 30, 2020 at 04:34:33PM +0100, Dmitry Dolgov wrote: > > Just to clarify, since the CF item for this patch was withdrawn > > recently. Does it mean that eventually the thread [1] covers this one > > too? > > > > [1]: > > https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com > > I withdrew $SUBJECT because, if someone reviews one of my patches, I want it > to be the one you cite in [1]. I plan not to commit [1] without a Ready for > Committer, and I plan not to commit $SUBJECT before committing [1]. I would > be willing to commit $SUBJECT without getting a review, however. After further reflection, I plan to push $SUBJECT shortly after 2020-08-13, not waiting for [1] to change status. Reasons: - While I put significant weight on the fact that I couldn't reproduce $SUBJECT problems without first fixing [1], I'm no longer confident of that representing real-world experiences. Reproducing [1] absolutely requires a close approach to a wrap limit, but $SUBJECT problems might not. - Adding locks won't change correct functional behavior to incorrect functional behavior. - By pushing at that particular time, the fix ordinarily will appear in v13.0 before appearing in a back branch release. If problematic contention arises quickly in the field, that's a more-comfortable way to discover it.
Re: OpenSSL randomness seeding
On Thu, Jul 30, 2020 at 11:42:16PM +0200, Daniel Gustafsson wrote: > Somewhat on topic though, 1.1.1 adds a RAND_priv_bytes function for random > numbers that are supposed to be private and extra protected via it's own DRBG. > Maybe we should use that for SCRAM salts etc in case we detect 1.1.1? Maybe. Would you have a separate pg_private_random() function, or just use RAND_priv_bytes() for pg_strong_random()? No pg_strong_random() caller is clearly disinterested in privacy; gen_random_uuid() may come closest.