date:20200801

Re: display offset along with block number in vacuum errors

2020-08-01 Thread Mahendra Singh Thalor

Thanks Justin.

On Sat, 1 Aug 2020 at 11:47, Justin Pryzby  wrote:
>
> On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> > Bcc:
> > Subject: Re: display offset along with block number in vacuum errors
> > Reply-To:
> > In-Reply-To: 
> > 
>
> whoops
>
> > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > > Here:
> > > >
> > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber 
> > > > blkno, Buffer buffer,
> > > > BlockNumber tblk;
> > > > OffsetNumber toff;
> > > > ItemId  itemid;
> > > > +   LVSavedErrInfo loc_saved_err_info;
> > > >
> > > > tblk = 
> > > > ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > > > if (tblk != blkno)
> > > > break;  /* past end of 
> > > > tuples for this block */
> > > > toff = 
> > > > ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > > +
> > > > +   /* Update error traceback information */
> > > > +   update_vacuum_error_info(vacrelstats, 
> > > > &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > > +blkno, 
> > > > toff);
> > > > itemid = PageGetItemId(page, toff);
> > > > ItemIdSetUnused(itemid);
> > > > unused[uncnt++] = toff;
> > > > +
> > > > +   /* Revert to the previous phase information for error 
> > > > traceback */
> > > > +   restore_vacuum_error_info(vacrelstats, 
> > > > &loc_saved_err_info);
> > > > }
> > > >
> > > > I'm not sure why you use restore_vacuum_error_info() at all.  It's 
> > > > already
> > > > called at the end of lazy_vacuum_page() (and others) to allow functions 
> > > > to
> > > > clean up after their own state changes, rather than requiring callers 
> > > > to do it.
> > > > I don't think you should use it in a loop, nor introduce another
> > > > LVSavedErrInfo.
> > > >
> > > > Since phase and blkno are already set, I think you should just set
> > > > vacrelstats->offnum = toff, rather than calling 
> > > > update_vacuum_error_info().
> > > > Similar to whats done in lazy_vacuum_heap():
> > > >
> > > > tblk = 
> > > > ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > > > vacrelstats->blkno = tblk;
> > >
> > > Fixed.
> >
> > I rearead this thread and I think the earlier suggestion from Masahiko was
> > right.  The loop around dead_tuples only does ItemIdSetUnused() which 
> > updates
> > the page, which has already been read from disk.  On my suggestion, your v2
> > patch sets offnum directly, but now I think it's not useful to set at all,
> > since the whole page is manipulated by PageRepairFragmentation() and
> > log_heap_clean().  An error there would misleadingly say "..at offset number
> > MM", but would always show the page's last offset, and not the offset where 
> > an
> > error occured.
>
> This makes me question whether offset numbers are ever useful during
> VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
> internal functions that don't update vacrelstats->offno.  Note that my initial
> problem report that led to the errcontext implementation was an ERROR in heap
> *scan* (not vacuum).  So an offset number at that point would've been
> sufficient.
> https://www.postgresql.org/message-id/20190808012436.gg11...@telsasoft.com
>
> I mentioned that lazy_check_needs_freeze() should save and restore the 
> errinfo,
> so an error in heap_page_prune() (for example) doesn't get the wrong offset
> associated with it.
>
> So see the attached changes on top of your v2 patch.

Actually I was waiting for review comments from committer and other
people also and was planning to send a patch after that. I already
fixed your comments in my offline patch and was waiting for more
comments. Anyway, thanks for delta patch.

Here, attaching v3 patch for review.

>
> postgres=# DROP TABLE tt; CREATE TABLE tt(a int) WITH (fillfactor=90); INSERT 
> INTO tt SELECT generate_series(1,9); VACUUM tt; UPDATE tt SET a=1 WHERE 
> ctid='(345,10)'; UPDATE pg_class SET relfrozenxid=(relfrozenxid::text::int + 
> (1<<30))::int::text::xid WHERE oid='tt'::regclass; VACUUM tt;
> ERROR:  found xmin 1961 from before relfrozenxid 1073743785
> CONTEXT:  while scanning block 345 of relation "public.tt", item offset 205
>
> Hmm.. is it confusing that the block number is 0-indexed but the offset is
> 1-indexed ?
>
> --
> Justin


-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.c


v03_0001-Added-offset-with-block-number-in-vacuum-errcontext.patch
Description: Binary data

Re: psql - improve test coverage from 41% to 88%

2020-08-01 Thread Fabien COELHO




Hello,


This patch no longer applies: http://cfbot.cputube.org/patch_27_2262.log

CF entry has been updated to Waiting on Author.


This patch hasn't been updated and still doesn't apply, do you intend to rebase
it during this commitfest or should we move it to returned with feedback?  It
can always be re-opened at a later date.


As the thread has stalled, I've marked this Returned with Feedback.


Hmmm.

AFAICR the feedback is that the Expect perl module is not welcome, which 
seems to suggest that it would have to be re-implemented somehow. This is 
not my dev philosophy, I won't do that, so I'm sorry to say that psql 
coverage will remain pretty abysmal.


Sigh.

--
Fabien.

Re: Getting rid of some more lseek() calls

2020-08-01 Thread Thomas Munro

On Fri, Feb 14, 2020 at 4:47 AM Alvaro Herrera  wrote:
> So, you said lseek() is faster than fstat, and we would only use the
> latter because we want to avoid the file position jumping ahead, even
> though it's slower.  But if the next read/write is not going to care
> about the file position because pread/pwrite, then why not just do one
> lseek() and not worry about the file position jumping ahead?  Maybe the
> API could offer this as an option; caller can say "I do care about file
> position" (a boolean flag) and then we use fstat; or they can say "I
> don't care" and then we just do a single lseek(SEEK_END).  Of course, in
> Windows we ignore the flag since we can do it in the fast way.
>
> pg_file_size(int fd, bool careful)
>
> Let's have the function comment indicate that -1 is returned in case of
> failure.

Reviving an old thread... yeah, we should probably figure out if we
still want a central fd size function and how it should look, but in
the meantime, we might as well have slru.c using pread/pwrite as
already agreed, so I went ahead and pushed that part.

Re: display offset along with block number in vacuum errors

2020-08-01 Thread Justin Pryzby

On Sat, Aug 01, 2020 at 12:31:53PM +0530, Mahendra Singh Thalor wrote:
> Actually I was waiting for review comments from committer and other
> people also and was planning to send a patch after that. I already
> fixed your comments in my offline patch and was waiting for more
> comments. Anyway, thanks for delta patch.
> 
> Here, attaching v3 patch for review.

I wasn't being impatient but I spent enough time thinking about this that it
made sense to put it in patch form.  Your patch has a couple extaneous changes:

case VACUUM_ERRCB_PHASE_VACUUM_HEAP:
if (BlockNumberIsValid(errinfo->blkno))
+   {
errcontext("while vacuuming block %u of 
relation \"%s.%s\"",
   errinfo->blkno, 
errinfo->relnamespace, errinfo->relname);
+   }
break;
 
case VACUUM_ERRCB_PHASE_VACUUM_INDEX:
@@ -3589,6 +3618,7 @@ vacuum_error_callback(void *arg)
   errinfo->indname, 
errinfo->relnamespace, errinfo->relname);
break;
 
+
case VACUUM_ERRCB_PHASE_INDEX_CLEANUP:
errcontext("while cleaning up index \"%s\" of relation 
\"%s.%s\"",

I would get rid of these by doing like: git reset -p HEAD~1 (say "n" to most
hunks and "y" to reset just the two, above), then git commit --amend (without
-a and without pathnames), then git diff will show local changes (including
those no-longer-committed hunks), which you can git checkout -p (or similar).
I'd be interested to hear if there's a better way.

-- 
Justin

Re: [HACKERS] [PATCH] Generic type subscripting

2020-08-01 Thread Dmitry Dolgov

> On Fri, Jul 31, 2020 at 03:35:22PM -0400, Tom Lane wrote:
>
> I started to look through this again, and really found myself wondering
> why we're going to all this work to invent what are fundamentally pretty
> bogus "features".  The thing that particularly sticks in my craw is the
> 0005 patch, which tries to interpret a subscript of a JSON value as either
> integer or text depending on, seemingly, the phase of the moon.  I don't
> think that will work.  For example, with existing arrays you can do
> something like arraycol['42'] and the unknown-type literal is properly
> cast to an integer.  The corresponding situation with a JSON subscript
> would have no principled resolution.
>
> It doesn't help any that both coercion alternatives are attempted at
> COERCION_ASSIGNMENT level, which makes it noticeably more likely that
> they'll both succeed.  But using ASSIGNMENT level is quite inappropriate
> in any context where it's not 100% certain what the intended type is.
>
> The proposed commit message for 0005 claims that this is somehow improving
> our standards compliance, but I see nothing in the SQL spec suggesting
> that you can subscript a JSON value at all within the SQL language, so
> I think that claim is just false.

It's due to my lack of writing skills. As far as I can remember the
discussion was about JSON path part of the standard, where one allowed
to use float indexes with implementation-defined rounding or truncation.
In this patch series I'm trying to think of JSON subscript as an
equivalent for JSON path, hence this misleading description. Having said
that, I guess the main motivation behind 0005 is performance
improvements. Hopefully Nikita can provide more insights. Overall back
when 0005 patch was suggested its implementation looked reasonable for
me, but I'll review it again.

> Maybe this could be salvaged by flushing 0005 in its current form and
> having the jsonb subscript executor do something like "if the current
> value-to-be-subscripted is a JSON array, then try to convert the textual
> subscript value to an integer".  Not sure about what the error handling
> rules ought to be like, though.

I'm fine with the idea of separating 0005 patch and potentially prusuing
it as an independent item. Just need to rebase 0006, since Pavel
mentioned that it's a reasonable change he would like to see in the
final result.

Re: Comment simplehash/dynahash trade-offs

2020-08-01 Thread James Coleman

On Fri, Jul 31, 2020 at 8:17 PM Thomas Munro  wrote:
>
> On Sat, Aug 1, 2020 at 7:22 AM James Coleman  wrote:
> > [v2 patch set]
>
> I ran it through pgindent which insisted on adding some newlines, I
> manually replaced some spaces with tabs to match nearby lines, I added
> some asterisks in your example function prototypes where  is
> returned because they seemed to be missing, and I pushed this.
> Thanks!


Thanks for reviewing and committing!

James

Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM

2020-08-01 Thread Peter Geoghegan

On Mon, Mar 16, 2020 at 7:08 AM Michail Nikolaev
 wrote:
> While working on support for index hint bits on standby [1] I have
> started to getting
> "ERROR:  could not find left sibling of block  in index "
> during stress tests.

I reproduced the bug using your steps (including the pg_usleep() hack)
today. It was fairly easy to confirm the problem.

Attached is a revised version of your patch. It renames the buffer
variable names, and changes the precise order in which the locks are
released (for consistency with _bt_unlink_halfdead_page()). It also
changes the comments, and adds a new paragraph to the README. The
existing paragraph was about cross-level differences, this new one is
about same-level differences (plus a second new paragraph to talk
about backwards scans + page deletion).

This revised version is essentially the same as your original patch --
I have only made superficial adjuments. I think that I will be able to
commit this next week, barring objections.

--
Peter Geoghegan

v3-0001-Avoid-backwards-scan-page-deletion-standby-race.patch
Description: Binary data

Re: psql - improve test coverage from 41% to 88%

2020-08-01 Thread Daniel Gustafsson

> On 1 Aug 2020, at 09:06, Fabien COELHO  wrote:
> 
> Hello,
> 
 This patch no longer applies: http://cfbot.cputube.org/patch_27_2262.log
 
 CF entry has been updated to Waiting on Author.
>>> 
>>> This patch hasn't been updated and still doesn't apply, do you intend to 
>>> rebase
>>> it during this commitfest or should we move it to returned with feedback?  
>>> It
>>> can always be re-opened at a later date.
>> 
>> As the thread has stalled, I've marked this Returned with Feedback.
> 
> Hmmm.
> 
> AFAICR the feedback is that the Expect perl module is not welcome, which 
> seems to suggest that it would have to be re-implemented somehow. This is not 
> my dev philosophy, I won't do that, so I'm sorry to say that psql coverage 
> will remain pretty abysmal.

Re-reading this thread, I see no complaints about introducing a dependency on
Expect.  The feedback returned in this case is that the patch hasn't applied
since March, and that the patch is more than welcome to be re-entered in the
next CF once it does.

cheers ./daniel

Re: Rethinking opclass member checks and dependency strength

2020-08-01 Thread Tom Lane

Anastasia Lubennikova  writes:
> On 31.03.2020 23:45, Tom Lane wrote:
>> Still haven't got a better naming idea, but in the meantime here's
>> a rebase to fix a conflict with 612a1ab76.

> Maybe "amadjustmembers" will work?

Not having any better idea, I adopted that one.

> I've looked through the patch and noticed this comment:
> +                /* Probably we should throw error here */

> I suggest adding an ERROR or maybe Assert, so that future developers 
> wouldn't
> forget about setting dependencies. Other than that, the patch looks good 
> to me.

I'd figured that adding error checks could be left for a second pass,
but there's no strong reason not to insert these particular checks
now ... and indeed, doing so showed me that the patch hadn't been
updated to cover the recent addition of opclass options procs :-(.
So I fixed that and pushed it.

regards, tom lane

Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM

2020-08-01 Thread Daniel Gustafsson

> On 1 Aug 2020, at 20:30, Peter Geoghegan  wrote:

> This revised version is essentially the same as your original patch --
> I have only made superficial adjuments. I think that I will be able to
> commit this next week, barring objections.

As we're out of time for the July CF where this is registered, I've moved this
to 2020-09.  Based on the above comment, I've marked it Ready for Committer.

cheers ./daniel

Re: psql - improve test coverage from 41% to 88%

2020-08-01 Thread Tom Lane

Daniel Gustafsson  writes:
> On 1 Aug 2020, at 09:06, Fabien COELHO  wrote:
>> AFAICR the feedback is that the Expect perl module is not welcome, which 
>> seems to suggest that it would have to be re-implemented somehow. This is 
>> not my dev philosophy, I won't do that, so I'm sorry to say that psql 
>> coverage will remain pretty abysmal.

> Re-reading this thread, I see no complaints about introducing a dependency on
> Expect.  The feedback returned in this case is that the patch hasn't applied
> since March, and that the patch is more than welcome to be re-entered in the
> next CF once it does.

Personally, I'd object to introducing a hard dependency on Expect, as
there are no doubt a lot of developers and buildfarm members who don't
have that installed.  But I see no reason we couldn't skip some tests
if it's lacking, as we're already doing with IO::Pty in
010_tab_completion.pl.

That does raise the question of whether Expect makes things enough
easier than raw IO::Pty that it's worth adding that dependency (and
hence foregoing the tests on some machines).  But I'm prepared to be
convinced on that point.

regards, tom lane

Re: Autovacuum on partitioned table (autoanalyze)

2020-08-01 Thread Daniel Gustafsson

> On 6 Jul 2020, at 12:35, yuzuko  wrote:
> 
>> On Wed, Jul 1, 2020 at 6:26 PM Daniel Gustafsson  wrote:
>> 
>>> On 21 Apr 2020, at 18:21, yuzuko  wrote:
>> 
>>> I'll update the patch soon.
>> 
>> Do you have an updated version to submit?  The previous patch no longer 
>> applies
>> to HEAD, so I'm marking this entry Waiting on Author in the meantime.
>> 
> Thank you for letting me know.
> I attach the latest patch applies to HEAD.

This version seems to fail under Werror which is used in the Travis builds:

autovacuum.c: In function ‘relation_needs_vacanalyze’:
autovacuum.c:3117:59: error: ‘reltuples’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
   anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
   ^
autovacuum.c:2972:9: note: ‘reltuples’ was declared here
  float4  reltuples;  /* pg_class.reltuples */
 ^

I've moved this patch to the next commitfest, but kept the status as Waiting on
Author.  Please submit a new version of the patch.

cheers ./daniel

Re: Default gucs for EXPLAIN

2020-08-01 Thread Daniel Gustafsson

> On 10 Jul 2020, at 15:30, Peter Eisentraut  
> wrote:
> 
> On 2020-05-23 11:14, Vik Fearing wrote:
>> Here is a patch to provide default gucs for EXPLAIN options.
>> I have two goals with this patch.  The first is that I personally
>> *always* want BUFFERS turned on, so this would allow me to do it without
>> typing it every time.
> 
> There was a lot of opposition to the approach taken by this patch, but there 
> was a lot of support turning BUFFERS on by default.  Would you like to submit 
> a patch for that?

My reading of this thread and the above that the patch, and CF entry, as it
stands should be rejected - but that a separate patch for turning BUFFERS on by
default would be highly appreciated.  Unless objections I'll go do that in the
CF app for 2020-07.

cheers ./daniel

Re: Default gucs for EXPLAIN

2020-08-01 Thread Justin Pryzby

On Sun, Aug 02, 2020 at 12:00:56AM +0200, Daniel Gustafsson wrote:
> > On 10 Jul 2020, at 15:30, Peter Eisentraut 
> >  wrote:
> > 
> > On 2020-05-23 11:14, Vik Fearing wrote:
> >> Here is a patch to provide default gucs for EXPLAIN options.
> >> I have two goals with this patch.  The first is that I personally
> >> *always* want BUFFERS turned on, so this would allow me to do it without
> >> typing it every time.
> > 
> > There was a lot of opposition to the approach taken by this patch, but 
> > there was a lot of support turning BUFFERS on by default.  Would you like 
> > to submit a patch for that?
> 
> My reading of this thread and the above that the patch, and CF entry, as it
> stands should be rejected - but that a separate patch for turning BUFFERS on 
> by
> default would be highly appreciated.  Unless objections I'll go do that in the
> CF app for 2020-07.

Sounds right.  I have a patch to enable buffers by default, but that raises the
issue about how to avoid machine-dependent output when we run explain ANALYZE.
One option is to add "buffers off" to the existing incantation of (COSTS OFF,
TIMING OFF, SUMMARY OFF).  But I think we should just add a new "REGRESS"
option, which does that and any future similar thing (like WAL OFF).

I have a patch to do that, too, which I included in an early version of this
series (which I since changed to run explain without analyze).  That handles
most but not all machine-dependant output.
https://www.postgresql.org/message-id/20200306213310.GM684%40telsasoft.com

-- 
Justin

Re: Comment simplehash/dynahash trade-offs

2020-08-01 Thread David Rowley

On Sat, 1 Aug 2020 at 12:17, Thomas Munro  wrote:
>
> On Sat, Aug 1, 2020 at 7:22 AM James Coleman  wrote:
> > [v2 patch set]
>
> I ran it through pgindent which insisted on adding some newlines, I
> manually replaced some spaces with tabs to match nearby lines, I added
> some asterisks in your example function prototypes where  is
> returned because they seemed to be missing, and I pushed this.
> Thanks!

I was just reading over this and wondered about the following:

+ *   The element type is required to contain a "uint32 status" member.

I see that PagetableEntry does not follow this and I also didn't
follow it when writing the Result Cache patch in [1]. I managed to
shrink the struct I was using for the hash table by 4 bytes by using a
char instead of an int. That sounds like a small amount of memory, but
it did result in much better cache hit ratios in the patch

Maybe it would be better just to get rid of the enum and just #define
the values. It seems unlikely that we're every going to need many more
states than what are there already, let along more than, say 127 of
them. It does look like manifest_file could be shrunk down a bit too
by making the status field a char.

[1] 
https://www.postgresql.org/message-id/flat/caaphdvrpcqyqdwergywx8j+2dlungxu+fosbq1uscxrunyx...@mail.gmail.com

Re: Re: fixing pg_ctl with relative paths

2020-08-01 Thread Chapman Flack

On 07/01/13 20:10, Josh Kupershmidt wrote:
> I am eager to see the relative paths issue fixed, but maybe we need to
> bite the bullet and sort out the escaping of command-line options in
> the rest of pg_ctl first, so that a DataDir like "/tmp/here's a \"
> quote" can consistently be used by pg_ctl {start|stop|restart} before
> we can fix this wart.

It was timely to see this thread recently revived, as I had only just
recently needed to contend with the same escaping issue while writing a
PostgresNode-like test harness for PL/Java, where I discovered I have
to pass -o values pre-transformed to pg_ctl, and even have to do that
platform-sensitively, to pre-transform them according to the rules for
Bourne shell or those for cmd.exe.

I looked at the history of that code in pg_ctl and saw that it went
all the way back, so I assumed that any proposal to fix it would probably
be met with "it has always been that way and anybody calling it with
arbitrary arguments must be pre-transforming them anyway and it would be
bad to break that." (And anyway, my test harness thing is now yet one more
thing that depends on the current behavior.)

But would it be worthwhile to perhaps make a start, add an option
(non-default at first) that changes to an implementation that passes
values transparently and isn't injection-prone?

(I use "injection-prone" not because I'd be especially concerned about
command injection by anybody who can already run pg_ctl, just because
it's an economical way to describe what pg_ctl currently does.)

Regards,
-Chap

Re: Comment simplehash/dynahash trade-offs

2020-08-01 Thread David Rowley

On Sun, 2 Aug 2020 at 11:16, David Rowley  wrote:
> Maybe it would be better just to get rid of the enum and just #define
> the values. It seems unlikely that we're every going to need many more
> states than what are there already, let along more than, say 127 of
> them. It does look like manifest_file could be shrunk down a bit too
> by making the status field a char.

This didn't turn out quite as pretty as I had imagined.  I needed to
leave the two statuses defined in simplehash.h so that callers could
make use of them. (Result Cache will do this).

The change here would be callers would need to use SH_STATUS_IN_USE
rather than _SH_IN_USE.

I'm not really that sold on doing things this way. I really just don't
want to have to make my status field a uint32 in Result Cache per what
the new comment states we must do. If there's a nicer way, then
perhaps that's worth considering.

David

simplehash_fix.patch
Description: Binary data

Re: Use of "long" in incremental sort code

2020-08-01 Thread David Rowley

On Sat, 1 Aug 2020 at 02:02, James Coleman  wrote:
> I'd previously attached a patch [1], and there seemed to be agreement
> it was reasonable (lightly so, but I also didn't see any
> disagreement); would someone be able to either commit the change or
> provide some additional feedback?

It looks fine to me. Pushed.

David

> [1]: 
> https://www.postgresql.org/message-id/CAAaqYe_Y5zwCTFCJeso7p34yJgf4khR8EaKeJtGd%3DQPudOad6A%40mail.gmail.com

Re: display offset along with block number in vacuum errors

2020-08-01 Thread Masahiko Sawada

On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor  wrote:
>
> Thanks Justin.
>
> On Sat, 1 Aug 2020 at 11:47, Justin Pryzby  wrote:
> >
> > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> > > Bcc:
> > > Subject: Re: display offset along with block number in vacuum errors
> > > Reply-To:
> > > In-Reply-To: 
> > > 
> >
> > whoops
> >
> > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > > > Here:
> > > > >
> > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber 
> > > > > blkno, Buffer buffer,
> > > > > BlockNumber tblk;
> > > > > OffsetNumber toff;
> > > > > ItemId  itemid;
> > > > > +   LVSavedErrInfo loc_saved_err_info;
> > > > >
> > > > > tblk = 
> > > > > ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > > > > if (tblk != blkno)
> > > > > break;  /* past end 
> > > > > of tuples for this block */
> > > > > toff = 
> > > > > ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > > > +
> > > > > +   /* Update error traceback information */
> > > > > +   update_vacuum_error_info(vacrelstats, 
> > > > > &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > > > +
> > > > > blkno, toff);
> > > > > itemid = PageGetItemId(page, toff);
> > > > > ItemIdSetUnused(itemid);
> > > > > unused[uncnt++] = toff;
> > > > > +
> > > > > +   /* Revert to the previous phase information for error 
> > > > > traceback */
> > > > > +   restore_vacuum_error_info(vacrelstats, 
> > > > > &loc_saved_err_info);
> > > > > }
> > > > >
> > > > > I'm not sure why you use restore_vacuum_error_info() at all.  It's 
> > > > > already
> > > > > called at the end of lazy_vacuum_page() (and others) to allow 
> > > > > functions to
> > > > > clean up after their own state changes, rather than requiring callers 
> > > > > to do it.
> > > > > I don't think you should use it in a loop, nor introduce another
> > > > > LVSavedErrInfo.
> > > > >
> > > > > Since phase and blkno are already set, I think you should just set
> > > > > vacrelstats->offnum = toff, rather than calling 
> > > > > update_vacuum_error_info().
> > > > > Similar to whats done in lazy_vacuum_heap():
> > > > >
> > > > > tblk = 
> > > > > ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > > > > vacrelstats->blkno = tblk;
> > > >
> > > > Fixed.
> > >
> > > I rearead this thread and I think the earlier suggestion from Masahiko was
> > > right.  The loop around dead_tuples only does ItemIdSetUnused() which 
> > > updates
> > > the page, which has already been read from disk.  On my suggestion, your 
> > > v2
> > > patch sets offnum directly, but now I think it's not useful to set at all,
> > > since the whole page is manipulated by PageRepairFragmentation() and
> > > log_heap_clean().  An error there would misleadingly say "..at offset 
> > > number
> > > MM", but would always show the page's last offset, and not the offset 
> > > where an
> > > error occured.
> >
> > This makes me question whether offset numbers are ever useful during
> > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
> > internal functions that don't update vacrelstats->offno.  Note that my 
> > initial
> > problem report that led to the errcontext implementation was an ERROR in 
> > heap
> > *scan* (not vacuum).  So an offset number at that point would've been
> > sufficient.
> > https://www.postgresql.org/message-id/20190808012436.gg11...@telsasoft.com
> >
> > I mentioned that lazy_check_needs_freeze() should save and restore the 
> > errinfo,
> > so an error in heap_page_prune() (for example) doesn't get the wrong offset
> > associated with it.
> >
> > So see the attached changes on top of your v2 patch.
>
> Actually I was waiting for review comments from committer and other
> people also and was planning to send a patch after that. I already
> fixed your comments in my offline patch and was waiting for more
> comments. Anyway, thanks for delta patch.
>
> Here, attaching v3 patch for review.

Thank you for updating the patch!

Here are my comments on v3 patch:

@@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
if (PageIsNew(page) || PageIsEmpty(page))
return false;

+   /* Update error traceback information */
+   update_vacuum_error_info(vacrelstats, &saved_err_info,
+   VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
+   InvalidOffsetNumber);
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
 offnum <= maxoff;

You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
but I think we're already in

Re: display offset along with block number in vacuum errors

2020-08-01 Thread Justin Pryzby

On Sun, Aug 02, 2020 at 01:02:42PM +0900, Masahiko Sawada wrote:
> Thank you for updating the patch!
> 
> Here are my comments on v3 patch:
> 
> @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
> if (PageIsNew(page) || PageIsEmpty(page))
> return false;
> 
> +   /* Update error traceback information */
> +   update_vacuum_error_info(vacrelstats, &saved_err_info,
> +   VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
> +   InvalidOffsetNumber);
> +
> maxoff = PageGetMaxOffsetNumber(page);
> for (offnum = FirstOffsetNumber;
>  offnum <= maxoff;
> 
> You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
> but I think we're already in that phase. I'm okay with explicitly
> setting it but on the other hand, we don't set the phase in
> heap_page_is_all_visible(). Is there any reason for that?

That part was my suggestion, so I can answer that.  I added
update_vacuum_error_info() to lazy_check_needs_freeze() to allow it to later
call restore_vacuum_error_info().

> Also, since we don't reset vacrelstats->offnum at the end of
> heap_page_is_all_visible(), if an error occurs by the end of
> lazy_vacuum_page(), the caller of  heap_page_is_all_visible(), we
> report the error context with the last offset number in the page,
> making the users confused.

So it looks like heap_page_is_all_visible() should also call the update and
restore functions.

Do you agree with my suggestion that the VACUUM phase should never try to
report an offset ?

How do you think we can phrase the message to avoid confusion due to 0-based
block number and 1-based offset ?

Thanks,
-- 
Justin

LDAP check flapping on crake due to race

2020-08-01 Thread Thomas Munro

Hi,

There are one or two failures per month on crake.  It looks like when
authentication is rejected, as expected in the tests, the psql process
is exiting, but there is a race where the Perl script still wants to
write a dummy query to its stdin (?), so you get:

psql: FATAL:  LDAP authentication failed for user "test1"
ack Broken pipe: write( 13, 'SELECT 1' ) at
/usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.

Example:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2019-11-10%2023%3A36%3A04

tmunro=> select animal, snapshot, branch from run where fail_stage =
'ldapCheck' order by snapshot desc;
 animal |  snapshot   |branch
+-+---
 crake  | 2020-08-02 02:32:30 | REL_13_STABLE
 crake  | 2020-07-22 23:36:04 | REL_12_STABLE
 crake  | 2020-07-14 00:52:04 | REL_13_STABLE
 crake  | 2020-05-15 17:35:05 | REL_11_STABLE
 crake  | 2020-04-07 20:51:03 | REL_12_STABLE
 mantid | 2020-03-04 18:17:58 | REL_12_STABLE
 mantid | 2020-03-04 17:59:58 | REL_11_STABLE
 crake  | 2020-01-17 14:33:21 | REL_12_STABLE
 crake  | 2019-11-10 23:36:04 | REL_11_STABLE
 crake  | 2019-09-09 08:48:25 | HEAD
 crake  | 2019-08-05 21:18:23 | REL_12_STABLE
 crake  | 2019-07-19 01:33:31 | HEAD
 crake  | 2019-07-16 01:06:02 | REL_11_STABLE
(13 rows)

(Ignore mantid, it had a temporary setup problem that was resolved.)

Re: LDAP check flapping on crake due to race

2020-08-01 Thread Noah Misch

On Sun, Aug 02, 2020 at 05:29:57PM +1200, Thomas Munro wrote:
> There are one or two failures per month on crake.  It looks like when
> authentication is rejected, as expected in the tests, the psql process
> is exiting, but there is a race where the Perl script still wants to
> write a dummy query to its stdin (?), so you get:
> 
> psql: FATAL:  LDAP authentication failed for user "test1"
> ack Broken pipe: write( 13, 'SELECT 1' ) at
> /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549.

Do you suppose a fix like e12a472 would cover this?  ("psql <&-" fails with
status 1 after successful authentication, and authentication failure gives
status 2.)

Re: SimpleLruTruncate() mutual exclusion

2020-08-01 Thread Noah Misch

On Fri, Jan 31, 2020 at 12:42:13AM -0800, Noah Misch wrote:
> On Thu, Jan 30, 2020 at 04:34:33PM +0100, Dmitry Dolgov wrote:
> > Just to clarify, since the CF item for this patch was withdrawn
> > recently. Does it mean that eventually the thread [1] covers this one
> > too?
> > 
> > [1]: 
> > https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com
> 
> I withdrew $SUBJECT because, if someone reviews one of my patches, I want it
> to be the one you cite in [1].  I plan not to commit [1] without a Ready for
> Committer, and I plan not to commit $SUBJECT before committing [1].  I would
> be willing to commit $SUBJECT without getting a review, however.

After further reflection, I plan to push $SUBJECT shortly after 2020-08-13,
not waiting for [1] to change status.  Reasons:

- While I put significant weight on the fact that I couldn't reproduce
  $SUBJECT problems without first fixing [1], I'm no longer confident of that
  representing real-world experiences.  Reproducing [1] absolutely requires a
  close approach to a wrap limit, but $SUBJECT problems might not.

- Adding locks won't change correct functional behavior to incorrect
  functional behavior.

- By pushing at that particular time, the fix ordinarily will appear in v13.0
  before appearing in a back branch release.  If problematic contention arises
  quickly in the field, that's a more-comfortable way to discover it.

Re: OpenSSL randomness seeding

2020-08-01 Thread Noah Misch

On Thu, Jul 30, 2020 at 11:42:16PM +0200, Daniel Gustafsson wrote:
> Somewhat on topic though, 1.1.1 adds a RAND_priv_bytes function for random
> numbers that are supposed to be private and extra protected via it's own DRBG.
> Maybe we should use that for SCRAM salts etc in case we detect 1.1.1?

Maybe.  Would you have a separate pg_private_random() function, or just use
RAND_priv_bytes() for pg_strong_random()?  No pg_strong_random() caller is
clearly disinterested in privacy; gen_random_uuid() may come closest.

Re: display offset along with block number in vacuum errors

Re: psql - improve test coverage from 41% to 88%

Re: Getting rid of some more lseek() calls

Re: display offset along with block number in vacuum errors

Re: [HACKERS] [PATCH] Generic type subscripting

Re: Comment simplehash/dynahash trade-offs

Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM

Re: psql - improve test coverage from 41% to 88%

Re: Rethinking opclass member checks and dependency strength

Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM

Re: psql - improve test coverage from 41% to 88%

Re: Autovacuum on partitioned table (autoanalyze)

Re: Default gucs for EXPLAIN

Re: Default gucs for EXPLAIN

Re: Comment simplehash/dynahash trade-offs

Re: Re: fixing pg_ctl with relative paths

Re: Comment simplehash/dynahash trade-offs

Re: Use of "long" in incremental sort code

Re: display offset along with block number in vacuum errors

Re: display offset along with block number in vacuum errors

LDAP check flapping on crake due to race

Re: LDAP check flapping on crake due to race

Re: SimpleLruTruncate() mutual exclusion

Re: OpenSSL randomness seeding

24 matches

Site Navigation

Mail list logo

Footer information