Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-03-08 Thread Fabien COELHO



Hello Justin,

Patch series applies cleanly. The last status compiles and passes "make 
check". A few more comments:


* v8.[123] ok.

* v8.4

Avoid using the type name as a field name? "enum dir_action dir_action;" 
-> "enum dir_action action", or maybe rename "dir_action" enum 
"dir_action_t".


About pg_ls_dir:

"if (!fctx->dirdesc)" I do not think that is true even if AllocateDir 
failed, the list exists anyway. ISTM it should be linitial which is NULL 
in that case.


Given the overlap between pg_ls_dir and pg_ls_dir_files, ISTM that the 
former should call the slightly extended later with appropriate flags.


About populate_paths:

function is a little bit strange to me, ISTM it would deserve more 
comments.


I'm not sure the name reflect what it does. For instance, ISTM that it 
does one thing, but the name is plural. Maybe "move_to_next_path" or 
"update_current_path" or something?


It returns an int which can only be 0 or 1, which smells like an bool. 
What is this int/bool is not told in the function head comment. I guess it 
is whether the path was updated. When it returns false, the list length is 
down to one.


Shouldn't AllocateDir be tested for bad result? Maybe it is a dir but you 
do not have perms to open it? Or give a comment about why it cannot 
happen?


later, good, at least the function is called, even if it is only for an 
error case. Maybe some non empty coverage tests could be added with a 
"count(*) > 0" on not is_dir or maybe "count(*) = 0" on is_dir, for 
instance?


  (SELECT oid FROM pg_tablespace b WHERE b.spcname='regress_tblspace'
  UNION SELECT 0 ORDER BY 1 DESC LIMIT 1)b

The 'b' glued to the ')' looks pretty strange. I'd suggest ") AS b". 
Reusing the same alias twice could be avoided for clarity, maybe.


* v8.[56]

I'd say that a behavior change which adds a column and a possibly a few 
rows is ok, especially as the tmpdir contains subdirs now.


--
Fabien.




Re: Psql patch to show access methods info

2020-03-08 Thread Alexander Korotkov
On Fri, Mar 6, 2020 at 11:46 AM Alexander Korotkov
 wrote:
> On Fri, Mar 6, 2020 at 7:10 AM vignesh C  wrote:
> > I feel your explanation sounds fair to me.
>
> Thanks.
>
> I've also revised tab-completion code.  I'm going to push this if no 
> objections.

So, pushed!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Allow to_date() and to_timestamp() to accept localized names

2020-03-08 Thread Juan José Santamaría Flecha
On Sun, Mar 8, 2020 at 3:48 AM Tom Lane  wrote:

> James Coleman  writes:
> > On Sat, Mar 7, 2020 at 9:31 PM Tom Lane  wrote:
> >> Looks like you may not have Turkish locale installed?  Try
> >> locale -a | grep tr_TR
>
> > Hmm, when I grep the locales I see `tr_TR.utf8` in the output. I assume
> the
> > utf8 version is acceptable? Or is there a non-utf8 variant?
>
> Hmm ... I'm far from an expert on the packaging of locale data, but
> the simplest explanation I can think of is that the tr_TR locale exists
> to some extent on your machine but the LC_TIME component of that is
> missing.
>

 AFAICS, the locale 'tr_TR' uses the encoding ISO-8859-9 (LATIN5), is not
the same as 'tr_TR.utf8'.


> BTW, what platform are you using anyway?
>

I have just checked in a Debian Stretch

Regards,

Juan José Santamaría Flecha


RE: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-03-08 Thread Floris Van Nee
> Attached is v5, which inlines in a targeted fashion, pretty much in the same
> way as the earliest version. This is the same as v4 in every other way.
> Perhaps you can test this.
> 

Thank you for the new patch. With the new one I am indeed able to reproduce a 
performance increase. It is very difficult to reliably reproduce it when 
running on a large number of cores though, due to the NUMA architecture.
For tests with a small number of cores, I pin all of them to the same node. 
With that, I see a significant performance increase for v5 compared to master. 
However, if I pin pgbench to a different node than the node that Postgres is 
pinned to, this leads to a 20% performance degradation compared to having all 
of them on the same node, as well as the stddev increasing by a factor of 2 
(regardless of patch). With that, it becomes very difficult to see any kind of 
performance increase due to the patch. For a large number of pgbench workers, I 
cannot specifically pin the pgbench worker on the same node as the Postgres 
backend connection it's handling. Leaving it to the OS gives very unreliable 
results - when I run the 30 workers / 30 connections test, I sometimes see 
periods of up to 30 minutes where it's doing it 'correctly', but it could also 
randomly run at the -20% performance for a long time. So far my best bet at 
explaining this is the NUMA performance hit. I'd like to be able to 
specifically schedule some Postgres backends to run on one node, while other 
Postgres backends run on a different node, but this isn't straightforward.

For now, I see no issues with the patch though. However, in real life 
situations there may be other, more important, optimizations for people that 
use big multi-node machines.

Thoughts?

-Floris



Re: Allow to_date() and to_timestamp() to accept localized names

2020-03-08 Thread James Coleman
On Sat, Mar 7, 2020 at 9:48 PM Tom Lane  wrote:

> James Coleman  writes:
> > On Sat, Mar 7, 2020 at 9:31 PM Tom Lane  wrote:
> >> Looks like you may not have Turkish locale installed?  Try
> >> locale -a | grep tr_TR
>
> > Hmm, when I grep the locales I see `tr_TR.utf8` in the output. I assume
> the
> > utf8 version is acceptable? Or is there a non-utf8 variant?
>
> Hmm ... I'm far from an expert on the packaging of locale data, but
> the simplest explanation I can think of is that the tr_TR locale exists
> to some extent on your machine but the LC_TIME component of that is
> missing.
>
> Do you get different results from "date" depending on the locale?
> I get
>
> $ LANG=C date
> Sat Mar  7 21:44:24 EST 2020
> $ LANG=tr_TR.utf8 date
> Cts Mar  7 21:44:26 EST 2020
>

$ LANG=C date
Sun Mar  8 09:27:50 EDT 2020
$ LANG=tr_TR.utf8 date
Paz Mar  8 09:27:54 EDT 2020
$ LANG=tr_TR date
Sun Mar  8 09:27:57 EDT 2020

I'm curious if you get something different for that last one (no utf8
qualifier).

on my Fedora 30 box.
>
> Another possibility perhaps is that you have partial locale settings
> in your environment that are bollixing the test.  Try
>
> $ env | grep ^LANG
>

This gives me:
LANG=en_US.UTF-8
LANGUAGE=en_US


> $ env | grep ^LC_
>

Nothing for this.


> If there's more than one relevant environment setting, and they
> don't all agree, I'm not sure what would happen with our
> regression tests.
>

There's LANG and LANGUAGE but they look like they effectively agree to me?


> BTW, what platform are you using anyway?
>

ElementaryOS 5.1 Hera, which is built on top of Ubuntu 18.04.3 LTS (and
Linux 4.15.0-72-generic).

This suggests I have the standard Ubuntu Turkish language packages
installed:

$ dpkg -l | grep language-pack-tr
ii  language-pack-tr  1:18.04+20180712
   all  translation updates for language Turkish
ii  language-pack-tr-base 1:18.04+20180712
   all  translations for language Turkish

James


Re: Index Skip Scan

2020-03-08 Thread Dmitry Dolgov
> On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:
>
> I've been looking over v32 of the patch and have a few comments
> regarding the planner changes.

Thanks for the commentaries!

> I think the changes in create_distinct_paths() need more work.  The
> way I think this should work is that create_distinct_paths() gets to
> know exactly nothing about what path types support the elimination of
> duplicate values.  The Path should carry the UniqueKeys so that can be
> determined. In create_distinct_paths() you should just be able to make
> use of those paths, which should already have been created when
> creating index paths for the rel due to PlannerInfo's query_uniquekeys
> having been set.

Just for me to clarify. The idea is to "move" information about what
path types support skipping into UniqueKeys (derived from PlannerInfo's
query_uniquekeys), but other checks (e.g. if index am supports that)
still perform in create_distinct_paths?

> Also, should create_grouping_paths() be getting the same code?
> Jesper's UniqueKey patch seems to set query_uniquekeys when there's a
> GROUP BY with no aggregates. So it looks like he has intended that
> something like:
>
> SELECT x FROM t GROUP BY x;
>
> should work the same way as
>
> SELECT DISTINCT x FROM t;
>
> but the 0002 patch does not make this work.  Has that just been overlooked?

I believe it wasn't overlooked in 0002 patch, but rather added just in
case in 0001. I guess there are no theoretical problems in implementing
it, but since we wanted to keep scope of the patch under control and
concentrate on the existing functionality it probably makes sense to
plan it as one of the next steps?

> There's also some weird looking assumptions that an EquivalenceMember
> can only be a Var in create_distinct_paths(). I think you're only
> saved from crashing there because a ProjectionPath will be created
> atop of the IndexPath to evaluate expressions, in which case you're
> not seeing the IndexPath.  This results in the optimisation not
> working in cases like:
>
> postgres=# create table t (a int); create index on t ((a+1)); explain
> select distinct a+1 from t;
> CREATE TABLE
> CREATE INDEX
> QUERY PLAN
> ---
>  HashAggregate  (cost=48.25..50.75 rows=200 width=4)
>Group Key: (a + 1)
>->  Seq Scan on t  (cost=0.00..41.88 rows=2550 width=4)

Yes, I need to fix it.

> Using unique paths as I mentioned above should see that fixed.

I'm a bit confused about this statement, how exactly unique paths should
fix this?




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-03-08 Thread Julien Rouhaud
On Fri, Jun 28, 2019 at 11:49:53AM -0700, Peter Geoghegan wrote:
> On Tue, Mar 19, 2019 at 12:38 PM legrand legrand
>  wrote:
> > Would it make sense to add it in auto explain ?
> > I don't know for explain itself, but maybe ...
>
> I think that it should appear in EXPLAIN. pg_stat_statements already
> cannot have a query hash of zero, so it might be okay to display it
> only when its value is non-zero.

I had forgotten about this.  After looking at it, I can see a few issues.

For now post_parse_analyze_hook isn't called for the underlying statement, so
we don't have the queryid.  And we can't compute the queryid for the underlying
query in the initial post_parse_analyze_hook call as we don't want the executor
to have a queryid set in that case to avoid cumulating counters for both the
explain and the query.

We could add an extra call in ExplainQuery, but this will be ignored by
pg_stat_statements unless you set pg_stat_statements.track to all. Also,
pgss_post_parse_analyze will try to record an entry with the normalized query
text if no one exists yet and if any constant where removed.  The problem is
that, as I already mentioned in [1], the underlying query doesn't have
query_location or query_len valued, so the recorded query text will at least
contain the explain part of the input query.

[1] 
https://www.postgresql.org/message-id/CAOBaU_Y-y%2BVOhTZgDOuDk6-9V72-ZXdWccXo_kx0P4DDBEEh9A%40mail.gmail.com




Re: Allow to_date() and to_timestamp() to accept localized names

2020-03-08 Thread James Coleman
On Sun, Mar 8, 2020 at 7:17 AM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

>
>
> On Sun, Mar 8, 2020 at 3:48 AM Tom Lane  wrote:
>
>> James Coleman  writes:
>> > On Sat, Mar 7, 2020 at 9:31 PM Tom Lane  wrote:
>> >> Looks like you may not have Turkish locale installed?  Try
>> >> locale -a | grep tr_TR
>>
>> > Hmm, when I grep the locales I see `tr_TR.utf8` in the output. I assume
>> the
>> > utf8 version is acceptable? Or is there a non-utf8 variant?
>>
>> Hmm ... I'm far from an expert on the packaging of locale data, but
>> the simplest explanation I can think of is that the tr_TR locale exists
>> to some extent on your machine but the LC_TIME component of that is
>> missing.
>>
>
>  AFAICS, the locale 'tr_TR' uses the encoding ISO-8859-9 (LATIN5), is not
> the same as 'tr_TR.utf8'.
>

The test name implies it's about utf8, though, which makes me wonder if the
test should be testing utf8 instead?

That being said, a bit more googling based on your node about the proper
ISO encoding turned up this page: https://unix.stackexchange.com/a/446762

And I confirmed that the locale you mentioned is available:
$ grep "tr_TR" /usr/share/i18n/SUPPORTED
tr_TR.UTF-8 UTF-8
tr_TR ISO-8859-9

So I tried:
echo tr_TR.ISO-8859-9 >> /var/lib/locales/supported.d/local # In a root
session
sudo dpkg-reconfigure locales

That didn't seem to fix it, though `locale -a` still only lists tr_TR.utf8,
so I'm still at a loss, and also unclear why a test names utf8 is actually
relying on an ISO encoding.

James


Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-03-08 Thread James Coleman
On Saturday, March 7, 2020, Dilip Kumar  wrote:

> On Sat, Mar 7, 2020 at 9:59 AM Dilip Kumar  wrote:
> >
> > On Sat, Mar 7, 2020 at 12:30 AM Andres Freund 
> wrote:
> > >
> > > Hi,
> > >
> > > On 2020-01-08 18:06:52 +0530, Dilip Kumar wrote:
> > > > On Wed, 8 Jan 2020 at 5:28 PM, Heikki Linnakangas 
> wrote:
> > > >
> > > > > On 25/11/2019 05:52, Dilip Kumar wrote:
> > > > > > In logical decoding, while sending the changes to the output
> plugin we
> > > > > > need to arrange them in the LSN order.  But, if there is only one
> > > > > > transaction which is a very common case then we can avoid
> building the
> > > > > > binary heap.  A small patch is attached for the same.
> > > > >
> > > > > Does this make any measurable performance difference? Building a
> > > > > one-element binary heap seems pretty cheap.
> > > >
> > > >
> > > > I haven’t really measured the performance for this.  I will try to
> do that
> > > > next week.  Thanks for looking into this.
> > >
> > > Did you do that?
> >
> > I tried once in my local machine but could not produce consistent
> > results.  I will try this once again in the performance machine and
> > report back.
>
> I have tried to decode changes for the 100,000 small transactions and
> measured the time with head vs patch.  I did not observe any
> significant gain.
>
> Head
> ---
> 519ms
> 500ms
> 487ms
> 501ms
>
> patch
> --
> 501ms
> 492ms
> 486ms
> 489ms
>
> IMHO, if we conclude that because there is no performance gain so we
> don't want to add one extra path in the code then we might want to
> remove that TODO from the code so that we don't spend time for
> optimizing this in the future.
>

Would you be able to share your test setup? It seems like it’d helpful to
get a larger sample size to better determine if it’s measurable or not.
Visually those 4 runs look to me like it’s possible, but objectively I’m
not sure we can yet conclude one way or the other.

James


Re: Allow to_date() and to_timestamp() to accept localized names

2020-03-08 Thread James Coleman
On Sun, Mar 8, 2020 at 10:42 AM James Coleman  wrote:

> On Sun, Mar 8, 2020 at 7:17 AM Juan José Santamaría Flecha <
> juanjo.santama...@gmail.com> wrote:
>
>>
>>
>> On Sun, Mar 8, 2020 at 3:48 AM Tom Lane  wrote:
>>
>>> James Coleman  writes:
>>> > On Sat, Mar 7, 2020 at 9:31 PM Tom Lane  wrote:
>>> >> Looks like you may not have Turkish locale installed?  Try
>>> >> locale -a | grep tr_TR
>>>
>>> > Hmm, when I grep the locales I see `tr_TR.utf8` in the output. I
>>> assume the
>>> > utf8 version is acceptable? Or is there a non-utf8 variant?
>>>
>>> Hmm ... I'm far from an expert on the packaging of locale data, but
>>> the simplest explanation I can think of is that the tr_TR locale exists
>>> to some extent on your machine but the LC_TIME component of that is
>>> missing.
>>>
>>
>>  AFAICS, the locale 'tr_TR' uses the encoding ISO-8859-9 (LATIN5), is not
>> the same as 'tr_TR.utf8'.
>>
>
> The test name implies it's about utf8, though, which makes me wonder if
> the test should be testing utf8 instead?
>
> That being said, a bit more googling based on your node about the proper
> ISO encoding turned up this page: https://unix.stackexchange.com/a/446762
>
> And I confirmed that the locale you mentioned is available:
> $ grep "tr_TR" /usr/share/i18n/SUPPORTED
> tr_TR.UTF-8 UTF-8
> tr_TR ISO-8859-9
>
> So I tried:
> echo tr_TR.ISO-8859-9 >> /var/lib/locales/supported.d/local # In a root
> session
> sudo dpkg-reconfigure locales
>
> That didn't seem to fix it, though `locale -a` still only lists
> tr_TR.utf8, so I'm still at a loss, and also unclear why a test names utf8
> is actually relying on an ISO encoding.
>

Another update:

Since sudo dpkg-reconfigure locales opens up an ncurses gui on my machine,
I tried selecting the tr_TR.ISO-8859-9 option there and removed the
/var/lib/locales/supported.d/local file. Now I get:

$ locale -a | grep tr_TR
tr_TR
tr_TR.iso88599
tr_TR.utf8

And now `make check` passes.

I'm still interested in understanding why we're using the ISO locale
instead of the utf8 one in a utf8-labeled test though.

Thanks,
James


Re: Allow to_date() and to_timestamp() to accept localized names

2020-03-08 Thread Tom Lane
James Coleman  writes:
> Since sudo dpkg-reconfigure locales opens up an ncurses gui on my machine,
> I tried selecting the tr_TR.ISO-8859-9 option there and removed the
> /var/lib/locales/supported.d/local file. Now I get:

> $ locale -a | grep tr_TR
> tr_TR
> tr_TR.iso88599
> tr_TR.utf8

> And now `make check` passes.

Hm.

> I'm still interested in understanding why we're using the ISO locale
> instead of the utf8 one in a utf8-labeled test though.

We are not.  My understanding of the rules about this is that the
active LC_CTYPE setting determines the encoding that libc uses,
period.  The encoding suffix on the locale name only makes a
difference when LC_CTYPE is being specified (or derived from LANG or
LC_ALL), not any other LC_XXX setting --- although for consistency
they'll let you include it in any LC_XXX value.

We could of course choose to spell LC_TIME as 'tr_TR.utf8' in this
test, but that would open up the question of whether that causes
problems on platforms where the canonical spelling of the encoding
suffix is different (eg "UTF-8").  I'm disinclined to take that risk
without positive proof that it helps on some platform.

My guess about what was actually happening on your machine is that
the underlying locale data only exists in iso-8859-9 form, and that
glibc normally translates that to utf8 on-the-fly when it's demanded
... but if the data isn't installed at all, nothing happens.

On my Red Hat platforms, this installation choice seems pretty
all-or-nothing, but it sounds like Debian lets you chop it up
more finely (and maybe slice off your finger while at it :-().

regards, tom lane




Re: range_agg

2020-03-08 Thread Tom Lane
Isaac Morland  writes:
>> so 7. 3. 2020 v 22:20 odesílatel Tom Lane  napsal:
>>> Actually ... have you given any thought to just deciding that ranges and
>>> multiranges are the same type?  That is, any range can now potentially
>>> contain multiple segments?

> Definitely agreed that range and multirange (or whatever it's called)
> should be different. In the work I do I have a number of uses for ranges,
> but not (yet) for multiranges. I want to be able to declare a column as
> range and be sure that it is just a single range, and then call lower() and
> upper() on it and be sure to get just one value in each case; and if I
> accidentally try to take the union of ranges where the union isn’t another
> range, I want to get an error rather than calculate some weird (in my
> context) multirange.

I do not find that argument convincing at all.  Surely you could put
that constraint on your column using "CHECK (numranges(VALUE) <= 1)"
or some such notation.

Also, you're attacking a straw man with respect to lower() and upper();
I did not suggest changing them to return arrays, but rather interpreting
them as returning the lowest or highest endpoint, which I think would be
transparent in most cases.  (There would obviously need to be some other
functions that could dissect a multirange more completely.)

The real problem with the proposal as it stands, I think, is exactly
that range union has failure conditions and you have to use some other
operator if you want to get a successful result always.  That's an
enormously ugly kluge, and if we'd done it right the first time nobody
would have objected.

Bottom line is that I don't think that we should add a pile of new moving
parts to the type system just because people are afraid of change;
arguably, that's *more* change (and more risk of bugs), not less.
Unifying the types would, for example, get rid of the pesky question
of what promoting a range to multirange should look like exactly,
because it'd be a no-op.

regards, tom lane




Re: range_agg

2020-03-08 Thread David G. Johnston
On Fri, Dec 20, 2019 at 10:43 AM Alvaro Herrera 
wrote:

> I took the liberty of rebasing this series on top of recent branch
> master.
>

In the tests there is:

+select '{[a,a],[b,b]}'::textmultirange;
+ textmultirange
+
+ {[a,a],[b,b]}
+(1 row)
+
+-- without canonicalization, we can't join these:
+select '{[a,a], [b,b]}'::textmultirange;
+ textmultirange
+
+ {[a,a],[b,b]}
+(1 row)
+

Aside from the comment they are identical so I'm confused as to why both
tests exist - though I suspect it has to do with the fact that the expected
result would be {[a,b]} since text is discrete.

Also, the current patch set seems a bit undecided on whether it wants to be
truly a multi-range or a range that can report non-contiguous components.
Specifically,

+select '{[a,d), [b,f]}'::textmultirange;
+ textmultirange
+
+ {[a,f]}
+(1 row)

There is a an argument that a multi-range should output {[a,d),[b,f]}.  IMO
its arguable that a multi-range container should not try and reduce the
number of contained ranges at all.  If that is indeed a desire, which seems
like it is, that feature alone goes a long way to support wanting to just
merge the desired functionality into the existing range type, where the
final output has the minimum number of contiguous ranges possible, rather
than having a separate multirange type.

David J.


pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-08 Thread Justin Pryzby
While working on a patch, I noticed this pre-existing behavior, which seems to
be new since v11, maybe due to changes to SRF.

|postgres=# SELECT pg_ls_dir('.') LIMIT 1;
|WARNING:  1 temporary files and directories not closed at end-of-transaction
|pg_ls_dir | pg_dynshmem

|postgres=# SELECT pg_ls_waldir() LIMIT 1;
|WARNING:  1 temporary files and directories not closed at end-of-transaction
|-[ RECORD 1 ]+-
|pg_ls_waldir | (00013192007B,16777216,"2020-03-08 03:50:34-07")


Note, that doesn't happen with "SELECT * FROM".

I'm not sure what the solution is to that, but my patch was going to make it
worse rather than better for pg_ls_tmpdir.

-- 
Justin




Re: proposal: schema variables

2020-03-08 Thread Pavel Stehule
so 7. 3. 2020 v 22:15 odesílatel Pavel Stehule 
napsal:

> Hi
>
> I fixed the some ugly parts of this patch - now the LET x = DEFAULT, LET x
> = NULL is processed by more usual way.
> Statement LET is doesn't switch between STMT_UTILITY and STMT_PLAN_UTILITY
> like before. It is only STMT_PLAN_UTILITY statement.
>

I did some cleaning - I mainly replaced CMD_PLAN_UTILITY by CMD_LET because
there is not another similar statement in queue.

Regards

Pavel


> Regards
>
> Pavel
>


schema-variables-20200308.patch.gz
Description: application/gzip


Re: Allow to_date() and to_timestamp() to accept localized names

2020-03-08 Thread Tom Lane
I wrote:
> James Coleman  writes:
>> I'm still interested in understanding why we're using the ISO locale
>> instead of the utf8 one in a utf8-labeled test though.

> We are not.  My understanding of the rules about this is that the
> active LC_CTYPE setting determines the encoding that libc uses,
> period.  The encoding suffix on the locale name only makes a
> difference when LC_CTYPE is being specified (or derived from LANG or
> LC_ALL), not any other LC_XXX setting --- although for consistency
> they'll let you include it in any LC_XXX value.

Oh wait --- I'm wrong about that.  Looking at the code in pg_locale.c,
what actually happens is that we get data in the codeset implied by
the LC_TIME setting and then translate it to the database encoding
(cf commit 7ad1cd31b).  So if bare "tr_TR" is taken as implying
iso-8859-9, which seems likely (it appears to work that way here,
anyway) then this test is exercising the codeset translation path.
We could change the test to say 'tr_TR.utf8' but that would give us
less test coverage.

regards, tom lane




Re: Remove utils/acl.h from catalog/objectaddress.h

2020-03-08 Thread Tom Lane
Peter Eisentraut  writes:
> I noticed that catalog/objectaddress.h includes utils/acl.h for no 
> apparent reason.  It turns out this used to be needed but not anymore. 
> So removed it and cleaned up the fallout.  Patch attached.

Seems reasonable.  One thing I noticed is that if you are including
nodes/parsenodes.h explicitly in objectaddress.h, there seems little
point in the #include "nodes/pg_list.h" right beside it.

Sometime we really ought to make an effort to make our header inclusions
less of a mass of spaghetti.  But this patch needn't take on that load.

regards, tom lane




Re: Use compiler intrinsics for bit ops in hash

2020-03-08 Thread David Fetter
On Mon, Mar 02, 2020 at 12:45:21PM -0800, Jesse Zhang wrote:
> Hi David,
> 
> On Wed, Feb 26, 2020 at 9:56 PM David Fetter  wrote:
> >
> > On Wed, Feb 26, 2020 at 09:12:24AM +0100, David Fetter wrote:
> > > On Fri, Jan 31, 2020 at 04:59:18PM +0100, David Fetter wrote:
> > > > On Wed, Jan 15, 2020 at 03:45:12PM -0800, Jesse Zhang wrote:
> > > > > On Tue, Jan 14, 2020 at 2:09 PM David Fetter  wrote:
> > > > > > > The changes in hash AM and SIMPLEHASH do look like a net positive
> > > > > > > improvement. My biggest cringe might be in pg_bitutils:
> > > > > > >
> > > > > > > 1. Is ceil_log2_64 dead code?
> > > > > >
> > > > > > Let's call it nascent code. I suspect there are places it could go, 
> > > > > > if
> > > > > > I look for them.  Also, it seemed silly to have one without the 
> > > > > > other.
> > > > > >
> > > > >
> > > > > While not absolutely required, I'd like us to find at least one
> > > > > place and start using it. (Clang also nags at me when we have
> > > > > unused functions).
> > > >
> > > > Done in the expanded patches attached.
> I see that you've found use of it in dynahash, thanks!
> 
> The math in the new (from v4 to v6) patch is wrong: it yields
> ceil_log2(1) = 1 or next_power_of_2(1) = 2. I can see that you lifted
> the restriction of "num greater than one" for ceil_log2() in this patch
> set, but it's now _more_ problematic to base those functions on
> pg_leftmost_one_pos().
> 
> I'm not comfortable with your changes to pg_leftmost_one_pos() to remove
> the restriction on word being non-zero. Specifically
> pg_leftmost_one_pos() is made to return 0 on 0 input. While none of its
> current callers (in HEAD) is harmed, this introduces muddy semantics:
> 
> 1. pg_leftmost_one_pos is semantically undefined on 0 input: scanning
> for a set bit in a zero word won't find it anywhere.
> 
> 2. we can _try_ generalizing it to accommodate ceil_log2 by
> extrapolating based on the invariant that BSR + LZCNT = 31 (or 63). In
> that case, the extrapolation yields -1 for pg_leftmost_one_pos(0).
> 
> I'm not convinced that others on the list will be comfortable with the
> generalization suggested in 2 above.
> 
> I've quickly put together a PoC patch on top of yours, which
> re-implements ceil_log2 using LZCNT coupled with a CPUID check.
> Thoughts?

Per discussion on IRC with Andrew (RhodiumToad) Gierth:

The runtime detection means there's always an indirect call overhead
and no way to inline.  This is counter to what using compiler
intrinsics is supposed to do.

It's better to rely on the compiler, because:
(a) The compiler often knows whether the value can or can't be 0 and
can therefore skip a conditional jump.
(b) If you're targeting a recent microarchitecture, the compiler can
just use the right instruction.
(c) Even if the conditional branch is left in, it's not a big overhead.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-08 Thread Tom Lane
Justin Pryzby  writes:
> While working on a patch, I noticed this pre-existing behavior, which seems to
> be new since v11, maybe due to changes to SRF.

> |postgres=# SELECT pg_ls_dir('.') LIMIT 1;
> |WARNING:  1 temporary files and directories not closed at end-of-transaction

Hmm, actually it looks to me like pg_ls_dir has been broken forever.
The reason the warning didn't show up before v11 is that CleanupTempFiles
didn't bleat about leaked "allocated" directories before that
(cf 9cb7db3f0).

I guess we ought to change that function to use returns-a-tuplestore
protocol instead of thinking it can hold a directory open across calls.
It's not hard to think of use-cases where the existing behavior would
cause issues worse than a nanny-ish WARNING, especially on platforms
with tight "ulimit -n" limits.

regards, tom lane




Re: More tests to stress directly checksum_impl.h

2020-03-08 Thread Tom Lane
Michael Paquier  writes:
> Thanks for the computations with big-endian!  I would have just gone
> down to the 8kB page for the expected results by seeing three other
> tests blowing up, but no objection to what you have here either.  I
> have checked the computations with little-endian from your patch and
> these are correct.

After thinking more I concluded that the extra expected files would
just be a waste of tarball space, at least till such time as we make
a push to fix all the regression tests to be blocksize-independent.

Pushed it with just the 8K files.

regards, tom lane




Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-08 Thread Justin Pryzby
On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > While working on a patch, I noticed this pre-existing behavior, which seems 
> > to
> > be new since v11, maybe due to changes to SRF.
> 
> > |postgres=# SELECT pg_ls_dir('.') LIMIT 1;
> > |WARNING:  1 temporary files and directories not closed at 
> > end-of-transaction
> 
> Hmm, actually it looks to me like pg_ls_dir has been broken forever.
> The reason the warning didn't show up before v11 is that CleanupTempFiles
> didn't bleat about leaked "allocated" directories before that
> (cf 9cb7db3f0).
> 
> I guess we ought to change that function to use returns-a-tuplestore
> protocol instead of thinking it can hold a directory open across calls.
> It's not hard to think of use-cases where the existing behavior would
> cause issues worse than a nanny-ish WARNING, especially on platforms
> with tight "ulimit -n" limits.

Thanks for the analysis.

Do you mean it should enumerate all files during the initial SRF call, or use
something other than the SRF_* macros ?

-- 
Justin




Re: Additional improvements to extended statistics

2020-03-08 Thread Dean Rasheed
On Fri, 6 Mar 2020 at 12:58, Tomas Vondra  wrote:
>
> Here is a rebased version of this patch series. I've polished the first
> two parts a bit - estimation of OR clauses and (Var op Var) clauses.
>

Hi,

I've been looking over the first patch (OR list support). It mostly
looks reasonable to me, except there's a problem with the way
statext_mcv_clauselist_selectivity() combines multiple stat_sel values
into the final result -- in the OR case, it needs to start with sel =
0, and then apply the OR formula to factor in each new estimate. I.e.,
this isn't right for an OR list:

/* Factor the estimate from this MCV to the oveall estimate. */
sel *= stat_sel;

(Oh and there's a typo in that comment: s/oveall/overall/).

For example, with the regression test data, this isn't estimated well:

  SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0;

Similarly, if no extended stats can be applied it needs to return 0
not 1, for example this query on the test data:

  SELECT * FROM mcv_lists WHERE a = 1 OR a = 2 OR d IS NOT NULL;

It might also be worth adding a couple more regression test cases like these.

Regards,
Dean




Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-08 Thread Tom Lane
Justin Pryzby  writes:
> On Sun, Mar 08, 2020 at 02:37:49PM -0400, Tom Lane wrote:
>> I guess we ought to change that function to use returns-a-tuplestore
>> protocol instead of thinking it can hold a directory open across calls.
>> It's not hard to think of use-cases where the existing behavior would
>> cause issues worse than a nanny-ish WARNING, especially on platforms
>> with tight "ulimit -n" limits.

> Thanks for the analysis.

> Do you mean it should enumerate all files during the initial SRF call, or use
> something other than the SRF_* macros ?

It has to enumerate all the files during the first call.  I suppose it
could do that and then still hand back the results one-at-a-time, but
there seems little point compared to filling a tuplestore immediately.
So probably the SRF_ macros are useless here.

Another possible solution is to register an exprstate-shutdown hook to
ensure the resource is cleaned up, but I'm not very happy with that
because it does nothing to prevent the hazard of overrunning the
available resources if you have several of these active at once.

I've just finished scanning the source code and concluding that all
of these functions are similarly broken:

pg_ls_dir
pg_ls_dir_files
pg_tablespace_databases
pg_logdir_ls_internal
pg_timezone_names
pgrowlocks

The first five risk leaking an open directory, the last risks leaking
an active tablescan and open relation.

I don't see anything in the documentation (either funcapi.h or
xfunc.sgml) warning that the function might not get run to completion,
either ...

regards, tom lane




Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction

2020-03-08 Thread Tom Lane
I wrote:
> I've just finished scanning the source code and concluding that all
> of these functions are similarly broken:
> pg_ls_dir
> pg_ls_dir_files
> pg_tablespace_databases
> pg_logdir_ls_internal
> pg_timezone_names
> pgrowlocks

BTW, another thing I noticed while looking around is that some of
the functions using SRF_RETURN_DONE() think they should clean up
memory beforehand.  This is a waste of code/cycles, as long as the
memory was properly allocated in funcctx->multi_call_memory_ctx,
because funcapi.c takes care of deleting that context.

We should probably document that *any* manual cleanup before
SRF_RETURN_DONE() is an antipattern.  If you have to have cleanup,
it needs to be done via RegisterExprContextCallback instead.

regards, tom lane




Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2020-03-08 Thread Thomas Munro
On Mon, Mar 9, 2020 at 10:00 AM Marina Polyakova
 wrote:
> On 2018-11-16 22:59, Alvaro Herrera wrote:
> > On 2018-Sep-05, Marina Polyakova wrote:
> >
> >> v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
> >> - a patch for the RandomState structure (this is used to reset a
> >> client's
> >> random seed during the repeating of transactions after
> >> serialization/deadlock failures).
> >
> > Pushed this one with minor stylistic changes (the most notable of which
> > is the move of initRandomState to where the rest of the random
> > generator
> > infrastructure is, instead of in a totally random place).  Thanks,
>
> Thank you very much! I'm going to send a new patch set until the end of
> this week (I'm sorry I was very busy in the release of Postgres Pro
> 11...).

Is anyone interested in rebasing this, and summarising what needs to
be done to get it in?  It's arguably a bug or at least quite
unfortunate that pgbench doesn't work with SERIALIZABLE, and I heard
that a couple of forks already ship Marina's patch set.




Re: pg_rewind docs correction

2020-03-08 Thread James Coleman
On Tue, Sep 17, 2019 at 9:41 PM Michael Paquier  wrote:

> On Tue, Sep 17, 2019 at 08:38:18AM -0400, James Coleman wrote:
> > I don't agree that that's a valid equivalency. I myself spent a lot of
> > time trying to understand how this could possibly be true a while
> > back, and even looked at source code to be certain. I've asked other
> > people and found the same confusion.
> >
> > As I read it the 2nd second sentence doesn't actually tell you the
> > differences; it makes a quick attempt at summarizing *how* the first
> > sentence is true, but if the first sentence isn't accurate, then it's
> > hard to read the 2nd one as helping.
>
> Well, then it comes back to the part where I am used to the existing
> docs :)
>
> > If you'd prefer something less detailed at this point at that point in
> > the docs, then something along the lines of "results in a data
> > directory state which can then be safely replayed from the source" or
> > some such.
>
> Actually this is a good suggestion, and could replace the first
> sentence of this paragraph.
>
> > The docs shouldn't be correct just for someone how already understands
> > the intricacies. And the end user shouldn't have to read the "how it
> > works" (which incidentally is kinda hidden at the bottom underneath
> > the CLI args -- perhaps we could move that?) to extrapolate things in
> > the primary documentation.
>
> Perhaps.  This doc page is not that long either.
>

I'd set this aside for quite a while, but I was looking at it again this
afternoon, and I've come to see your concern about the opening paragraphs
remaining relatively simple. To that end I believe I've come up with a
patch that's a good compromise: retaining that simplicity and being more
clear and accurate at the same time.

In the first paragraph I've updated it to refer to both "successful rewind
and subsequent WAL replay" and the result I describe as being equivalent to
the result of a base backup, since that's more technically correct anyway
(the current text could be read as implying a full out copy of the data
directory, but that's not really true just as it isn't with pg_basebackup).

I've added the information about how the backup label control file is
written, and updated the How It Works steps to refer to that separately
from restart.

Additionally the How It Works is updated to include WAL segments and new
relation files in the list of files copied wholesale, since that was
previously stated but somewhat contradicted there.

I realized I didn't previously add this to the CF; since it's not a new
patch I've added it to the current CF, but if this is incorrect please let
me know.

Thanks,
James
From 592ba15c35bb16e55b0bb0a7e7bdbb6dd4e08a0b Mon Sep 17 00:00:00 2001
From: James Coleman 
Date: Sun, 8 Mar 2020 16:39:45 -0400
Subject: [PATCH v3] Improve pg_rewind explanation and warnings

The pg_rewind docs currently assert that the state of the target's
data directory after rewind is equivalent to the source's data
directory. But that isn't quite true both because the base state is
further back in time and because the target's data directory will
include the current state on the source of any copied blocks.
Additionally the state isn't equal to a copy of the source data
directory; it's equivalent to a base backup of the source.

The How It Works section now:
- Includes details about how the backup label file is created.
- Is updated to include WAL segments and new relation files in the
  list of files copied wholesale from the source.

Finally, document clearly the state of the cluster after the operation
and also the operation sequencing dangers caused by copying
configuration files from the source.
---
 doc/src/sgml/ref/pg_rewind.sgml | 87 -
 1 file changed, 54 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 42d29edd4e..bc6f0009cc 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -48,14 +48,16 @@ PostgreSQL documentation
   
 
   
-   The result is equivalent to replacing the target data directory with the
-   source one. Only changed blocks from relation files are copied;
-   all other files are copied in full, including configuration files. The
-   advantage of pg_rewind over taking a new base backup, or
-   tools like rsync, is that pg_rewind does
-   not require reading through unchanged blocks in the cluster. This makes
-   it a lot faster when the database is large and only a small
-   fraction of blocks differ between the clusters.
+   After a successful rewind and subsequent WAL replay, the target data
+   directory is equivalent to a base backup of the source data directory. While
+   only changed blocks from existing relation files are copied; all other files
+   are copied in full, including new relation files, configuration files, and WAL
+   segments. The advantage of pg_rewind over taking a
+   new base backup, or tools like rsync, is t

Re: pg_rewind docs correction

2020-03-08 Thread James Coleman
On Sun, Mar 8, 2020 at 5:13 PM James Coleman  wrote:

>
> I realized I didn't previously add this to the CF; since it's not a new
> patch I've added it to the current CF, but if this is incorrect please let
> me know.
>

 Hmm, looks like I can't add it to the current one. I added it to the next
one. I think it could probably go now, since the patch is really 6 months
old, but either way is fine -- it's just a docs patch.

James


Re: Index Skip Scan

2020-03-08 Thread David Rowley
On Mon, 9 Mar 2020 at 03:21, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> >
> > I've been looking over v32 of the patch and have a few comments
> > regarding the planner changes.
>
> Thanks for the commentaries!
>
> > I think the changes in create_distinct_paths() need more work.  The
> > way I think this should work is that create_distinct_paths() gets to
> > know exactly nothing about what path types support the elimination of
> > duplicate values.  The Path should carry the UniqueKeys so that can be
> > determined. In create_distinct_paths() you should just be able to make
> > use of those paths, which should already have been created when
> > creating index paths for the rel due to PlannerInfo's query_uniquekeys
> > having been set.
>
> Just for me to clarify. The idea is to "move" information about what
> path types support skipping into UniqueKeys (derived from PlannerInfo's
> query_uniquekeys), but other checks (e.g. if index am supports that)
> still perform in create_distinct_paths?

create_distinct_paths() shouldn't know any details specific to the
pathtype that it's using or considering using.  All the details should
just be in Path. e.g. uniquekeys, pathkeys, costs etc. There should be
no IsA(path, ...).  Please have a look over the details in my reply to
Tomas. I hope that reply has enough information in it, but please
reply there if I've missed something.

> > On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:
> > There's also some weird looking assumptions that an EquivalenceMember
> > can only be a Var in create_distinct_paths(). I think you're only
> > saved from crashing there because a ProjectionPath will be created
> > atop of the IndexPath to evaluate expressions, in which case you're
> > not seeing the IndexPath.  This results in the optimisation not
> > working in cases like:
> >
> > postgres=# create table t (a int); create index on t ((a+1)); explain
> > select distinct a+1 from t;
> > CREATE TABLE
> > CREATE INDEX
> > QUERY PLAN
> > ---
> >  HashAggregate  (cost=48.25..50.75 rows=200 width=4)
> >Group Key: (a + 1)
> >->  Seq Scan on t  (cost=0.00..41.88 rows=2550 width=4)
>
> Yes, I need to fix it.
>
> > Using unique paths as I mentioned above should see that fixed.
>
> I'm a bit confused about this statement, how exactly unique paths should
> fix this?

The path's uniquekeys would mention that it's unique on  (a+1). You'd
compare the uniquekeys of the path to the DISTINCT clause and see that
the uniquekeys are a subset of the DISTINCT clause therefore the
DISTINCT is a no-op. If that uniquekey path is cheaper than the
cheapest_total_path + , then you should
pick the unique path, otherwise use the cheapest_total_path and
uniquify that.

I think the UniqueKeys may need to be changed from using
EquivalenceClasses to use Exprs instead.




Re: Allow to_date() and to_timestamp() to accept localized names

2020-03-08 Thread James Coleman
On Sun, Mar 8, 2020 at 2:19 PM Tom Lane  wrote:

> I wrote:
> > James Coleman  writes:
> >> I'm still interested in understanding why we're using the ISO locale
> >> instead of the utf8 one in a utf8-labeled test though.
>
> > We are not.  My understanding of the rules about this is that the
> > active LC_CTYPE setting determines the encoding that libc uses,
> > period.  The encoding suffix on the locale name only makes a
> > difference when LC_CTYPE is being specified (or derived from LANG or
> > LC_ALL), not any other LC_XXX setting --- although for consistency
> > they'll let you include it in any LC_XXX value.
>
> Oh wait --- I'm wrong about that.  Looking at the code in pg_locale.c,
> what actually happens is that we get data in the codeset implied by
> the LC_TIME setting and then translate it to the database encoding
> (cf commit 7ad1cd31b).  So if bare "tr_TR" is taken as implying
> iso-8859-9, which seems likely (it appears to work that way here,
> anyway) then this test is exercising the codeset translation path.
> We could change the test to say 'tr_TR.utf8' but that would give us
> less test coverage.
>

So just to confirm I understand, that implies that the issue is solely that
only the utf8 tr_TR set is installed by default on this machine, and the
iso-8859-9 set is a hard requirement (that is, the test is explicitly
testing a codepath that generates utf8 results from a non-utf8 source)?

If so, I'm going to try a bare Ubuntu install on a VM and see what locales
are installed by default for Turkish.

If in fact Ubuntu doesn't install this locale by default, then is this a
caveat we should add to developer docs somewhere? It seems odd to me that
I'd be the only one encountering it, but OTOH I would have thought this a
fairly vanilla install too...

James


Re: Improve search for missing parent downlinks in amcheck

2020-03-08 Thread Alexander Korotkov
Hi, Peter!

On Tue, Mar 3, 2020 at 3:04 AM Peter Geoghegan  wrote:
> Apologies for the delayed response. I was a little tired from the
> deduplication project.

No problem.  Apologies for the delayed revision as well.

> I taught pageinspect to display a "htid" field for pivot tuples
> recently, making it easier to visualize this example.

Great!

> I think that you should say more about how "lowkey" is used here:
>
> > /*
> > -* Record if page that is about to become target is the right half 
> > of
> > -* an incomplete page split.  This can go stale immediately in
> > -* !readonly case.
> > +* Copy current target low key as the high key of right sibling.
> > +* Allocate memory in upper level context, so it would be cleared
> > +* after reset of target context.
> > +*
> > +* We only need low key for parent check.
> >  */
> > -   state->rightsplit = P_INCOMPLETE_SPLIT(opaque);
> > +   if (state->readonly && !P_RIGHTMOST(opaque))
> > +   {
>
> Say something about concurrent page splits, since they're the only
> case where we actually use lowkey. Maybe say something like: "We
> probably won't end up doing anything with lowkey, but it's simpler for
> readonly verification to always have it available".

I've revised this comment.  Hopefully it's better now.

> * I think that these comments could still be clearer:
>
> > +   /*
> > +* We're going to try match child high key to "negative
> > +* infinity key".  This normally happens when the last child
> > +* we visited for target's left sibling was an incomplete
> > +* split. So, we must be still on the child of target's left
> > +* sibling. Thus, we should match to target's left sibling
> > +* high key. Thankfully we saved it, it's called a "low 
> > key".
> > +*/
>
> Maybe start with "We cannot try to match child's high key to a
> negative infinity key in target, since there is nothing to compare.
> However...". Perhaps use terms like "cousin page" and "subtree", which
> can be useful. Alternatively, mention this case in the diagram example
> at the top of bt_child_highkey_check(). It's tough to write comments
> like this, but I think it's worth it.

I've updated this comment using terms "cousin page" and "subtree".  I
didn't refer the diagram example, because it doesn't contain
appropriate case.  And I wouldn't like this diagram to contain such
case, because that probably makes this diagram too complex.  I've also
invented term "uncle page". BTW, should it be "aunt page"?  I don't
know.

> Note that a high key is also a pivot tuple, so I wouldn't mention high
> keys here:
>
> > +/*
> > + * Check if two tuples are binary identical except the block number.  So,
> > + * this function is capable to compare high keys with pivot keys.
> > + */
> > +static bool
> > +bt_pivot_tuple_identical(IndexTuple itup1, IndexTuple itup2)
> > +{

Sure, this comment is revised.

> v7 looks pretty close to being commitable, though I'll probably want
> to update some comments that you haven't touched when you commit this.
> I should probably wait until you've committed the patch to go do that.
> I'm thinking of things like old comments in bt_downlink_check().
>
> I will test the patch properly one more time when you produce a new
> revision. I haven't really tested it since the last time.

Attached patch also has revised commit message.  I'll wait for your
response before commit.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Allow to_date() and to_timestamp() to accept localized names

2020-03-08 Thread Tom Lane
James Coleman  writes:
> So just to confirm I understand, that implies that the issue is solely that
> only the utf8 tr_TR set is installed by default on this machine, and the
> iso-8859-9 set is a hard requirement (that is, the test is explicitly
> testing a codepath that generates utf8 results from a non-utf8 source)?

It's not "explicitly" testing that; the fact that "tr_TR" is treated
as "tr_TR.iso88599" is surely an implementation artifact.  (On macOS,
some experimentation shows that "tr_TR" is treated as "tr_TR.UTF-8".)
But yeah, I think it's intentional that we want the codeset translation
path to be exercised here on at least some platforms.

> If in fact Ubuntu doesn't install this locale by default, then is this a
> caveat we should add to developer docs somewhere? It seems odd to me that
> I'd be the only one encountering it, but OTOH I would have thought this a
> fairly vanilla install too...

Not sure.  The lack of prior complaints points to this not being a
common situation.  It does seem weird that they'd set things up so
that "tr_TR.utf8" exists but not "tr_TR"; even if that's not an
outright packaging mistake, it seems like a POLA violation from here.

regards, tom lane




Re: Allow to_date() and to_timestamp() to accept localized names

2020-03-08 Thread James Coleman
On Sun, Mar 8, 2020 at 6:03 PM Tom Lane  wrote:

> James Coleman  writes:
> > So just to confirm I understand, that implies that the issue is solely
> that
> > only the utf8 tr_TR set is installed by default on this machine, and the
> > iso-8859-9 set is a hard requirement (that is, the test is explicitly
> > testing a codepath that generates utf8 results from a non-utf8 source)?
>
> It's not "explicitly" testing that; the fact that "tr_TR" is treated
> as "tr_TR.iso88599" is surely an implementation artifact.  (On macOS,
> some experimentation shows that "tr_TR" is treated as "tr_TR.UTF-8".)
> But yeah, I think it's intentional that we want the codeset translation
> path to be exercised here on at least some platforms.
>

I idly wonder if there could/should be some tests for the implicit case,
some explicitly testing the codeset translation (if possible), and some
testing the explicit utf8 case...but I don't know enough about this area to
push for anything.


> > If in fact Ubuntu doesn't install this locale by default, then is this a
> > caveat we should add to developer docs somewhere? It seems odd to me that
> > I'd be the only one encountering it, but OTOH I would have thought this a
> > fairly vanilla install too...
>
> Not sure.  The lack of prior complaints points to this not being a
> common situation.  It does seem weird that they'd set things up so
> that "tr_TR.utf8" exists but not "tr_TR"; even if that's not an
> outright packaging mistake, it seems like a POLA violation from here.
>
> regards, tom lane
>

All right. I downloaded an Ubuntu 18.04.3 server VM from OSBoxes, and it
had very few locales installed by default...so that wasn't all that helpful.

I think at this point I'll just leave this as a mystery, much as I hate
that.

James


Re: Add absolute value to dict_int

2020-03-08 Thread Tom Lane
Jeff Janes  writes:
> I've seen a few requests on how to make FTS search on the absolute value of
> integers.  This question is usually driven by the fact that the text search
> parser interprets a separating hyphen ("partnumber-987") as a minus sign.

> There is currently no good answer for this that doesn't involve C coding.
> I think this feature has a natural and trivial home in the dict_int
> extension, so attached is a patch that does that.

Seems reasonable, so pushed with minor cleanup (I noticed it was
off-by-one for the maxlen check, which was harmless unless you had
rejectlong enabled as well).  I debated whether I liked the "absval"
parameter name, which seemed a bit too abbreviative; but it's more
or less in line with the existing parameter names, so I left it alone.

> There are no changes to the extension creation scripts, so there is no need
> to bump the version.  And I think the convention is that we don't bump the
> version just to signify a change which implements a new feature when that
> doesn't change the creation scripts.

Right, there's no need to update the creation script.


I noticed one odd thing which is not the fault of this patch, but
seems to need cleaned up:

regression=# ALTER TEXT SEARCH DICTIONARY intdict (absval = true);
ALTER TEXT SEARCH DICTIONARY
regression=# select ts_lexize('intdict', '-123456');
 ts_lexize 
---
 {123456}
(1 row)

regression=# ALTER TEXT SEARCH DICTIONARY intdict (absval = 1);
ALTER TEXT SEARCH DICTIONARY
regression=# select ts_lexize('intdict', '-123456');
ERROR:  absval requires a Boolean value

Why did ALTER accept that, if it wasn't valid?  It's not like
there's no error checking at all:

regression=# ALTER TEXT SEARCH DICTIONARY intdict (absval = foo);
ERROR:  absval requires a Boolean value

Upon digging into that, the reason is that defGetBoolean accepts
a T_Integer Value with value 1, but it doesn't accept a T_String
with value "1".  And apparently we're satisfied to smash dictionary
parameters to strings before storing them.

There are at least three things we could do here:

1. Insist that defGetBoolean and its siblings should accept the
string equivalent of any non-string value they accept.  This
would change the behavior of a whole lot of utility commands,
not only the text search commands, and I'm not exactly convinced
it's a good idea.  Seems like it's losing error detection
capability.

2. Improve the catalog representation of text search parameters
so that the original Value node can be faithfully reconstructed.
I'd be for this, except it seems like a lot of work for a rather
minor benefit.

3. Rearrange text search parameter validation so we smash parameters
to strings *before* we validate them, ensuring we won't take any
settings that will be rejected later.

I'm leaning to #3 as being the most practical fix.  Thoughts?

regards, tom lane




Re: Use compiler intrinsics for bit ops in hash

2020-03-08 Thread Jesse Zhang
Hi John,
Oops this email has been sitting in my outbox for 3 days...

On Wed, Mar 4, 2020 at 1:46 AM John Naylor  wrote:
> On Tue, Mar 3, 2020 at 4:46 AM Jesse Zhang  wrote:
> > I've quickly put together a PoC patch on top of yours, which
> > re-implements ceil_log2 using LZCNT coupled with a CPUID check.
> > Thoughts?
>
> This patch seems to be making an assumption that an indirect function
> call is faster than taking a branch (in inlined code) that the CPU
> will almost always predict correctly. It would be nice to have some
> numbers to compare. (against pg_count_leading_zeros_* using the "slow"
> versions but statically inlined).
>

Ah, how could I forget that... I ran a quick benchmark on my laptop, and
indeed, even though the GCC-generated code takes a hit on zero input
(Clang generates slightly different code that gives indistinguishable
runtime for zero and non-zero inputs), the inlined code (the function
input in my benchmark is never a constant literal so the branch does get
exercised at runtime) is still more than twice as fast as the function
call.

--
BenchmarkTime CPU   Iterations
--
BM_pfunc/01.57 ns 1.56 ns447127265
BM_pfunc/11.56 ns 1.56 ns449618696
BM_pfunc/81.57 ns 1.57 ns443013856
BM_pfunc/64   1.57 ns 1.57 ns448784369
BM_slow/00.602 ns0.600 ns   10
BM_slow/10.391 ns0.390 ns   10
BM_slow/80.392 ns0.391 ns   10
BM_slow/64   0.391 ns0.390 ns   10
BM_fast/0 1.47 ns 1.46 ns477513921
BM_fast/1 1.47 ns 1.46 ns473992040
BM_fast/8 1.46 ns 1.46 ns474895755
BM_fast/641.47 ns 1.46 ns477215268


For your amusement, I've attached the meat of the benchmark. To build
the code you can grab the repository at
https://github.com/d/glowing-chainsaw/tree/pfunc

> Stylistically, "8 * sizeof(num)" is a bit overly formal, since the
> hard-coded number we want is in the name of the function.

Oh yeah, overly generic code is indicative of the remnants of my C++
brain, will fix.

Cheers,
Jesse
#include "lzcnt.h"

static bool go = lzcnt_available();

__attribute__((target("lzcnt"))) inline int lzcnt_fast(uint64_t word) {
  return __builtin_clzll(word);
}

static int lzcnt_choose(uint64_t word) {
  if (lzcnt_available())
lzcnt_pfunc = lzcnt_fast;
  else
lzcnt_pfunc = lzcnt_slow;
  return lzcnt_pfunc(word);
}

int (*lzcnt_pfunc)(uint64_t word) = lzcnt_choose;
#ifndef LZCNT_H
#define LZCNT_H

#include 
#include 

static inline bool lzcnt_available() {
  uint32_t exx[4] = {0, 0, 0, 0};

  __get_cpuid(0x8001, exx + 0, exx + 1, exx + 2, exx + 3);

  return (exx[2] & bit_ABM) != 0;
}

extern int (*lzcnt_pfunc)(uint64_t word);

__attribute__((target("no-lzcnt"))) inline int lzcnt_slow(uint64_t word) {
  if (word == 0)
return 8 * sizeof(word);
  return __builtin_clzll(word);
}

int lzcnt_fast(uint64_t word);

#endif
#include "benchmark/benchmark.h"

#include "lzcnt.h"

static void BM_pfunc(benchmark::State& state) {
  auto word = state.range(0);
  for (auto _ : state) {
lzcnt_pfunc(word);
  }
}

static void BM_slow(benchmark::State& state) {
  auto word = state.range(0);
  for (auto _ : state) {
benchmark::DoNotOptimize(lzcnt_slow(word));
  }
}

static void BM_fast(benchmark::State& state) {
  auto word = state.range(0);
  for (auto _ : state) {
lzcnt_fast(word);
  }
}

BENCHMARK(BM_pfunc)->Arg(0)->Range(1, 64);
BENCHMARK(BM_slow)->Arg(0)->Range(1, 64);
BENCHMARK(BM_fast)->Arg(0)->Range(1, 64);

BENCHMARK_MAIN();


Re: Additional improvements to extended statistics

2020-03-08 Thread Tomas Vondra

On Sun, Mar 08, 2020 at 07:17:10PM +, Dean Rasheed wrote:

On Fri, 6 Mar 2020 at 12:58, Tomas Vondra  wrote:


Here is a rebased version of this patch series. I've polished the first
two parts a bit - estimation of OR clauses and (Var op Var) clauses.



Hi,

I've been looking over the first patch (OR list support). It mostly
looks reasonable to me, except there's a problem with the way
statext_mcv_clauselist_selectivity() combines multiple stat_sel values
into the final result -- in the OR case, it needs to start with sel =
0, and then apply the OR formula to factor in each new estimate. I.e.,
this isn't right for an OR list:

   /* Factor the estimate from this MCV to the oveall estimate. */
   sel *= stat_sel;

(Oh and there's a typo in that comment: s/oveall/overall/).

For example, with the regression test data, this isn't estimated well:

 SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0;

Similarly, if no extended stats can be applied it needs to return 0
not 1, for example this query on the test data:

 SELECT * FROM mcv_lists WHERE a = 1 OR a = 2 OR d IS NOT NULL;



Ah, right. Thanks for noticing this. Attaches is an updated patch series
with parts 0002 and 0003 adding tests demonstrating the issue and then
fixing it (both shall be merged to 0001).


It might also be worth adding a couple more regression test cases like these.


Agreed, 0002 adds a couple of relevant tests.

Incidentally, I've been working on improving test coverage for extended
stats over the past few days (it has ~80% lines covered, which is not
bad nor great). I haven't submitted that to hackers yet, because it's
mostly mechanical and it's would interfere with the two existing threads
about extended stats ...

Speaking of which, would you take a look at [1]? I think supporting SAOP
is fine, but I wonder if you agree with my conclusion we can't really
support inclusion @> as explained in [2].

[1] https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy@pierred-pdoc
[2] 
https://www.postgresql.org/message-id/20200202184134.swoqkqlqorqolrqv%40development

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Additional improvements to extended statistics

2020-03-08 Thread Tomas Vondra

On Mon, Mar 09, 2020 at 01:01:57AM +0100, Tomas Vondra wrote:

On Sun, Mar 08, 2020 at 07:17:10PM +, Dean Rasheed wrote:

On Fri, 6 Mar 2020 at 12:58, Tomas Vondra  wrote:


Here is a rebased version of this patch series. I've polished the first
two parts a bit - estimation of OR clauses and (Var op Var) clauses.



Hi,

I've been looking over the first patch (OR list support). It mostly
looks reasonable to me, except there's a problem with the way
statext_mcv_clauselist_selectivity() combines multiple stat_sel values
into the final result -- in the OR case, it needs to start with sel =
0, and then apply the OR formula to factor in each new estimate. I.e.,
this isn't right for an OR list:

  /* Factor the estimate from this MCV to the oveall estimate. */
  sel *= stat_sel;

(Oh and there's a typo in that comment: s/oveall/overall/).

For example, with the regression test data, this isn't estimated well:

SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0;

Similarly, if no extended stats can be applied it needs to return 0
not 1, for example this query on the test data:

SELECT * FROM mcv_lists WHERE a = 1 OR a = 2 OR d IS NOT NULL;



Ah, right. Thanks for noticing this. Attaches is an updated patch series
with parts 0002 and 0003 adding tests demonstrating the issue and then
fixing it (both shall be merged to 0001).



One day I won't forget to actually attach the files ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 4a0807b7e8ec511d72361598c390c158dc76e1a0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sun, 8 Mar 2020 23:26:50 +0100
Subject: [PATCH 1/4] Support using extended stats for parts of OR clauses

---
 src/backend/optimizer/path/clausesel.c| 109 +++---
 src/backend/statistics/extended_stats.c   |  45 +++-
 src/backend/statistics/mcv.c  |   5 +-
 .../statistics/extended_stats_internal.h  |   3 +-
 src/include/statistics/statistics.h   |   3 +-
 src/test/regress/expected/stats_ext.out   |   3 +-
 src/test/regress/sql/stats_ext.sql|   1 -
 7 files changed, 138 insertions(+), 31 deletions(-)

diff --git a/src/backend/optimizer/path/clausesel.c 
b/src/backend/optimizer/path/clausesel.c
index a3ebe10592..8c1a404ce2 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -92,7 +92,7 @@ clauselist_selectivity(PlannerInfo *root,
 */
s1 *= statext_clauselist_selectivity(root, clauses, varRelid,

 jointype, sjinfo, rel,
-   
 &estimatedclauses);
+   
 &estimatedclauses, false);
}
 
/*
@@ -104,6 +104,89 @@ clauselist_selectivity(PlannerInfo *root,

  estimatedclauses);
 }
 
+/*
+ * clauselist_selectivity_or -
+ *   Compute the selectivity of an implicitly-ORed list of boolean
+ *   expression clauses.  The list can be empty, in which case 0.0
+ *   must be returned.  List elements may be either RestrictInfos
+ *   or bare expression clauses --- the former is preferred since
+ *   it allows caching of results.
+ *
+ * See clause_selectivity() for the meaning of the additional parameters.
+ *
+ * The basic approach is to apply extended statistics first, on as many
+ * clauses as possible, in order to capture cross-column dependencies etc.
+ * The remaining clauses are then estimated using regular statistics tracked
+ * for individual columns.  This is done by simply passing the clauses to
+ * clauselist_selectivity and then combining the selectivities using the
+ * regular formula (s1+s2 - s1*s2).
+ */
+static Selectivity
+clauselist_selectivity_or(PlannerInfo *root,
+ List *clauses,
+ int varRelid,
+ JoinType jointype,
+ SpecialJoinInfo *sjinfo)
+{
+   ListCell   *lc;
+   Selectivity s1 = 0.0;
+   RelOptInfo *rel;
+   Bitmapset  *estimatedclauses = NULL;
+   int listidx;
+
+   /*
+* Determine if these clauses reference a single relation.  If so, and 
if
+* it has extended statistics, try to apply those.
+*/
+   rel = find_single_rel_for_clauses(root, clauses);
+   if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
+   {
+   /*
+* Estimate as many clauses as possible using extended 
statistics.
+*
+* 'estimatedc

Nicer error when connecting to standby with hot_standby=off

2020-03-08 Thread James Coleman
I recently noticed while setting up a test environment that attempting to
connect to a standby running without hot_standby=on results in a fairly
generic error (I believe "the database system is starting up"). I don't
have my test setup running right now, so can't confirm with a repro case at
the moment, but with a little bit of spelunking I noticed that error text
only shows up in src/backend/postmaster/postmaster.c when
port->canAcceptConnections has the value CAC_STARTUP.

Ideally the error message would include something along the lines of "The
server is running as a standby but cannot accept connections with
hot_standby=off".

I wanted to get some initial feedback on the idea before writing a patch:
does that seem like a reasonable change? Is it actually plausible to
distinguish between this state and "still recovering" (i.e., when starting
up a hot standby but initial recovery hasn't completed so it legitimately
can't accept connections yet)? If so, should we include the possibility if
hot_standby isn't on, just in case?

The enum value CAC_STARTUP is defined in src/include/libpq/libpq-be.h,
which makes me wonder if changing this value would result in a wire
protocol change/something the client wants to know about? If so, I assume
it's not reasonable to change the value, but would it still be reasonable
to change the error text?

Thanks,
James Coleman


Re: Use compiler intrinsics for bit ops in hash

2020-03-08 Thread Jesse Zhang
Hi David,
On Sun, Mar 8, 2020 at 11:34 AM David Fetter  wrote:
>
> On Mon, Mar 02, 2020 at 12:45:21PM -0800, Jesse Zhang wrote:
> > Hi David,
>
> Per discussion on IRC with Andrew (RhodiumToad) Gierth:
>
> The runtime detection means there's always an indirect call overhead
> and no way to inline.  This is counter to what using compiler
> intrinsics is supposed to do.
>
> It's better to rely on the compiler, because:
> (a) The compiler often knows whether the value can or can't be 0 and
> can therefore skip a conditional jump.

Yes, the compiler would know to eliminate the branch if the inlined
function is called with a literal argument, or it infers an invariant
from the context (like nesting inside a conditional block, or a previous
conditional "noreturn" path).

> (b) If you're targeting a recent microarchitecture, the compiler can
> just use the right instruction.

I might be more conservative than you are on (b). The thought of
building a binary that cannot run "somewhere" where the compiler
supports by default still mortifies me.

> (c) Even if the conditional branch is left in, it's not a big overhead.
>

I 100% agree with (c), see benchmarking results upthread.

Cheers,
Jesse




Re: More tests to stress directly checksum_impl.h

2020-03-08 Thread Michael Paquier
On Sun, Mar 08, 2020 at 03:12:11PM -0400, Tom Lane wrote:
> After thinking more I concluded that the extra expected files would
> just be a waste of tarball space, at least till such time as we make
> a push to fix all the regression tests to be blocksize-independent.

Makes sense.

> Pushed it with just the 8K files.

Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: PG_COLOR not mentioned in docs of vacuumlo, oid2name and pgbench

2020-03-08 Thread Michael Paquier
On Sat, Mar 07, 2020 at 10:09:23AM +0900, Michael Paquier wrote:
> Thanks to both of you for the reviews.  Please note that I will
> mention the business with pg_ctl and logging in a new  thread and
> remove the diff of pg_ctl.c from the previous patch, and that the doc
> changes could be backpatched down to 12 for the relevant parts.  The
> documentation for PG_COLORS is still missing, but that's not new and I
> think that we had better handle that case separately by creating a new
> section in the docs.  For now, let's wait a couple of days and see if
> others have more thoughts to share about the doc patch of this thread.

Hearing nothing, done.  The part about pgbench with PGHOST, PGUSER and
PGPORT could go further down, but it has been like that for years so I
did not bother.
--
Michael


signature.asc
Description: PGP signature


Re: Exposure related to GUC value of ssl_passphrase_command

2020-03-08 Thread Fujii Masao




On 2020/02/14 10:31, Moon, Insung wrote:

Dear Hackers.

Thank you for an response.
I registered this entry in commifest of 2020-03.
# I registered in the security part, but if it is wrong, sincerely
apologize for this.

And I'd like to review show authority to ssl_ * later and discuss it
in a separate thread.


So, you are planning to start new discussion about this?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Exposure related to GUC value of ssl_passphrase_command

2020-03-08 Thread Fujii Masao




On 2020/03/06 16:20, keisuke kuroda wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

I tested the patch on the master branch (addd034) and it works fine.

I think that test case which non-superuser can't see this parameter is 
unnecessary.
There is a similar test for pg_read_all_settings role.

The new status of this patch is: Ready for Committer


Pushed! Thanks!

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: Asynchronous Append on postgres_fdw nodes.

2020-03-08 Thread movead li
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

I have tested the feature and it shows great performance in queries
which have small amount result compared with base scan amount.

Re: Fastpath while arranging the changes in LSN order in logical decoding

2020-03-08 Thread Dilip Kumar
On Sun, Mar 8, 2020 at 9:24 PM James Coleman  wrote:
>
> On Saturday, March 7, 2020, Dilip Kumar  wrote:
>>
>> On Sat, Mar 7, 2020 at 9:59 AM Dilip Kumar  wrote:
>> >
>> > On Sat, Mar 7, 2020 at 12:30 AM Andres Freund  wrote:
>> > >
>> > > Hi,
>> > >
>> > > On 2020-01-08 18:06:52 +0530, Dilip Kumar wrote:
>> > > > On Wed, 8 Jan 2020 at 5:28 PM, Heikki Linnakangas  
>> > > > wrote:
>> > > >
>> > > > > On 25/11/2019 05:52, Dilip Kumar wrote:
>> > > > > > In logical decoding, while sending the changes to the output 
>> > > > > > plugin we
>> > > > > > need to arrange them in the LSN order.  But, if there is only one
>> > > > > > transaction which is a very common case then we can avoid building 
>> > > > > > the
>> > > > > > binary heap.  A small patch is attached for the same.
>> > > > >
>> > > > > Does this make any measurable performance difference? Building a
>> > > > > one-element binary heap seems pretty cheap.
>> > > >
>> > > >
>> > > > I haven’t really measured the performance for this.  I will try to do 
>> > > > that
>> > > > next week.  Thanks for looking into this.
>> > >
>> > > Did you do that?
>> >
>> > I tried once in my local machine but could not produce consistent
>> > results.  I will try this once again in the performance machine and
>> > report back.
>>
>> I have tried to decode changes for the 100,000 small transactions and
>> measured the time with head vs patch.  I did not observe any
>> significant gain.
>>
>> Head
>> ---
>> 519ms
>> 500ms
>> 487ms
>> 501ms
>>
>> patch
>> --
>> 501ms
>> 492ms
>> 486ms
>> 489ms
>>
>> IMHO, if we conclude that because there is no performance gain so we
>> don't want to add one extra path in the code then we might want to
>> remove that TODO from the code so that we don't spend time for
>> optimizing this in the future.
>
>
> Would you be able to share your test setup? It seems like it’d helpful to get 
> a larger sample size to better determine if it’s measurable or not. Visually 
> those 4 runs look to me like it’s possible, but objectively I’m not sure we 
> can yet conclude one way or the other.

Yeah, my test is very simple

CREATE TABLE t1 (a int, b int);
SELECT * FROM pg_create_logical_replication_slot('regression_slot',
'test_decoding');

--run 100,000 small transactions with pgbench
./pgbench -f test.sql -c 1 -j 1 -t 10 -P 1  postgres;

--measure time to decode the changes
time ./psql -d postgres -c "select count(*) from
pg_logical_slot_get_changes('regression_slot', NULL,NULL);

*test.sql is just one insert query like below
insert into t1 values(1,1);

I guess this should be the best case to test this patch because we are
decoding a lot of small transactions but it seems the time taken for
creating the binary heap is quite small.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Some problems of recovery conflict wait events

2020-03-08 Thread Fujii Masao



On 2020/03/08 13:52, Masahiko Sawada wrote:

On Thu, 5 Mar 2020 at 20:16, Fujii Masao  wrote:




On 2020/03/05 16:58, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 15:21, Fujii Masao  wrote:




On 2020/03/04 14:31, Masahiko Sawada wrote:

On Wed, 4 Mar 2020 at 13:48, Fujii Masao  wrote:




On 2020/03/04 13:27, Michael Paquier wrote:

On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:

Yeah, so 0001 patch sets existing wait events to recovery conflict
resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
to the recovery conflict on a snapshot. 0003 patch improves these wait
events by adding the new type of wait event such as
WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
is the fix for existing versions and 0003 patch is an improvement for
only PG13. Did you mean even 0001 patch doesn't fit for back-patching?


Yes, it looks like a improvement rather than bug fix.



Okay, understand.


I got my eyes on this patch set.  The full patch set is in my opinion
just a set of improvements, and not bug fixes, so I would refrain from
back-backpatching.


I think that the issue (i.e., "waiting" is reported twice needlessly
in PS display) that 0002 patch tries to fix is a bug. So it should be
fixed even in the back branches.


So we need only two patches: one fixes process title issue and another
improve wait event. I've attached updated patches.


Thanks for updating the patches! I started reading 0001 patch.


Thank you for reviewing this patch.



-   /*
-* Report via ps if we have been waiting for more than 
500 msec
-* (should that be configurable?)
-*/
-   if (update_process_title && new_status == NULL &&
-   TimestampDifferenceExceeds(waitStart, 
GetCurrentTimestamp(),
-   
   500))

The patch changes ResolveRecoveryConflictWithSnapshot() and
ResolveRecoveryConflictWithTablespace() so that they always add
"waiting" into the PS display, whether wait is really necessary or not.
But isn't it better to display "waiting" in PS basically when wait is
necessary, like originally ResolveRecoveryConflictWithVirtualXIDs()
does as the above?


You're right. Will fix it.



ResolveRecoveryConflictWithDatabase(Oid dbid)
{
+   char*new_status = NULL;
+
+   /* Report via ps we are waiting */
+   new_status = set_process_title_waiting();

In  ResolveRecoveryConflictWithDatabase(), there seems no need to
display "waiting" in PS because no wait occurs when recovery conflict
with database happens.


Isn't the startup process waiting for other backend to terminate?


Yeah, you're right. I agree that "waiting" should be reported in this case.

Currently ResolveRecoveryConflictWithLock() and
ResolveRecoveryConflictWithBufferPin() don't call
ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting"
in PS display. You changed them so that they report "waiting". I agree
to have this change. But this change is an improvement rather than
a bug fix, i.e., we should apply this change only in v13?



Did you mean ResolveRecoveryConflictWithDatabase and
ResolveRecoveryConflictWithBufferPin?


Yes! Sorry for my typo.


In the current code as far as I
researched there are two cases where we don't add "waiting" and one
case where we doubly add "waiting".

ResolveRecoveryConflictWithDatabase and
ResolveRecoveryConflictWithBufferPin don't update the ps title.
Although the path where GetCurrentTimestamp() >= ltime is false in
ResolveRecoveryConflictWithLock also doesn't update the ps title, it's
already updated in WaitOnLock. On the other hand, the path where
GetCurrentTimestamp() >= ltime is true in
ResolveRecoveryConflictWithLock updates the ps title but it's wrong as
I reported.

I've split the patch into two patches: 0001 patch fixes the issue
about doubly updating ps title, 0002 patch makes the recovery conflict
resolution on database and buffer pin update the ps title.


Thanks for splitting the patches. I think that 0001 patch can be back-patched.

-   /*
-* Report via ps if we have been waiting for more than 
500 msec
-* (should that be configurable?)
-*/
-   if (update_process_title && new_status == NULL &&
-   TimestampDifferenceExceeds(waitStart, 
GetCurrentTimestamp(),
-   
   500))

Originally, "waiting" is reported in PS if we've been waiting for more than
500 msec, as the above does. But you got rid of those codes in the patch.
Did you confirm that it's safe to do that? If not, isn't it better to apply
the attached patch? The attached patch makes
ResolveRecoveryConflictWithVirtualXIDs() report "waiting" as it d

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-08 Thread Amit Kapila
On Sat, Mar 7, 2020 at 9:19 PM Dilip Kumar  wrote:

> On Sat, Mar 7, 2020 at 8:53 PM Tom Lane  wrote:
> >
> > Dilip Kumar  writes:
> > > I think instead of the flag we need to keep the counter because we can
> > > acquire the same relation extension lock multiple times.
> >
> > Uh ... what?  How would that not be broken usage on its face?
>
> Basically,  if we can ensure that while holding the relation extension
> lock we will not wait for any other lock then we can ignore in the
> deadlock detection path so that we don't detect the false deadlock due
> to the group locking mechanism.  So if we are already holding the
> relation extension lock and trying to acquire the same lock-in same
> mode then it can never wait so this is safe.
>
> > I continue to think that we'd be better off getting all of this
> > out of the heavyweight lock manager.  There is no reason why we
> > should need deadlock detection, or multiple holds of the same
> > lock, or pretty much anything that LWLocks don't give you.
>
> Right, we never need deadlock detection for this lock.  But, I think
> there are quite a few cases where we have multiple holds at the same
> time.  e.g, during RelationAddExtraBlocks, while holding the relation
> extension lock we try to update the block in FSM and FSM might need to
> add extra FSM block which will again try to acquire the same lock.
>
> But, I think the main reason for not converting it to an LWLocks is
> because Andres has a concern about inventing new lock mechanism as
> discuss upthread[1]
>
>
Right, that is one point and another is that if we go via the route of
converting it to LWLocks, then we also need to think of some solution for
page locks that are used in ginInsertCleanup.  However, if we go with the
approach being pursued [1] then the page locks will be handled in a similar
way as relation extension locks.

[1] -
https://www.postgresql.org/message-id/CAA4eK1%2BNjo%2BpnqSNi2ScKf0BcVBWWf37BrW-pykVSG0B0C5Qig%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Crash by targetted recovery

2020-03-08 Thread Kyotaro Horiguchi
At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao  
wrote in 
> > (It seems retroverting to the first patch when I started this...)
> > The second place covers wider cases so I reverted the first place.
> 
> Thanks for updating the patch that way.
> Not sure which patch you're mentioning, though.

That meant 0003.

> Regarding 0003 patch, I added a bit more detail comments into
> the patch so that we can understand the code more easily.
> Updated version of 0003 patch attached. Barring any objection,
> at first, I plan to commit this patch.

Looks good to me. Thanks for writing the detailed comments.

> >> + lastSourceFailed = false; /* We haven't failed on the new source */
> >>
> >> Is this really necessary? Since ReadRecord() always reset
> >> lastSourceFailed to false, it seems not necessary.
> > It's just to make sure.  Actually lastSourceFailed is always false
> > when we get there.  But when the source is switched, lastSourceFailed
> > should be changed to false as a matter of design.  I'd like to do that
> > unless that harms.
> 
> OK.

Thanks.

> >> -  else if (currentSource == 0)
> >> +  else if (currentSource == 0 ||
> >>
> >> Though this is a *separate topic*, 0 should be XLOG_FROM_ANY?
> >> There are some places where 0 is used as the value of currentSource.
> >> IMO they should be updated so that XLOG_FROM_ANY is used instead of 0.
> > Yeah, I've thought that many times but have neglected since it is not
> > critical and trivial as a separate patch.  I'd take the chance to do
> > that now.  Another minor glitch is "int oldSource = currentSource;" it
> > is not debugger-friendly so I changed it to XLogSource.  It is added
> > as a new patch file before the main patch.
> 
> There seems to be more other places where XLogSource and
> XLOG_FROM_XXX are not used yet. For example, the initial values
> of readSource and XLogReceiptSource, the type of argument
> "source" in XLogFileReadAnyTLI() and XLogFileRead(), etc.
> These also should be updated?

Right. I checked through the file and AFAICS that's all.  The attachec
v5-0001-Tidy...patch is the fix on top of the v4-0003 on the current
master.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 358f37899a02a8ae6f803fe32b97c2cbe302786f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Mar 2020 10:11:59 +0900
Subject: [PATCH v5] Tidy up XLogSource usage.

We used interger 0 instead of XLOG_FROM_ANY and defined a variable to
store the type with int. Tidy them up for readability and
debugger-friendliness.
---
 src/backend/access/transam/xlog.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 21c0acb740..9ebfbf31c5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -795,7 +795,7 @@ static int	readFile = -1;
 static XLogSegNo readSegNo = 0;
 static uint32 readOff = 0;
 static uint32 readLen = 0;
-static XLogSource readSource = 0;	/* XLOG_FROM_* code */
+static XLogSource readSource = XLOG_FROM_ANY;
 
 /*
  * Keeps track of which source we're currently reading from. This is
@@ -804,7 +804,7 @@ static XLogSource readSource = 0;	/* XLOG_FROM_* code */
  * attempt to read from currentSource failed, and we should try another source
  * next.
  */
-static XLogSource currentSource = 0;	/* XLOG_FROM_* code */
+static XLogSource currentSource = XLOG_FROM_ANY;
 static bool lastSourceFailed = false;
 
 typedef struct XLogPageReadPrivate
@@ -823,7 +823,7 @@ typedef struct XLogPageReadPrivate
  * XLogReceiptSource tracks where we last successfully read some WAL.)
  */
 static TimestampTz XLogReceiptTime = 0;
-static XLogSource XLogReceiptSource = 0;	/* XLOG_FROM_* code */
+static XLogSource XLogReceiptSource = XLOG_FROM_ANY;
 
 /* State information for XLOG reading */
 static XLogRecPtr ReadRecPtr;	/* start of last record read */
@@ -886,8 +886,8 @@ static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
    bool find_free, XLogSegNo max_segno,
    bool use_lock);
 static int	XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
-		 int source, bool notfoundOk);
-static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
+		 XLogSource source, bool notfoundOk);
+static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source);
 static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
 		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
@@ -3633,7 +3633,7 @@ XLogFileOpen(XLogSegNo segno)
  */
 static int
 XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
-			 int source, bool notfoundOk)
+			 XLogSource source, bool notfoundOk)
 {
 	char		xlogfname[MAXFNAMELEN];
 	char		activitymsg[MAXFNAMELEN + 16];
@@ -3715,7 +3715,7 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
  * This version searches fo

Re: Some problems of recovery conflict wait events

2020-03-08 Thread Masahiko Sawada
On Mon, 9 Mar 2020 at 13:24, Fujii Masao  wrote:
>
>
>
> On 2020/03/08 13:52, Masahiko Sawada wrote:
> > On Thu, 5 Mar 2020 at 20:16, Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/03/05 16:58, Masahiko Sawada wrote:
> >>> On Wed, 4 Mar 2020 at 15:21, Fujii Masao  
> >>> wrote:
> 
> 
> 
>  On 2020/03/04 14:31, Masahiko Sawada wrote:
> > On Wed, 4 Mar 2020 at 13:48, Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/03/04 13:27, Michael Paquier wrote:
> >>> On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:
>  Yeah, so 0001 patch sets existing wait events to recovery conflict
>  resolution. For instance, it sets (PG_WAIT_LOCK | 
>  LOCKTAG_TRANSACTION)
>  to the recovery conflict on a snapshot. 0003 patch improves these 
>  wait
>  events by adding the new type of wait event such as
>  WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) 
>  patch
>  is the fix for existing versions and 0003 patch is an improvement for
>  only PG13. Did you mean even 0001 patch doesn't fit for 
>  back-patching?
> >>
> >> Yes, it looks like a improvement rather than bug fix.
> >>
> >
> > Okay, understand.
> >
> >>> I got my eyes on this patch set.  The full patch set is in my opinion
> >>> just a set of improvements, and not bug fixes, so I would refrain from
> >>> back-backpatching.
> >>
> >> I think that the issue (i.e., "waiting" is reported twice needlessly
> >> in PS display) that 0002 patch tries to fix is a bug. So it should be
> >> fixed even in the back branches.
> >
> > So we need only two patches: one fixes process title issue and another
> > improve wait event. I've attached updated patches.
> 
>  Thanks for updating the patches! I started reading 0001 patch.
> >>>
> >>> Thank you for reviewing this patch.
> >>>
> 
>  -   /*
>  -* Report via ps if we have been waiting for 
>  more than 500 msec
>  -* (should that be configurable?)
>  -*/
>  -   if (update_process_title && new_status == NULL &&
>  -   TimestampDifferenceExceeds(waitStart, 
>  GetCurrentTimestamp(),
>  -
>    500))
> 
>  The patch changes ResolveRecoveryConflictWithSnapshot() and
>  ResolveRecoveryConflictWithTablespace() so that they always add
>  "waiting" into the PS display, whether wait is really necessary or not.
>  But isn't it better to display "waiting" in PS basically when wait is
>  necessary, like originally ResolveRecoveryConflictWithVirtualXIDs()
>  does as the above?
> >>>
> >>> You're right. Will fix it.
> >>>
> 
>  ResolveRecoveryConflictWithDatabase(Oid dbid)
>  {
>  +   char*new_status = NULL;
>  +
>  +   /* Report via ps we are waiting */
>  +   new_status = set_process_title_waiting();
> 
>  In  ResolveRecoveryConflictWithDatabase(), there seems no need to
>  display "waiting" in PS because no wait occurs when recovery conflict
>  with database happens.
> >>>
> >>> Isn't the startup process waiting for other backend to terminate?
> >>
> >> Yeah, you're right. I agree that "waiting" should be reported in this case.
> >>
> >> Currently ResolveRecoveryConflictWithLock() and
> >> ResolveRecoveryConflictWithBufferPin() don't call
> >> ResolveRecoveryConflictWithVirtualXIDs and don't report "waiting"
> >> in PS display. You changed them so that they report "waiting". I agree
> >> to have this change. But this change is an improvement rather than
> >> a bug fix, i.e., we should apply this change only in v13?
> >>
> >
> > Did you mean ResolveRecoveryConflictWithDatabase and
> > ResolveRecoveryConflictWithBufferPin?
>
> Yes! Sorry for my typo.
>
> > In the current code as far as I
> > researched there are two cases where we don't add "waiting" and one
> > case where we doubly add "waiting".
> >
> > ResolveRecoveryConflictWithDatabase and
> > ResolveRecoveryConflictWithBufferPin don't update the ps title.
> > Although the path where GetCurrentTimestamp() >= ltime is false in
> > ResolveRecoveryConflictWithLock also doesn't update the ps title, it's
> > already updated in WaitOnLock. On the other hand, the path where
> > GetCurrentTimestamp() >= ltime is true in
> > ResolveRecoveryConflictWithLock updates the ps title but it's wrong as
> > I reported.
> >
> > I've split the patch into two patches: 0001 patch fixes the issue
> > about doubly updating ps title, 0002 patch makes the recovery conflict
> > resolution on database and buffer pin update the ps title.
>
> Thanks for splitting the patches. I think that 0001 patch can be back-patched.
>

Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-08 Thread Kyotaro Horiguchi
At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander  wrote 
in 
> On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao  
> wrote:
> > I believe that the time required to estimate the backup size is not so large
> > in most cases, so in the above idea, most users don't need to specify more
> > option for the estimation. This is good for UI perspective.
> >
> > OTOH, users who are worried about the estimation time can use
> > --no-estimate-backup-size option and skip the time-consuming estimation.
> 
> Personally, I think this is the best idea. it brings a "reasonable
> default", since most people are not going to have this problem, and
> yet a good way to get out from the issue for those that potentially
> have it. Especially since we are now already showing the state that
> "walsender is estimating the size", it should be easy enugh for people
> to determine if they need to use this flag or not.
> 
> In nitpicking mode, I'd just call the flag --no-estimate-size -- it's
> pretty clear things are about backups when you call pg_basebackup, and
> it keeps the option a bit more reasonable in length.

I agree to the negative option and the shortened name.  What if both
--no-estimate-size and -P are specifed?  Rejecting as conflicting
options or -P supercedes?  I would choose the former because we don't
know which of them has priority.

$ pg_basebackup --no-estimate-size -P
pg_basebackup: -P requires size estimate.
$ 

regads.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-08 Thread Amit Kapila
On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Mon, 24 Feb 2020 at 19:08, Amit Kapila  wrote:
> >
> > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund 
> wrote:
> > >
> > > Hi,
> > >
> > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote:
> > > > I think till we know the real need for changing group locking, going
> > > > in the direction of what Tom suggested to use an array of LWLocks [1]
> > > > to address the problems in hand is a good idea.
> > >
> > > -many
> > >
> > > I think that building yet another locking subsystem is the entirely
> > > wrong idea - especially when there's imo no convincing architectural
> > > reasons to do so.
> > >
> >
> > Hmm, AFAIU, it will be done by having an array of LWLocks which we do
> > at other places as well (like BufferIO locks).  I am not sure if we
> > can call it as new locking subsystem, but if we decide to continue
> > using lock.c and change group locking then I think we can do that as
> > well, see my comments below regarding that.
> >
> > >
> > > > It is not very clear to me that are we thinking to give up on Tom's
> > > > idea [1] and change group locking even though it is not clear or at
> > > > least nobody has proposed an idea/patch which requires that?  Or are
> > > > we thinking that we can do what Tom suggested for relation extension
> > > > lock and also plan to change group locking for future parallel
> > > > operations that might require it?
> > >
> > > What I'm advocating is that extension locks should continue to go
> > > through lock.c. And yes, that requires some changes to group locking,
> > > but I still don't see why they'd be complicated.
> > >
> >
> > Fair position, as per initial analysis, I think if we do below three
> > things, it should work out without changing to a new way of locking
> > for relation extension or page type locks.
> > a. As per the discussion above, ensure in code we will never try to
> > acquire another heavy-weight lock after acquiring relation extension
> > or page type locks (probably by having Asserts in code or maybe some
> > other way).
>
> The current patch
> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
> doesn't check that acquiring a heavy-weight lock after page type lock,
> is that right?


No, it should do that.


> There is the path doing that: ginInsertCleanup() holds
> a page lock and insert the pending list items, which might hold a
> relation extension lock.
>

Right, I could also see that, but do you see any problem with that?  I
agree that Assert should cover this case, but I don't see any fundamental
problem with that.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side

2020-03-08 Thread Magnus Hagander
On Sun, Mar 8, 2020 at 10:13 PM Kyotaro Horiguchi
 wrote:
>
> At Fri, 6 Mar 2020 09:54:09 -0800, Magnus Hagander  
> wrote in
> > On Fri, Mar 6, 2020 at 1:51 AM Fujii Masao  
> > wrote:
> > > I believe that the time required to estimate the backup size is not so 
> > > large
> > > in most cases, so in the above idea, most users don't need to specify more
> > > option for the estimation. This is good for UI perspective.
> > >
> > > OTOH, users who are worried about the estimation time can use
> > > --no-estimate-backup-size option and skip the time-consuming estimation.
> >
> > Personally, I think this is the best idea. it brings a "reasonable
> > default", since most people are not going to have this problem, and
> > yet a good way to get out from the issue for those that potentially
> > have it. Especially since we are now already showing the state that
> > "walsender is estimating the size", it should be easy enugh for people
> > to determine if they need to use this flag or not.
> >
> > In nitpicking mode, I'd just call the flag --no-estimate-size -- it's
> > pretty clear things are about backups when you call pg_basebackup, and
> > it keeps the option a bit more reasonable in length.
>
> I agree to the negative option and the shortened name.  What if both
> --no-estimate-size and -P are specifed?  Rejecting as conflicting
> options or -P supercedes?  I would choose the former because we don't
> know which of them has priority.

I would definitely prefer rejecting an invalid combination of options.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: Exposure related to GUC value of ssl_passphrase_command

2020-03-08 Thread Moon, Insung
Dear Kuroda-san, Fujii-san
Thank you for review and commit!
#Oops.. Sorry..This mail thread has been spammed in Gmail.

I'll go to submit a new discussion after found which case could leak
about the GUC parameters related to ssl_*.
Please wait a bit.

Best regards.
Moon.

On Mon, Mar 9, 2020 at 11:43 AM Fujii Masao  wrote:
>
>
>
> On 2020/02/14 10:31, Moon, Insung wrote:
> > Dear Hackers.
> >
> > Thank you for an response.
> > I registered this entry in commifest of 2020-03.
> > # I registered in the security part, but if it is wrong, sincerely
> > apologize for this.
> >
> > And I'd like to review show authority to ssl_ * later and discuss it
> > in a separate thread.
>
> So, you are planning to start new discussion about this?
>
> Regards,
>
> --
> Fujii Masao
> NTT DATA CORPORATION
> Advanced Platform Technology Group
> Research and Development Headquarters




Re: logical copy_replication_slot issues

2020-03-08 Thread Masahiko Sawada
On Fri, 6 Mar 2020 at 20:02, Arseny Sher  wrote:
>
> I wrote:
>
> > It looks good to me now.
>
> After lying for some time in my head it reminded me that
> CreateInitDecodingContext not only pegs the LSN, but also xmin, so
> attached makes a minor comment correction.
>
> While taking a look at the nearby code it seemed weird to me that
> GetOldestSafeDecodingTransactionId checks PGXACT->xid, not xmin. Don't
> want to investigate this at the moment though, and not for this thread.
>
> Also not for this thread, but I've noticed
> pg_copy_logical_replication_slot doesn't allow to change plugin name
> which is an omission in my view. It would be useful and trivial to do.
>

Thank you for updating the patch. The patch looks basically good to me
but I have a few questions:

/*
-* Create logical decoding context, to build the initial snapshot.
+* Create logical decoding context to find start point or, if we don't
+* need it, to 1) bump slot's restart_lsn and xmin 2) check plugin sanity.
 */

Do we need to numbering that despite not referring them?

ctx = CreateInitDecodingContext(plugin, NIL,
-   false,  /* do not build snapshot */
+   false,  /* do not build data snapshot */
restart_lsn,
logical_read_local_xlog_page, NULL, NULL,
NULL);

I'm not sure this change makes the comment better. Could you elaborate
on the motivation of this change?

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: reindex concurrently and two toast indexes

2020-03-08 Thread Michael Paquier
On Fri, Mar 06, 2020 at 01:36:48PM +0100, Julien Rouhaud wrote:
> Ah I see, thanks for the clarification.  I guess there's room for improvement
> in the comments about that, since the ERRCODE_FEATURE_NOT_SUPPORTED usage is
> quite misleading there.
> 
> v4 attached, which doesn't prevent a REINDEX INDEX CONCURRENTLY on any invalid
> non-TOAST index anymore.

Thanks.  The position of the error check in reindex_relation() is
correct, but as it opens a relation for the cache lookup let's invent
a new routine in lsyscache.c to grab pg_index.indisvalid.  It is
possible to make use of this routine with all the other checks:
- WARNING for REINDEX TABLE (non-conurrent)
- ERROR for REINDEX INDEX (non-conurrent)
- ERROR for REINDEX INDEX CONCURRENTLY
(There is already a WARNING for REINDEX TABLE CONCURRENTLY.)

I did not find the addition of an error check in ReindexIndex()
consistent with the existing practice to check the state of the
relation reindexed in reindex_index() (for the non-concurrent case)
and ReindexRelationConcurrently() (for the concurrent case).  Okay,
this leads to the introduction of two new ERROR messages related to
invalid toast indexes for the concurrent and the non-concurrent cases
when using REINDEX INDEX instead of one, but having two messages leads
to something much more consistent with the rest, and all checks remain
centralized in the same routines.

For the index-level operation, issuing a WARNING is not consistent
with the existing practice to use an ERROR, which is more adapted as
the operation is done on a single index at a time. 

For the check in reindex_relation, it is more consistent to check the
namespace of the index instead of the parent relation IMO (the
previous patch used "rel", which refers to the parent table).  This
has in practice no consequence though.

It would have been nice to test this stuff.  However, this requires an
invalid toast index and we cannot create that except by canceling a
concurrent reindex, leading us back to the upthread discussion about
isolation tests, timeouts and fault injection :/

Any opinions?
--
Michael
From 2e808a2971fcaf160f6a1e6c9c80366ff053bccf Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 9 Mar 2020 14:43:33 +0900
Subject: [PATCH v2] Forbid reindex of invalid indexes on TOAST tables

Such indexes can only be duplicated leftovers of a previously failed
REINDEX CONCURRENTLY command, and a valid equivalent is guaranteed to
exist already.  As we only allow the drop of invalid indexes on TOAST
tables, reindexing these would lead to useless duplicated indexes that
can't be dropped anymore.

Thanks to Justin Pryzby for reminding that this problem was reported
long ago, but it has never been addressed.

Reported-by: Sergei Kornilov, Justin Pryzby
Author: Julien Rouhaud
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net
Discussion: https://postgr.es/m/20200216190835.ga21...@telsasoft.com
Backpatch-through: 12
---
 src/include/utils/lsyscache.h   |  1 +
 src/backend/catalog/index.c | 28 
 src/backend/commands/indexcmds.c| 12 
 src/backend/utils/cache/lsyscache.c | 23 +++
 4 files changed, 64 insertions(+)

diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index f132d39458..131d10eab0 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -181,6 +181,7 @@ extern char *get_namespace_name_or_temp(Oid nspid);
 extern Oid	get_range_subtype(Oid rangeOid);
 extern Oid	get_range_collation(Oid rangeOid);
 extern Oid	get_index_column_opclass(Oid index_oid, int attno);
+extern bool get_index_isvalid(Oid index_oid);
 
 #define type_is_array(typid)  (get_element_type(typid) != InvalidOid)
 /* type_is_array_domain accepts both plain arrays and domains over arrays */
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 7223679033..ec2d7dc9cb 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -46,6 +46,7 @@
 #include "catalog/pg_depend.h"
 #include "catalog/pg_description.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/pg_namespace_d.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
 #include "catalog/pg_tablespace.h"
@@ -3474,6 +3475,17 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot reindex temporary tables of other sessions")));
 
+	/*
+	 * Don't allow reindex on an invalid toast index.  This is a leftover from
+	 * a failed REINDEX CONCURRENTLY, and if rebuilt it would not be possible
+	 * to drop it anymore.
+	 */
+	if (RelationGetNamespace(iRel) == PG_TOAST_NAMESPACE &&
+		!get_index_isvalid(indexId))
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex invalid toast index")));
+
 	/*
 	 * Also check for active uses of the index in the curre

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-08 Thread Masahiko Sawada
On Mon, 9 Mar 2020 at 14:16, Amit Kapila  wrote:
>
> On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada 
>  wrote:
>>
>> On Mon, 24 Feb 2020 at 19:08, Amit Kapila  wrote:
>> >
>> > On Thu, Feb 20, 2020 at 8:06 AM Andres Freund  wrote:
>> > >
>> > > Hi,
>> > >
>> > > On 2020-02-19 11:12:18 +0530, Amit Kapila wrote:
>> > > > I think till we know the real need for changing group locking, going
>> > > > in the direction of what Tom suggested to use an array of LWLocks [1]
>> > > > to address the problems in hand is a good idea.
>> > >
>> > > -many
>> > >
>> > > I think that building yet another locking subsystem is the entirely
>> > > wrong idea - especially when there's imo no convincing architectural
>> > > reasons to do so.
>> > >
>> >
>> > Hmm, AFAIU, it will be done by having an array of LWLocks which we do
>> > at other places as well (like BufferIO locks).  I am not sure if we
>> > can call it as new locking subsystem, but if we decide to continue
>> > using lock.c and change group locking then I think we can do that as
>> > well, see my comments below regarding that.
>> >
>> > >
>> > > > It is not very clear to me that are we thinking to give up on Tom's
>> > > > idea [1] and change group locking even though it is not clear or at
>> > > > least nobody has proposed an idea/patch which requires that?  Or are
>> > > > we thinking that we can do what Tom suggested for relation extension
>> > > > lock and also plan to change group locking for future parallel
>> > > > operations that might require it?
>> > >
>> > > What I'm advocating is that extension locks should continue to go
>> > > through lock.c. And yes, that requires some changes to group locking,
>> > > but I still don't see why they'd be complicated.
>> > >
>> >
>> > Fair position, as per initial analysis, I think if we do below three
>> > things, it should work out without changing to a new way of locking
>> > for relation extension or page type locks.
>> > a. As per the discussion above, ensure in code we will never try to
>> > acquire another heavy-weight lock after acquiring relation extension
>> > or page type locks (probably by having Asserts in code or maybe some
>> > other way).
>>
>> The current patch
>> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
>> doesn't check that acquiring a heavy-weight lock after page type lock,
>> is that right?
>
>
> No, it should do that.
>
>>
>> There is the path doing that: ginInsertCleanup() holds
>> a page lock and insert the pending list items, which might hold a
>> relation extension lock.
>
>
> Right, I could also see that, but do you see any problem with that?  I agree 
> that Assert should cover this case, but I don't see any fundamental problem 
> with that.

I think that could be a problem if we change the group locking so that
it doesn't consider page lock type.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Crash by targetted recovery

2020-03-08 Thread Fujii Masao




On 2020/03/09 13:49, Kyotaro Horiguchi wrote:

At Sat, 7 Mar 2020 01:46:16 +0900, Fujii Masao  
wrote in

(It seems retroverting to the first patch when I started this...)
The second place covers wider cases so I reverted the first place.


Thanks for updating the patch that way.
Not sure which patch you're mentioning, though.


That meant 0003.


Regarding 0003 patch, I added a bit more detail comments into
the patch so that we can understand the code more easily.
Updated version of 0003 patch attached. Barring any objection,
at first, I plan to commit this patch.


Looks good to me. Thanks for writing the detailed comments.


Thanks for the review! Pushed.

I will review other two patches later.


There seems to be more other places where XLogSource and
XLOG_FROM_XXX are not used yet. For example, the initial values
of readSource and XLogReceiptSource, the type of argument
"source" in XLogFileReadAnyTLI() and XLogFileRead(), etc.
These also should be updated?


Right. I checked through the file and AFAICS that's all.  The attachec
v5-0001-Tidy...patch is the fix on top of the v4-0003 on the current
master.


Thanks for the patch!

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-08 Thread Amit Kapila
On Mon, Mar 9, 2020 at 11:38 AM Masahiko Sawada <
masahiko.saw...@2ndquadrant.com> wrote:

> On Mon, 9 Mar 2020 at 14:16, Amit Kapila  wrote:
> >
> > On Sun, Mar 8, 2020 at 7:58 AM Masahiko Sawada <
> masahiko.saw...@2ndquadrant.com> wrote:
> >> >
> >> > Fair position, as per initial analysis, I think if we do below three
> >> > things, it should work out without changing to a new way of locking
> >> > for relation extension or page type locks.
> >> > a. As per the discussion above, ensure in code we will never try to
> >> > acquire another heavy-weight lock after acquiring relation extension
> >> > or page type locks (probably by having Asserts in code or maybe some
> >> > other way).
> >>
> >> The current patch
> >> (v02_0001-Added-assert-to-verify-that-we-never-try-to-take-any.patch)
> >> doesn't check that acquiring a heavy-weight lock after page type lock,
> >> is that right?
> >
> >
> > No, it should do that.
> >
> >>
> >> There is the path doing that: ginInsertCleanup() holds
> >> a page lock and insert the pending list items, which might hold a
> >> relation extension lock.
> >
> >
> > Right, I could also see that, but do you see any problem with that?  I
> agree that Assert should cover this case, but I don't see any fundamental
> problem with that.
>
> I think that could be a problem if we change the group locking so that
> it doesn't consider page lock type.
>

I might be missing something, but won't that be a problem only when if
there is a case where we acquire page lock after acquiring a relation
extension lock?  Can you please explain the scenario you have in mind which
can create a problem?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com