Pedagogical example for KNN usage in GiST indexes?

2020-09-27 Thread Esteban Zimanyi
Dear all

Is there a pedagogical example showing KNN processing in GiST indexes ? I
am looking for something equivalent to the example in file geo_spgist.c.

Thanks for your help

Esteban


Re: Get rid of runtime handling of AlternativeSubPlan?

2020-09-27 Thread David Rowley
On Sun, 27 Sep 2020 at 10:03, Tom Lane  wrote:
>
> Thanks for reviewing!
>
> David Rowley  writes:
> > 1. I think we should be moving away from using linitial() and second()
> > when we know there are two items in the list. Using list_nth() has
> > less overhead.
>
> Uh, really?

Yeah. Using linitial() and lsecond() will check if the list is
not-NIL. lsecond() does an additional check to ensure the list has at
least two elements. None of which are required since we already know
the list has two elements.

>  And if it's true, why would we change all the call sites
> rather than improving the pg_list.h macros?

Maybe we should. Despite the non-NIL check and length check in
list_head(), list_second_cell(), list_third_cell() functions, the
corresponding macro will crash anyway if those functions were to
return NULL.  We might as well just use list_nth_cell() to get the
ListCell without any checks to see if the cell exists.  I  can go off
and fix those separately. I attached a 0004 patch to help explain what
I'm talking about.

> > 2. I did have sight concerns that fix_alternative_subplan() always
> > assumes the list of subplans will always be 2, though on looking at
> > the definition of AlternativeSubPlan, I see always having two in the
> > list is mentioned. It feels like fix_alternative_subplan() wouldn't
> > become much more complex to allow any non-zero number of subplans, but
> > maybe making that happen should wait until there is some need for more
> > than two. It just feels a bit icky to have to document the special
> > case when not having the special case is not that hard to implement.
>
> It seemed to me that dealing with the general case would make
> fix_alternative_subplan() noticeably more complex and less obviously
> correct.  I might be wrong though; what specific coding did you have in
> mind?

I had thought something like 0003 (attached). It's a net reduction of
3 entire lines, including the removal of the comment that explained
that there's always two in the list.

> > On a side note, I was playing around with the following case:
> > ...
> > both master and patched seem to not choose to use the hashed subplan
> > which results in a pretty slow execution time. This seems to be down
> > to cost_subplan() doing:
> >   /* we only need to fetch 1 tuple; clamp to avoid zero divide */
> >   sp_cost.per_tuple += plan_run_cost / clamp_row_est(plan->plan_rows);
> > I imagine / 2 might be more realistic to account for the early abort,
> > which is pretty much what the ALL_SUBLINK and ANY_SUBLINK do just
> > below:
>
> Hm, actually isn't it the other way around?  *If* there are any matching
> rows, then what's being done here is an accurate estimate.  But if there
> are not, we're going to have to scan the entire subquery output to verify
> that.  I wonder if we should just be taking the subquery cost at face
> value, ie be pessimistic not optimistic.  If the user is bothering to
> test EXISTS, we should expect that the no-match case does happen.
>
> However, I think that's a distinct concern from this patch; this patch
> is only meant to improve the processing of alternative subplans, not
> to change the costing rules around them.  If we fool with it I'd rather
> do so as a separate patch.

Yeah, agreed. I'll open another thread.

David


0004-Improve_pg_list_macros.patch
Description: Binary data


0003-Allow-any-number-of-alternative-subplans.patch
Description: Binary data


Re: Fwd: Extending range type operators to cope with elements

2020-09-27 Thread Esteban Zimanyi
Dear all

After a long time (as you can imagine, this year everything has been upside
down ...), you will find enclosed the patch for extending the range
operators so they can cope with range  element and element  range
in addition to the existing range  range.

Best regards

Esteban


Prof. Esteban Zimanyi
Department of Computer & Decision Engineering  (CoDE) CP 165/15
Universite Libre de Bruxelles
Avenue F. D. Roosevelt 50
B-1050 Brussels, Belgium
fax: + 32.2.650.47.13
tel: + 32.2.650.31.85
e-mail: ezima...@ulb.ac.be
Internet: http://cs.ulb.ac.be/members/esteban/


On Tue, Sep 17, 2019 at 5:18 AM David Fetter  wrote:

> On Sun, Sep 15, 2019 at 04:30:52PM +0200, Esteban Zimanyi wrote:
> > > So yes, I've had a need for those operators in the past. What I don't
> > know is whether adding these functions will be worth the catalog clutter.
> >
> > The operators are tested and running within MobilityDB. It concerns lines
> > 231-657 for the C code in file
> >
> https://github.com/MobilityDB/MobilityDB/blob/master/src/rangetypes_ext.c
> 
> >
> > and lines 32-248 for the SQL code in file
> >
> https://github.com/ULB-CoDE-WIT/MobilityDB/blob/master/src/sql/07_rangetypes_ext.in.sql
> >
> > Since you don't really use PR, please let me know whether I can be of
> > any help.
>
> It's not done by pull request at this time. Instead, it is done by sending
> patches to this mailing list.
>
> http://wiki.postgresql.org/wiki/Development_information
> http://wiki.postgresql.org/wiki/Submitting_a_Patch
> https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F
> http://www.interdb.jp/pg/
>
> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>


range-ext.patch
Description: Binary data


Re: Get rid of runtime handling of AlternativeSubPlan?

2020-09-27 Thread Tom Lane
David Rowley  writes:
> On Sun, 27 Sep 2020 at 10:03, Tom Lane  wrote:
>> And if it's true, why would we change all the call sites
>> rather than improving the pg_list.h macros?

> Maybe we should. Despite the non-NIL check and length check in
> list_head(), list_second_cell(), list_third_cell() functions, the
> corresponding macro will crash anyway if those functions were to
> return NULL.

Hm, good point.

> We might as well just use list_nth_cell() to get the
> ListCell without any checks to see if the cell exists.  I  can go off
> and fix those separately. I attached a 0004 patch to help explain what
> I'm talking about.

Yeah, that should be dealt with separately.

>> It seemed to me that dealing with the general case would make
>> fix_alternative_subplan() noticeably more complex and less obviously
>> correct.  I might be wrong though; what specific coding did you have in
>> mind?

> I had thought something like 0003 (attached). It's a net reduction of
> 3 entire lines, including the removal of the comment that explained
> that there's always two in the list.

Meh.  This seems to prove my point, as it's in fact wrong; you are only
nulling out the discarded subplans-list entry in one of the two cases.
Once you fix that it's not really shorter anymore, nor clearer.  Still,
I suppose there's some value in removing the assumption about exactly
two subplans.

I'll fix up fix_alternative_subplan and push this.  I think the other
topics should be raised in separate threads.

Thanks for reviewing!

regards, tom lane




Re: Fwd: Extending range type operators to cope with elements

2020-09-27 Thread Tom Lane
Esteban Zimanyi  writes:
> After a long time (as you can imagine, this year everything has been upside
> down ...), you will find enclosed the patch for extending the range
> operators so they can cope with range  element and element  range
> in addition to the existing range  range.

Cool.  Please add this to the open commitfest list [1] to ensure we don't
lose track of it.

regards, tom lane

[1] https://commitfest.postgresql.org/30/




Re: AppendStringInfoChar instead of appendStringInfoString

2020-09-27 Thread Justin Pryzby
On Fri, Sep 25, 2020 at 08:49:57AM +, Hou, Zhijie wrote:
> In (/src/backend/replication/backup_manifest.c)
> 
> I found the following code:
> 
>   appendStringInfoString(&buf, "\n");
>   appendStringInfoString(&buf, "\"");

Good point.  There's another one:

$ git grep -E 'appendStringInfoString.*".{,1}");'
src/backend/utils/adt/ruleutils.c:  appendStringInfoString(buf, "(");

I noticed you added a similar thread here.
https://commitfest.postgresql.org/30/

I think this one could be combined as a single patchset with the existing CF
entry for the other thread.

https://www.postgresql.org/message-id/flat/cb172cf4361e4c7ba7167429070979d4%40G08CNEXMBPEKD05.g08.fujitsu.local
https://www.postgresql.org/message-id/ce9a8564ccf1435eacf14bb7364ae9de%40G08CNEXMBPEKD05.g08.fujitsu.local

-- 
Justin




Partition prune with stable Expr

2020-09-27 Thread Andy Fan
Hi:

I find we can't prune partitions in the planner if the qual is a stable
function.

CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktempint,
unitsales   int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');

postgres=# explain (costs off) select * from measurement
postgres-# where logdate = to_date('2006-03-02', '-mm-dd');
 QUERY PLAN
-
 Append
   Subplans Removed: 1  <-- Here
   ->  Seq Scan on measurement_y2006m03 measurement_1
 Filter: (logdate = to_date('2006-03-02'::text, '-mm-dd'::text))
(4 rows)

IMO, we should do it. Why not?  The attached is used to show the things
in my mind.

Btw,  why the to_date function is declared as stable rather than immutable
since it always delivers the same result for the same inputs.

-- 
Best Regards
Andy Fan


v1-0001-Allow-planner-prune-partitionn-with-stable-Expr.patch
Description: Binary data


Re: Partition prune with stable Expr

2020-09-27 Thread David Rowley
On Mon, 28 Sep 2020 at 08:59, Andy Fan  wrote:
> I find we can't prune partitions in the planner if the qual is a stable 
> function.

> IMO, we should do it. Why not?

Thanks for showing an interest in partition pruning. Unfortunately,
it's not possible to use stable functions to prune partitions during
planning.

NOW() is one example of a function that's stable, but the return value
will change over time. If we used the return value of that to perform
partition pruning then we'd end up with a plan that's wrong over time.

Here's an example:

create table rp (t timestamp) partition by range(t);
create table rp1 partition of rp for values from ('now'::timestamp) to
('now'::timestamp + '1 min'::interval);
create table rp2 partition of rp for values from ('now'::timestamp +
'1 min'::interval) to ('now'::timestamp + '2 min'::interval);
insert into rp select t from generate_Series('now'::timestamp,
'now'::timestamp + '1 min 59 sec'::interval, '1 sec'::interval) t;

prepare q1 as select count(*) from rp where t > now() and t < now() +
'10 sec'::interval;

Now, if you run the following command with your patch, it'll prune the
rp2 partition as it's not required for the WHERE clause (at the time
we planned). However, just wait 1 minute and execute the plan again.
Oops, my rows vanished!

execute q1; select pg_sleep(60); execute q1;

The 2nd execute should have returned 10 rows, the same as the first
(assuming you executed that directly after creating the tables)

Run-time partition pruning was invented just for this purpose.

David




Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

2020-09-27 Thread David Rowley
Over in [1] I complained to Tom that I thought he should use
list_nth() instead of linitial() and lsecond(). My reasoning was that
we already knew that the list contained 2 elements, and linitial() and
lsecond() do some additional checks and return NULL if there are fewer
than that amount of elements in the list, or so I thought.  As it
turns out, due to the lfirst() dereferencing the returned pointer,
we'd just segfault if the list being too short and we got NULL.

Since there appears to be no actual safety reason to use those macros
in case your list is too small, it seems better just to use
lfirst(list_nth_cell(..., ...)).  This saves doing the checks to see
if the list is NIL or too short.  It seems nice to get rid of that
additional branching.

We can't just return list_nth() as some code still do things like:

linitial(list) = something;

and we obviously can't assign "something" to the return value of a
function call.

I've attached a patch which improves the macros.

I see on gcc9.3 this reduces the size of the postgres binary by about
16KB: 9027296 to 9011320.

I'm a bit unsure about llast()'s new double evaluation of the list.
Perhaps I can add a new inline function named list_last_cell() to get
around that... Or maybe it doesn't matter. I'm not quite sure what's
best there.

David


[1] 
https://www.postgresql.org/message-id/caaphdvqeh8jeqmjpcftghd_zu2s03novh2srejd+snlza8m...@mail.gmail.com


0001-Improve_pg_list_macros.patch
Description: Binary data


Re: Partition prune with stable Expr

2020-09-27 Thread Andy Fan
Thank you David for coming:)

On Mon, Sep 28, 2020 at 4:46 AM David Rowley  wrote:

> On Mon, 28 Sep 2020 at 08:59, Andy Fan  wrote:
> > I find we can't prune partitions in the planner if the qual is a stable
> function.
>
> > IMO, we should do it. Why not?
>
> Thanks for showing an interest in partition pruning. Unfortunately,
> it's not possible to use stable functions to prune partitions during
> planning.
>
> NOW() is one example of a function that's stable, but the return value
> will change over time. If we used the return value of that to perform
> partition pruning then we'd end up with a plan that's wrong over time.
>
>
Sigh.. I understand you now, I ignored the plan can be cached for later use.
Without that,  we should be able to prune with stable function.  I know the
run-time partition prune can help with this, but it can't help with
planning time.
I run into some cases that SELECT * FROM p WHERE pkey = to_date(..);
p has 1500+ partitions and planning takes lots of time.  and users are not
willing to remove the to_date('2018-11-11', '-mm-dd') style code since
too much and can't find out all of them at once.  Actually I think to_date
should
be marked as immuable rather than stable.


-- 
Best Regards
Andy Fan


Re: Partition prune with stable Expr

2020-09-27 Thread Tom Lane
Andy Fan  writes:
> On Mon, Sep 28, 2020 at 4:46 AM David Rowley  wrote:
>> Thanks for showing an interest in partition pruning. Unfortunately,
>> it's not possible to use stable functions to prune partitions during
>> planning.

> Sigh.. I understand you now, I ignored the plan can be cached for later use.
> Without that,  we should be able to prune with stable function.

No, that's still wrong.  The contract for a stable function is that
its result won't change over execution of a single query; but that
says *execution*, not *planning and execution*.

In particular, the canonical example of a stable function is one
whose result depends on a database query.  The reason it can be
considered stable is that within a single outer query, the MVCC
snapshot it's used with won't change.  But we take a new snapshot
(later than the planner's snapshot) when beginning execution.

Somebody (Robert Haas, if memory serves, which it might not)
tried to change that a few years ago.  It blew up pretty well,
and was eventually reverted, because of undesirable side-effects
on user-visible query semantics.  You'd have to check the archives
for details.

It's possible that we could make that work differently in serializable
mode, thanks to the longer persistence of snapshots.  Not sure that
it'd be desirable for planning to work differently in serializable
mode, though.

regards, tom lane




Re: Parallel INSERT (INTO ... SELECT ...)

2020-09-27 Thread Greg Nancarrow
On Sun, Sep 27, 2020 at 2:03 AM vignesh C  wrote:
>
> I noticed that we are not having any check for skipping temporary
> table insertion.
>

> You should also include temporary tables check here, as parallel
> workers might not have access to temporary tables.
>

Thanks Vignesh, you are right, I need to test this and add it to the
list of further exclusions that the patch needs to check for.
Hopefully I can provide an updated patch soon that caters for these
additional identified cases.

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

2020-09-27 Thread Tom Lane
David Rowley  writes:
> I'm a bit unsure about llast()'s new double evaluation of the list.
> Perhaps I can add a new inline function named list_last_cell() to get
> around that... Or maybe it doesn't matter. I'm not quite sure what's
> best there.

Double evaluation bad, especially in a macro that has not had such a
hazard for the last twenty-plus years.

It might not be worth mucking with llast, as it's not used very heavily
I believe.  But if it is, then let's add another inline function.

regards, tom lane




Re: "cert" + clientcert=verify-ca in pg_hba.conf?

2020-09-27 Thread Kyotaro Horiguchi
Hello.

At Fri, 25 Sep 2020 13:30:06 -0400, Bruce Momjian  wrote in 
> On Thu, Sep 24, 2020 at 09:59:50PM -0400, Tom Lane wrote:
> > Kyotaro Horiguchi  writes:
> > > Thank you Bruce, Michael. This is a rebased version.
> > 
> > I really strongly object to all the encoded data in this patch.
> > One cannot read it, one cannot even easily figure out how long
> > it is until the tests break by virtue of the certificates expiring.

I thought the same but the current source tree contains generated
certificates, perhaps for developer's convenience. This patch follows
the policy (if it is correct..). If certificates expiring matters,
don't we need to remove the certificates in the current tree?

(Anyway we experenced replacement of existing certificates due to
obsoletion of a cipher algorithm and will face the same when the
current cipher algorithm gets obsolete.)

> > One can, however, be entirely certain that they *will* break at
> > some point.  I don't like the idea of time bombs in our test suite.
> > That being the case, it'd likely be better to drop all the pre-made
> > certificates and have the test scripts create them on the fly.
> > That'd remove both the documentation problem (i.e., having readable
> > info as to how the certificates were made) and the expiration problem.
> 
> I am not planning to apply the test parts of this patch.  I think
> having the committer test it is sufficient.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Planner making bad choice in alternative subplan decision

2020-09-27 Thread David Rowley
This follows on from what I mentioned in [1]. I'll reiterate below:

Really the topic is much more broad than what I wrote above, but the
example case I have is around subplan costing which results in the
alternative subplan code making a bad choice.

So the example case is:

create table t (a int, b int);
insert into t select x,1 from generate_Series(1,1)x;
create index t_b_idx on t (b);
analyze t;

The query is:

select * from t where exists (select 1 from t t2 where t.a=t2.b) or a < 0;

Here the planner has 3 choices:

1. Use a non-hashed subplan and use the t_b_idx to find if there's a
match for the EXISTS, or;
2. Use a hashed subplan and just scan all of t and put that in a hash
table to probe when scanning the outer query, or;
3. Use a seq-scan on the inner table and fetch 1 row matching t.a=t2.b
for each outer row.

#3 is not going to work out well here since t.a has the values from 1
to 1 and t2.b just has one distinct value that is 1. For  of
the scans we'd have to seqscan all rows in the entire t2 to find out
that no row exists. This would be a terrible plan if the planner went
with that.

Unfortunately, that's the exact plan the planner goes with :-(

Let me explain:

The subquery planning that's going on is pretty much planning the
following. I'll use PREPARE and a generic plan to simulate how the
planner deals with unknown parameter values in the subquery. I'll add
a LIMIT 1 since the subquery_planner will by passed a 1.0
tuple_fraction when planning a EXISTS_SUBLINK.

set plan_cache_mode = force_generic_plan;
prepare q1(int) as select b from t where b = $1 limit 1;
postgres=# explain (analyze, costs off, timing off) execute q1(1);
 QUERY PLAN
-
 Limit (actual rows=1 loops=1)
   ->  Seq Scan on t (actual rows=1 loops=1)
 Filter: (b = $1)
 Planning Time: 0.035 ms
 Execution Time: 0.089 ms
(5 rows)

So here the planner has to plan with a unknown parameter value.
var_eq_non_const() comes along and sees the stats say there's only 1
distinct value in the "b" column, so the selectivity is  1.0 / 1.0,
aka 1.0.  That works out well when I pass the value of 1 as the to
execute, but if I pass in anything else, then:

postgres=# explain (analyze, costs off, timing off) execute q1(0);
 QUERY PLAN
-
 Limit (actual rows=0 loops=1)
   ->  Seq Scan on t (actual rows=0 loops=1)
 Filter: (b = $1)
 Rows Removed by Filter: 1
 Planning Time: 0.017 ms
 Execution Time: 1.850 ms
(6 rows)

We run through the entire table to find nothing.

I removed the costs here to trim down the output, but the row estimate
is 1 in both cases.

The reason that the alternative subplan code picks the non-hashed plan
is due to the estimated rows of the subplan is 1 and we apply the
scan cost in cost_subplan() as:

/* we only need to fetch 1 tuple; clamp to avoid zero divide */
sp_cost.per_tuple += plan_run_cost / clamp_row_est(plan->plan_rows);

On my machine the seqscan for the 1 rows costs 180. So
cost_subplan() costs the subplan with: 180 / 1. That's slightly
more favourable than the hashed subplan since that's planning the
seqscan + the cost of hashing. The qual-less scan in this case cost
just 155 but the cost of hashing, coincidentally, takes the startup
cost to 180.  In the very recently committed fix_alternative_subplan()
function, by the time we account for the hash probe cost, the hashed
subplan will cost 205.  That code ends up choosing the non-hashed
plan, aka the dreaded #3 case from above.

Fixing it:

I'm not really quite sure which part of the planner to blame for this
poor choice.  Is it the fact that the planner picked such a risky
seqscan path to fetch 1 row from a table. Or is it cost_subplan() for
dividing the total cost for the EXISTS_SUBLINK by the row estimate.
Certainly if we change that to do the same as ALL_SUBLINK, then that
would fix this particular problem, but it might penilise othercases
incorrectly.

I'm more leaning towards the fact that the planner picked the seqscan
path in the first place. Unfotunately I don't have any great ideas on
how to fix without fudging the costs a bit. Maybe we can make
var_eq_non_const() divide down the selectivity by:

ndistinct = Max(get_variable_numdistinct(vardata, &isdefault),
DEFAULT_NUM_DISTINCT);

instead of :

ndistinct = get_variable_numdistinct(vardata, &isdefault);

But that's going to make the row estimate in this case 50 rows (1
/ 200). With a subplan that the stats say can only either return 1
or 0 rows. Is that too weird?

FWIW, the execution times for #3 is 7,4 seconds on my machine. Using
the hashed subplan (#2) takes 3.7 milliseconds, or 2000 times faster,
if you'd prefer that.

Anyone else have any thoughts on this?

David

(copied in Tom as he was the one I mentioned this to in [1] and who I
promised another thread for this to)

[1] 
https://www.postgresql.org/message-id/ca

RE: Use appendStringInfoString and appendPQExpBufferStr where possible

2020-09-27 Thread Hou, Zhijie
> Good point.  There's another one:

> $ git grep -E 'appendStringInfoString.*".{,1}");'
> src/backend/utils/adt/ruleutils.c:  appendStringInfoString(buf, "(");

> I noticed you added a similar thread here.
> https://commitfest.postgresql.org/30/

> I think this one could be combined as a single patchset with the existing CF 
> entry for the other thread.

Thanks for your response, combined them as a single patchset now.

Best regards,





0001-Latest-appendStringInfoString-instead-of-appendStringInfo.patch
Description: 0001-Latest-appendStringInfoString-instead-of-appendStringInfo.patch


0001-appendStringInfoChar-instead-of-appendStringInfoString.patch
Description: 0001-appendStringInfoChar-instead-of-appendStringInfoString.patch


0001-Latest-appendPQExpBufferStr-instead-of-appendPQExpBuffer.patch
Description: 0001-Latest-appendPQExpBufferStr-instead-of-appendPQExpBuffer.patch


Re: New statistics for tuning WAL buffer size

2020-09-27 Thread Kyotaro Horiguchi
At Sat, 26 Sep 2020 15:48:49 +0530, Amit Kapila  wrote 
in 
> On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao
>  wrote:
> >
> > On 2020/09/25 12:06, Masahiro Ikeda wrote:
> > > On 2020-09-18 11:11, Kyotaro Horiguchi wrote:
> > >> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
> > >>  wrote in
> > >>> Thanks. I confirmed that it causes HOT pruning or killing of
> > >>> dead index tuple if DecodeCommit() is called.
> > >>>
> > >>> As you said, DecodeCommit() may access the system table.
> > >> ...
> > >>> The wals are generated only when logical replication is performed.
> > >>> So, I added pgstat_send_wal() in XLogSendLogical().
> > >>>
> > >>> But, I concerned that it causes poor performance
> > >>> since pgstat_send_wal() is called per wal record,
> > >>
> > >> I think that's too frequent.  If we want to send any stats to the
> > >> collector, it is usually done at commit time using
> > >> pgstat_report_stat(), and the function avoids sending stats too
> > >> frequently. For logrep-worker, apply_handle_commit() is calling it. It
> > >> seems to be the place if we want to send the wal stats.  Or it may be
> > >> better to call pgstat_send_wal() via pgstat_report_stat(), like
> > >> pg_stat_slru().
> > >
> > > Thanks for your comments.
> > > Since I changed to use pgstat_report_stat() and DecodeCommit() is calling 
> > > it,
> > > the frequency to send statistics is not so high.
> >
> > On second thought, it's strange to include this change in pg_stat_wal patch.
> > Because pgstat_report_stat() sends various stats and that change would
> > affect not only pg_stat_wal but also other stats views. That is, if we 
> > really
> > want to make some processes call pgstat_report_stat() newly, which
> > should be implemented as a separate patch. But I'm not sure how useful
> > this change is because probably the stats are almost negligibly small
> > in those processes.
> >
> > This thought seems valid for pgstat_send_wal(). I changed the thought
> > and am inclined to be ok not to call pgstat_send_wal() in some background
> > processes that are very unlikely to generate WAL.
> >
> 
> This makes sense to me. I think even if such background processes have

+1

> This makes sense to me. I think even if such background processes have
> to write WAL due to wal_buffers, it will be accounted next time the
> backend sends the stats.

Where do they send the stats? (I think it's ok to omit seding stats at
all for such low-wal/heap activity processes.)

> One minor point, don't we need to reset the counter
> WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
> stats will be accounted multiple times.

Isn't this doing that?

+   /*
+* Clear out the statistics buffer, so it can be re-used.
+*/
+   MemSet(&WalStats, 0, sizeof(WalStats));

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Partition prune with stable Expr

2020-09-27 Thread Andy Fan
On Mon, Sep 28, 2020 at 7:15 AM Tom Lane  wrote:

> Andy Fan  writes:
> > On Mon, Sep 28, 2020 at 4:46 AM David Rowley 
> wrote:
> >> Thanks for showing an interest in partition pruning. Unfortunately,
> >> it's not possible to use stable functions to prune partitions during
> >> planning.
>
> > Sigh.. I understand you now, I ignored the plan can be cached for later
> use.
> > Without that,  we should be able to prune with stable function.
>
> No, that's still wrong.  The contract for a stable function is that
> its result won't change over execution of a single query; but that
> says *execution*, not *planning and execution*.
>
> In particular, the canonical example of a stable function is one
> whose result depends on a database query.  The reason it can be
> considered stable is that within a single outer query, the MVCC
> snapshot it's used with won't change.  But we take a new snapshot
> (later than the planner's snapshot) when beginning execution.
>
> Somebody (Robert Haas, if memory serves, which it might not)
> tried to change that a few years ago.  It blew up pretty well,
> and was eventually reverted, because of undesirable side-effects
> on user-visible query semantics.  You'd have to check the archives
> for details.
>
> It's possible that we could make that work differently in serializable
> mode, thanks to the longer persistence of snapshots.  Not sure that
> it'd be desirable for planning to work differently in serializable
> mode, though.
>
> regards, tom lane
>

Well, that's very interesting.  Specific to my user case,
SELECT * FROM p WHERE pkey = to_date('2018-12-13', '-mm-dd)';
p has 1500+ partitions and planning takes lots of time, which is so same
with SELECT * FROM p WHERE pkey = '2018-12-13',  however the planning
time difference is so huge, that doesn't make sense in human view.  Can
we do something for that?  to_date(text, text) should be a "immutable"
function
IMO.  Does that have a semantic issue or other issues?


-- 
Best Regards
Andy Fan


Re: calling procedures is slow and consumes extra much memory against calling function

2020-09-27 Thread Tom Lane
Pavel Stehule  writes:
> I am sending another patch that tries to allow CachedPlans for CALL
> statements. I think this patch is very accurate, but it is not nice,
> because it is smudging very precious reference counting for CachedPlans.

I spent some time testing this.  Although the #1 patch gets rid of
the major memory leak of cached plans, the original test case still
shows a pretty substantial leak across repeated executions of a CALL.
The reason is that the stanza for rebuilding stmt->target also gets
executed each time through, and that leaks not only the relatively
small PLpgSQL_row datum but also a bunch of catalog lookup cruft
created on the way to building the datum.  Basically this code forgot
that plpgsql's outer execution layer can't assume that it's running
in a short-lived context.

I attach a revised #1 that takes care of that problem, and also
cleans up what seems to me to be pretty sloppy thinking in both
the original code and Pavel's #1 patch: we should be restoring
the previous value of expr->plan, not cavalierly assuming that
it was necessarily NULL.  I didn't care for looking at the plan's
"saved" field to decide what was happening, either.  We really
should have a local flag variable clearly defining which behavior
it is that we're implementing.

With this patch, I see zero memory bloat on Pavel's original example,
even with a much larger repeat count.

I don't like much of anything about plpgsql-stmt_call-fix-2.patch.
It feels confused and poorly documented, possibly because "fragile"
is not a very clear term for whatever property it is you're trying to
attribute to plans.  But in any case, I think it's fixing the problem
in the wrong place.  I think the right way to fix it probably is to
manage a CALL's saved plan the same as every other plpgsql plan,
but arrange for the transient refcount on that plan to be held by a
ResourceOwner that is not a child of any transaction resowner, but
rather belongs to the procedure's execution and will be released on
the way out of the procedure.

In any case, I doubt we'd risk back-patching either the #2 patch
or any other approach to avoiding the repeat planning.  We need a
back-patchable fix that at least tamps down the memory bloat,
and this seems like it'd do.

regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d4a3d58daa..7a2f2dfe91 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2145,40 +2145,60 @@ exec_stmt_perform(PLpgSQL_execstate *estate, PLpgSQL_stmt_perform *stmt)
 
 /*
  * exec_stmt_call
+ *
+ * NOTE: this is used for both CALL and DO statements.
  */
 static int
 exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 {
 	PLpgSQL_expr *expr = stmt->expr;
+	SPIPlanPtr	orig_plan = expr->plan;
+	bool		local_plan;
+	PLpgSQL_variable *volatile cur_target = stmt->target;
 	volatile LocalTransactionId before_lxid;
 	LocalTransactionId after_lxid;
 	volatile bool pushed_active_snap = false;
 	volatile int rc;
 
+	/*
+	 * If not in atomic context, we make a local plan that we'll just use for
+	 * this invocation, and will free at the end.  Otherwise, transaction ends
+	 * would cause errors about plancache leaks.
+	 *
+	 * XXX This would be fixable with some plancache/resowner surgery
+	 * elsewhere, but for now we'll just work around this here.
+	 */
+	local_plan = !estate->atomic;
+
 	/* PG_TRY to ensure we clear the plan link, if needed, on failure */
 	PG_TRY();
 	{
 		SPIPlanPtr	plan = expr->plan;
 		ParamListInfo paramLI;
 
-		if (plan == NULL)
+		/*
+		 * Make a plan if we don't have one, or if we need a local one.  Note
+		 * that we'll overwrite expr->plan either way; the PG_TRY block will
+		 * ensure we undo that on the way out, if the plan is local.
+		 */
+		if (plan == NULL || local_plan)
 		{
+			/* Don't let SPI save the plan if it's going to be local */
+			exec_prepare_plan(estate, expr, 0, !local_plan);
+			plan = expr->plan;
 
 			/*
-			 * Don't save the plan if not in atomic context.  Otherwise,
-			 * transaction ends would cause errors about plancache leaks.
-			 *
-			 * XXX This would be fixable with some plancache/resowner surgery
-			 * elsewhere, but for now we'll just work around this here.
+			 * A CALL should never be a simple expression.  (If it could be,
+			 * we'd have to worry about saving/restoring the previous values
+			 * of the related expr fields, not just expr->plan.)
 			 */
-			exec_prepare_plan(estate, expr, 0, estate->atomic);
+			Assert(!expr->expr_simple_expr);
 
 			/*
 			 * The procedure call could end transactions, which would upset
 			 * the snapshot management in SPI_execute*, so don't let it do it.
 			 * Instead, we set the snapshots ourselves below.
 			 */
-			plan = expr->plan;
 			plan->no_snapshots = true;
 
 			/*
@@ -2186,14 +2206,20 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			 * case the procedure's argument list has changed.
 			 *

Re: Partition prune with stable Expr

2020-09-27 Thread Tom Lane
Andy Fan  writes:
> Well, that's very interesting.  Specific to my user case,
> SELECT * FROM p WHERE pkey = to_date('2018-12-13', '-mm-dd)';
> p has 1500+ partitions and planning takes lots of time, which is so same
> with SELECT * FROM p WHERE pkey = '2018-12-13',  however the planning
> time difference is so huge, that doesn't make sense in human view.  Can
> we do something for that?  to_date(text, text) should be a "immutable"
> function IMO.  Does that have a semantic issue or other issues?

Yeah.  It depends on the lc_time setting, and possibly also the timezone
GUC.  (Admittedly, common values of the format string would not have
any lc_time dependency, but the immutability property is not fine-grained
enough to recognize that.)

regards, tom lane




Re: Range checks of pg_test_fsync --secs-per-test and pg_test_timing --duration

2020-09-27 Thread Michael Paquier
On Fri, Sep 25, 2020 at 07:52:10AM +0200, Peter Eisentraut wrote:
> looks good to me

Thanks, applied.
--
Michael


signature.asc
Description: PGP signature


Re: New statistics for tuning WAL buffer size

2020-09-27 Thread Masahiro Ikeda

On 2020-09-26 19:18, Amit Kapila wrote:

On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao
 wrote:


On 2020/09/25 12:06, Masahiro Ikeda wrote:
> On 2020-09-18 11:11, Kyotaro Horiguchi wrote:
>> At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
>>  wrote in
>>> Thanks. I confirmed that it causes HOT pruning or killing of
>>> dead index tuple if DecodeCommit() is called.
>>>
>>> As you said, DecodeCommit() may access the system table.
>> ...
>>> The wals are generated only when logical replication is performed.
>>> So, I added pgstat_send_wal() in XLogSendLogical().
>>>
>>> But, I concerned that it causes poor performance
>>> since pgstat_send_wal() is called per wal record,
>>
>> I think that's too frequent.  If we want to send any stats to the
>> collector, it is usually done at commit time using
>> pgstat_report_stat(), and the function avoids sending stats too
>> frequently. For logrep-worker, apply_handle_commit() is calling it. It
>> seems to be the place if we want to send the wal stats.  Or it may be
>> better to call pgstat_send_wal() via pgstat_report_stat(), like
>> pg_stat_slru().
>
> Thanks for your comments.
> Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it,
> the frequency to send statistics is not so high.

On second thought, it's strange to include this change in pg_stat_wal 
patch.

Because pgstat_report_stat() sends various stats and that change would
affect not only pg_stat_wal but also other stats views. That is, if we 
really

want to make some processes call pgstat_report_stat() newly, which
should be implemented as a separate patch. But I'm not sure how useful
this change is because probably the stats are almost negligibly small
in those processes.

This thought seems valid for pgstat_send_wal(). I changed the thought
and am inclined to be ok not to call pgstat_send_wal() in some 
background

processes that are very unlikely to generate WAL.



OK, I removed to pgstat_report_stat() for autovaccum launcher, 
logrep-worker and logrep-launcher.




This makes sense to me. I think even if such background processes have
to write WAL due to wal_buffers, it will be accounted next time the
backend sends the stats.


Thanks for your comments.

IIUC, since each process counts WalStats.m_wal_buffers_full,
backend can't send the counter which other background processes have to 
write WAL due to wal_buffers.
Although we can't track all WAL activity, the impact on the statistics 
is minimal so we can ignore it.



One minor point, don't we need to reset the counter
WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
stats will be accounted multiple times.


Now, the counter is reset in pgstat_send_wal.
Isn't it enough?



The checkpointer doesn't seem to call pgstat_report_stat() currently,
but since there is a possibility to send wal statistics, I added 
pgstat_report_stat().


IMO it's better to call pgstat_send_wal() in the checkpointer, instead,
because of the above reason.


Ok, I changed.



Thanks for updating the patch! I'd like to share my review comments.

+for details.

Like the description for pg_stat_bgwriter,  tag should be used
instead of .


Thanks, fixed.


+  
+   Number of WAL writes when the  
are full

+  

I prefer the following description. Thought?

"Number of times WAL data was written to the disk because wal_buffers 
got full"


Ok, I changed.


+the pg_stat_archiver view ,or
wal

A comma should be just after "view" (not just before "or").


Sorry, anyway I think a comma is not necessary.
I removed it.


+/*
+ * WAL global statistics counter.
+ * This counter is incremented by both each backend and background.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWal WalStats;

What about merging the comments for BgWriterStats and WalStats into
one because they are almost the same? For example,

---
/*
 * BgWriter and WAL global statistics counters.
 * Stored directly in a stats message structure so they can be sent
 * without needing to copy things around.  We assume these init to 
zeroes.

 */
PgStat_MsgBgWriter BgWriterStats;
PgStat_MsgWal WalStats;
---

BTW, originally there was the comment "(unused in other processes)"
for BgWriterStats. But it seems not true, so I removed it from
the above example.


Thanks, I changed.


+   rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+   (void) rc;  /* we'll check for 
error with ferror */

Since the patch changes the pgstat file format,
PGSTAT_FILE_FORMAT_ID should also be changed?


Sorry about that.
I incremented PGSTAT_FILE_FORMAT_ID by +1.

-	 * Clear out global and archiver statistics so they start from zero 
in
+	 * Clear out global, archiver and wal statistics so they start from 
zero in


This is not the issue of this patch, but isn't it better to mention
also SLRU stats here? That is, what about "Clear out global, archiver,
WAL and 

Re: Asynchronous Append on postgres_fdw nodes.

2020-09-27 Thread Kyotaro Horiguchi
Thanks for reviewing.

At Sat, 26 Sep 2020 19:45:39 +0900, Etsuro Fujita  
wrote in 
> > Come to think of "complex", ExecAsync stuff in this patch might be
> > too-much for a short-term solution until executor overhaul, if it
> > comes shortly. (the patch of mine here as a whole is like that,
> > though..). The queueing stuff in postgres_fdw is, too.
> 
> Here are some review comments on “ExecAsync stuff” (the 0002 patch):
> 
> @@ -192,10 +196,20 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
> 
> i = -1;
> while ((i = bms_next_member(validsubplans, i)) >= 0)
> {
> Plan   *initNode = (Plan *) list_nth(node->appendplans, i);
> +   int sub_eflags = eflags;
> +
> +   /* Let async-capable subplans run asynchronously */
> +   if (i < node->nasyncplans)
> +   {
> +   sub_eflags |= EXEC_FLAG_ASYNC;
> +   nasyncplans++;
> +   }
> 
> This would be more ambitious than Thomas’ patch: his patch only allows
> foreign scan nodes beneath an Append node to be executed
> asynchronously, but your patch allows any plan nodes beneath it (e.g.,
> local child joins between foreign tables).  Right?  I think that would

Right. It is intended to work any place, but all upper nodes up to the
common node must be "async-aware and capable" for the machinery to work. So it
doesn't work currently since Append is the only async-aware node.
> be great, but I’m not sure how we execute such plan nodes
> asynchronously as other parts of your patch seem to assume that only
> foreign scan nodes beneath an Append are considered as async-capable.
> Maybe I’m missing something, though.  Could you elaborate on that?

Right about this patch.  As a trial at hand, in my faint memory, some
join methods and some aggregaioion can be async-aware but they are not
included in this patch not to bloat it with more complex stuff.

> Your patch (and the original patch by Robert [1]) modified
> ExecAppend() so that it can process local plan nodes while waiting for
> the results from remote queries, which would be also a feature that’s
> not supported by Thomas’ patch, but I’d like to know performance
> results.  Did you do performance testing on that?  I couldn’t find
> that from the archive.

At least, even though theoretically, I think it's obvious that it's
performant to do something than just sitting waitng for the next tuple
to come from abroad.  (I's not so obvious for slow local
vs. hyperspeed-remotes configuration, but...)

> +bool
> +is_async_capable_path(Path *path)
> +{
> +   switch (nodeTag(path))
> +   {
> +   case T_ForeignPath:
> +   {
> +   FdwRoutine *fdwroutine = path->parent->fdwroutine;
> +
> +   Assert(fdwroutine != NULL);
> +   if (fdwroutine->IsForeignPathAsyncCapable != NULL &&
> +   fdwroutine->IsForeignPathAsyncCapable((ForeignPath *) 
> path))
> +   return true;
> +   }
> 
> Do we really need to introduce the FDW API
> IsForeignPathAsyncCapable()?  I think we could determine whether a
> foreign path is async-capable, by checking whether the FDW has the
> postgresForeignAsyncConfigureWait() API.

Note that the API routine takes a path, but it's just that a child
path in a certain form theoretically can obstruct async behavior.

> In relation to the first comment, I noticed this change in the
> postgres_fdw regression tests:
> 
> HEAD:
> 
> EXPLAIN (VERBOSE, COSTS OFF)
> SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
>QUERY PLAN
> 
>  Sort
>Output: t1.a, (count(((t1.*)::pagg_tab)))
>Sort Key: t1.a
>->  Append
>  ->  HashAggregate
>Output: t1.a, count(((t1.*)::pagg_tab))
>Group Key: t1.a
>Filter: (avg(t1.b) < '22'::numeric)
>->  Foreign Scan on public.fpagg_tab_p1 t1
>  Output: t1.a, t1.*, t1.b
>  Remote SQL: SELECT a, b, c FROM public.pagg_tab_p1
>  ->  HashAggregate
>Output: t1_1.a, count(((t1_1.*)::pagg_tab))
>Group Key: t1_1.a
>Filter: (avg(t1_1.b) < '22'::numeric)
>->  Foreign Scan on public.fpagg_tab_p2 t1_1
>  Output: t1_1.a, t1_1.*, t1_1.b
>  Remote SQL: SELECT a, b, c FROM public.pagg_tab_p2
>  ->  HashAggregate
>Output: t1_2.a, count(((t1_2.*)::pagg_tab))
>Group Key: t1_2.a
>Filter: (avg(t1_2.b) < '22'::numeric)
>->  Foreign Scan on public.fpagg_tab_p3 t1_2
>  Output: t1_2.a, t1_2.*, t1_2.b
>  Remote SQL: SELECT a, b, c FROM public.pagg_tab_p3
> (25 rows)
> 
> Patched:
> 
> EXPLAIN (VERBOSE, COSTS OFF)
> SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
>

RE: Global snapshots

2020-09-27 Thread tsunakawa.ta...@fujitsu.com
Hi Andrey san, all,

From: tsunakawa.ta...@fujitsu.com 
> And please wait.  As below, the patent holder just says that Clock-SI is not
> based on the patent and an independent development.  He doesn't say
> Clock-SI does not overlap with the patent or implementing Clock-SI does not
> infringe on the patent.  Rather, he suggests that Clock-SI has many
> similarities and thus those may match the claims of the patent
> (unintentionally?)  I felt this is a sign of risking infringement.
> 
> "The answer to your question is: No, Clock-SI is not based on the patent - it
> was an entirely independent development. The two approaches are similar in
> the sense that there is no global clock, the commit time of a distributed
> transaction is the same in every partition where it modified data, and a
> transaction gets it snapshot timestamp from a local clock. The difference is
> whether a distributed transaction gets its commit timestamp before or after 
> the
> prepare phase in 2PC."
> 
> The timeline of events also worries me.  It seems unnatural to consider that
> Clock-SI and the patent are independent.
> 
> 2010/6 - 2010/9  One Clock-SI author worked for Microsoft Research as
> an research intern
> 2010/10  Microsoft filed the patent
> 2011/9 - 2011/12  The same Clock-SI author worked for Microsoft
> Research as an research intern
> 2013  The same author moved to EPFL and published the Clock-SI paper
> with another author who has worked for Microsoft Research since then.
> 
> So, could you give your opinion whether we can use Clock-SI without
> overlapping with the patent claims?  I also will try to check and see, so 
> that I
> can understand your technical analysis.
> 
> And I've just noticed that I got in touch with another author of Clock-SI via 
> SNS,
> and sent an inquiry to him.  I'll report again when I have a reply.

I got a reply from the main author of the Clock-SI paper:

[Reply from the Clock-SI author Jiaqing Du]
--
Thanks for reaching out.

I actually did not know that Microsoft wrote a patent which is  similar to the 
ideas in my paper. I worked there as an intern. My Clock-SI paper was done at 
my school (EPFL) after my internships at Microsoft. The paper was very loosely 
related to my internship project at Microsoft. In a sense, the internship 
project at Microsoft inspired me to work on Clock-SI after I finished the 
internship. As you see in the paper, my coauthor, who is my internship host, is 
also from Microsoft, but interestingly he is not on the patent :)

Cheers,
Jiaqing
--


Unfortunately, he also did not assert that Clock-SI does not infringe on the 
patent.  Rather, worrying words are mixed: "similar to my ideas", "loosely 
related", "inspired".

Also, his internship host is the co-author of the Clock-SI paper.  That person 
should be Sameh Elnikety, who has been working for Microsoft Research.  I also 
asked him about the same question, but he has been silent for about 10 days.

When I had a quick look, the patent appeared to be broader than Clock-SI, and 
Clock-SI is a concrete application of the patent.  This is just my guess, but 
Sameh Elnikety had known the patent and set an internship theme at Microsoft or 
the research subject at EPFL based on it, whether he was aware or not.

As of now, it seems that the Clock-SI needs to be evaluated against the patent 
claims by two or more persons -- one from someone who knows Clock-SI well and 
implemented it for Postgres (Andrey-san?), and someone else who shares little 
benefit with the former person and can see it objectively.


Regards
Takayuki Tsunakawa



Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

2020-09-27 Thread David Rowley
On Mon, 28 Sep 2020 at 12:58, Tom Lane  wrote:
>
> David Rowley  writes:
> > I'm a bit unsure about llast()'s new double evaluation of the list.
> > Perhaps I can add a new inline function named list_last_cell() to get
> > around that... Or maybe it doesn't matter. I'm not quite sure what's
> > best there.
>
> Double evaluation bad, especially in a macro that has not had such a
> hazard for the last twenty-plus years.
>
> It might not be worth mucking with llast, as it's not used very heavily
> I believe.  But if it is, then let's add another inline function.

Thanks for having a look at this.

I changed things around to make llast() and the int and oid variant
use a new inline function to get the last cell.

I also pushed the resulting code to master.

David




Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

2020-09-27 Thread Tom Lane
David Rowley  writes:
> I changed things around to make llast() and the int and oid variant
> use a new inline function to get the last cell.
> I also pushed the resulting code to master.

LGTM.

regards, tom lane




Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-09-27 Thread Keisuke Kuroda
Hi Amit,

Thank you for the reply!

> However, after commit c55040ccd0 (When wal_level=logical,
> write invalidations at command end into WAL so that decoding can use
> this information.) we actually know exactly when we need to execute
> each invalidation.

I see, thank you for your explaination.
I'll try to think about the solution by using XLOG_INVALIDATIONS
and referring to the thread
"PATCH: logical_work_mem and logical streaming of large in-progress
transactions".

Best Regards,
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




Re: Feature improvement of tab completion for DEALLOCATE

2020-09-27 Thread Fujii Masao




On 2020/09/25 15:18, btnakamichin wrote:

2020-09-25 14:30 に Fujii Masao さんは書きました:

On 2020/09/25 13:45, btnakamichin wrote:

Hello!

I’d like to improve the deallocate tab completion.

The deallocate tab completion can’t complement “ALL”. Therefore, this patch 
fixes the problem.


Thanks for the patch! It looks basically good, but I think it's better
to add newline just before UNION to avoid long line, as follows.

- COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements);
+ COMPLETE_WITH_QUERY(Query_for_list_of_prepared_statements
+   " UNION SELECT 'ALL'");

Regards,


Thank you, I appreciate your comment.

I have attached pattch with newline.


Thanks for updating the patch! But it contains only the diff from the previous 
patch.
Anyway, I pushed the change that you proposed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

2020-09-27 Thread Tom Lane
Poking around to count remaining uses of those inline functions,
I found a few places that should be using the macros instead,
and fixed them.  After that, I notice that list_tail(),
list_third_cell(), and list_fourth_cell() are entirely unreferenced.
I'm hesitant to get rid of list_tail(), because it seems like it
could well be used by extensions.  But I'd bet quite a bit that
list_third_cell() and list_fourth_cell() are not used anywhere
anymore.  Should we get rid of them to shave a few microseconds
from compile times?

regards, tom lane




Re: New statistics for tuning WAL buffer size

2020-09-27 Thread Amit Kapila
On Mon, Sep 28, 2020 at 7:00 AM Masahiro Ikeda  wrote:
>
> On 2020-09-26 19:18, Amit Kapila wrote
>
> > This makes sense to me. I think even if such background processes have
> > to write WAL due to wal_buffers, it will be accounted next time the
> > backend sends the stats.
>
> Thanks for your comments.
>
> IIUC, since each process counts WalStats.m_wal_buffers_full,
> backend can't send the counter which other background processes have to
> write WAL due to wal_buffers.
>

Right, I misunderstood it.

> Although we can't track all WAL activity, the impact on the statistics
> is minimal so we can ignore it.
>

Yeah, that is probably true.

> > One minor point, don't we need to reset the counter
> > WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
> > stats will be accounted multiple times.
>
> Now, the counter is reset in pgstat_send_wal.
> Isn't it enough?
>

That should be enough.

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.


-- 
With Regards,
Amit Kapila.




Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-09-27 Thread Amit Kapila
On Mon, Sep 28, 2020 at 7:50 AM Keisuke Kuroda
 wrote:
>
> Hi Amit,
>
> Thank you for the reply!
>
> > However, after commit c55040ccd0 (When wal_level=logical,
> > write invalidations at command end into WAL so that decoding can use
> > this information.) we actually know exactly when we need to execute
> > each invalidation.
>
> I see, thank you for your explaination.
> I'll try to think about the solution by using XLOG_INVALIDATIONS
>

You need to refer to XLOG_XACT_INVALIDATIONS, not XLOG_INVALIDATIONS.

> and referring to the thread
> "PATCH: logical_work_mem and logical streaming of large in-progress
> transactions".
>

Okay. Feel free to clarify your questions if you have any? Are you
interested in writing a patch for the same?


-- 
With Regards,
Amit Kapila.




Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros

2020-09-27 Thread David Rowley
Thanks for 9d299a492.

On Mon, 28 Sep 2020 at 15:35, Tom Lane  wrote:
>
> Poking around to count remaining uses of those inline functions,
> I found a few places that should be using the macros instead,
> and fixed them.  After that, I notice that list_tail(),
> list_third_cell(), and list_fourth_cell() are entirely unreferenced.
> I'm hesitant to get rid of list_tail(), because it seems like it
> could well be used by extensions.  But I'd bet quite a bit that
> list_third_cell() and list_fourth_cell() are not used anywhere
> anymore.  Should we get rid of them to shave a few microseconds
> from compile times?

I wouldn't object to the removal of list_third_cell() and list_fourth_cell().

I agree to your reasoning with last_tail(). It does seem more likely
that someone would use it. Although, if you'd proposed to remove it
too, I wouldn't have objected.  It's not like it's hard to reimplement
within an extension for any extensions that use it. Though, perhaps it
would maybe be a shame if that was the sole thing we broke for them
when they try compiling their extension in a year's time on the newly
release PG14.

David




RE: [Patch] Optimize dropping of relation buffers using dlist

2020-09-27 Thread tsunakawa.ta...@fujitsu.com
From: Amit Kapila 
> I agree with the above two points.

Thank you.  I'm relieved to know I didn't misunderstand.


> > * Then, add a new function, say, smgrnblocks_cached() that simply returns
> the cached block count, and DropRelFileNodeBuffers() uses it instead of
> smgrnblocks().
> >
> 
> I am not sure if it worth adding a new function for this. Why not simply add a
> boolean variable in smgrnblocks for this?


One reason is that adding an argument requires modification of existing call 
sites (10 + a few).  Another is that, although this may be different for each 
person's taste, it's sometimes not easy to understand when a function call with 
true/false appears.  One such example is find_XXX(some_args, true/false), where 
the true/false represents missing_ok.  Another example is as follows.  I often 
wonder "what's the meaning of this false, and that true?"

if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, false))
elog(ERROR, "InstallXLogFileSegment should not have failed");

Fortunately, the new function is very short and doesn't duplicate much code.  
The function is a simple getter and the function name can convey the meaning 
straight (if the name is good.)


> BTW, AFAICS, the latest patch
> doesn't have code to address this point.

Kirk-san, can you address this?  I don't mind much if you add an argument or a 
new function.



Regards
Takayuki Tsunakawa



Re: Partition prune with stable Expr

2020-09-27 Thread Andy Fan
On Mon, Sep 28, 2020 at 9:15 AM Tom Lane  wrote:

> Andy Fan  writes:
> > Well, that's very interesting.  Specific to my user case,
> > SELECT * FROM p WHERE pkey = to_date('2018-12-13', '-mm-dd)';
> > p has 1500+ partitions and planning takes lots of time, which is so same
> > with SELECT * FROM p WHERE pkey = '2018-12-13',  however the planning
> > time difference is so huge, that doesn't make sense in human view.  Can
> > we do something for that?  to_date(text, text) should be a "immutable"
> > function IMO.  Does that have a semantic issue or other issues?
>
> Yeah.  It depends on the lc_time setting, and possibly also the timezone
> GUC.  (Admittedly, common values of the format string would not have
> any lc_time dependency, but the immutability property is not fine-grained
> enough to recognize that.)
>
> regards, tom lane
>

Thanks for your reply. Even it has something on GUC or lc_time setting,
suppose
it should be decided at planning time.  Do we have concerns about changes
between planning and execution?

The attached patch marked some common formatting function as immutable,
only one partition prune test case needed fixing because of this. I only
changed
to_char/to_date/to_timestamp,  however the whole list is below. I can
change
all of them if needed.

 proname | count
-+---
 to_ascii| 3
 to_char | 8
 to_date | 1
 to_hex  | 2
 to_json | 1
 to_jsonb| 1
 to_number   | 1
 to_regclass | 1
 to_regcollation | 1
 to_regnamespace | 1
 to_regoper  | 1
 to_regoperator  | 1
 to_regproc  | 1
 to_regprocedure | 1
 to_regrole  | 1
 to_regtype  | 1
 to_timestamp| 2
 to_tsquery  | 2
 to_tsvector | 6
(19 rows)

With this change, the exact issue on the beginning of this thread can be
fixed as
well with this patch.

-- 
Best Regards
Andy Fan


v1-0001-Mark-some-formating-builtin-function-as-immutable.patch
Description: Binary data


Re: Load TIME fields - proposed performance improvement

2020-09-27 Thread Peter Smith
On Sat, Sep 26, 2020 at 2:17 AM Tom Lane  wrote:
> That led me to refactor the patch as attached.  (I'd first thought
...

Thanks for refactoring the patch.

Everything looks good to me.

As expected, observations for the v02 patch gave pretty much the same
results as v01 (previous mail).

v02 perf results
2.07% GetSQLCurrentTime
0.50% GetSQLCurrentDate
--.-% GetSQLLocalTime
0.69% GetCurrentDateTime
-.--% GetCurrentTimeUsec

v02 elapsed time
Run1 1m38s
Run2 1m35s
Run3 1m33s

(gives same %19 improvement as v01)

~

I only have a couple of questions, more for curiosity than anything else.

1. Why is there sometimes an extra *tm = &tt; variable introduced?
(e.g. GetSQLCurrentTime, GetSQLLocalTime). Why not just declare struct
pg_tm tm; and pass the &tm same as what GetSQLCurrentDate does?

2. Shouldn't the comment "/* This is just a convenience wrapper for
GetCurrentTimeUsec */" be in the function comment for
GetCurrentDateTime, instead of in the function body?

~

I added a new commitfest entry for this proposal.
https://commitfest.postgresql.org/30/2746/

Is there anything else I should be doing to help get this committed?
IIUC it seems ready as-is.

Kind Regards,
Peter Smith
Fujitsu Australia




Re: New statistics for tuning WAL buffer size

2020-09-27 Thread Kyotaro Horiguchi
At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila  wrote 
in 
> One other thing that occurred to me today is can't we keep this as
> part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
> reset it. It seems to me this is a cluster-wide stats and somewhat
> similar to some of the other stats we maintain there.

I like that direction, but PgStat_GlobalStats is actually
PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Parallel INSERT (INTO ... SELECT ...)

2020-09-27 Thread Greg Nancarrow
On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
 wrote:

> I further checked on full txn id and command id. Yes, these are
> getting passed to workers  via InitializeParallelDSM() ->
> SerializeTransactionState(). I tried to summarize what we need to do
> in case of parallel inserts in general i.e. parallel COPY, parallel
> inserts in INSERT INTO and parallel inserts in CTAS.
>
> In the leader:
> GetCurrentFullTransactionId()
> GetCurrentCommandId(true)
> EnterParallelMode();
> InitializeParallelDSM() --> calls SerializeTransactionState()
> (both full txn id and command id are serialized into parallel DSM)
>
> In the workers:
> ParallelWorkerMain() -->  calls StartParallelWorkerTransaction() (both
> full txn id and command id are restored into workers'
> CurrentTransactionState->fullTransactionId and currentCommandId)
> If the parallel workers are meant for insertions, then we need to set
> currentCommandIdUsed = true; Maybe we can lift the assert in
> GetCurrentCommandId(), if we don't want to touch that function, then
> we can have a new function GetCurrentCommandidInWorker() whose
> functionality will be same as GetCurrentCommandId() without the
> Assert(!IsParallelWorker());.
>
> Am I missing something?
>
> If the above points are true, we might have to update the parallel
> copy patch set, test the use cases and post separately in the parallel
> copy thread in coming days.
>

Hi Bharath,

I pretty much agree with your above points.

I've attached an updated Parallel INSERT...SELECT patch, that:
- Only uses existing transaction state serialization support for
transfer of xid and cid.
- Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
currentCommandIdUsed=true at the start of a parallel operation (used
for Parallel INSERT case, where we know the currentCommandId has been
assigned to the worker at the start of the parallel operation).
- Tweaks the Assert condition within "used=true" parameter case in
GetCurrentCommandId(), so that it only fires if in a parallel worker
and currentCommandId is false - refer to the updated comment in that
function.
- Does not modify any existing GetCurrentCommandId() calls.
- Does not remove any existing parallel-related asserts/checks, except
for the "cannot insert tuples in a parallel worker" error in
heap_prepare_insert(). I am still considering what to do with the
original error-check here.
[- Does not yet cater for other exclusion cases that you and Vignesh
have pointed out]

This patch is mostly a lot cleaner, but does contain a possible ugly
hack, in that where it needs to call GetCurrentFullTransactionId(), it
must temporarily escape parallel-mode (recalling that parallel-mode is
set true right at the top of ExectePlan() in the cases of Parallel
INSERT/SELECT).

Regards,
Greg Nancarrow
Fujitsu Australia


0003-ParallelInsertSelect.patch
Description: Binary data


[PATCH] SET search_path += octopus

2020-09-27 Thread Abhijit Menon-Sen
Hi.

The first attached patch
(0001-Accept-SET-xyz-pqr-to-add-pqr-to-the-current-setting.patch) adds
support for commands like the following, applicable to any configuration
settings that are represented as a comma-separated list of strings
(i.e., GUC_LIST_INPUT):

postgres=# SET search_path += octopus;
SET
postgres=# SET search_path += "giant squid", kraken, narwhal; -- [1]
SET
postgres=# SET search_path -= public, narwhal;
SET
postgres=# SHOW search_path;
┌─┐
│   search_path   │
├─┤
│ "$user", octopus, "giant squid", kraken │
└─┘
(1 row)

The implementation extends to ALTER SYSTEM SET with next to no effort,
so you can also add entries to shared_preload_libraries without having
to know its current value:

ALTER SYSTEM SET shared_preload_libraries += auto_explain;

The second patch
(0002-Support-SET-syntax-for-numeric-configuration-setting.patch) adds
support to modify numeric configuration settings:

postgres=# SET cpu_tuple_cost += 0.02;
SET
postgres=# SET effective_cache_size += '2GB';
SET
postgres=# SHOW effective_cache_size;
┌──┐
│ effective_cache_size │
├──┤
│ 6GB  │
└──┘
(1 row)
postgres=# ALTER SYSTEM SET max_worker_processes += 4;
ALTER SYSTEM

Being able to safely modify shared_preload_libraries (in particular) and
max_worker_processes during automated extension deployments is a problem
I've struggled with more than once in the past.

These patches do not affect configuration file parsing in any way: its
use is limited to "SET" and "ALTER xxx SET". (After I started working on
this, I came to know that this idea has been proposed in different forms
in the past, and objections to those proposals centred around various
difficulties involved in adding this syntax to configuration files. I'm
not particularly fond of that idea, and it's not what I've done here.)

(Another feature that could be implemented using this framework is to
ensure the current setting is at least as large as a given value:

ALTER SYSTEM SET shared_buffers >= '8GB';

This would not change shared_buffers if it were already larger than 8GB.
I have not implemented this, pending feedback on what's already there,
but it would be simple to do.)

Comments welcome.

-- Abhijit

1. This feature supports a wide variety of marine creatures, with no
   implied judgement about their status, real or mythical; however,
   adding them to shared_preload_libraries is not advisable.
>From b7f262cf98be76215a9b9968c8800831874cf1d7 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen 
Date: Sun, 27 Sep 2020 06:52:30 +0530
Subject: Accept "SET xyz += pqr" to add pqr to the current setting of xyz

A new function called by ExtractSetVariableArgs() modifies the current
value of a configuration setting represented as a comma-separated list
of strings (e.g., search_path) by adding or removing each of the given
arguments, based on new stmt->kind values of VAR_{ADD,SUBTRACT}_VALUE.

Using += x will add x if it is not already present and do nothing
otherwise, and -= x will remove x if it is present and do nothing
otherwise.

The implementation extends to ALTER SYSTEM SET and similar commands, so
this can be used by extension creation scripts to add individual entries
to shared_preload_libraries.

Examples:

SET search_path += my_schema, other_schema;
SET search_path -= public;
ALTER SYSTEM SET shared_preload_libraries += auto_explain;
---
 doc/src/sgml/ref/set.sgml  |  18 
 src/backend/parser/gram.y  |  22 +
 src/backend/parser/scan.l  |  23 -
 src/backend/tcop/utility.c |   2 +
 src/backend/utils/misc/guc.c   | 168 +
 src/include/nodes/parsenodes.h |   4 +-
 src/include/parser/scanner.h   |   1 +
 7 files changed, 233 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/set.sgml b/doc/src/sgml/ref/set.sgml
index 63f312e812..e30e9b42f0 100644
--- a/doc/src/sgml/ref/set.sgml
+++ b/doc/src/sgml/ref/set.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  
 
 SET [ SESSION | LOCAL ] configuration_parameter { TO | = } { value | 'value' | DEFAULT }
+SET [ SESSION | LOCAL ] configuration_parameter { += | -= } { value | 'value' }
 SET [ SESSION | LOCAL ] TIME ZONE { timezone | LOCAL | DEFAULT }
 
  
@@ -40,6 +41,14 @@ SET [ SESSION | LOCAL ] TIME ZONE { timezone
 
+  
+   For configuration parameters that accept a list of values, such as
+   search_path, you can modify the existing setting by adding
+   or removing individual elements with the += and
+   -= syntax. The former will add a value if it is not
+   already present, while the latter will remove an existing value.
+  
+
   
If SET (or equivalently SET SESSION)
is issued within a transaction that is lat

The return value of SPI_connect

2020-09-27 Thread Hou, Zhijie
Hi

I found the function SPI_connect() has only one return value(SPI_OK_CONNECT).
But I also found that some places check the return value of SPI_connect() like 
the following:

if (SPI_connect() != SPI_OK_CONNECT)
elog(ERROR, "SPI_connect failed");

Is it necessary to check the return value of SPI_connect() ?

And in doc https://www.postgresql.org/docs/13/spi-spi-connect.html
It also says SPI_connect() can return SPI_ERROR_CONNECT on error.

May be we can make "int SPI_connect()" => "void SPI_connect()" and fix the doc ?

Best regards,
houzj






Re: New statistics for tuning WAL buffer size

2020-09-27 Thread Amit Kapila
On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila  
> wrote in
> > One other thing that occurred to me today is can't we keep this as
> > part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
> > reset it. It seems to me this is a cluster-wide stats and somewhat
> > similar to some of the other stats we maintain there.
>
> I like that direction, but PgStat_GlobalStats is actually
> PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.
>

Yeah, I think if we want to pursue this direction then we probably
need to have a separate message to set/reset WAL-related stuff. I
guess we probably need to have a separate reset timestamp for WAL. I
think the difference would be that we can have one structure to refer
to global_stats instead of referring to multiple structures and we
don't need to issue separate read/write calls but OTOH I don't see
many disadvantages of the current approach as well.

-- 
With Regards,
Amit Kapila.




Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2

2020-09-27 Thread Michael Paquier
On Fri, Sep 25, 2020 at 12:27:03AM -0400, Tom Lane wrote:
> Given the tiny number of complaints to date, it seems sufficient to me
> to deal with this in HEAD.

Thanks.  I have done more tests with the range of OpenSSL versions we
support on HEAD, and applied this one.  I have noticed that the
previous patch forgot two fail-and-abort code paths as of
EVP_DigestInit_ex() and EVP_DigestUpdate().
--
Michael


signature.asc
Description: PGP signature


Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

2020-09-27 Thread Keisuke Kuroda
Hi Amit,

> You need to refer to XLOG_XACT_INVALIDATIONS, not XLOG_INVALIDATIONS.

Thank you, that's right. XLOG_INVALIDATIONS is added at COMMIT,
so I need to refer to XLOG_XACT_INVALIDATIONS.

> Okay. Feel free to clarify your questions if you have any? Are you
> interested in writing a patch for the same?

Thank you! Yes, I would like to write a patch.
If you already have a discussed thread or patch for this idea,
please let me know.

Best Regards,
-- 
Keisuke Kuroda
NTT Software Innovation Center
keisuke.kuroda.3...@gmail.com




RE: Transactions involving multiple postgres foreign servers, take 2

2020-09-27 Thread tsunakawa.ta...@fujitsu.com
From: Masahiko Sawada 
> On Fri, 25 Sep 2020 at 18:21, tsunakawa.ta...@fujitsu.com
>  wrote:
> > Why does the client backend have to create a new connection cache entry
> during PREPARE or COMMIT PREPARE?  Doesn't the client backend naturally
> continue to use connections that it has used in its current transaction?
> 
> I think there are two cases: a process executes PREPARE TRANSACTION
> and another process executes COMMIT PREPARED later, and if the
> coordinator has cascaded foreign servers (i.g., a foreign server has
> its foreign server) and temporary connection problem happens in the
> intermediate node after PREPARE then another process on the
> intermediate node will execute COMMIT PREPARED on its foreign server.

Aren't both the cases failure cases, and thus handled by the resolver?


> > > In terms of performance you're concerned, I wonder if we can somewhat
> > > eliminate the bottleneck if multiple resolvers are able to run on one
> > > database in the future. For example, if we could launch resolver
> > > processes as many as connections on the database, individual backend
> > > processes could have one resolver process. Since there would be
> > > contention and inter-process communication it still brings some
> > > overhead but it might be negligible comparing to network round trip.
> >
> > Do you mean that if concurrent 200 clients each update data on two foreign
> servers, there are 400 resolvers?  ...That's overuse of resources.
> 
> I think we have 200 resolvers in this case since one resolver process
> per backend process.

That does not parallelize prepare or commit for a single client, as each 
resolver can process only one prepare or commit synchronously at a time.  Not 
to mention the resource usage is high.


> Or another idea is that all processes queue
> foreign transactions to resolve into the shared memory queue and
> resolver processes fetch and resolve them instead of assigning one
> distributed transaction to one resolver process. Using asynchronous
> execution, the resolver process can process a bunch of foreign
> transactions across distributed transactions and grouped by the
> foreign server at once. It might be more complex than the current
> approach but having multiple resolver processes on one database would
> increase through-put well especially by combining with asynchronous
> execution.

Yeah, that sounds complex.  It's simpler and natural for each client backend to 
use the connections it has used in its current transaction and issue prepare 
and commit to the foreign servers, and the resolver just takes care of failed 
commits and aborts behind the scenes.  That's like the walwriter takes care of 
writing WAL based on the client backend that commits asynchronously.


Regards
Takayuki Tsunakawa



a problem about XLogReader callback system

2020-09-27 Thread Wang, Shenhao
Hi, hackers

I have a problem about XLogReader callback system

In xlog.c, function StartupXLOG

xlogreader =
XLogReaderAllocate(wal_segment_size, NULL,
   XL_ROUTINE(.page_read = 
&XLogPageRead,
  
.segment_open = NULL,
  
.segment_close = wal_segment_close),
   &private);

XLogPageReader uses readFile to store the fd, and I can't find any location to 
set the value of seg.ws_file.
Is it necessary to set a segment_close callback?

Best regards
Wang






remove useless returns

2020-09-27 Thread Tang, Haiying
Hello

Found two more useless "return;" lines from the following code.
 .src/backend/regex/regcomp.c
 .src/interfaces/libpq/fe-secure.c

Maybe it's better to remove them together?

Previous discussion:
https://www.postgresql.org/message-id/20191128144653.GA27883@alvherre.pgsql

Best Regards,
Tang

唐海英 (とう かいえい)
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
ADDR.: No.6 Wenzhu Road, Software Avenue,
    Nanjing, 210012, China
TEL  : +86+25-86630566-8336
COINS: 7998-8313
FAX  : +86+25-83317685
Mail:tanghy.f...@cn.fujitsu.com







0001-Remove-useless-return-lines(2).patch
Description: 0001-Remove-useless-return-lines(2).patch


Re: Partition prune with stable Expr

2020-09-27 Thread Jesse Zhang
On Sun, Sep 27, 2020 at 7:52 PM Andy Fan wrote:
>
>
> On Mon, Sep 28, 2020 at 9:15 AM Tom Lane wrote:
>>
>> Andy Fan writes:
>> > Well, that's very interesting.  Specific to my user case,
>> > SELECT * FROM p WHERE pkey = to_date('2018-12-13', '-mm-dd)';
>> > p has 1500+ partitions and planning takes lots of time, which is so same
>> > with SELECT * FROM p WHERE pkey = '2018-12-13',  however the planning
>> > time difference is so huge, that doesn't make sense in human view.  Can
>> > we do something for that?  to_date(text, text) should be a "immutable"
>> > function IMO.  Does that have a semantic issue or other issues?
>>
>> Yeah.  It depends on the lc_time setting, and possibly also the timezone
>> GUC.  (Admittedly, common values of the format string would not have
>> any lc_time dependency, but the immutability property is not fine-grained
>> enough to recognize that.)
>>
>> regards, tom lane
>
>
> Thanks for your reply. Even it has something on GUC or lc_time setting, 
> suppose
> it should be decided at planning time.  Do we have concerns about changes
> between planning and execution?

Planner can be called at prepared statement creation time, like

PREPARE yolo() AS SELECT * FROM foo WHERE pk = to_date(...);

Here, there's an arbitrary gap between planning time, and execution.

>
> The attached patch marked some common formatting function as immutable,
> only one partition prune test case needed fixing because of this. I only 
> changed
> to_char/to_date/to_timestamp,  however the whole list is below. I can change
> all of them if needed.
>
>  proname | count
> -+---
>  to_ascii| 3
>  to_char | 8
>  to_date | 1
>  to_hex  | 2
>  to_json | 1
>  to_jsonb| 1
>  to_number   | 1
>  to_regclass | 1
>  to_regcollation | 1
>  to_regnamespace | 1
>  to_regoper  | 1
>  to_regoperator  | 1
>  to_regproc  | 1
>  to_regprocedure | 1
>  to_regrole  | 1
>  to_regtype  | 1
>  to_timestamp| 2
>  to_tsquery  | 2
>  to_tsvector | 6
> (19 rows)
>
This patch is ridiculous.

Immutable functions need to produce the same output for the same
argument values. None of the functions changed in the patch is
immutable: they are all stable because they all depend on GUC settings
(e.g. to_tsvector depends on default_text_search_config).




Re: Parallel copy

2020-09-27 Thread Amit Kapila
On Wed, Jul 22, 2020 at 7:48 PM vignesh C  wrote:
>
> On Tue, Jul 21, 2020 at 3:54 PM Amit Kapila  wrote:
> >
>
> > Review comments:
> > ===
> >
> > 0001-Copy-code-readjustment-to-support-parallel-copy
> > 1.
> > @@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate)
> >   else
> >   nbytes = 0; /* no data need be saved */
> >
> > + if (cstate->copy_dest == COPY_NEW_FE)
> > + minread = RAW_BUF_SIZE - nbytes;
> > +
> >   inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
> > -   1, RAW_BUF_SIZE - nbytes);
> > +   minread, RAW_BUF_SIZE - nbytes);
> >
> > No comment to explain why this change is done?
> >
> > 0002-Framework-for-leader-worker-in-parallel-copy
>
> Currently CopyGetData copies a lesser amount of data to buffer even though 
> space is available in buffer because minread was passed as 1 to CopyGetData. 
> Because of this there are frequent call to CopyGetData for fetching the data. 
> In this case it will load only some data due to the below check:
> while (maxread > 0 && bytesread < minread && !cstate->reached_eof)
> After reading some data bytesread will be greater than minread which is 
> passed as 1 and return with lesser amount of data, even though there is some 
> space.
> This change is required for parallel copy feature as each time we get a new 
> DSM data block which is of 64K size and copy the data. If we copy less data 
> into DSM data blocks we might end up consuming all the DSM data blocks.
>

Why can't we reuse the DSM block which has unfilled space?

>  I felt this issue can be fixed as part of HEAD. Have posted a separate 
> thread [1] for this. I'm planning to remove that change once it gets 
> committed. Can that go as a separate
> patch or should we include it here?
> [1] - 
> https://www.postgresql.org/message-id/CALDaNm0v4CjmvSnftYnx_9pOS_dKRG%3DO3NnBgJsQmi0KipvLog%40mail.gmail.com
>

I am convinced by the reason given by Kyotaro-San in that another
thread [1] and performance data shown by Peter that this can't be an
independent improvement and rather in some cases it can do harm. Now,
if you need it for a parallel-copy path then we can change it
specifically to the parallel-copy code path but I don't understand
your reason completely.

> > 2.
..
> > + */
> > +typedef struct ParallelCopyLineBoundary
> >
> > Are we doing all this state management to avoid using locks while
> > processing lines?  If so, I think we can use either spinlock or LWLock
> > to keep the main patch simple and then provide a later patch to make
> > it lock-less.  This will allow us to first focus on the main design of
> > the patch rather than trying to make this datastructure processing
> > lock-less in the best possible way.
> >
>
> The steps will be more or less same if we use spinlock too. step 1, step 3 & 
> step 4 will be common we have to use lock & unlock instead of step 2 & step 
> 5. I feel we can retain the current implementation.
>

I'll study this in detail and let you know my opinion on the same but
in the meantime, I don't follow one part of this comment: "If they
don't follow this order the worker might process wrong line_size and
leader might populate the information which worker has not yet
processed or in the process of processing."

Do you want to say that leader might overwrite some information which
worker hasn't read yet? If so, it is not clear from the comment.
Another minor point about this comment:

+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset &
the size of

I think there should be a full-stop after worker instead of a comma.

>
> > 6.
> > In function BeginParallelCopy(), you need to keep a provision to
> > collect wal_usage and buf_usage stats.  See _bt_begin_parallel for
> > reference.  Those will be required for pg_stat_statements.
> >
>
> Fixed
>

How did you ensure that this is fixed? Have you tested it, if so
please share the test? I see a basic problem with your fix.

+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+   &walusage[ParallelWorkerNumber]);

You need to call InstrStartParallelQuery() before the actual operation
starts, without that stats won't be accurate? Also, after calling
WaitForParallelWorkersToFinish(), you need to accumulate the stats
collected from workers which neither you have done nor is possible
with the current code in your patch because you haven't made any
provision to capture them in BeginParallelCopy.

I suggest you look into lazy_parallel_vacuum_indexes() and
begin_parallel_vacuum() to understand how the buffer/wal usage stats
are accumulated. Also, please test this functionality using
pg_stat_statements.

>
> > 0003-Allow-copy-from-command-to-process-data-from-file-ST
> > 10.
> > In the commit message, you have writt

Re: Report error position in partition bound check

2020-09-27 Thread Amit Langote
On Fri, Sep 25, 2020 at 12:02 AM Tom Lane  wrote:
> [ cc'ing Peter, since his opinion seems to have got us here in the first 
> place ]
>
> Amit Langote  writes:
> > On Thu, Sep 24, 2020 at 7:19 AM Tom Lane  wrote:
> >> However, while I was looking at it I couldn't help noticing that
> >> transformPartitionBoundValue's handling of collation concerns seems
> >> less than sane.  There are two things bugging me:
> >>
> >> 1. Why does it care about the expression's collation only when there's
> >> a top-level CollateExpr?  For example, that means we get an error for
> >>
> >> regression=# create table p (f1 text collate "C") partition by list(f1);
> >> CREATE TABLE
> >> regression=# create table c1 partition of p for values in ('a' collate 
> >> "POSIX");
> >> ERROR:  collation of partition bound value for column "f1" does not match 
> >> partition key collation "C"
> >>
> >> but not this:
> >>
> >> regression=# create table c2 partition of p for values in ('a' || 'b' 
> >> collate "POSIX");
> >> CREATE TABLE
> >>
> >> Given that we will override the expression's collation with the partition
> >> column's collation anyway, I don't see why we have this check at all,
> >> so my preference is to just rip out the entire stanza beginning with
> >> "if (IsA(value, CollateExpr))".  If we keep it, though, I think it needs
> >> to do something else that is more general.
>
> > I dug up the discussion which resulted in this test being added and
> > found that Peter E had opined that this failure should not occur [1].
>
> Well, I agree with Peter to that extent, but my opinion is that *none*
> of these cases ought to be errors.  What we're doing here is performing
> an implicit assignment-level coercion of the expression to the type of
> the column, and changing the collation is allowed as part of that:
>
> regression=# create table foo (f1 text collate "C");
> CREATE TABLE
> regression=# insert into foo values ('a' COLLATE "POSIX");
> INSERT 0 1
> regression=# update foo set f1 = 'b' COLLATE "POSIX";
> UPDATE 1
>
> So I find it completely inconsistent that the partitioning logic
> complains about equivalent cases.

My perhaps wrong impression was that the bound expression that is
specified when creating a partition is not as such being *assigned* to
the key column, but now that I think about it some more, that doesn't
matter.

>  I think we should just rip the
> whole thing out, as per the attached draft.  This causes several
> regression test results to change, but AFAICS those are only there
> to exercise the error tests that I think we should get rid of.

Yeah, I can see no other misbehavior resulting from this.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: Yet another fast GiST build

2020-09-27 Thread Heikki Linnakangas



On 21/09/2020 16:29, Andrey M. Borodin wrote:

But thing that bothers me now: when we vacuum leaf page, we bump it's
NSN. But we do not bump internal page LSN. Does this means we will
follow rightlinks after vacuum? It seems superflous.


Sorry, I did not understand what you said above. Vacuum doesn't update 
any NSNs, only LSNs. Can you elaborate?



And btw we did not adjust internal page tuples after vacuum...

What do you mean by that?

- Heikki