Re: support for MERGE

2022-02-27 Thread Zhihong Yu
On Sun, Feb 27, 2022 at 9:25 AM Alvaro Herrera 
wrote:

> I attach v12 of MERGE.  Significant effort went into splitting
> ExecUpdate and ExecDelete into parts that can be reused from the MERGE
> routines in a way that doesn't involve jumping out in the middle of
> TM_Result processing to merge-specific EvalPlanQual handling.  So in
> terms of code flow, this is much cleaner.  It does mean that MERGE now
> has to call three routines instead of just one, but that doesn't seem a
> big deal.
>
> So now the main ExecUpdate first has to call ExecUpdatePrologue, then
> ExecUpdateAct, and complete with ExecUpdateEpilogue.  In the middle of
> all this, it does its own thing to deal with foreign tables, and result
> processing for regular tables including EvalPlanQual.  ExecUpdate has
> been split similarly.
>
> So MERGE now does these three things, and then its own result
> processing.
>
> Now, naively patching in that way results in absurd argument lists for
> these routines, so I introduced a new ModifyTableContext struct to carry
> the context for each operation.  I think this is good, but perhaps it
> could stand some more improvement in terms of what goes in there and
> what doesn't.  It took me a while to find an arrangement that really
> worked.  (Initially I had resultRelInfo and the tuple slot and some
> flags for DELETE, but it turned out to be better to keep them out.)
>
> Regarding code arrangement: I decided that exporting all those
> ModifyTable internal functions was a bad idea, so I made it all static
> again.  I think the MERGE routines are merely additional ModifyTable
> methods; trying to make it a separate executor node doesn't seem like
> it'd be an improvement.  For now, I just made nodeModifyTable.c #include
> execMerge.c, though I'm not sure if moving all the code into
> nodeModifyTable.c would be a good idea, or whether we should create
> separate files for each of those methods.  Generally speaking I'm not
> enthusiastic about creating an exported API of these methods.  If we
> think nodeModifyTable.c is too large, maybe we can split it in smaller
> files and smash them together with #include "foo.c".  (If we do expose
> it in some .h then the ModifyTableContext would have to be exported as
> well, which doesn't sound too exciting.)
>
>
> Sadly, because I've been spending a lot of time preparing for an
> international move, I didn't have time to produce a split patch that
> would first restructure nodeModifyTable.c making this more reviewable,
> but I'll do that as soon as I'm able.  There is also a bit of a bug in a
> corner case of an UPDATE that involves a cross-partition tuple migration
> with another concurrent update.  I was unable to figure out how to fix
> that, so I commented out the affected line from merge-update.spec.
> Again, I'll get that fixed as soon as the dust has settled here.
>
> There are some review points that are still unaddressed, such as
> Andres' question of what happens in case of a concurrent INSERT.
>
> Thanks
>
> --
> Álvaro Herrera   39°49'30"S 73°17'W  —
> https://www.EnterpriseDB.com/
> "La fuerza no está en los medios físicos
> sino que reside en una voluntad indomable" (Gandhi)
>

Hi,

+   else if (node->operation == CMD_MERGE)
+   {
+   /* EXPLAIN ANALYZE display of actual outcome for each tuple
proposed */

I know the comment was copied. But it seems `proposed` should be
`processed`.

For ExecMergeNotMatched():

+   foreach(l, actionStates)
+   {
...
+   break;
+   }

If there is only one state in the List, maybe add an assertion for the
length of the actionStates List.

+   ModifyTableContext sc;

It seems the variable name can be more intuitive. How about naming it mtc
(or context, as what later code uses) ?

Cheers


casting operand to proper type in BlockIdGetBlockNumber

2022-03-03 Thread Zhihong Yu
Hi,
In test output, I saw:

src/backend/utils/adt/tid.c:112:16: runtime error: left shift of 65535 by
16 places cannot be represented in type 'int'

I think this was due to the left shift in BlockIdGetBlockNumber not
properly casting its operand.

Please see the proposed change in patch.

Thanks


get-block-number.patch
Description: Binary data


Re: casting operand to proper type in BlockIdGetBlockNumber

2022-03-03 Thread Zhihong Yu
On Thu, Mar 3, 2022 at 7:44 AM Tom Lane  wrote:

> Zhihong Yu  writes:
> > In test output, I saw:
> > src/backend/utils/adt/tid.c:112:16: runtime error: left shift of 65535 by
> > 16 places cannot be represented in type 'int'
>
> What compiler is that?
>
> regards, tom lane
>
Hi,
Jenkins build is alma8-clang12-asan

So it is clang12 on alma.

Cheers


Re: casting operand to proper type in BlockIdGetBlockNumber

2022-03-03 Thread Zhihong Yu
On Thu, Mar 3, 2022 at 8:24 AM Tom Lane  wrote:

> Zhihong Yu  writes:
> > On Thu, Mar 3, 2022 at 7:44 AM Tom Lane  wrote:
> >> Zhihong Yu  writes:
> >>> In test output, I saw:
> >>> src/backend/utils/adt/tid.c:112:16: runtime error: left shift of 65535
> by
> >>> 16 places cannot be represented in type 'int'
>
> > Jenkins build is alma8-clang12-asan
>
> Oh, I misread this as a compile-time warning, but it must be from ASAN.
> Was the test case one of your own, or just our normal regression tests?
>
> (I think the code is indeed incorrect, but I'm wondering why this hasn't
> been reported before.  It's been like that for a long time.)
>
> regards, tom lane
>
Hi,
The Jenkins test is ported from contrib/postgres_fdw/sql/postgres_fdw.sql -
so theoretically PG would see the same error for clang12 on Alma.

Here were a few lines prior to the sanitizer complaint:

ts1|pid123867|:30045 2022-03-02 01:47:57.098 UTC [124161] STATEMENT:
 CREATE TRIGGER trig_row_before
ts1|pid123867|:30045BEFORE INSERT OR UPDATE OR DELETE ON rem1
ts1|pid123867|:30045FOR EACH ROW EXECUTE PROCEDURE
trigger_data(23,'skidoo');
ts1|pid123867|:30045 2022-03-02 01:47:57.106 UTC [124161] ERROR:  function
trigger_data() does not exist
ts1|pid123867|:30045 2022-03-02 01:47:57.106 UTC [124161] STATEMENT:
 CREATE TRIGGER trig_row_after
ts1|pid123867|:30045AFTER INSERT OR UPDATE OR DELETE ON rem1
ts1|pid123867|:30045FOR EACH ROW EXECUTE PROCEDURE
trigger_data(23,'skidoo');

I think the ASAN build on Alma is able to detect errors such as this.

Cheers


Re: casting operand to proper type in BlockIdGetBlockNumber

2022-03-03 Thread Zhihong Yu
On Thu, Mar 3, 2022 at 9:13 AM Tom Lane  wrote:

> Zhihong Yu  writes:
> > On Thu, Mar 3, 2022 at 8:24 AM Tom Lane  wrote:
> >> Oh, I misread this as a compile-time warning, but it must be from ASAN.
> >> Was the test case one of your own, or just our normal regression tests?
>
> > The Jenkins test is ported from
> contrib/postgres_fdw/sql/postgres_fdw.sql -
> > so theoretically PG would see the same error for clang12 on Alma.
>
> Hmph.  I tried enabling -fsanitize=undefined here, and I get some
> complaints about passing null pointers to memcmp and the like, but
> nothing about this shift (tested with clang 12.0.1 on RHEL8 as well
> as clang 13.0.0 on Fedora 35).  What compiler switches are being
> used exactly?
>
> regards, tom lane
>
Hi,
This is from (internal Jenkins) build log:

CMAKE_C_FLAGS  -Werror -fno-strict-aliasing -Wall -msse4.2 -Winvalid-pch
-pthread -DBOOST_BIND_NO_PLACEHOLDERS
-DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DROCKSDB_PLATFORM_POSIX
-DBOOST_ERROR_CODE_HEADER_ONLY -march=ivybridge -mcx16
-DYB_COMPILER_TYPE=clang12 -DYB_COMPILER_VERSION=12.0.1
-DROCKSDB_LIB_IO_POSIX -DSNAPPY -DLZ4 -DZLIB -mno-avx -mno-bmi -mno-bmi2
-mno-fma -D__STDC_FORMAT_MACROS -Wno-deprecated-declarations
-DGFLAGS=gflags  -Werror=enum-compare  -Werror=switch -Werror=return-type
 -Werror=string-plus-int -Werror=return-stack-address
-Werror=implicit-fallthrough -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS
-Wthread-safety-analysis -Wshorten-64-to-32 -ggdb -O1
-fno-omit-frame-pointer -DFASTDEBUG -Wno-ambiguous-member-template
-Wimplicit-fallthrough -Qunused-arguments -stdlib=libc++
-D_GLIBCXX_EXTERN_TEMPLATE=0 -nostdinc++ -stdlib=libc++
-D_GLIBCXX_EXTERN_TEMPLATE=0 -nostdinc++ -shared-libasan -fsanitize=address
-DADDRESS_SANITIZER -fsanitize=undefined -fno-sanitize-recover=all
-fno-sanitize=alignment,vptr -fsanitize-recover=float-cast-overflow
-fsanitize-blacklist=... -fPIC

I would suggest trying out the build on Alma Linux.

FYI


Re: casting operand to proper type in BlockIdGetBlockNumber

2022-03-03 Thread Zhihong Yu
On Thu, Mar 3, 2022 at 1:11 PM Andres Freund  wrote:

> Hi,
>
> On 2022-03-03 15:31:51 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2022-03-03 14:00:14 -0500, Tom Lane wrote:
> > > For the buildfarm, I could enable it on flaviventris? That runs an
> > > experimental gcc, without optimization (whereas serinus runs with
> > > optimization). Which seems reasonable to combine with sanitizers?
> >
> > Dunno.  I already found out that my Mac laptop (w/ clang 13) detects
> > the numeric.c problem but not any of the other ones.  The messages
> > on RHEL8 cite where the system headers declare memcmp and friends
> > with "attribute nonnull", so I'm betting that Apple's headers lack
> > that annotation.
>
> The sanitizers are documented to work best on linux... As flaviventris runs
> linux, so I'm not sure what your concern is?
>
> I think basically newer glibc versions have more annotations, so ubsan will
> have more things to fail against. So it'd be good to have a fairly
> regularly
> updated OS.
>
>
> > I also tried adding the various -m switches shown in Zhihong's
> > CFLAGS setting, but that still didn't repro the Alma warning
> > for me.
>
> The compilation flags make it look like it's from a run of yugabyte's fork,
> rather than plain postgres.
>
Hi,
I should mention that, the PG subtree in yugabyte is currently aligned with
PG 11.
There have been backports from PG 12, but code related to tid.c
and block.h, etc is the same with upstream PG.

The fdw tests are backported from PG as well.


>
> The message says:
> src/backend/utils/adt/tid.c:112:16: runtime error: left shift of 65535 by
> 16 places cannot be represented in type 'int'
>
> Afaics that means bi_hi is 65535. So either we're dealing with a very large
> relation or BlockIdGetBlockNumber() is getting passed InvalidBlockNumber?
>
> It might be enough to do something like
> SELECT * FROM pg_class WHERE ctid = '(65535, 17)';
> to trigger the problem?
>

The above syntax is not currently supported in yugabyte.

FYI


timestamp for query in pg_stat_statements

2022-03-05 Thread Zhihong Yu
Hi,
Looking at pg_stat_statements, there doesn't seem to be timestamp column
for when the underlying query is performed.
Since the same query can be run multiple times, the absence of timestamp
column makes finding the most recent invocation of the query difficult.

Does it make sense to add such a column ?

Thanks


Re: timestamp for query in pg_stat_statements

2022-03-06 Thread Zhihong Yu
On Sat, Mar 5, 2022 at 8:17 PM Julien Rouhaud  wrote:

> On Sat, Mar 05, 2022 at 06:10:44PM -0800, Zhihong Yu wrote:
> >
> > Looking at pg_stat_statements, there doesn't seem to be timestamp column
> > for when the underlying query is performed.
> > Since the same query can be run multiple times, the absence of timestamp
> > column makes finding the most recent invocation of the query difficult.
> >
> > Does it make sense to add such a column ?
>
> I don't think it would be that helpful.  Why do you need to only know when
> the
> last execution was, but no other details among every other cumulated
> counters?
>
> You should consider using some other tools on top of pg_stat_statements
> (and
> possibly other extensions) that performs snapshot regularly and can show
> you
> all the details at the given frequency.
>
Hi,
The current design of pg_stat_statements doesn't have the concept of
observation.

By observation I mean scenarios where pg_stat_statements is read by people
doing performance tuning.

Here is one example (same query, q, is concerned).
At t1, q is performed, leaving one row in pg_stat_statements with mean_time
of 10.
At t2, operator examines pg_stat_statements and provides some suggestion
for tuning q (which is carried out).
At t3, q is run again leaving the row with mean_time of 9.
Now with two rows for q, how do we know whether the row written at t3 is
prior to or after implementing the suggestion made at t2 ?

Using other tools, a lot of the information in pg_stat_statements would be
duplicated to distinguish the counters recorded w.r.t. tuning operation.

I think pg_stat_statements can do better in this regard.

Cheers


Re: timestamp for query in pg_stat_statements

2022-03-06 Thread Zhihong Yu
On Sun, Mar 6, 2022 at 6:23 PM Julien Rouhaud  wrote:

> On Sun, Mar 06, 2022 at 12:37:00PM -0800, Zhihong Yu wrote:
> > The current design of pg_stat_statements doesn't have the concept of
> > observation.
> >
> > By observation I mean scenarios where pg_stat_statements is read by
> people
> > doing performance tuning.
> >
> > Here is one example (same query, q, is concerned).
> > At t1, q is performed, leaving one row in pg_stat_statements with
> mean_time
> > of 10.
> > At t2, operator examines pg_stat_statements and provides some suggestion
> > for tuning q (which is carried out).
> > At t3, q is run again leaving the row with mean_time of 9.
> > Now with two rows for q, how do we know whether the row written at t3 is
> > prior to or after implementing the suggestion made at t2 ?
>
> Well, if pg_stat_statements is read by people doing performance tuning
> shouldn't they be able to distinguish which query text is the one they just
> rewrote?
>
Did I mention rewriting ?
As you said below, adding index is one way of tuning which doesn't involve
rewriting.

Please also note that the person tuning the query may be different from the
person writing the query.
So some information in pg_stat_statements (or related table) is needed to
disambiguate.


> > Using other tools, a lot of the information in pg_stat_statements would
> be
> > duplicated to distinguish the counters recorded w.r.t. tuning operation.
>
> Yes, which is good.  Your example was about rewriting a query, but what
> about
> other possibilities like creating an index, changing
> hash_mem_multiplier...?
> You won't get a new record and the mean_time will mostly be useless.
>
> If you take regular snapshot, then you will be able to compute the
> mean_time
> for each interval, and that will answer bot this scenario and the one in
> your
> example (since the 2nd row won't exist in the earlier snapshots).
>


refreshing query id for pg_stat_statements based on comment in sql

2022-03-07 Thread Zhihong Yu
Hi,
Currently the query id for pg_stat_statements gets calculated based on the
parse nodes specifics.
This means that the user cannot add a comment to a SQL query to test
something. (though some other RDBMS allows this practice).

Consider this use case: for query q, admin looks at stats and performs some
optimization (without changing the query). Admin adds / modifies the
comment for q - now the query becomes q'. If query id doesn't change, there
still would be one row in pg_stat_statements which makes it difficult to
gauge the effectiveness of the tuning.

I want to get opinion from the community whether adding / changing comment
in SQL query should result in new query id for pg_stat_statements.

Cheers


Re: support for MERGE

2022-03-07 Thread Zhihong Yu
On Mon, Mar 7, 2022 at 12:04 PM Álvaro Herrera 
wrote:

> On Mon, Mar 7, 2022, at 4:59 PM, Álvaro Herrera wrote:
>
> I attach v13 here.  This version includes a 0002 that's does the split of
> nodeModifyTable.c routines, then 0003 implements MERGE on top of that.
> 0001 is as before.
>
>
> In 0002, I've opted to have two separate structs.  One is the
> ModifyTableContext, as before, but I've removed 'tupleid' and 'oldtuple'
> (the specification of the tuple to delete/update) because it makes
> ExecCrossPartitionUpdate cleaner if we pass them separately.  The second
> struct is UpdateContext, which is used inside ExecUpdate as output data
> from its subroutines.  This is also for the benefit of cross-partition
> updates.
>
> I'm pretty happy with how this turned out; even without considering MERGE,
> it seems to me that ExecUpdate benefits from being shorter.
>
Hi,
For v13-0003-MERGE-SQL-Command-following-SQL-2016.patch :

+* Reset per-tuple memory context to free any expression evaluation
+* storage allocated in the previous cycle.
+*/
+   ResetExprContext(econtext);

Why is the memory cleanup done in the next cycle ? Can the cleanup be done
at the end of the current cycle ?

+* XXX Should this explain why MERGE has the same logic as
UPDATE?

I think explanation should be given.

Cheers


minor change for create_list_bounds()

2022-03-08 Thread Zhihong Yu
Hi,
I was looking at commit db632fbca and noticed that,
in create_list_bounds(), if index is added to boundinfo->interleaved_parts
in the first if statement, there is no need to perform the second check
involving call to partition_bound_accepts_nulls().

Here is a short patch.

Cheers


list-bound-accepts-null.patch
Description: Binary data


Re: support for MERGE

2022-03-09 Thread Zhihong Yu
On Wed, Mar 9, 2022 at 9:38 AM Alvaro Herrera 
wrote:

> I attach MERGE v14.  This includes a fix from Amit Langote for the
> problem I described previously, with EvalPlanQual not working correctly.
> (I had failed to short-circuit the cross-partition update correctly.)
> Now the test case is enabled again, and it passes with the correct
> output.
>
> 0001 is as before; the only change is that I've added a commit message.
> Since this just moves some code around, I intend to get it pushed very
> soon.
>
> 0002 is the ExecUpdate/ExecDelete split.  I cleaned up the struct
> definitions a bit more, but it's pretty much as in the previous version.
>
> 0003 is the actual MERGE implementation.  Many previous review comments
> have been handled, and several other things are cleaner than before.
> I haven't made any changes for work_mem handling in ModifyTable's
> transition tables, though, and haven't attempted to address concurrent
> INSERT.
>
> --
> Álvaro Herrera   39°49'30"S 73°17'W  —
> https://www.EnterpriseDB.com/


Hi,
For v14-0002-Split-ExecUpdate-and-ExecDelete-in-reusable-piec.patch :

+   TupleTableSlot *insertProjectedTuple;

With `insert` at the beginning of the variable name, someone may think it
is a verb. How about naming the variable projectedTupleFromInsert (or
something similar) ?

+   if (!ExecBRDeleteTriggers(context->estate, context->epqstate,
+ resultRelInfo, tupleid, oldtuple,
+ epqreturnslot))
+   /* some trigger made the delete a no-op; let caller know */
+   return false;

It seems you can directly return the value returned
from ExecBRDeleteTriggers().

+   if (!ExecBRUpdateTriggers(context->estate, context->epqstate,
+ resultRelInfo, tupleid, oldtuple, slot))
+   /* some trigger made the update a no-op; let caller know */
+   return false;

Similar comment as above (the comment is good and should be kept).

Cheers


Re: generic plans and "initial" pruning

2022-03-11 Thread Zhihong Yu
Hi,
w.r.t. v5-0003-Teach-AcquireExecutorLocks-to-skip-locking-pruned.patch :

(pruning steps containing expressions that can be computed before
before the executor proper has started)

the word 'before' was repeated.

For ExecInitParallelPlan():

+   char   *execlockrelsinfo_data;
+   char   *execlockrelsinfo_space;

the content of execlockrelsinfo_data is copied into execlockrelsinfo_space.
I wonder if having one of execlockrelsinfo_data and
execlockrelsinfo_space suffices.

Cheers


Re: BufferAlloc: don't take two simultaneous locks

2022-03-13 Thread Zhihong Yu
On Sun, Mar 13, 2022 at 3:25 AM Yura Sokolov 
wrote:

> В Пт, 11/03/2022 в 17:21 +0900, Kyotaro Horiguchi пишет:
> > At Fri, 11 Mar 2022 15:49:49 +0900 (JST), Kyotaro Horiguchi <
> horikyota@gmail.com> wrote in
> > > At Fri, 11 Mar 2022 15:30:30 +0900 (JST), Kyotaro Horiguchi <
> horikyota@gmail.com> wrote in
> > > > Thanks!  I looked into dynahash part.
> > > >
> > > >  struct HASHHDR
> > > >  {
> > > > -   /*
> > > > -* The freelist can become a point of contention in
> high-concurrency hash
> > > >
> > > > Why did you move around the freeList?
>
> This way it is possible to allocate just first partition, not all 32
> partitions.
>
> >
> > Then I looked into bufmgr part.  It looks fine to me but I have some
> > comments on code comments.
> >
> > >* To change the association of a valid buffer, we'll
> need to have
> > >* exclusive lock on both the old and new mapping
> partitions.
> > >   if (oldFlags & BM_TAG_VALID)
> >
> > We don't take lock on the new mapping partition here.
>
> Thx, fixed.
>
> > +* Clear out the buffer's tag and flags.  We must do this to
> ensure that
> > +* linear scans of the buffer array don't think the buffer is
> valid. We
> > +* also reset the usage_count since any recency of use of the
> old content
> > +* is no longer relevant.
> > +*
> > +* We are single pinner, we hold buffer header lock and exclusive
> > +* partition lock (if tag is valid). Given these statements it
> is safe to
> > +* clear tag since no other process can inspect it to the moment.
> >
> > This comment is a merger of the comments from InvalidateBuffer and
> > BufferAlloc.  But I think what we need to explain here is why we
> > invalidate the buffer here despite of we are going to reuse it soon.
> > And I think we need to state that the old buffer is now safe to use
> > for the new tag here.  I'm not sure the statement is really correct
> > but clearing-out actually looks like safer.
>
> I've tried to reformulate the comment block.
>
> >
> > > Now it is safe to use victim buffer for new tag.  Invalidate the
> > > buffer before releasing header lock to ensure that linear scans of
> > > the buffer array don't think the buffer is valid.  It is safe
> > > because it is guaranteed that we're the single pinner of the buffer.
> > > That pin also prevents the buffer from being stolen by others until
> > > we reuse it or return it to freelist.
> >
> > So I want to revise the following comment.
> >
> > -* Now it is safe to use victim buffer for new tag.
> > +* Now reuse victim buffer for new tag.
> > >* Make sure BM_PERMANENT is set for buffers that must be
> written at every
> > >* checkpoint.  Unlogged buffers only need to be written at
> shutdown
> > >* checkpoints, except for their "init" forks, which need to be
> treated
> > >* just like permanent relations.
> > >*
> > >* The usage_count starts out at 1 so that the buffer can
> survive one
> > >* clock-sweep pass.
> >
> > But if you think the current commet is fine, I don't insist on the
> > comment chagnes.
>
> Used suggestion.
>
> Fr, 11/03/22 Yura Sokolov wrote:
> > В Пт, 11/03/2022 в 15:49 +0900, Kyotaro Horiguchi пишет:
> > > BufTableDelete considers both reuse and !reuse cases but
> > > BufTableInsert doesn't and always does HASH_ASSIGN.  That looks
> > > odd. We should use HASH_ENTER here.  Thus I think it is more
> > > reasonable that HASH_ENTRY uses the stashed entry if exists and
> > > needed, or returns it to freelist if exists but not needed.
> > >
> > > What do you think about this?
> >
> > Well... I don't like it but I don't mind either.
> >
> > Code in HASH_ENTER and HASH_ASSIGN cases differs much.
> > On the other hand, probably it is possible to merge it carefuly.
> > I'll try.
>
> I've merged HASH_ASSIGN into HASH_ENTER.
>
> As in previous letter, three commits are concatted to one file
> and could be applied with `git am`.
>
> ---
>
> regards
>
> Yura Sokolov
> Postgres Professional
> y.soko...@postgrespro.ru
> funny.fal...@gmail.com


Hi,
In the description:

There is no need to hold both lock simultaneously.

both lock -> both locks

+* We also reset the usage_count since any recency of use of the old

recency of use -> recent use

+BufTableDelete(BufferTag *tagPtr, uint32 hashcode, bool reuse)

Later on, there is code:

+   reuse ? HASH_REUSE : HASH_REMOVE,

Can flag (such as HASH_REUSE) be passed to BufTableDelete() instead of bool
? That way, flag can be used directly in the above place.

+   longnalloced;   /* number of entries initially allocated for

nallocated isn't very long. I think it would be better to name the
field nallocated* '*nallocated'.

+   sum += hashp->hctl->freeList[i].nalloced;
+   sum -= hashp->hctl->freeList[i].nfree;

I think it would be better to calculate 

Re: BufferAlloc: don't take two simultaneous locks

2022-03-13 Thread Zhihong Yu
On Sun, Mar 13, 2022 at 3:27 PM Yura Sokolov 
wrote:

> В Вс, 13/03/2022 в 07:05 -0700, Zhihong Yu пишет:
> >
> > Hi,
> > In the description:
> >
> > There is no need to hold both lock simultaneously.
> >
> > both lock -> both locks
>
> Thanks.
>
> > +* We also reset the usage_count since any recency of use of the old
> >
> > recency of use -> recent use
>
> Thanks.
>
> > +BufTableDelete(BufferTag *tagPtr, uint32 hashcode, bool reuse)
> >
> > Later on, there is code:
> >
> > +   reuse ? HASH_REUSE : HASH_REMOVE,
> >
> > Can flag (such as HASH_REUSE) be passed to BufTableDelete() instead of
> bool ? That way, flag can be used directly in the above place.
>
> No.
> BufTable* functions are created to abstract Buffer Table from dynahash.
> Pass of HASH_REUSE directly will break abstraction.
>
> > +   longnalloced;   /* number of entries initially allocated
> for
> >
> > nallocated isn't very long. I think it would be better to name the field
> nallocated 'nallocated'.
>
> It is debatable.
> Why not num_allocated? allocated_count? number_of_allocations?
> Same points for nfree.
> `nalloced` is recognizable and unambiguous. And there are a lot
> of `*alloced` in the postgresql's source, so this one will not
> be unusual.
>
> I don't see the need to make it longer.
>
> But if someone supports your point, I will not mind to changing
> the name.
>
> > +   sum += hashp->hctl->freeList[i].nalloced;
> > +   sum -= hashp->hctl->freeList[i].nfree;
> >
> > I think it would be better to calculate the difference between nalloced
> and nfree first, then add the result to sum (to avoid overflow).
>
> Doesn't really matter much, because calculation must be valid
> even if all nfree==0.
>
> I'd rather debate use of 'long' in dynahash at all: 'long' is
> 32bit on 64bit Windows. It is better to use 'Size' here.
>
> But 'nelements' were 'long', so I didn't change things. I think
> it is place for another patch.
>
> (On the other hand, dynahash with 2**31 elements is at least
> 512GB RAM... we doubtfully trigger problem before OOM killer
> came. Does Windows have an OOM killer?)
>
> > Subject: [PATCH 3/3] reduce memory allocation for non-partitioned
> dynahash
> >
> > memory allocation -> memory allocations
>
> For each dynahash instance single allocation were reduced.
> I think, 'memory allocation' is correct.
>
> Plural will be
> reduce memory allocations for non-partitioned dynahashes
> ie both 'allocations' and 'dynahashes'.
> Am I wrong?
>
> Hi,
bq. reduce memory allocation for non-partitioned dynahash

It seems the following is clearer:

reduce one memory allocation for every non-partitioned dynahash

Cheers


Re: simplifying foreign key/RI checks

2022-03-14 Thread Zhihong Yu
On Mon, Mar 14, 2022 at 1:33 AM Amit Langote 
wrote:

> On Tue, Jan 18, 2022 at 3:30 PM Amit Langote 
> wrote:
> > v13 is attached.
>
> I noticed that the recent 641f3dffcdf's changes to
> get_constraint_index() made it basically unusable for this patch's
> purposes.
>
> Reading in the thread that led to 641f3dffcdf why
> get_constraint_index() was changed the way it was, I invented in the
> attached updated patch a get_fkey_constraint_index() that is local to
> ri_triggers.c for use by the new ri_ReferencedKeyExists(), replacing
> get_constraint_index() that no longer gives it the index it's looking
> for.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

Hi,
+   partkey_isnull[j] = (key_nulls[k] == 'n' ? true :
false);

The above can be shortened as:

  partkey_isnull[j] = key_nulls[k] == 'n';

+* May neeed to cast each of the individual values of the foreign
key

neeed -> need

Cheers


Re: Concurrent deadlock scenario with DROP INDEX on partitioned index

2022-03-16 Thread Zhihong Yu
On Wed, Mar 16, 2022 at 11:20 AM Jimmy Yih  wrote:

> Tom Lane  wrote:
> > 1. RangeVarCallbackForDropRelation can get called more than once
> > during a lookup (in case of concurrent rename and suchlike).
> > If so it needs to be prepared to drop the lock(s) it got last time.
> > You have not implemented any such logic.  This doesn't seem hard
> > to fix, just store the OID list into struct DropRelationCallbackState.
>
> Fixed in attached patch. We added the OID list to the
> DropRelationCallbackState as you suggested.
>
> > 2. I'm not sure you have fully thought through the interaction
> > with the subsequent "is_partition" stanza.   If the target is
> > an intermediate partitioned index, don't we need to acquire lock
> > on its parent index before starting to lock children?  (It may
> > be that that stanza is already in the wrong place relative to
> > the table-locking stanza it currently follows, but not sure.)
>
> It's not required to acquire lock on a possible parent index because
> of the restriction where we can only run DROP INDEX on the top-most
> partitioned index.
>
> > 3. Calling find_all_inheritors with lockmode NoLock, and then
> > locking those relation OIDs later, is both unsafe and code-wasteful.
> > Just tell find_all_inheritors to get the locks you want.
>
> Fixed in attached patch.
>
> > 4. This code will invoke find_all_inheritors on InvalidOid if
> > IndexGetRelation fails.  It needs to be within the if-test
> > just above.
>
> Fixed in attached patch.
>
> > 5. Reading classform again at this point in the function is
> > not merely inefficient, but outright wrong, because we
> > already released the syscache entry.  Use the local variable.
>
> Fixed in attached patch. Added another local variable
> is_partitioned_index to store the classform value. The main reason we
> need the classform is because the existing relkind and
> expected_relkind local variables would only show RELKIND_INDEX whereas
> we needed exactly RELKIND_PARTITIONED_INDEX.
>
> > 6. You've not paid enough attention to updating existing comments,
> > particularly the header comment for RangeVarCallbackForDropRelation.
>
> Fixed in attached patch. Updated the header comment and aggregated our
> introduced comment to another relative comment slightly above the
> proposed locking section.
>
> > Actually though, maybe you *don't* want to do this in
> > RangeVarCallbackForDropRelation.  Because of point 2, it might be
> > better to run find_all_inheritors after we've successfully
> > identified and locked the direct target relation, ie do it back
> > in RemoveRelations.  I've not thought hard about that, but it's
> > attractive if only because it'd mean you don't have to fix point 1.
>
> We think that RangeVarCallbackForDropRelation is probably the most
> correct function to place the fix. It would look a bit out-of-place
> being in RemoveRelations seeing how there's already relative DROP
> INDEX code in RangeVarCallbackForDropRelation. With point 2 explained
> and point 1 addressed, the fix seems to look okay in
> RangeVarCallbackForDropRelation.
>
> Thanks for the feedback.  Attached new patch with feedback addressed.
>
> --
> Jimmy Yih (VMware)
> Gaurab Dey (VMware)


Hi,
For RangeVarCallbackForDropRelation():

+   LockRelationOid(indexRelationOid, heap_lockmode);

Since the above is called for both if and else blocks, it can be lifted
outside.

Cheers


Re: a misbehavior of partition row movement (?)

2022-03-18 Thread Zhihong Yu
On Fri, Mar 18, 2022 at 9:38 AM Alvaro Herrera 
wrote:

> I rebased this patch; v15 attached.  Other than fixing the (very large)
> conflicts due to nodeModifyTable.c rework, the most important change is
> moving GetAncestorResultRels into execMain.c and renaming it to have the
> "Exec-" prefix.  The reason is that what this code is doing is affect
> struct ResultRelInfo, which is owned by execMain.c, so it seemed bogus
> to do that in nodeModifyTable.c and then let execMain.c's
> ExecCloseResultRelations() do the cleanup.  I added a little commentary
> in the latter routine too.
>
> --
> Álvaro Herrera  Valdivia, Chile  —
> https://www.EnterpriseDB.com/
> "World domination is proceeding according to plan"(Andrew Morton)
>

Hi,

+#define AFTER_TRIGGER_OFFSET   0x07FF  /* must be low-order
bits */
+#define AFTER_TRIGGER_DONE 0x8000
+#define AFTER_TRIGGER_IN_PROGRESS  0x4000

Is it better if the order of AFTER_TRIGGER_DONE
and AFTER_TRIGGER_IN_PROGRESS is swapped (for the ordinal values to be
sequential) ?

+#define AFTER_TRIGGER_CP_UPDATE0x0800

It would be better to add a comment for this constant, explaining what CP
means (cross partition).

+   if (!partRel->rd_rel->relispartition)
+   elog(ERROR, "cannot find ancestors of a non-partition result
relation");

It would be better to include the relation name in the error message.

+   /* Ignore the root ancestor, because ...?? */

Please fill out the remainder of the comment.

+   if (!trig->tgisclone &&
+   RI_FKey_trigger_type(trig->tgfoid) == RI_TRIGGER_PK)
+   {
+   has_noncloned_fkey = true;

The variable says fkey, but the constant is not RI_TRIGGER_FK. Maybe add a
comment explaining why.

Cheers


Re: freeing bms explicitly

2022-03-21 Thread Zhihong Yu
>
> Hi,
> I was looking at calls to bms_free() in PG code.
>
> e.g. src/backend/commands/publicationcmds.c line 362
>
> bms_free(bms);
>
> The above is just an example, there're other calls to bms_free().
> Since the bms is allocated from some execution context, I wonder why this
> call is needed.
>
> When the underlying execution context wraps up, isn't the bms freed ?
>
> Cheers
>
>
>


Re: freeing bms explicitly

2022-03-21 Thread Zhihong Yu
On Mon, Mar 21, 2022 at 3:05 PM Tom Lane  wrote:

> Zhihong Yu  writes:
> >> I was looking at calls to bms_free() in PG code.
> >> e.g. src/backend/commands/publicationcmds.c line 362
> >>  bms_free(bms);
> >> The above is just an example, there're other calls to bms_free().
> >> Since the bms is allocated from some execution context, I wonder why
> this
> >> call is needed.
> >>
> >> When the underlying execution context wraps up, isn't the bms freed ?
>
> Yeah, that's kind of pointless --- and the pfree(rfnode) after it is even
> more pointless, since it'll free only the top node of that expression
> tree.  Not to mention the string returned by TextDatumGetCString, and
> whatever might be leaked during the underlying catalog accesses.
>
> If we were actually worried about transient space consumption of this
> function, it'd be necessary to do a lot more than this.  It doesn't
> look to me like it's worth worrying about though -- it doesn't seem
> like it could be hit more than once per query in normal cases.
>
> regards, tom lane
>

Thanks Tom for replying.

What do you think of the following patch ?

Cheers


rfcolumn-free.patch
Description: Binary data


Re: Window Function "Run Conditions"

2022-03-22 Thread Zhihong Yu
On Tue, Mar 22, 2022 at 3:24 PM David Rowley  wrote:

> On Thu, 26 Aug 2021 at 14:54, Andy Fan  wrote:
> >
> > On Thu, Aug 19, 2021 at 2:35 PM David Rowley 
> wrote:
> > >
> > > On Thu, 19 Aug 2021 at 00:20, Andy Fan 
> wrote:
> > > > In the current master, the result is:
> > > >
> > > >  empno | salary | c | dr
> > > > ---++---+
> > > >  8 |   6000 | 4 |  1
> > >
> > > > In the patched version, the result is:
> > > >
> > > >  empno | salary | c | dr
> > > > ---++---+
> > > >  8 |   6000 | 1 |  1
> > >
> > > Thanks for taking it for a spin.
> > >
> > > That's a bit unfortunate.  I don't immediately see how to fix it other
> > > than to restrict the optimisation to only apply when there's a single
> > > WindowClause. It might be possible to relax it further and only apply
> > > if it's the final window clause to be evaluated, but in those cases,
> > > the savings are likely to be much less anyway as some previous
> > > WindowAgg will have exhausted all rows from its subplan.
> >
> > I am trying to hack the select_active_windows function to make
> > sure the WindowClause with Run Condition clause to be the last one
> > to evaluate (we also need to consider more than 1 window func has
> > run condition), at that time, the run condition clause is ready already.
> >
> > However there are two troubles in this direction: a).  This may conflict
> > with "the windows that need the same sorting are adjacent in the list."
> > b). "when two or more windows are order-equivalent then all peer rows
> > must be presented in the same order in all of them. .. (See General Rule
> 4 of
> >  in SQL2008 - SQL2016.)"
> >
> > In summary, I am not sure if it is correct to change the execution Order
> > of WindowAgg freely.
>
> Thanks for looking at that.
>
> My current thoughts are that it just feels a little too risky to
> adjust the comparison function that sorts the window clauses to pay
> attention to the run-condition.
>
> We would need to ensure that there's just a single window function
> with a run condition as it wouldn't be valid for there to be multiple.
> It would be easy enough to ensure we only push quals into just 1
> window clause, but that and meddling with the evaluation order has
> trade-offs.  To do that properly, we'd likely want to consider the
> costs when deciding which window clause would benefit from having
> quals pushed the most.  Plus, if we start messing with the evaluation
> order then we'd likely really want some sort of costing to check if
> pushing a qual and adjusting the evaluation order is actually cheaper
> than not pushing the qual and keeping the clauses in the order that
> requires the minimum number of sorts.   The planner is not really
> geared up for costing things like that properly, that's why we just
> assume the order with the least sorts is best. In reality that's often
> not going to be true as an index may exist and we might want to
> evaluate a clause first if we could get rid of a sort and index scan
> instead. Sorting the window clauses based on their SortGroupClause is
> just the best we can do for now at that stage in planning.
>
> I think it's safer to just disable the optimisation when there are
> multiple window clauses.  Multiple matching clauses are merged
> already, so it's perfectly valid to have multiple window functions,
> it's just they must share the same window clause.  I don't think
> that's terrible as with the major use case that I have in mind for
> this, the window function is only added to limit the number of rows.
> In most cases I can imagine, there'd be no reason to have an
> additional window function with different frame options.
>
> I've attached an updated patch.
>
Hi,
The following code seems to be common between if / else blocks (w.r.t.
wfunc_left) of find_window_run_conditions().

+   op = get_opfamily_member(opinfo->opfamily_id,
+opinfo->oplefttype,
+opinfo->oprighttype,
+newstrategy);
+
+   newopexpr = (OpExpr *) make_opclause(op,
+opexpr->opresulttype,
+opexpr->opretset,
+otherexpr,
+(Expr *) wfunc,
+opexpr->opcollid,
+opexpr->inputcollid);
+   newopexpr->opfuncid = get_opcode(op);
+
+   *keep_original = true;
+   runopexpr = newopexpr;

It would be nice if this code can be shared.

+   WindowClause *wclause = (WindowClause *)
+   list_nth(subquery->windowClause,
+wfunc->winref - 1);

The code would be more readable if list_nth() is indented.

+   /* Check the left side of the OpExpr */


Re: freeing bms explicitly

2022-03-22 Thread Zhihong Yu
On Tue, Mar 22, 2022 at 8:45 PM Amit Kapila  wrote:

> On Tue, Mar 22, 2022 at 3:39 AM Zhihong Yu  wrote:
> >
> > On Mon, Mar 21, 2022 at 3:05 PM Tom Lane  wrote:
> >>
> >> Zhihong Yu  writes:
> >> >> I was looking at calls to bms_free() in PG code.
> >> >> e.g. src/backend/commands/publicationcmds.c line 362
> >> >>  bms_free(bms);
> >> >> The above is just an example, there're other calls to bms_free().
> >> >> Since the bms is allocated from some execution context, I wonder why
> this
> >> >> call is needed.
> >> >>
> >> >> When the underlying execution context wraps up, isn't the bms freed ?
> >>
> >> Yeah, that's kind of pointless --- and the pfree(rfnode) after it is
> even
> >> more pointless, since it'll free only the top node of that expression
> >> tree.  Not to mention the string returned by TextDatumGetCString, and
> >> whatever might be leaked during the underlying catalog accesses.
> >>
> >> If we were actually worried about transient space consumption of this
> >> function, it'd be necessary to do a lot more than this.  It doesn't
> >> look to me like it's worth worrying about though -- it doesn't seem
> >> like it could be hit more than once per query in normal cases.
> >>
> >> regards, tom lane
> >
> >
> > Thanks Tom for replying.
> >
> > What do you think of the following patch ?
> >
>
> Your patch looks good to me. I have found one more similar instance in
> the same file and changed that as well accordingly. Let me know what
> you think of the attached?
>
> --
> With Regards,
> Amit Kapila.
>

Hi, Amit:
The patch looks good to me.

Cheers


Re: freeing bms explicitly

2022-03-23 Thread Zhihong Yu
On Tue, Mar 22, 2022 at 9:04 PM Zhihong Yu  wrote:

>
>
> On Tue, Mar 22, 2022 at 8:45 PM Amit Kapila 
> wrote:
>
>> On Tue, Mar 22, 2022 at 3:39 AM Zhihong Yu  wrote:
>> >
>> > On Mon, Mar 21, 2022 at 3:05 PM Tom Lane  wrote:
>> >>
>> >> Zhihong Yu  writes:
>> >> >> I was looking at calls to bms_free() in PG code.
>> >> >> e.g. src/backend/commands/publicationcmds.c line 362
>> >> >>  bms_free(bms);
>> >> >> The above is just an example, there're other calls to bms_free().
>> >> >> Since the bms is allocated from some execution context, I wonder
>> why this
>> >> >> call is needed.
>> >> >>
>> >> >> When the underlying execution context wraps up, isn't the bms freed
>> ?
>> >>
>> >> Yeah, that's kind of pointless --- and the pfree(rfnode) after it is
>> even
>> >> more pointless, since it'll free only the top node of that expression
>> >> tree.  Not to mention the string returned by TextDatumGetCString, and
>> >> whatever might be leaked during the underlying catalog accesses.
>> >>
>> >> If we were actually worried about transient space consumption of this
>> >> function, it'd be necessary to do a lot more than this.  It doesn't
>> >> look to me like it's worth worrying about though -- it doesn't seem
>> >> like it could be hit more than once per query in normal cases.
>> >>
>> >> regards, tom lane
>> >
>> >
>> > Thanks Tom for replying.
>> >
>> > What do you think of the following patch ?
>> >
>>
>> Your patch looks good to me. I have found one more similar instance in
>> the same file and changed that as well accordingly. Let me know what
>> you think of the attached?
>>
>> --
>> With Regards,
>> Amit Kapila.
>>
>
> Hi, Amit:
> The patch looks good to me.
>
> Cheers
>

Tom:
 Do you mind taking a look at the latest patch ?

Thanks


Re: Skip partition tuple routing with constant partition key

2022-03-23 Thread Zhihong Yu
On Wed, Mar 23, 2022 at 5:52 AM Amit Langote 
wrote:

> Hi Greg,
>
> On Wed, Mar 16, 2022 at 6:54 AM Greg Stark  wrote:
> > There are a whole lot of different patches in this thread.
> >
> > However this last one https://commitfest.postgresql.org/37/3270/
> > created by Amit seems like a fairly straightforward optimization that
> > can be evaluated on its own separately from the others and seems quite
> > mature. I'm actually inclined to set it to "Ready for Committer".
>
> Thanks for taking a look at it.
>
> > Incidentally a quick read-through of the patch myself and the only
> > question I have is how the parameters of the adaptive algorithm were
> > chosen. They seem ludicrously conservative to me
>
> Do you think CACHE_BOUND_OFFSET_THRESHOLD_TUPS (1000) is too high?  I
> suspect maybe you do.
>
> Basically, the way this works is that once set, cached_bound_offset is
> not reset until encountering a tuple for which cached_bound_offset
> doesn't give the correct partition, so the threshold doesn't matter
> when the caching is active.  However, once reset, it is not again set
> till the threshold number of tuples have been processed and that too
> only if the binary searches done during that interval appear to have
> returned the same bound offset in succession a number of times.  Maybe
> waiting a 1000 tuples to re-assess that is a bit too conservative,
> yes.  I guess even as small a number as 10 is fine here?
>
> I've attached an updated version of the patch, though I haven't
> changed the threshold constant.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>
> On Wed, Mar 16, 2022 at 6:54 AM Greg Stark  wrote:
> >
> > There are a whole lot of different patches in this thread.
> >
> > However this last one https://commitfest.postgresql.org/37/3270/
> > created by Amit seems like a fairly straightforward optimization that
> > can be evaluated on its own separately from the others and seems quite
> > mature. I'm actually inclined to set it to "Ready for Committer".
> >
> > Incidentally a quick read-through of the patch myself and the only
> > question I have is how the parameters of the adaptive algorithm were
> > chosen. They seem ludicrously conservative to me and a bit of simple
> > arguments about how expensive an extra check is versus the time saved
> > in the boolean search should be easy enough to come up with to justify
> > whatever values make sense.
>
> Hi,

+ * Threshold of the number of tuples to need to have been processed before
+ * maybe_cache_partition_bound_offset() (re-)assesses whether caching must
be

The first part of the comment should be:

Threshold of the number of tuples which need to have been processed

+   (double) pd->n_tups_inserted / pd->n_offset_changed > 1)

I think division can be avoided - the condition can be written as:

  pd->n_tups_inserted > pd->n_offset_changed

+   /* Check if the value is below the high bound */

high bound -> upper bound

Cheers


Re: ExecRTCheckPerms() and many prunable partitions

2022-03-23 Thread Zhihong Yu
On Wed, Mar 23, 2022 at 12:03 AM Amit Langote 
wrote:

> On Mon, Mar 14, 2022 at 4:36 PM Amit Langote 
> wrote:
> > Also needed fixes when rebasing.
>
> Needed another rebase.
>
> As the changes being made with the patch are non-trivial and the patch
> hasn't been reviewed very significantly since Alvaro's comments back
> in Sept 2021 which I've since addressed, I'm thinking of pushing this
> one into the version 16 dev cycle.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

Hi,
For patch 1:

bq. makes permissions-checking needlessly expensive when many inheritance
children are added to the range range

'range' is repeated in the above sentence.

+ExecCheckOneRelPerms(RelPermissionInfo *perminfo)

Since RelPermissionInfo is for one relation, I think the 'One' in func name
can be dropped.

+   else/* this isn't a child result rel */
+   resultRelInfo->ri_RootToChildMap = NULL;
...
+   resultRelInfo->ri_RootToChildMapValid = true;

Should the assignment of true value be moved into the if block (in the else
block, ri_RootToChildMap is assigned NULL) ?

+   /* Looks like the RTE doesn't, so try to find it the hard way. */

doesn't -> doesn't know

Cheers


runtime error copying oids field

2020-11-30 Thread Zhihong Yu
Hi,
In our testPgRegressTrigger test log, I saw the following (this was for a
relatively old version of PG):

197859  [ts-1]
 ../../../../../../src/postgres/src/backend/commands/indexcmds.c:1062:22:
runtime error: null pointer passed as argument 2, which is declared to
never be null
197860  [ts-1]
 /opt/yb-build/brew/linuxbrew-20181203T161736v9/include/string.h:43:28:
note: nonnull attribute specified here
197861  [ts-1]   #0 0xacbd0f in DefineIndex
$YB_SRC_ROOT/src/postgres/src/backend/commands/../../../../../../src/postgres/src/backend/commands/indexcmds.c:1062:4
197862  [ts-1]   #1 0x11441e0 in ProcessUtilitySlow
$YB_SRC_ROOT/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/utility.c:1436:7
197863  [ts-1]   #2 0x114141f in standard_ProcessUtility
$YB_SRC_ROOT/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/utility.c:962:4
197864  [ts-1]   #3 0x1140b65 in YBProcessUtilityDefaultHook
$YB_SRC_ROOT/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/utility.c:3574:3
197865  [ts-1]   #4 0x7f47d4950eac in pgss_ProcessUtility
$YB_SRC_ROOT/src/postgres/contrib/pg_stat_statements/../../../../../src/postgres/contrib/pg_stat_statements/pg_stat_statements.c:1120:4

This was the line runtime error was raised:

memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);

>From RelationBuildPartitionDesc we can see that:

if (nparts > 0)
{
PartitionBoundInfo boundinfo;
int*mapping;
int next_index = 0;

result->oids = (Oid *) palloc0(nparts * sizeof(Oid));

The cause was oids field was not assigned due to nparts being 0.
This is verified by additional logging added just prior to the memcpy call.

I want to get the community's opinion on whether a null check should be
added prior to the memcpy() call.

Cheers


Re: runtime error copying oids field

2020-11-30 Thread Zhihong Yu
Hi,
See attached patch which is along the line Alvaro outlined.

Cheers

On Mon, Nov 30, 2020 at 3:01 PM Alvaro Herrera 
wrote:

> On 2020-Nov-30, Zhihong Yu wrote:
>
> > This was the line runtime error was raised:
> >
> > memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
> >
> > From RelationBuildPartitionDesc we can see that:
> >
> >   if (nparts > 0)
> >   {
> >   PartitionBoundInfo boundinfo;
> >   int*mapping;
> >   int next_index = 0;
> >
> >   result->oids = (Oid *) palloc0(nparts * sizeof(Oid));
> >
> > The cause was oids field was not assigned due to nparts being 0.
> > This is verified by additional logging added just prior to the memcpy
> call.
> >
> > I want to get the community's opinion on whether a null check should be
> > added prior to the memcpy() call.
>
> As far as I understand, we do want to avoid memcpy's of null pointers;
> see [1].
>
> In this case I think it'd be sane to skip the complete block, not just
> the memcpy, something like
>
> diff --git a/src/backend/commands/indexcmds.c
> b/src/backend/commands/indexcmds.c
> index ca24620fd0..d35deb433a 100644
> --- a/src/backend/commands/indexcmds.c
> +++ b/src/backend/commands/indexcmds.c
> @@ -1163,15 +1163,17 @@ DefineIndex(Oid relationId,
>
> if (partitioned)
> {
> +   PartitionDesc partdesc;
> +
> /*
>  * Unless caller specified to skip this step (via ONLY),
> process each
>  * partition to make sure they all contain a corresponding
> index.
>  *
>  * If we're called internally (no stmt->relation), recurse
> always.
>  */
> -   if (!stmt->relation || stmt->relation->inh)
> +   partdesc = RelationGetPartitionDesc(rel);
> +   if ((!stmt->relation || stmt->relation->inh) &&
> partdesc->nparts > 0)
> {
> -   PartitionDesc partdesc =
> RelationGetPartitionDesc(rel);
> int nparts = partdesc->nparts;
> Oid*part_oids = palloc(sizeof(Oid)
> * nparts);
> boolinvalidate_parent = false;
>
> [1]
> https://www.postgresql.org/message-id/flat/20200904023648.GB3426768%40rfd.leadboat.com
>


v01-check-nparts-for-defining-idx.patch
Description: Binary data


Re: runtime error copying oids field

2020-11-30 Thread Zhihong Yu
Alvaro, et al:
Please let me know how to proceed with the patch.

Running test suite with the patch showed no regression.

Cheers

On Mon, Nov 30, 2020 at 3:24 PM Zhihong Yu  wrote:

> Hi,
> See attached patch which is along the line Alvaro outlined.
>
> Cheers
>
> On Mon, Nov 30, 2020 at 3:01 PM Alvaro Herrera 
> wrote:
>
>> On 2020-Nov-30, Zhihong Yu wrote:
>>
>> > This was the line runtime error was raised:
>> >
>> > memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
>> >
>> > From RelationBuildPartitionDesc we can see that:
>> >
>> >   if (nparts > 0)
>> >   {
>> >   PartitionBoundInfo boundinfo;
>> >   int*mapping;
>> >   int next_index = 0;
>> >
>> >   result->oids = (Oid *) palloc0(nparts * sizeof(Oid));
>> >
>> > The cause was oids field was not assigned due to nparts being 0.
>> > This is verified by additional logging added just prior to the memcpy
>> call.
>> >
>> > I want to get the community's opinion on whether a null check should be
>> > added prior to the memcpy() call.
>>
>> As far as I understand, we do want to avoid memcpy's of null pointers;
>> see [1].
>>
>> In this case I think it'd be sane to skip the complete block, not just
>> the memcpy, something like
>>
>> diff --git a/src/backend/commands/indexcmds.c
>> b/src/backend/commands/indexcmds.c
>> index ca24620fd0..d35deb433a 100644
>> --- a/src/backend/commands/indexcmds.c
>> +++ b/src/backend/commands/indexcmds.c
>> @@ -1163,15 +1163,17 @@ DefineIndex(Oid relationId,
>>
>> if (partitioned)
>> {
>> +   PartitionDesc partdesc;
>> +
>> /*
>>  * Unless caller specified to skip this step (via ONLY),
>> process each
>>  * partition to make sure they all contain a
>> corresponding index.
>>  *
>>  * If we're called internally (no stmt->relation),
>> recurse always.
>>  */
>> -   if (!stmt->relation || stmt->relation->inh)
>> +   partdesc = RelationGetPartitionDesc(rel);
>> +   if ((!stmt->relation || stmt->relation->inh) &&
>> partdesc->nparts > 0)
>> {
>> -   PartitionDesc partdesc =
>> RelationGetPartitionDesc(rel);
>> int nparts = partdesc->nparts;
>> Oid*part_oids =
>> palloc(sizeof(Oid) * nparts);
>> boolinvalidate_parent = false;
>>
>> [1]
>> https://www.postgresql.org/message-id/flat/20200904023648.GB3426768%40rfd.leadboat.com
>>
>


Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-04 Thread Zhihong Yu
Hi, David:
For nodeResultCache.c :

+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0

I think it would be safer if the comparison is enclosed in parentheses (in
case the macro appears in composite condition).

+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey
*key1,
+ const ResultCacheKey *key2)

Since key2 is not used, maybe name it unused_key ?

+   /* Make a guess at a good size when we're not given a valid size. */
+   if (size == 0)
+   size = 1024;

Should the default size be logged ?

+   /* Update the memory accounting */
+   rcstate->mem_used -= freed_mem;

Maybe add an assertion that mem_used is >= 0 after the decrement (there is
an assertion in remove_cache_entry however, that assertion is after another
decrement).

+ * 'specialkey', if not NULL, causes the function to return false if the
entry
+ * entry which the key belongs to is removed from the cache.

duplicate entry (one at the end of first line and one at the beginning of
second line).

For cache_lookup(), new key is allocated before checking
whether rcstate->mem_used > rcstate->mem_upperlimit. It seems new entries
should probably have the same size.
Can we check whether upper limit is crossed (assuming the addition of new
entry) before allocating new entry ?

+   if (unlikely(!cache_reduce_memory(rcstate, key)))
+   return NULL;

Does the new entry need to be released in the above case?

Cheers


Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-04 Thread Zhihong Yu
There are two blocks with almost identical code (second occurrence in
cache_store_tuple):

+   if (rcstate->mem_used > rcstate->mem_upperlimit)
+   {

It would be nice if the code can be extracted to a method and shared.

node->rc_status = RC_END_OF_SCAN;
return NULL;
}
else

There are several places where the else keyword for else block can be
omitted because the if block ends with return.
This would allow the code in else block to move leftward (for easier
reading).

   if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))

I noticed that right_hashfn isn't used. Would this cause some warning from
the compiler (for some compiler the warning would be treated as error) ?
Maybe NULL can be passed as the last parameter. The return value
of get_op_hash_functions would keep the current meaning (find both hash
fn's).

rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;

Maybe (in subsequent patch) GUC variable can be introduced for tuning the
constant 0.98.

For +paraminfo_get_equal_hashops :

+   else
+   Assert(false);

Add elog would be good for debugging.

Cheers

On Fri, Dec 4, 2020 at 5:09 PM Zhihong Yu  wrote:

> Hi, David:
> For nodeResultCache.c :
>
> +#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
>
> I think it would be safer if the comparison is enclosed in parentheses (in
> case the macro appears in composite condition).
>
> +ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey
> *key1,
> + const ResultCacheKey *key2)
>
> Since key2 is not used, maybe name it unused_key ?
>
> +   /* Make a guess at a good size when we're not given a valid size. */
> +   if (size == 0)
> +   size = 1024;
>
> Should the default size be logged ?
>
> +   /* Update the memory accounting */
> +   rcstate->mem_used -= freed_mem;
>
> Maybe add an assertion that mem_used is >= 0 after the decrement (there is
> an assertion in remove_cache_entry however, that assertion is after another
> decrement).
>
> + * 'specialkey', if not NULL, causes the function to return false if the
> entry
> + * entry which the key belongs to is removed from the cache.
>
> duplicate entry (one at the end of first line and one at the beginning of
> second line).
>
> For cache_lookup(), new key is allocated before checking
> whether rcstate->mem_used > rcstate->mem_upperlimit. It seems new entries
> should probably have the same size.
> Can we check whether upper limit is crossed (assuming the addition of new
> entry) before allocating new entry ?
>
> +   if (unlikely(!cache_reduce_memory(rcstate, key)))
> +   return NULL;
>
> Does the new entry need to be released in the above case?
>
> Cheers
>


Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-06 Thread Zhihong Yu
> +   /* Make a guess at a good size when we're not given a valid size. */
> +   if (size == 0)
> +   size = 1024;
>
> Should the default size be logged ?

> I'm not too sure if I know what you mean here. Should it be a power of
> 2? It is.  Or do you mean I shouldn't use a magic number?

Using 1024 seems to be fine. I meant logging the default value of 1024 so
that user / dev can have better idea the actual value used (without looking
at the code).

>> Or do you think 98% is not a good number?

Since you have played with Result Cache, I would trust your judgment in
choosing the percentage.
It is fine not to expose this constant until the need arises.

Cheers

On Sun, Dec 6, 2020 at 5:15 PM David Rowley  wrote:

> On Sat, 5 Dec 2020 at 16:51, Zhihong Yu  wrote:
> >
> > There are two blocks with almost identical code (second occurrence in
> cache_store_tuple):
> >
> > +   if (rcstate->mem_used > rcstate->mem_upperlimit)
> > +   {
> > It would be nice if the code can be extracted to a method and shared.
>
> It's true, they're *almost* identical.  I quite like the fact that one
> of the cases can have an unlikely() macro in there. It's pretty
> unlikely that we'd go into cache overflow just when adding a new cache
> entry. work_mem would likely have to be set to a couple of dozen bytes
> for that to happen. 64k is the lowest it can be set.  However, I
> didn't really check to see if having that unlikely() macro increases
> performance.  I've changed things locally here to add a new function
> named cache_check_mem(). I'll keep that for now, but I'm not sure if
> there's enough code there to warrant a function. The majority of the
> additional lines are from the comment being duplicated.
>
> > node->rc_status = RC_END_OF_SCAN;
> > return NULL;
> > }
> > else
> >
> > There are several places where the else keyword for else block can be
> omitted because the if block ends with return.
> > This would allow the code in else block to move leftward (for easier
> reading).
>
> OK, I've removed the "else" in places where it can be removed.
>
> >if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
> >
> > I noticed that right_hashfn isn't used. Would this cause some warning
> from the compiler (for some compiler the warning would be treated as error)
> ?
> > Maybe NULL can be passed as the last parameter. The return value of
> get_op_hash_functions would keep the current meaning (find both hash fn's).
>
> It's fine not to use output parameters.  It's not the only one in the
> code base ignoring one from that very function. See
> execTuplesHashPrepare().
>
> > rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
> >
> > Maybe (in subsequent patch) GUC variable can be introduced for tuning
> the constant 0.98.
>
> I don't think exposing such knobs for users to adjust is a good idea.
> Can you think of a case where you'd want to change it? Or do you think
> 98% is not a good number?
>
> >
> > For +paraminfo_get_equal_hashops :
> >
> > +   else
> > +   Assert(false);
>
> I'm keen to leave it like it is.  I don't see any need to bloat the
> compiled code with an elog(ERROR).
>
> There's a comment in RelOptInfo.lateral_vars that mentions:
>
> /* LATERAL Vars and PHVs referenced by rel */
>
> So, if anyone, in the future, wants to add some other node type to
> that list then they'll have a bit more work to do. Plus, I'm only
> doing the same as what's done in create_lateral_join_info().
>
> I'll run the updated patch which includes the cache_check_mem()
> function for a bit and post an updated patch to the main thread a bit
> later.
>
> Thanks for having a look at this patch.
>
> David
>


Re: Parallel Inserts in CREATE TABLE AS

2020-12-06 Thread Zhihong Yu
Hi, Bharath :

+   (void) SetCurrentCommandIdUsedForWorker();
+   myState->output_cid = GetCurrentCommandId(false);

SetCurrentCommandIdUsedForWorker already has void as return type. The
'(void)' is not needed.

+* rd_createSubid is marked invalid, otherwise, the table is
+* not allowed to extend by the workers.

nit: to extend by the workers -> to be extended by the workers

For IsParallelInsertInCTASAllowed, logic is inside 'if (IS_CTAS(into))'
block.
You can return false when (!IS_CTAS(into)) - this would save some
indentation for the body.

+   if (rel && rel->relpersistence != RELPERSISTENCE_TEMP)
+   allowed = true;

Similarly, when the above condition doesn't hold, you can return false
directly - reducing the next if condition to 'if (queryDesc)'.

+   if (!(ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+   plannedstmt->parallelModeNeeded &&
+   plannedstmt->planTree &&
+   IsA(plannedstmt->planTree, Gather) &&
+   plannedstmt->planTree->lefttree &&
+   plannedstmt->planTree->lefttree->parallel_aware &&
+   plannedstmt->planTree->lefttree->parallel_safe))

The composite condition is negated. Maybe you can write without negation:

+   return (ps && IsA(ps, GatherState) && !ps->ps_ProjInfo &&
+   plannedstmt->parallelModeNeeded &&
+   plannedstmt->planTree &&
+   IsA(plannedstmt->planTree, Gather) &&
+   plannedstmt->planTree->lefttree &&
+   plannedstmt->planTree->lefttree->parallel_aware &&
+   plannedstmt->planTree->lefttree->parallel_safe)

+* Write out the number of tuples this worker has inserted. Leader will
use
+* it to inform to the end client.

'inform to the end client' -> 'inform the end client' (without to)

Cheers

On Sun, Dec 6, 2020 at 4:37 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> Thanks Amit for the review comments.
>
> On Sat, Dec 5, 2020 at 4:27 PM Amit Kapila 
> wrote:
> >
> > > If I'm not wrong, I think currently we have no exec nodes for DDLs.
> > > I'm not sure whether we would like to introduce one for this.
> >
> > Yeah, I am also not in favor of having an executor node for CTAS but
> > OTOH, I also don't like the way you have jammed the relevant
> > information in generic PlanState. How about keeping it in GatherState
> > and initializing it in ExecCreateTableAs() after the executor start.
> > You are already doing special treatment for the Gather node in
> > ExecCreateTableAs (via IsParallelInsertInCTASAllowed) so we can as
> > well initialize the required information in GatherState in
> > ExecCreateTableAs. I think that might help in reducing the special
> > treatment for intoclause at different places.
> >
>
> Done. Added required info to GatherState node. While this reduced the
> changes at many other places, but had to pass the into clause and
> object id to ExecInitParallelPlan() as we do not send GatherState node
> to it. Hope that's okay.
>
> >
> > Few other assorted comments:
> > =
> > 1.
> > This looks a bit odd. The function name
> > 'IsParallelInsertInCTASAllowed' suggests that it just checks whether
> > parallelism is allowed but it is internally changing the plan_rows. It
> > might be better to do this separately if the parallelism is allowed.
> >
>
> Changed.
>
> >
> > 2.
> >  static void ExecShutdownGatherWorkers(GatherState *node);
> > -
> > +static void ExecParallelInsertInCTAS(GatherState *node);
> >
> > Spurious line removal.
> >
>
> Corrected.
>
> >
> > 3.
> > The comment and code appear a bit misleading as the function seems to
> > shutdown the workers rather than waiting for them to finish. How about
> > using something like below:
> >
> > /*
> > * Next, accumulate buffer and WAL usage.  (This must wait for the workers
> > * to finish, or we might get incomplete data.)
> > */
> > if (nworkers > 0)
> > {
> > int i;
> >
> > /* Wait for all vacuum workers to finish */
> > WaitForParallelWorkersToFinish(lps->pcxt);
> >
> > for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> > InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
> > }
> >
> > This is how it works for parallel vacuum.
> >
>
> Done.
>
> >
> > 4.
> > The above comment doesn't seem to convey what it intends to convey.
> > How about changing it slightly as: "We don't compute the
> > parallel_tuple_cost for CTAS because the number of tuples that are
> > transferred from workers to the gather node is zero as each worker
> > parallelly inserts the tuples that are resulted from its chunk of plan
> > execution. This change may make the parallel plan cheap among all
> > other plans, and influence the planner to consider this parallel
> > plan."
> >
>
> Changed.
>
> >
> > Then, we can also have an Assert for path->path.rows to zero for the
> CTAS case.
> >
>
> We can not have Assert(path->path.rows == 0), because we are 

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

2020-12-07 Thread Zhihong Yu
>>  just removing the logic that has the
soft upper limit and just have it do cache evictions after each
allocation after the cache first fills

Yeah - having one fewer limit would simplify the code.

Cheers

On Mon, Dec 7, 2020 at 5:27 PM David Rowley  wrote:

> On Mon, 7 Dec 2020 at 14:25, Zhihong Yu  wrote:
> >
> > > +   /* Make a guess at a good size when we're not given a valid size.
> */
> > > +   if (size == 0)
> > > +   size = 1024;
> > >
> > > Should the default size be logged ?
> >
> > > I'm not too sure if I know what you mean here. Should it be a power of
> > > 2? It is.  Or do you mean I shouldn't use a magic number?
> >
> > Using 1024 seems to be fine. I meant logging the default value of 1024
> so that user / dev can have better idea the actual value used (without
> looking at the code).
>
> Oh, right. In EXPLAIN ANALYZE. Good point.  I wonder if that's going
> to be interesting enough to show.
>
> > >> Or do you think 98% is not a good number?
> >
> > Since you have played with Result Cache, I would trust your judgment in
> choosing the percentage.
> > It is fine not to expose this constant until the need arises.
>
> I did some experimentation with different values on a workload that
> never gets a cache hit. and just always evicts the oldest entry.
> There's only very slight changes in performance between 90%, 98% and
> 100% with 1MB work_mem.
>
> times in milliseconds measured over 60 seconds on each run.
>
> 90% 98% 100%
> run1 2318 2339 2344
> run2 2339 2333 2309
> run3 2357 2339 2346
> avg (ms) 2338 2337 2333
>
> Perhaps this is an argument for just removing the logic that has the
> soft upper limit and just have it do cache evictions after each
> allocation after the cache first fills.
>
> Setup: same tables as [1]
> alter table hundredk alter column hundredk set (n_distinct = 10);
> analyze hundredk;
> alter system set work_mem = '1MB';
> select pg_reload_conf();
>
> Query
> select count(*) from hundredk hk inner join lookup l on hk.hundredk = l.a;
>
> David
>
> [1]
> https://www.postgresql.org/message-id/caaphdvrpcqyqdwergywx8j+2dlungxu+fosbq1uscxrunyx...@mail.gmail.com
>


Re: Parallel Inserts in CREATE TABLE AS

2020-12-09 Thread Zhihong Yu
Hi,

+   if (!OidIsValid(col->collOid) &&
+   type_is_collatable(col->typeName->typeOid))
+   ereport(ERROR,
...
+   attrList = lappend(attrList, col);

Should attrList be freed when ereport is called ?

+   query->CTASParallelInsInfo &= CTAS_PARALLEL_INS_UNDEF;

Since CTAS_PARALLEL_INS_UNDEF is 0, isn't the above equivalent to assigning
the value of 0 ?

Cheers

On Wed, Dec 9, 2020 at 5:43 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Wed, Dec 9, 2020 at 10:16 AM Dilip Kumar  wrote:
> >
> > On Tue, Dec 8, 2020 at 6:24 PM Hou, Zhijie 
> wrote:
> > >
> > > > > I'm not quite sure how to address this. Can we not allow the
> planner
> > > > > to consider that the select is for CTAS and check only after the
> > > > > planning is done for the Gather node and other checks?
> > > > >
> > > >
> > > > IIUC, you are saying that we should not influence the cost of gather
> node
> > > > even when the insertion would be done by workers? I think that
> should be
> > > > our fallback option anyway but that might miss some paths to be
> considered
> > > > parallel where the cost becomes more due to parallel_tuple_cost (aka
> tuple
> > > > transfer cost). I think the idea is we can avoid the tuple transfer
> cost
> > > > only when Gather is the top node because only at that time we can
> push
> > > > insertion down, right? How about if we have some way to detect the
> same
> > > > before calling generate_useful_gather_paths()? I think when we are
> calling
> > > > apply_scanjoin_target_to_paths() in grouping_planner(), if the
> > > > query_level is 1, it is for CTAS, and it doesn't have a chance to
> create
> > > > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we
> can
> > > > probably assume that the Gather will be top_node. I am not sure
> about this
> > > > but I think it is worth exploring.
> > > >
> > >
> > > I took a look at the parallel insert patch and have the same idea.
> > > https://commitfest.postgresql.org/31/2844/
> > >
> > >  * Consider generating Gather or Gather Merge paths.  We must
> only do this
> > >  * if the relation is parallel safe, and we don't do it for
> child rels to
> > >  * avoid creating multiple Gather nodes within the same plan.
> We must do
> > >  * this after all paths have been generated and before
> set_cheapest, since
> > >  * one of the generated paths may turn out to be the cheapest
> one.
> > >  */
> > > if (rel->consider_parallel && !IS_OTHER_REL(rel))
> > > generate_useful_gather_paths(root, rel, false);
> > >
> > > IMO Gatherpath created here seems the right one which can possible
> ignore parallel cost if in CTAS.
> > > But We need check the following parse option which will create path to
> be the parent of Gatherpath here.
> > >
> > > if (root->parse->rowMarks)
> > > if (limit_needed(root->parse))
> > > if (root->parse->sortClause)
> > > if (root->parse->distinctClause)
> > > if (root->parse->hasWindowFuncs)
> > > if (root->parse->groupClause || root->parse->groupingSets ||
> root->parse->hasAggs || root->root->hasHavingQual)
> > >
> >
> > Yeah, and as I pointed earlier, along with this we also need to
> > consider that the RelOptInfo must be the final target(top level rel).
> >
>
> Attaching v10 patch set that includes the change suggested above for
> ignoring parallel tuple cost and also few more test cases. I split the
> patch as per Amit's suggestion. v10-0001 contains parallel inserts
> code without planner tuple cost changes and test cases. v10-0002 has
> required changes for ignoring planner tuple cost calculations.
>
> Please review it further.
>
> After the review and addressing all the comments, I plan to make some
> code common so that it can be used for Parallel Inserts in REFRESH
> MATERIALIZED VIEW. Thoughts?
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Deleting older versions in unique indexes to avoid page splits

2020-12-09 Thread Zhihong Yu
Hi,
In v11-0001-Pass-down-logically-unchanged-index-hint.patch :

+   if (hasexpression)
+   return false;
+
+   return true;

The above can be written as 'return !hasexpression;'

For +index_unchanged_by_update_var_walker:

+ * Returns true when Var that appears within allUpdatedCols located.

the sentence seems incomplete.

Currently the return value of index_unchanged_by_update_var_walker() is the
reverse of index_unchanged_by_update().
Maybe it is easier to read the code if their return values have the same
meaning.

Cheers

On Wed, Dec 9, 2020 at 5:13 PM Peter Geoghegan  wrote:

> On Tue, Dec 1, 2020 at 2:18 PM Peter Geoghegan  wrote:
> > On Tue, Dec 1, 2020 at 1:50 AM Heikki Linnakangas 
> wrote:
> > > This is a wholly new concept with a lot of heuristics. It's a lot of
> > > swallow.
>
> Attached is v11, which cleans everything up around the tableam
> interface. There are only two patches in v11, since the tableam
> refactoring made it impossible to split the second patch into a heapam
> patch and nbtree patch (there is no reduction in functionality
> compared to v10).
>
> Most of the real changes in v11 (compared to v10) are in heapam.c.
> I've completely replaced the table_compute_xid_horizon_for_tuples()
> interface with a new interface that supports all existing requirements
> (from index deletions that support LP_DEAD deletion), while also
> supporting these new requirements (e.g. bottom-up index deletion). So
> now heap_compute_xid_horizon_for_tuples() becomes
> heap_compute_delete_for_tuples(), which has different arguments but
> the same overall structure. All of the new requirements can now be
> thought of as additive things that we happen to use for nbtree
> callers, that could easily also be used in other index AMs later on.
> This means that there is a lot less new code in heapam.c.
>
> Prefetching of heap blocks for the new bottom-up index deletion caller
> now works in the same way as it has worked in
> heap_compute_xid_horizon_for_tuples() since Postgres 12 (more or
> less). This is a significant improvement compared to my original
> approach.
>
> Chaning heap_compute_xid_horizon_for_tuples() rather than adding a
> sibling function started to make sense when v10 of the patch taught
> regular nbtree LP_DEAD deletion (the thing that has been around since
> PostgreSQL 8.2) to add "extra" TIDs to check in passing, just in case
> we find that they're also deletable -- why not just have one function?
> It also means that hash indexes and GiST indexes now use the
> heap_compute_delete_for_tuples() function, though they get the same
> behavior as before. I imagine that it would be pretty straightforward
> to bring that same benefit to those other index AMs, since their
> implementations are already derived from the nbtree implementation of
> LP_DEAD deletion (and because adding extra TIDs to check in passing in
> these other index AMs should be fairly easy).
>
> > > > +} TM_IndexDelete;
> >
> > > > +} TM_IndexStatus;
> >
> > > Is it really significantly faster to have two arrays? If you had just
> > > one array, each element would be only 12 bytes long. That's not much
> > > smaller than TM_IndexDeletes, which is 8 bytes.
>
> I kept this facet of the design in v11, following some deliberation. I
> found that the TID sort operation appeared quite prominently in
> profiles, and so the optimizations mostly seemed to still make sense.
> I also kept one shellsort specialization. However, I removed the
> second specialized sort implementation, so at least there is only one
> specialization now (which is small anyway). I found that the second
> sort specialization (added to heapam.c in v10) really wasn't pulling
> its weight, even in more extreme cases of the kind that justified the
> optimizations in the first place.
>
> --
> Peter Geoghegan
>


query on smallint array column

2020-12-13 Thread Zhihong Yu
Hi,
I was experimenting with the following query.

create table sint1(k int primary key, arr smallint[]);
create index s1 on sint1(arr);
insert into sint1 select s, array[s*s, s] FROM generate_series(1, 10) AS s;
select * from sint1 where arr @> array[4];
ERROR:  operator does not exist: smallint[] @> integer[]
LINE 1: select * from sint1 where arr @> array[4];
  ^
HINT:  No operator matches the given name and argument types. You might
need to add explicit type casts.
---

I wonder if someone can enlighten me on the correct way to perform the type
cast.

Thanks


Re: query on smallint array column

2020-12-13 Thread Zhihong Yu
Thanks Pavel for fast response.

On Sun, Dec 13, 2020 at 9:51 AM Pavel Stehule 
wrote:

> Hi
>
> ne 13. 12. 2020 v 18:42 odesílatel Zhihong Yu  napsal:
>
>> Hi,
>> I was experimenting with the following query.
>>
>> create table sint1(k int primary key, arr smallint[]);
>> create index s1 on sint1(arr);
>> insert into sint1 select s, array[s*s, s] FROM generate_series(1, 10) AS
>> s;
>> select * from sint1 where arr @> array[4];
>> ERROR:  operator does not exist: smallint[] @> integer[]
>> LINE 1: select * from sint1 where arr @> array[4];
>>   ^
>> HINT:  No operator matches the given name and argument types. You might
>> need to add explicit type casts.
>> ---
>>
>> I wonder if someone can enlighten me on the correct way to perform the
>> type cast.
>>
>
>
> postgres=# select * from sint1 where arr @> array[4::smallint];
> ┌───┬┐
> │ k │  arr   │
> ╞═══╪╡
> │ 2 │ {4,2}  │
> │ 4 │ {16,4} │
> └───┴┘
> (2 rows)
>
> postgres=# select * from sint1 where arr @> array[4]::smallint[];
> ┌───┬┐
> │ k │  arr   │
> ╞═══╪╡
> │ 2 │ {4,2}  │
> │ 4 │ {16,4} │
> └───┴┘
> (2 rows)
>
> postgres=#
>
>
>>
>> Thanks
>>
>


Re: Parallel Inserts in CREATE TABLE AS

2020-12-14 Thread Zhihong Yu
For set_append_rel_size(), it seems this is the difference
between query_level != 1 and query_level == 1:

+   (root->parent_root->parse->CTASParallelInsInfo &
CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND) &&

Maybe extract the common conditions into its own expression / variable so
that the code is easier to read.

Cheers

On Mon, Dec 14, 2020 at 4:50 AM Hou, Zhijie 
wrote:

> Hi
>
> > Attaching v11 patch set. Please review it further.
>
> Currently with the patch, we can allow parallel CTAS when topnode is
> Gather.
> When top-node is Append and Gather is the sub-node of Append, I think we
> can still enable
> Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such
> as:
>
> Append
> -->Gather
> ->Create table
> ->Seqscan
> -->Gather
> ->create table
> ->Seqscan
>
> And the use case seems common to me, such as:
> select * from A where xxx union all select * from B where xxx;
>
> I attatch a WIP patch which just show the possibility of this feature.
> The patch is based on the latest v11-patch.
>
> What do you think?
>
> Best regards,
> houzj
>
>
>
>
>
>


Re: postgres_fdw - cached connection leaks if the associated user mapping/foreign server is dropped

2020-12-14 Thread Zhihong Yu
Is the following sequence possible ?
In pgfdw_inval_callback():

entry->invalidated = true;
+   have_invalid_connections = true;

At which time the loop in pgfdw_xact_callback() is already running (but
past the above entry).
Then:

+   /* We are done closing all the invalidated connections so reset. */
+   have_invalid_connections = false;

At which time, there is still at least one invalid connection but the
global flag is off.

On Mon, Dec 14, 2020 at 6:39 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> Hi,
>
> As discussed in [1], in postgres_fdw the cached connections to remote
> servers can stay until the lifetime of the local session without
> getting a chance to disconnect (connection leak), if the underlying
> user mapping or foreign server is dropped in another session. Here are
> few scenarios how this can happen:
>
> Use case 1:
> 1) Run a foreign query in session 1 with server 1 and user mapping 1
> 2) Drop user mapping 1 in another session 2, an invalidation message
> gets generated which will have to be processed by all the sessions
> 3) Run the foreign query again in session 1, at the start of txn the
> cached entry gets invalidated via pgfdw_inval_callback() (as part of
> invalidation message processing). Whatever may be the type of foreign
> query (select, update, explain, delete, insert, analyze etc.), upon
> next call to GetUserMapping() from postgres_fdw.c, the cache lookup
> fails(with ERROR:  user mapping not found for "") since the user
> mapping 1 has been dropped in session 2 and the query will also fail
> before reaching GetConnection() where the connections associated with
> the invalidated entries would have got disconnected.
>
> So, the connection associated with invalidated entry would remain
> until the local session exits.
>
> Use case 2:
> 1) Run a foreign query in session 1 with server 1 and user mapping 1
> 2) Try to drop foreign server 1, then we would not be allowed because
> of dependency. Use CASCADE so that dependent objects i.e. user mapping
> 1 and foreign tables get dropped along with foreign server 1.
> 3) Run the foreign query again in session 1, at the start of txn, the
> cached entry gets invalidated via pgfdw_inval_callback() and the query
> fails because there is no foreign table.
>
> Note that the remote connection remains open in session 1 until the
> local session exits.
>
> To solve the above connection leak problem, it looks like the right
> place to close all the invalid connections is pgfdw_xact_callback(),
> once registered, which gets called at the end of every txn in the
> current session(by then all the sub txns also would have been
> finished). Note that if there are too many invalidated entries, then
> the following txn has to close all of them, but that's okay than
> having leaked connections and it's a one time job for the following
> one txn.
>
> Attaching a patch for the same.
>
> Thoughts?
>
> [1] -
> https://www.postgresql.org/message-id/flat/CALj2ACUOQYs%2BssxkxRvZ6Ja5%2Bsfc6a-s_0e-Nv2A591hEyOgiw%40mail.gmail.com
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


range_agg

2020-12-16 Thread Zhihong Yu
Hi,
w.r.t. patch v27.

+* The idea is to prepend underscores as needed until we make a name
that
+* doesn't collide with anything ...

I wonder if other characters (e.g. [a-z0-9]) can be used so that name
without collision can be found without calling truncate_identifier().

+   else if (strcmp(defel->defname, "multirange_type_name") == 0)
+   {
+   if (multirangeTypeName != NULL)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("conflicting or redundant options")));

Maybe make the error message a bit different from occurrences of similar
error message (such as including multirangeTypeName) ?

Thanks


Re: range_agg

2020-12-16 Thread Zhihong Yu
Letting user manually name the multirange (after a few automatic attempts)
seems reasonable.

Cheers

On Wed, Dec 16, 2020 at 3:34 PM Alexander Korotkov 
wrote:

> On Thu, Dec 17, 2020 at 1:03 AM Alexander Korotkov 
> wrote:
> > On Thu, Dec 17, 2020 at 12:54 AM Zhihong Yu  wrote:
> > > +* The idea is to prepend underscores as needed until we make a
> name that
> > > +* doesn't collide with anything ...
> > >
> > > I wonder if other characters (e.g. [a-z0-9]) can be used so that name
> without collision can be found without calling truncate_identifier().
> >
> > Probably.  But multiranges just shares naming logic already existing
> > in arrays.  If we're going to change this, I think we should change
> > this for arrays too.  And this change shouldn't be part of multirange
> > patch.
>
> I gave this another thought.  Now we have facility to name multirange
> types manually.  I think we should give up with underscore naming
> completely.  If both replacing "range" with "mutlirange" in the
> typename and appending "_multirange" to the type name failed (very
> unlikely), then let user manually name the multirange.  Any thoughts?
>
> --
> Regards,
> Alexander Korotkov
>


Re: On login trigger: take three

2020-12-16 Thread Zhihong Yu
Hi,
For EventTriggerOnConnect():

+   PG_CATCH();
+   {
...
+   AbortCurrentTransaction();
+   return;

Should runlist be freed in the catch block ?

+   gettext_noop("In case of errors in the ON client_connection
EVENT TRIGGER procedure, this parameter can be used to disable trigger
activation and provide access to the database."),

I think the text should be on two lines (current line too long).

Cheers


Re: Double partition lock in bufmgr

2020-12-18 Thread Zhihong Yu
Hi,
w.r.t. the code in BufferAlloc(), the pointers are compared.

Should we instead compare the tranche Id of the two LWLock ?

Cheers


Re: proposal: schema variables

2020-12-20 Thread Zhihong Yu
Hi,
I took a look at the rebased patch.

+  varisnotnull
+  boolean
+  
+  
+   True if the schema variable doesn't allow null value. The default
value is false.

I wonder whether the field can be named in positive tense: e.g.
varallowsnull with default of true.

+  vareoxaction
+   n = no action, d = drop the
variable,
+   r = reset the variable to its default value.

Looks like there is only one action allowed. I wonder if there is a
possibility of having two actions at the same time in the future.

+ The PL/pgSQL language has not packages
+ and then it has not package variables and package constants. The

'has not' -> 'has no'

+  a null value. A variable created as NOT NULL and without an
explicitely

explicitely -> explicitly

+   int nnewmembers;
+   Oid*oldmembers;
+   Oid*newmembers;

I wonder if naming nnewmembers newmembercount would be more readable.

For pg_variable_aclcheck:

+   return ACLCHECK_OK;
+   else
+   return ACLCHECK_NO_PRIV;

The 'else' can be omitted.

+ * Ownership check for a schema variables (specified by OID).

'a schema variable' (no s)

For NamesFromList():

+   if (IsA(n, String))
+   {
+   result = lappend(result, n);
+   }
+   else
+   break;

There would be no more name if current n is not a String ?

+* both variants, and returns InvalidOid with not_uniq flag,
when

'and return' (no s)

+   return InvalidOid;
+   }
+   else if (OidIsValid(varoid_without_attr))

'else' is not needed (since the if block ends with return).

For clean_cache_callback(),

+   if (hash_search(schemavarhashtab,
+   (void *) &svar->varid,
+   HASH_REMOVE,
+   NULL) == NULL)
+   elog(DEBUG1, "hash table corrupted");

Maybe add more information to the debug, such as var name.
Should we come out of the while loop in this scenario ?

Cheers


Re: proposal: schema variables

2020-12-20 Thread Zhihong Yu
Hi,
This is continuation of the previous review.

+* We should to use schema variable buffer, when
+* it is available.

'should use' (no to)

+   /* When buffer of used schema variables loaded from shared memory */

A verb seems missing in the above comment.

+   elog(ERROR, "unexpected non-SELECT command in LET ... SELECT");

Since non-SELECT is mentioned, I wonder if the trailing SELECT can be
omitted.

+* some collision can be solved simply here to reduce errors
based
+* on simply existence of some variables. Often error can be
using

simply occurred twice above - I think one should be enough.
If you want to keep the second, it should be 'simple'.

Cheers

On Sun, Dec 20, 2020 at 11:25 AM Zhihong Yu  wrote:

> Hi,
> I took a look at the rebased patch.
>
> +  varisnotnull
> +  boolean
> +  
> +  
> +   True if the schema variable doesn't allow null value. The default
> value is false.
>
> I wonder whether the field can be named in positive tense: e.g.
> varallowsnull with default of true.
>
> +  vareoxaction
> +   n = no action, d = drop the
> variable,
> +   r = reset the variable to its default value.
>
> Looks like there is only one action allowed. I wonder if there is a
> possibility of having two actions at the same time in the future.
>
> + The PL/pgSQL language has not packages
> + and then it has not package variables and package constants. The
>
> 'has not' -> 'has no'
>
> +  a null value. A variable created as NOT NULL and without an
> explicitely
>
> explicitely -> explicitly
>
> +   int nnewmembers;
> +   Oid*oldmembers;
> +   Oid*newmembers;
>
> I wonder if naming nnewmembers newmembercount would be more readable.
>
> For pg_variable_aclcheck:
>
> +   return ACLCHECK_OK;
> +   else
> +   return ACLCHECK_NO_PRIV;
>
> The 'else' can be omitted.
>
> + * Ownership check for a schema variables (specified by OID).
>
> 'a schema variable' (no s)
>
> For NamesFromList():
>
> +   if (IsA(n, String))
> +   {
> +   result = lappend(result, n);
> +   }
> +   else
> +   break;
>
> There would be no more name if current n is not a String ?
>
> +* both variants, and returns InvalidOid with not_uniq flag,
> when
>
> 'and return' (no s)
>
> +   return InvalidOid;
> +   }
> +   else if (OidIsValid(varoid_without_attr))
>
> 'else' is not needed (since the if block ends with return).
>
> For clean_cache_callback(),
>
> +   if (hash_search(schemavarhashtab,
> +   (void *) &svar->varid,
> +   HASH_REMOVE,
> +   NULL) == NULL)
> +   elog(DEBUG1, "hash table corrupted");
>
> Maybe add more information to the debug, such as var name.
> Should we come out of the while loop in this scenario ?
>
> Cheers
>


Re: Weird special case in jsonb_concat()

2020-12-20 Thread Zhihong Yu
Hi,
w.r.t. the patch,

+select '[3]'::jsonb || '{}'::jsonb;
+ ?column?
+--
+ [3, {}]
+(1 row)
+
+select '3'::jsonb || '[]'::jsonb;

Should cases where the empty array precedes non-empty jsonb be added ?

select '[]'::jsonb || '3'::jsonb;
select '{}'::jsonb || '[3]'::jsonb;

Cheers


Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly

2020-12-22 Thread Zhihong Yu
Justin:
For reindex_index() :

+   if (options->tablespaceOid == MyDatabaseTableSpace)
+   options->tablespaceOid = InvalidOid;
...
+   if (set_tablespace &&
+   (options->tablespaceOid != oldTablespaceOid ||
+   (options->tablespaceOid == MyDatabaseTableSpace &&
OidIsValid(oldTablespaceOid

I wonder why the options->tablespaceOid == MyDatabaseTableSpace clause
appears again in the second if statement.
Since the first if statement would assign InvalidOid
to options->tablespaceOid when the first if condition is satisfied.

Cheers


On Tue, Dec 22, 2020 at 1:15 PM Justin Pryzby  wrote:

> On Tue, Dec 22, 2020 at 06:57:41PM +0900, Michael Paquier wrote:
> > On Tue, Dec 22, 2020 at 02:32:05AM -0600, Justin Pryzby wrote:
> > > Also, this one is going to be subsumed by ExecReindex(), so the palloc
> will go
> > > away (otherwise I would ask to pass it in from the caller):
> >
> > Yeah, maybe.  Still you need to be very careful if you have any
> > allocated variables like a tablespace or a path which requires to be
> > in the private context used by ReindexMultipleInternal() or even
> > ReindexRelationConcurrently(), so I am not sure you can avoid that
> > completely.  For now, we could choose the option to still use a
> > palloc(), and then save the options in the private contexts.  Forgot
> > that in the previous version actually.
>
> I can't see why this still uses memset instead of structure assignment.
>
> Now, I really think utility.c ought to pass in a pointer to a local
> ReindexOptions variable to avoid all the memory context, which is
> unnecessary
> and prone to error.
>
> ExecReindex() will set options.tablesapceOid, not a pointer.  Like this.
>
> I also changed the callback to be a ReindexOptions and not a pointer.
>
> --
> Justin
>


Re: [Patch] Optimize dropping of relation buffers using dlist

2020-12-22 Thread Zhihong Yu
Hi,
It is possible to come out of the nested loop without goto.

+   boolcached = true;
...
+* to that fork during recovery.
+*/
+   for (i = 0; i < n && cached; i++)
...
+   if (!cached)
+.  break;

Here I changed the initial value for cached to true so that we enter the
outer loop.
In place of the original goto, we break out of inner loop and exit outer
loop.

Cheers

On Tue, Dec 22, 2020 at 8:22 PM tsunakawa.ta...@fujitsu.com <
tsunakawa.ta...@fujitsu.com> wrote:

> From: Amit Kapila 
> > + /* Get the number of blocks for a relation's fork */
> > + block[i][j] = smgrnblocks(smgr_reln[i], j, &cached);
> > +
> > + if (!cached)
> > + goto buffer_full_scan;
> >
> > Why do we need to use goto here? We can simply break from the loop and
> > then check if (cached && nBlocksToInvalidate <
> > BUF_DROP_FULL_SCAN_THRESHOLD). I think we should try to avoid goto if
> > possible without much complexity.
>
> That's because two for loops are nested -- breaking there only exits the
> inner loop.  (I thought the same as you at first... And I understand some
> people have alergy to goto, I think modest use of goto makes the code
> readable.)
>
>
> Regards
> Takayuki Tsunakawa
>
>
>
>
>


Postgres Spark connector

2020-12-23 Thread Zhihong Yu
Hi,
I searched for Postgres support in Apache Spark.
I found Spark doc related to JDBC.

I wonder if the community is aware of Spark connector for Postgres
(hopefully open source) where predicate involving jsonb columns can be
pushed down.

Your input is appreciated.

Thanks


Re: SQL/JSON: functions

2020-12-25 Thread Zhihong Yu
For 0001-Common-SQL-JSON-clauses-v51.patch :

+   /*  | implementation_defined_JSON_representation_option (BSON, AVRO
etc) */

I don't find implementation_defined_JSON_representation_option in the
patchset. Maybe rephrase the above as a comment
without implementation_defined_JSON_representation_option ?

For getJsonEncodingConst(), should the method error out for the default
case of switch (encoding) ?

0002-SQL-JSON-constructors-v51.patch :

+   Assert(!OidIsValid(collation)); /* result is always an
json[b] type */

an json -> a json

+   /* XXX TEXTOID is default by standard */
+   returning->typid = JSONOID;

Comment doesn't seem to match the assignment.

For json_object_agg_transfn :

+   if (out->len > 2)
+   appendStringInfoString(out, ", ");

Why length needs to be at least 3 (instead of 2) ?

Cheers

On Fri, Dec 25, 2020 at 12:26 PM Nikita Glukhov 
wrote:

> On 17.09.2020 08:41, Michael Paquier wrote:
>
> On Sat, Jul 18, 2020 at 09:24:11AM -0400, Andrew Dunstan wrote:
>
> I think patches 5 and 6 need to be submitted to the next commitfest,
> This is far too much scope creep to be snuck in under the current CF item.
>
> I'll look at patches 1-4.
>
> Even with that, the patch set has been waiting on author for the last
> six weeks, so I am marking it as RwF for now.  Please feel free to
> resubmit.
>
> Attached 51st version of the patches rebased onto current master.
>
>
> There were some shift/reduce conflicts in SQL grammar that have appeared
> after "expr AS keyword" refactoring in 06a7c3154f.  I'm not sure if I resolved
> them correctly.  JSON TEXT pseudotype, introduced in #0006, caused a lot of
> grammar conflicts, so it was replaced with simple explicit pg_catalog.json.
>
> Also new CoercionForm COERCE_SQL_SYNTAX was introduced, and this reminds 
> custom
> function formats that I have used in earlier version of the patches for
> deparsing of SQL/JSON constructor expressions that were based on raw json[b]
> function calls.  These custom function formats were replaced in v43 with
> dedicated executor nodes for SQL/JSON constructors.  So, I'm not sure is it
> worth to try to replace back nodes with new COERCE_SQL_SYNTAX.
>
> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: pg_stat_statements and "IN" conditions

2020-12-26 Thread Zhihong Yu
Hi,
A few comments.

+   "After this number of duplicating constants
start to merge them.",

duplicating -> duplicate

+   foreach(lc, (List *) expr)
+   {
+   Node * subExpr = (Node *) lfirst(lc);
+
+   if (!IsA(subExpr, Const))
+   {
+   allConst = false;
+   break;
+   }
+   }

It seems the above foreach loop (within foreach(temp, (List *) node)) can
be preceded with a check that allConst is true. Otherwise the loop can be
skipped.

+   if (currentExprIdx == pgss_merge_threshold - 1)
+   {
+   JumbleExpr(jstate, expr);
+
+   /*
+* A const expr is already found, so JumbleExpr must
+* record it. Mark it as merged, it will be the
first
+* merged but still present in the statement query.
+*/
+   Assert(jstate->clocations_count > 0);
+   jstate->clocations[jstate->clocations_count -
1].merged = true;
+   currentExprIdx++;
+   }

The above snippet occurs a few times. Maybe extract into a helper method.

Cheers

On Sat, Dec 26, 2020 at 2:45 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote:
> > > On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote:
> > >
> > > I would like to start another thread to follow up on [1], mostly to
> bump up the
> > > topic. Just to remind, it's about how pg_stat_statements jumbling
> ArrayExpr in
> > > queries like:
> > >
> > > SELECT something FROM table WHERE col IN (1, 2, 3, ...)
> > >
> > > The current implementation produces different jumble hash for every
> different
> > > number of arguments for essentially the same query. Unfortunately a
> lot of ORMs
> > > like to generate these types of queries, which in turn leads to
> > > pg_stat_statements pollution. Ideally we want to prevent this and have
> only one
> > > record for such a query.
> > >
> > > As the result of [1] I've identified two highlighted approaches to
> improve this
> > > situation:
> > >
> > > * Reduce the generated ArrayExpr to an array Const immediately, in
> cases where
> > >   all the inputs are Consts.
> > >
> > > * Make repeating Const to contribute nothing to the resulting hash.
> > >
> > > I've tried to prototype both approaches to find out pros/cons and be
> more
> > > specific. Attached patches could not be considered a completed piece
> of work,
> > > but they seem to work, mostly pass the tests and demonstrate the
> point. I would
> > > like to get some high level input about them and ideally make it clear
> what is
> > > the preferred solution to continue with.
> >
> > I've implemented the second approach mentioned above, this version was
> > tested on our test clusters for some time without visible issues. Will
> > create a CF item and would appreciate any feedback.
>
> After more testing I found couple of things that could be improved,
> namely in the presence of non-reducible constants one part of the query
> was not copied into the normalized version, and this approach also could
> be extended for Params. These are incorporated in the attached patch.
>


Re: SQL/JSON: functions

2020-12-26 Thread Zhihong Yu
Hi,
For ExecEvalJsonExprSubtrans(), if you check !subtrans first,

+   /* No need to use subtransactions. */
+   return func(op, econtext, res, resnull, p, error);

The return statement would allow omitting the else keyword and left-indent
the code in the current if block.

For ExecEvalJsonExpr()

+   *resnull = !DatumGetPointer(res);
+   if (error && *error)
+   return (Datum) 0;

Suppose *resnull is false and *error is true, 0 would be returned
with *resnull as false. Should the *resnull be consistent with the actual
return value ?

For ExecEvalJson() :

+   Assert(*op->resnull);
+   *op->resnull = true;

I am not sure of the purpose for the assignment since *op->resnull should
be true by the assertion.

For raw_expression_tree_walker :

+   if (walker(jfe->on_empty, context))
+   return true;

Should the if condition include jfe->on_empty prior to walking ?

nit: for contain_mutable_functions_walker, if !IsA(jexpr->path_spec, Const)
is checked first (and return), the current if block can be left indented.

For JsonPathDatatypeStatus,

+   jpdsDateTime,   /* unknown datetime type */

Should the enum be named jpdsUnknownDateTime so that its meaning is clear
to people reading the code ?

For get_json_behavior(), I wonder if mapping from behavior->btype to the
string form would shorten the body of switch statement.
e.g.
char* map[] = {
  " NULL",
  " ERROR",
  " EMPTY",
...
};

Cheers

On Fri, Dec 25, 2020 at 5:19 PM Zhihong Yu  wrote:

> For 0001-Common-SQL-JSON-clauses-v51.patch :
>
> +   /*  | implementation_defined_JSON_representation_option (BSON,
> AVRO etc) */
>
> I don't find implementation_defined_JSON_representation_option in the
> patchset. Maybe rephrase the above as a comment
> without implementation_defined_JSON_representation_option ?
>
> For getJsonEncodingConst(), should the method error out for the default
> case of switch (encoding) ?
>
> 0002-SQL-JSON-constructors-v51.patch :
>
> +   Assert(!OidIsValid(collation)); /* result is always an
> json[b] type */
>
> an json -> a json
>
> +   /* XXX TEXTOID is default by standard */
> +   returning->typid = JSONOID;
>
> Comment doesn't seem to match the assignment.
>
> For json_object_agg_transfn :
>
> +   if (out->len > 2)
> +   appendStringInfoString(out, ", ");
>
> Why length needs to be at least 3 (instead of 2) ?
>
> Cheers
>
> On Fri, Dec 25, 2020 at 12:26 PM Nikita Glukhov 
> wrote:
>
>> On 17.09.2020 08:41, Michael Paquier wrote:
>>
>> On Sat, Jul 18, 2020 at 09:24:11AM -0400, Andrew Dunstan wrote:
>>
>> I think patches 5 and 6 need to be submitted to the next commitfest,
>> This is far too much scope creep to be snuck in under the current CF item.
>>
>> I'll look at patches 1-4.
>>
>> Even with that, the patch set has been waiting on author for the last
>> six weeks, so I am marking it as RwF for now.  Please feel free to
>> resubmit.
>>
>> Attached 51st version of the patches rebased onto current master.
>>
>>
>> There were some shift/reduce conflicts in SQL grammar that have appeared
>> after "expr AS keyword" refactoring in 06a7c3154f.  I'm not sure if I 
>> resolved
>> them correctly.  JSON TEXT pseudotype, introduced in #0006, caused a lot of
>> grammar conflicts, so it was replaced with simple explicit pg_catalog.json.
>>
>> Also new CoercionForm COERCE_SQL_SYNTAX was introduced, and this reminds 
>> custom
>> function formats that I have used in earlier version of the patches for
>> deparsing of SQL/JSON constructor expressions that were based on raw json[b]
>> function calls.  These custom function formats were replaced in v43 with
>> dedicated executor nodes for SQL/JSON constructors.  So, I'm not sure is it
>> worth to try to replace back nodes with new COERCE_SQL_SYNTAX.
>>
>> --
>> Nikita Glukhov
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>


Re: SQL/JSON: JSON_TABLE

2020-12-26 Thread Zhihong Yu
For new files introduced in the patches:

+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group

2021 is a few days ahead. It would be good to update the year range.

For transformJsonTableColumn:

+   jfexpr->op =
+   jtc->coltype == JTC_REGULAR ? IS_JSON_VALUE :
+   jtc->coltype == JTC_EXISTS ? IS_JSON_EXISTS : IS_JSON_QUERY;

Should IS_JSON_EXISTS be mentioned in the comment preceding the method ?

For JsonTableDestroyOpaque():

+   state->opaque = NULL;

Should the memory pointed to by opaque be freed ?

+   l2 = list_head(tf->coltypes);
+   l3 = list_head(tf->coltypmods);
+   l4 = list_head(tf->colvalexprs);

Maybe the ListCell pointer variables can be named corresponding to the
lists they iterate so that the code is easier to understand.

+typedef enum JsonTablePlanJoinType
+{
+   JSTP_INNER = 0x01,
+   JSTP_OUTER = 0x02,
+   JSTP_CROSS = 0x04,

Since plan type enum starts with JSTP_, I think the plan join type should
start with JSTPJ_ or JSTJ_. Otherwise the following would be a bit hard to
read:

+   else if (plan->plan_type == JSTP_JOINED)
+   {
+   if (plan->join_type == JSTP_INNER ||
+   plan->join_type == JSTP_OUTER)

since different fields are checked in the two if statements but the
prefixes don't convey that.

+  Even though the path names are not incuded into the PLAN
DEFAULT

Typo: incuded -> included

+   int nchilds = 0;

nchilds -> nchildren

+#if 0 /* XXX it' unclear from the standard whether root path name is
mandatory or not */
+   if (plan && plan->plan_type != JSTP_DEFAULT && !rootPathName)

Do you plan to drop the if block ?

Cheers

On Fri, Dec 25, 2020 at 12:32 PM Nikita Glukhov 
wrote:

>
> On 03.08.2020 10:55, Michael Paquier wrote:
>
> On Sun, Jul 05, 2020 at 12:15:58PM -0500, Justin Pryzby wrote:
>
> It looks like this needs to be additionally rebased - I will set cfbot to
> "Waiting".
>
> ...  Something that has not happened in four weeks, so this is marked
> as returned with feedback.  Please feel free to resubmit once a rebase
> is done.
> --
> Michael
>
>
> Atatched 44th version of the pacthes rebased onto current master
> (#0001 corresponds to v51 of SQL/JSON patches).
>
>
> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


Re: range_agg

2020-12-27 Thread Zhihong Yu
Hi,

This is not an ideal way to index multirages, but something we can easily
have.

typo: multiranges

Cheers

On Sun, Dec 27, 2020 at 1:50 AM Alexander Korotkov 
wrote:

> On Thu, Dec 17, 2020 at 10:10 PM Alexander Korotkov
>  wrote:
> >
> > I think this patch is very close to committable.  I'm going to spend
> > some more time further polishing it and commit (if I don't find a
> > major issue or face objections).
>
> The main patch is committed.  I've prepared a set of improvements.
> 0001 Fixes bug in bsearch comparison functions
> 0002 Implements missing @> (range,multirange) operator and its commutator
> 0003 Does refactors signatures of *_internal() multirange functions
> 0004 Adds cross-type (range, multirange) operators handling to
> existing range GiST opclass
> 0005 Adds support for GiST multirange indexing by approximation of
> multirange as the union range with no gaps
>
> The patchset is quite trivial.  I'm going to push it if there are no
> objections.
>
> The SP-GiST handling is more tricky and requires substantial work.
>
> --
> Regards,
> Alexander Korotkov
>


Re: Parallel Inserts in CREATE TABLE AS

2020-12-27 Thread Zhihong Yu
For v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch:

+   if (ignore &&
+   (root->parse->CTASParallelInsInfo &
+CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))

I wonder why CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is checked again in the
above if since when ignore_parallel_tuple_cost returns
true, CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is set already.

+ * In this function we only care Append and Gather nodes.

'care' -> 'care about'

+   for (int i = 0; i < aps->as_nplans; i++)
+   {
+   parallel |= PushDownCTASParallelInsertState(dest,
+   aps->appendplans[i],
+   gather_exists);

It seems the loop termination condition can include parallel since we can
come out of the loop once parallel is true.

+   if (!allow && tuple_cost_flags && gather_exists)

As the above code shows, gather_exists is only checked when allow is false.

+* We set the flag for two cases when there is no parent path
will
+* be created(such as : limit,sort,distinct...):

Please correct the grammar : there are two verbs following 'when'

For set_append_rel_size:

+   {
+   root->parse->CTASParallelInsInfo |=
+
CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+   }
+   }
+
+   if (root->parse->CTASParallelInsInfo &
+   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+   {
+   root->parse->CTASParallelInsInfo &=
+
~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;

In the if block for childrel->rtekind ==
RTE_SUBQUERY, CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND maybe set. Why is it
cleared immediately after ?

+   /* Set to this in case tuple cost needs to be ignored for Append cases.
*/
+   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3

Since each CTAS_PARALLEL_INS_ flag is a bit, maybe it's better to use 'turn
on' or similar term in the comment. Because 'set to' normally means
assignment.

Cheers

On Sun, Dec 27, 2020 at 12:50 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar 
> wrote:
> > I have reviewed part of v15-0001 patch, I have a few comments, I will
> > continue to review this.
>
> Thanks a lot.
>
> > 1.
> > Why is this temporary hack? and what is the plan for removing this hack?
>
> The changes in xact.c, xact.h and heapam.c are common to all the
> parallel insert patches - COPY, INSERT INTO SELECT. That was the
> initial comment, I forgot to keep it in sync with the other patches.
> Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
> was to have these code in all the parallel inserts patch, whichever
> gets to review and commit first, others will update their patches
> accordingly.
>
> > 2.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e.
> push down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> > +if (!IS_CTAS(into))
> > +return;
> >
> > When will this hit?  The functtion name suggest that it is from CTAS
> > but now you have a check that if it is
> > not for CTAS then return,  can you add the comment that when do you
> > expect this case?
>
> Yes it will hit for explain cases, but I choose to remove this and
> check outside in the explain something like:
> if (into)
> ChooseParallelInsertsInCTAS()
>
> > Also the function name should start in a new line
> > i.e
> > void
> > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
>
> Ah, missed that. Modified now.
>
> > 3.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e.
> push down
> > + * the dest receiver to the Gather nodes.
> > + */
> >
> > Push down to the Gather nodes?  I think the right statement will be
> > push down below the Gather node.
>
> Modified.
>
> > 4.
> > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> > {
> >  DR_intorel *myState = (DR_intorel *) self;
> >
> > -- Comment ->in parallel worker we don't need to crease dest recv
> blah blah
> > +if (myState->is_parallel_worker)
> > {
> > --parallel worker handling--
> > return;
> > }
> >
> > --non-parallel worker code stay right there, instead of moving to
> else
>
> Done.
>
> > 5.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e.
> push down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> >
> > From function name and comments it appeared that this function will
> > return boolean saying whether
> > Parallel insert should be selected or not.  I t

Re: Parallel Inserts in CREATE TABLE AS

2020-12-29 Thread Zhihong Yu
w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch

+ * Push the dest receiver to Gather node when it is either at the top of
the
+ * plan or under top Append node unless it does not have any projections
to do.

I think the 'unless' should be 'if'. As can be seen from the body of the
method:

+   if (!ps->ps_ProjInfo)
+   {
+   GatherState *gstate = (GatherState *) ps;
+
+   parallel = true;

Cheers

On Mon, Dec 28, 2020 at 4:12 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Dec 28, 2020 at 10:46 AM Dilip Kumar 
> wrote:
> > Thanks for working on this, I will have a look at the updated patches
> soon.
>
> Attaching v17 patch set after addressing comments raised in other
> threads. Please consider this patch set for further review.
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Deleting older versions in unique indexes to avoid page splits

2020-12-31 Thread Zhihong Yu
Hi, Peter:
Happy New Year.

For v12-0001-Pass-down-logically-unchanged-index-hint.patch

+   if (hasexpression)
+   return false;
+
+   return true;

The above can be written as return !hasexpression;
The negation is due to the return value from
index_unchanged_by_update_var_walker() is inverse of index unchanged.
If you align the meaning of return value
from index_unchanged_by_update_var_walker() the same way
as index_unchanged_by_update(), negation is not needed.

For v12-0002-Add-bottom-up-index-deletion.patch :

For struct xl_btree_delete:

+   /* DELETED TARGET OFFSET NUMBERS FOLLOW */
+   /* UPDATED TARGET OFFSET NUMBERS FOLLOW */
+   /* UPDATED TUPLES METADATA (xl_btree_update) ARRAY FOLLOWS */

I guess the comment is for illustration purposes. Maybe you can write the
comment in lower case.

+#define BOTTOMUP_FAVORABLE_STRIDE  3

Adding a comment on why the number 3 is chosen would be helpful for people
to understand the code.

Cheers

On Wed, Dec 30, 2020 at 6:55 PM Peter Geoghegan  wrote:

> On Wed, Dec 9, 2020 at 5:12 PM Peter Geoghegan  wrote:
> > Attached is v11, which cleans everything up around the tableam
> > interface. There are only two patches in v11, since the tableam
> > refactoring made it impossible to split the second patch into a heapam
> > patch and nbtree patch (there is no reduction in functionality
> > compared to v10).
>
> Attached is v12, which fixed bitrot against the master branch. This
> version has significant comment and documentation revisions. It is
> functionally equivalent to v11, though.
>
> I intend to commit the patch in the next couple of weeks. While it
> certainly would be nice to get a more thorough review, I don't feel
> that it is strictly necessary. The patch provides very significant
> benefits with certain workloads that have traditionally been
> considered an Achilles' heel for Postgres. Even zheap doesn't provide
> a solution to these problems. The only thing that I can think of that
> might reasonably be considered in competition with this design is
> WARM, which hasn't been under active development since 2017 (I assume
> that it has been abandoned by those involved).
>
> I should also point out that the design doesn't change the on-disk
> format in any way, and so doesn't commit us to supporting something
> that might become onerous in the event of somebody else finding a
> better way to address at least some of the same problems.
>
> --
> Peter Geoghegan
>


Re: Table AM modifications to accept column projection lists

2020-12-31 Thread Zhihong Yu
Hi, Soumyadeep:
Happy New Year.

+typedef struct neededColumnContext
+{
+   Bitmapset **mask;
+   int n;

+ * n specifies the number of allowed entries in mask: we use
+ * it for bounds-checking in the walker above.

I think the code would be easier to read if the above comment is moved or
copied for field n of neededColumnContext

Cheers

On Thu, Dec 31, 2020 at 1:03 PM Soumyadeep Chakraborty <
soumyadeep2...@gmail.com> wrote:

> Hey Masahiko,
>
> I added it to the Jan CF (https://commitfest.postgresql.org/31/2922/).
>
> PFA a rebased version against latest head.
>
> Regards,
> Soumyadeep
>


Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-01 Thread Zhihong Yu
Hi, Bharath:

Happy new year.

+   appendStringInfo(&buf, "(%s, %s)", server->servername,
+entry->invalidated ? "false" : "true");

Is it better to use 'invalidated' than 'false' in the string ?

For the first if block of postgres_fdw_disconnect():

+* Check if the connection associated with the given foreign server
is
+* in use i.e. entry->xact_depth > 0. Since we can not close it, so
+* error out.
+*/
+   if (is_in_use)
+   ereport(WARNING,

since is_in_use is only set in the if (server) block, I think the above
warning can be moved into that block.

Cheers

On Fri, Jan 1, 2021 at 2:04 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Dec 31, 2020 at 8:29 AM Bharath Rupireddy
>  wrote:
> > Right. I meant the "next use" as the select attempt on a foreign table
> > with that foreign server. If no select query is run, then at the end
> > of the current txn that connection gets closed. Yes internally such
> > connection gets closed in pgfdw_xact_callback.
> >
> > If the errdetail("Such connections get closed either in the next use
> > or at the end of the current transaction.") looks confusing, how about
> >
> > 1) errdetail("Such connection gets discarded while closing the remote
> > transaction.")/errdetail("Such connections get discarded while closing
> > the remote transaction.")
> > 2) errdetail("Such connection is discarded at the end of remote
> > transaction.")/errdetail("Such connections are discarded at the end of
> > remote transaction.")
> >
> > I prefer 2)  Thoughts?
> >
> > Because we already print a message in pgfdw_xact_callback -
> > elog(DEBUG3, "closing remote transaction on connection %p"
>
> I changed the message to "Such connection is discarded at the end of
> remote transaction.".
>
> I'm attaching v5 patch set i.e. all the patches 0001 ( for new
> functions), 0002 ( for GUC) and 0003 (for server level option). I have
> also made the changes for increasing the version of
> postgres_fdw--1.0.sql from 1.0 to 1.1.
>
> I have no open points from my end. Please consider the v5 patch set
> for further review.
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: poc - possibility to write window function in PL languages

2021-01-01 Thread Zhihong Yu
Hi, Pavel:
Happy New Year.

+   command with clause WINDOW. The specific feature of
+   this functions is a possibility to two special storages with

this functions -> this function

possibility to two special storages: there is no verb.

'store with stored one value': store is repeated.

+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group

It would be better to change 2020 to 2021 in the new files.

For some functions, such as windowobject_get_func_arg_frame, it would be
better to add comment explaining their purposes.

For estimate_partition_context_size():
+errmsg("size of value is greather than limit (1024
bytes)")));

Please include the value of typlen in the message. There is similar error
message in the else block where value of size should be included.

+   return *realsize;
+   }
+   else

The 'else' is not needed since the if block ends with return.

+   size += size / 3;

Please add a comment for the choice of constant 3.

+   /* by default we allocate 30 bytes */
+   *realsize = 0;

The value 30 may not be accurate - from the caller:

+   if (PG_ARGISNULL(2))
+   minsize = VARLENA_MINSIZE;
+   else
+   minsize = PG_GETARG_INT32(2);

VARLENA_MINSIZE is 32.

Cheers

On Fri, Jan 1, 2021 at 3:29 AM Pavel Stehule 
wrote:

> Hi
>
> rebase
>
> Regards
>
> Pavel
>
>


Re: faster ETL / bulk data load for heap tables

2021-01-01 Thread Zhihong Yu
Hi, Luc:
Happy New Year.

Looking at BufferAllocExtend()
in v1-0002-WIP-buffer-alloc-specialized-for-relation-extensi.patch. it
seems there is duplicate code with the existing BufferAlloc().

It would be good if some refactoring is done by extracting common code into
a helper function.

Thanks

On Fri, Jan 1, 2021 at 6:07 AM Luc Vlaming  wrote:

> Hi,
>
> In an effort to speed up bulk data loading/transforming I noticed that
> considerable time is spent in the relation extension lock. I know there
> are already many other efforts to increase the chances of using bulk
> loading [1], [2], [3], [4], efforts to make loading more parallel [5],
> and removing some syscalls [6], as well as completely new I/O systems
> [7] and some more extreme measures like disabling wal logging [8].
>
> Whilst they will all help, they will ultimately be stopped by the
> relation extension lock. Also it seems from the tests I've done so far
> that at least for bulk loading using pwrite() actually can carry us
> rather far as long as we are not doing it under a global lock. Moreover,
> the solution provided here might be an alternative to [7] because the
> results are quite promising, even with WAL enabled.
>
> Attached two WIP patches in the hopes of getting feedback.
> The first changes the way we do bulk loading: each backend now gets a
> standalone set of blocks allocated that are local to that backend. This
> helps both with reducing contention but also with some OS writeback
> mechanisms.
> The second patch reduces the time spent on locking the partition buffers
> by shifting around the logic to make each set of 128 blocks use the same
> buffer partition, and then adding a custom function to get buffer blocks
> specifically for extension, whilst keeping a previous partition lock,
> thereby reducing the amount of time we spent on futexes.
>
> The design:
> - add a set of buffers into the BulkInsertState that can be used by the
> backend without any locks.
> - add a ReadBufferExtendBulk function which extends the relation with
> BULK_INSERT_BATCH_SIZE blocks at once.
> - rework FileWrite to have a parameter to speed up relation extension by
> passing in if we are using filewrite just to extend the file. if
> supported uses ftruncate as this is much faster (also than
> posix_fallocate on my system) and according to the manpages
> (https://linux.die.net/man/2/ftruncate) should read as zeroed space. to
> be cleaned-up later possibly into a special function FileExtend().
> - rework mdextend to get a page count.
> - make a specialized version of BufferAlloc called BufferAllocExtend
> which keeps around the lock on the last buffer partition and tries to
> reuse this so that there are a lot less futex calls.
>
> Many things that are still to be done; some are:
> - reintroduce FSM again, and possibly optimize the lock usage there. in
> other words: this patch currently can only start the server and run COPY
> FROM and read queries.
> - look into the wal logging. whilst it seems to be fairly optimal
> already wrt the locking and such i noticed there seems to be an
> alternating pattern between the bgwriter and the workers. whilst setting
> some parameters bigger helped a lot (wal_writer_flush_after,
> wal_writer_delay, wal_buffers)
> - work nicely with the code from [6] so that the register_dirty_segment
> is indeed not needed anymore; or alternatively optimize that code so
> that less locks are needed.
> - make use of [9] in the fallback case in FileWrite() when
> ftruncate/fallocate is not available so that the buffer size can be
> reduced.
>
> First results are below; all tests were loading 32 times the same 1G
> lineitem csv into the same table. tests were done both on a nvme and the
> more parallel ones also with a tmpfs disk to see potential disk
> bottlenecks and e.g. potential wrt using NVDIMM.
> =
> using an unlogged table:
> NVME, UNLOGGED table, 4 parallel streams:   HEAD 171s, patched 113s
> NVME, UNLOGGED table, 8 parallel streams:   HEAD 113s, patched 67s
> NVME, UNLOGGED table, 16 parallel streams:  HEAD 112s, patched 42s
> NVME, UNLOGGED table, 32 parallel streams:  HEAD 121s, patched 36s
> tmpfs, UNLOGGED table, 16 parallel streams: HEAD 96s, patched 38s
> tmpfs, UNLOGGED table, 32 parallel streams: HEAD 104s, patched 25s
> =
> using a normal table, wal-level=minimal, 16mb wal segments:
> NVME, 4 parallel streams:   HEAD 237s, patched 157s
> NVME, 8 parallel streams:   HEAD 200s, patched 142s
> NVME, 16 parallel streams:  HEAD 171s, patched 145s
> NVME, 32 parallel streams:  HEAD 199s, patched 146s
> tmpfs, 16 parallel streams: HEAD 131s, patched 89s
> tmpfs, 32 parallel streams: HEAD 148s, patched 98s
> =
> using a normal table, wal-level=minimal, 256mb wal segments,
> wal_init_zero = off, wal_buffers = 262143, wal_writer_delay = 1ms,
> wal_writer_flush_after = 512MB
>
> NVME, 4 parallel streams:   HEAD 201s, patched 15

Re: Implement for window functions

2021-01-01 Thread Zhihong Yu
Krasiyan:
Happy New Year.

For WinGetFuncArgInPartition():

+   if (target > 0)
+   step = 1;
+   else if (target < 0)
+   step = -1;
+   else
+   step = 0;

When would the last else statement execute ? Since the above code is
for WINDOW_SEEK_CURRENT, I wonder why step should be 0.

Similar question for the last else statement below
in WinGetFuncArgInFrame():

+   else if (seektype == WINDOW_SEEK_TAIL)
+   step = -1;
+   else
+   step = 0;

Thanks

On Fri, Jan 1, 2021 at 12:59 PM Krasiyan Andreev  wrote:

> Hi, it looks like cfbot.cputube.org didn't recognize and can't apply a
> patch, so I resend it now with a different format, no other changes.
>
> На ср, 30.12.2020 г. в 22:16 ч. Krasiyan Andreev 
> написа:
>
>> It works - now it compiles clean and all checks are passed, thank you. I
>> will continue with more complex tests.
>>
>> На ср, 30.12.2020 г. в 21:50 ч. David Fetter  написа:
>>
>>> On Wed, Dec 30, 2020 at 09:32:26PM +0200, Krasiyan Andreev wrote:
>>> > Hi, after latest committed patches about multirange datatypes, I get a
>>> > compilation error,
>>>
>>> Oh, right. I'd been meaning to send a patch to fix that. Here it is.
>>>
>>> Best,
>>> David.
>>> --
>>> David Fetter  http://fetter.org/
>>> Phone: +1 415 235 3778
>>>
>>> Remember to vote!
>>> Consider donating to Postgres: http://www.postgresql.org/about/donate
>>>
>>


Re: poc - possibility to write window function in PL languages

2021-01-04 Thread Zhihong Yu
Hi, Pavel:
Thanks for the update.

I don't have other comment.

Cheers

On Mon, Jan 4, 2021 at 3:15 AM Pavel Stehule 
wrote:

> Hi
>
> pá 1. 1. 2021 v 18:57 odesílatel Zhihong Yu  napsal:
>
>> Hi, Pavel:
>> Happy New Year.
>>
>> +   command with clause WINDOW. The specific feature of
>> +   this functions is a possibility to two special storages with
>>
>> this functions -> this function
>>
>> possibility to two special storages: there is no verb.
>>
>> 'store with stored one value': store is repeated.
>>
>> + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
>>
>> It would be better to change 2020 to 2021 in the new files.
>>
>
> fixed
>
>>
>> For some functions, such as windowobject_get_func_arg_frame, it would be
>> better to add comment explaining their purposes.
>>
>
> It is commented before. These functions just call WinAPI functions
>
> /*
>  * High level access function. These functions are wrappers for windows API
>  * for PL languages based on usage WindowObjectProxy.
>  */
>
>
>
>> For estimate_partition_context_size():
>> +errmsg("size of value is greather than limit (1024
>> bytes)")));
>>
>> Please include the value of typlen in the message. There is similar error
>> message in the else block where value of size should be included.
>>
>> +   return *realsize;
>> +   }
>> +   else
>>
>> The 'else' is not needed since the if block ends with return.
>>
>
> yes, but it is there for better readability (symmetry)
>
>>
>> +   size += size / 3;
>>
>> Please add a comment for the choice of constant 3.
>>
>> +   /* by default we allocate 30 bytes */
>> +   *realsize = 0;
>>
>> The value 30 may not be accurate - from the caller:
>>
>> +   if (PG_ARGISNULL(2))
>> +   minsize = VARLENA_MINSIZE;
>> +   else
>> +   minsize = PG_GETARG_INT32(2);
>>
>> VARLENA_MINSIZE is 32.
>>
>> Cheers
>>
>> On Fri, Jan 1, 2021 at 3:29 AM Pavel Stehule 
>> wrote:
>>
>>> Hi
>>>
>>> rebase
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>
> I am sending updated patch
>
> Thank you for comments
>
> Regards
>
> Pavel
>


Re: pg_stat_statements and "IN" conditions

2021-01-05 Thread Zhihong Yu
Hi, Dmitry:

+   int lastExprLenght = 0;

Did you mean to name the variable lastExprLenghth ?

w.r.t. extracting to helper method, the second and third if (currentExprIdx
== pgss_merge_threshold - 1) blocks are similar.
It is up to you whether to create the helper method.
I am fine with the current formation.

Cheers

On Tue, Jan 5, 2021 at 4:51 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Sat, Dec 26, 2020 at 08:53:28AM -0800, Zhihong Yu wrote:
> > Hi,
> > A few comments.
> >
> > +   foreach(lc, (List *) expr)
> > +   {
> > +   Node * subExpr = (Node *) lfirst(lc);
> > +
> > +   if (!IsA(subExpr, Const))
> > +   {
> > +   allConst = false;
> > +   break;
> > +   }
> > +   }
> >
> > It seems the above foreach loop (within foreach(temp, (List *) node)) can
> > be preceded with a check that allConst is true. Otherwise the loop can be
> > skipped.
>
> Thanks for noticing. Now that I look at it closer I think it's the other
> way around, the loop above checking constants for the first expression
> is not really necessary.
>
> > +   if (currentExprIdx == pgss_merge_threshold - 1)
> > +   {
> > +   JumbleExpr(jstate, expr);
> > +
> > +   /*
> > +* A const expr is already found, so JumbleExpr
> must
> > +* record it. Mark it as merged, it will be the
> > first
> > +* merged but still present in the statement
> query.
> > +*/
> > +   Assert(jstate->clocations_count > 0);
> > +   jstate->clocations[jstate->clocations_count -
> > 1].merged = true;
> > +   currentExprIdx++;
> > +   }
> >
> > The above snippet occurs a few times. Maybe extract into a helper method.
>
> Originally I was hesitant to extract it was because it's quite small
> part of the code. But now I've realized that the part relevant to lists
> is not really correct, which makes those bits even more different, so I
> think it makes sense to leave it like that. What do you think?
>


Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Zhihong Yu
For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :

ParallelInsCmdEstimate :

+   Assert(pcxt && ins_info &&
+  (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)

Sinc the if condition is covered by the assertion, I wonder why the if
check is still needed.

Similar comment for SaveParallelInsCmdFixedInfo and SaveParallelInsCmdInfo

Cheers

On Mon, Jan 4, 2021 at 7:59 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
>  wrote:
> >
> > > +   if
> (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> > > +   ((DR_intorel *)
> gstate->dest)->into->rel &&
> > > +   ((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > > why would rel and relname not be there? if no rows have been inserted?
> > > because it seems from the intorel_startup function that that would be
> > > set as soon as startup was done, which i assume (wrongly?) is always
> done?
> >
> > Actually, that into clause rel variable is always being set in the
> gram.y for CTAS, Create Materialized View and SELECT INTO (because
> qualified_name non-terminal is not optional). My bad. I just added it as a
> sanity check. Actually, it's not required.
> >
> > create_as_target:
> > qualified_name opt_column_list table_access_method_clause
> > OptWith OnCommitOption OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > create_mv_target:
> > qualified_name opt_column_list table_access_method_clause
> opt_reloptions OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > into_clause:
> > INTO OptTempTableName
> > {
> > $$ = makeNode(IntoClause);
> >$$->rel = $2;
> >
> > I will change the below code:
> > +if (GetParallelInsertCmdType(gstate->dest) ==
> > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> > +((DR_intorel *) gstate->dest)->into &&
> > +((DR_intorel *) gstate->dest)->into->rel &&
> > +((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > +{
> >
> > to:
> > +if (GetParallelInsertCmdType(gstate->dest) ==
> > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +{
> >
> > I will update this in the next version of the patch set.
>
> Attaching v20 patch set that has above change in 0001 patch, note that
> 0002 to 0004 patches have no changes from v19. Please consider the v20
> patch set for further review.
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Zhihong Yu
The plan sounds good.

Before the second command type is added, can you leave out the 'if (ins_cmd
== PARALLEL_INSERT_CMD_CREATE_TABLE_AS)' and keep the pair of curlies ?

You can add the if condition back when the second command type is added.

Cheers

On Tue, Jan 5, 2021 at 7:53 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Wed, Jan 6, 2021 at 8:19 AM Zhihong Yu  wrote:
> > For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
> >
> > ParallelInsCmdEstimate :
> >
> > +   Assert(pcxt && ins_info &&
> > +  (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
> > +
> > +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> >
> > Sinc the if condition is covered by the assertion, I wonder why the if
> check is still needed.
> >
> > Similar comment for SaveParallelInsCmdFixedInfo and
> SaveParallelInsCmdInfo
>
> Thanks.
>
> The idea is to have assertion with all the expected ins_cmd types, and
> then later to have selective handling for different ins_cmds. For
> example, if we add (in future) parallel insertion in Refresh
> Materialized View, then the code in those functions will be something
> like:
>
> +static void
> +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind
> ins_cmd,
> +   void *ins_info)
> +{
> +Assert(pcxt && ins_info &&
> +   (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS ||
> +(ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW));
> +
> +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +{
> +
> +}
> +   else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
> +   {
> +
> +   }
>
> Similarly for other functions as well.
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Parallel Inserts in CREATE TABLE AS

2021-01-06 Thread Zhihong Yu
Hi,
For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :

workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.

It would be nice if the scenarios where parallel plan is not chosen are
listed.

+   if ((root->parse->parallelInsCmdTupleCostOpt &
+PARALLEL_INSERT_SELECT_QUERY) &&
+   (root->parse->parallelInsCmdTupleCostOpt &
+PARALLEL_INSERT_CAN_IGN_TUP_COST))
+   {
+   /* We are ignoring the parallel tuple cost, so mark it. */
+   root->parse->parallelInsCmdTupleCostOpt |=
+
PARALLEL_INSERT_TUP_COST_IGNORED;

If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY
and PARALLEL_INSERT_CAN_IGN_TUP_COST are
set, PARALLEL_INSERT_TUP_COST_IGNORED is implied.

Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting
(1) of the first two bits should suffice.

Cheers

On Mon, Jan 4, 2021 at 7:59 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
>  wrote:
> >
> > > +   if
> (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> > > +   ((DR_intorel *)
> gstate->dest)->into->rel &&
> > > +   ((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > > why would rel and relname not be there? if no rows have been inserted?
> > > because it seems from the intorel_startup function that that would be
> > > set as soon as startup was done, which i assume (wrongly?) is always
> done?
> >
> > Actually, that into clause rel variable is always being set in the
> gram.y for CTAS, Create Materialized View and SELECT INTO (because
> qualified_name non-terminal is not optional). My bad. I just added it as a
> sanity check. Actually, it's not required.
> >
> > create_as_target:
> > qualified_name opt_column_list table_access_method_clause
> > OptWith OnCommitOption OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > create_mv_target:
> > qualified_name opt_column_list table_access_method_clause
> opt_reloptions OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > into_clause:
> > INTO OptTempTableName
> > {
> > $$ = makeNode(IntoClause);
> >$$->rel = $2;
> >
> > I will change the below code:
> > +if (GetParallelInsertCmdType(gstate->dest) ==
> > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> > +((DR_intorel *) gstate->dest)->into &&
> > +((DR_intorel *) gstate->dest)->into->rel &&
> > +((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > +{
> >
> > to:
> > +if (GetParallelInsertCmdType(gstate->dest) ==
> > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +{
> >
> > I will update this in the next version of the patch set.
>
> Attaching v20 patch set that has above change in 0001 patch, note that
> 0002 to 0004 patches have no changes from v19. Please consider the v20
> patch set for further review.
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-06 Thread Zhihong Yu
Hi,
For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :

However these functions are not neither committed nor aborted at

I think the double negation was not intentional. Should be 'are neither ...'

For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the
return statement ?

+   fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);

For the function name, Fdw and Xact appear twice, each. Maybe one of them
can be dropped ?

+* we don't need to anything for this participant because all
foreign

'need to' -> 'need to do'

+   else if (TransactionIdDidAbort(xid))
+   return FDWXACT_STATUS_ABORTING;
+
the 'else' can be omitted since the preceding if would return.

+   if (max_prepared_foreign_xacts <= 0)

I wonder when the value for max_prepared_foreign_xacts would be negative
(and whether that should be considered an error).

Cheers

On Wed, Jan 6, 2021 at 5:45 PM Masahiko Sawada 
wrote:

> On Mon, Dec 28, 2020 at 11:24 PM Masahiko Sawada 
> wrote:
> >
> > On Wed, Nov 25, 2020 at 9:50 PM Masahiko Sawada 
> wrote:
> > >
> > > Since the previous version conflicts with the current HEAD I've
> > > attached the rebased version patch set.
> >
> > Rebased the patch set again to the current HEAD.
> >
> > The discussion of this patch is very long so here is a short summary
> > of the current state:
> >
> > It’s still under discussion which approaches are the best for the
> > distributed transaction commit as a building block of built-in sharing
> > using foreign data wrappers.
> >
> > Since we’re considering that we use this feature for built-in
> > sharding, the design depends on the architecture of built-in sharding.
> > For example, with the current patch, the PostgreSQL node that received
> > a COMMIT from the client works as a coordinator and it commits the
> > transactions using 2PC on all foreign servers involved with the
> > transaction. This approach would be good with the de-centralized
> > sharding architecture but not with centralized architecture like the
> > GTM node of Postgres-XC and Postgres-XL that is a dedicated component
> > that is responsible for transaction management. Since we don't get a
> > consensus on the built-in sharding architecture yet, it's still an
> > open question that this patch's approach is really good as a building
> > block of the built-in sharding.
> >
> > On the other hand, this feature is not necessarily dedicated to the
> > built-in sharding. For example, the distributed transaction commit
> > through FDW is important also when atomically moving data between two
> > servers via FDWs. Using a dedicated process or server like GTM could
> > be an over solution. Having the node that received a COMMIT work as a
> > coordinator would be better and straight forward.
> >
> > There is no noticeable TODO in the functionality so far covered by
> > this patch set. This patchset adds new FDW APIs to support 2PC,
> > introduces the global transaction manager, and implement those FDW
> > APIs to postgres_fdw. Also, it has regression tests and documentation.
> > Transactions on foreign servers involved with the distributed
> > transaction are committed using 2PC. Committing using 2PC is performed
> > asynchronously and transparently to the user. Therefore, it doesn’t
> > guarantee that transactions on the foreign server are also committed
> > when the client gets an acknowledgment of COMMIT. The patch doesn't
> > cover synchronous foreign transaction commit via 2PC is not covered by
> > this patch as we still need a discussion on the design.
> >
>
> I've attached the rebased patches to make cfbot happy.
>
>
> Regards,
>
> --
> Masahiko Sawada
> EnterpriseDB:  https://www.enterprisedb.com/
>


Re: Parallel Inserts in CREATE TABLE AS

2021-01-06 Thread Zhihong Yu
Thanks for the clarification.

w.r.t. the commit message, maybe I was momentarily sidetracked by the
phrase: With this change.
It seems the scenarios you listed are known parallel safety constraints.

Probably rephrase that sentence a little bit to make this clearer.

Cheers

On Wed, Jan 6, 2021 at 8:01 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu  wrote:
> >
> > Hi,
> > For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
> >
> > workers to Gather node to 0. With this change, there are chances
> > that the planner may choose the parallel plan.
> >
> > It would be nice if the scenarios where a parallel plan is not chosen
> are listed.
>
> There are many reasons, the planner may not choose a parallel plan for
> the select part, for instance if there are temporary tables, parallel
> unsafe functions, or the parallelism GUCs are not set properly,
> foreign tables and so on. see
> https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
> think so, we will add all the scenarios into the commit message.
>
> Having said that, we have extensive comments in the code(especially in
> the function SetParallelInsertState) about when parallel inserts are
> chosen.
>
> + * Parallel insertions are possible only if the upper node is Gather.
>   */
> +if (!IsA(gstate, GatherState))
>  return;
>
> + * Parallelize inserts only when the upper Gather node has no
> projections.
>   */
> +if (!gstate->ps.ps_ProjInfo)
> +{
> +/* Okay to parallelize inserts, so mark it. */
> +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +((DR_intorel *) dest)->is_parallel = true;
> +
> +/*
> + * For parallelizing inserts, we must send some information so
> that the
> + * workers can build their own dest receivers. For CTAS, this
> info is
> + * into clause, object id (to open the created table).
> + *
> + * Since the required information is available in the dest
> receiver,
> + * store a reference to it in the Gather state so that it will be
> used
> + * in ExecInitParallelPlan to pick the information.
> + */
> +gstate->dest = dest;
> +}
> +else
> +{
> +/*
> + * Upper Gather node has projections, so parallel insertions are
> not
> + * allowed.
> + */
>
> > +   if ((root->parse->parallelInsCmdTupleCostOpt &
> > +PARALLEL_INSERT_SELECT_QUERY) &&
> > +   (root->parse->parallelInsCmdTupleCostOpt &
> > +PARALLEL_INSERT_CAN_IGN_TUP_COST))
> > +   {
> > +   /* We are ignoring the parallel tuple cost, so mark it. */
> > +   root->parse->parallelInsCmdTupleCostOpt |=
> > +
>  PARALLEL_INSERT_TUP_COST_IGNORED;
> >
> > If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and
> PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED
> is implied.
> >
> > Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the
> setting (1) of the first two bits should suffice.
>
> The way these flags work is as follows: before planning in CTAS, we
> set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
> gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
> actually ignored the tuple cost in cost_gather we set
> PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
> PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
> generate_useful_gather_paths, and the function
> generate_useful_gather_paths can return before reaching cost_gather,
> see below snippets. So, we need the 3 flags.
>
> void
> generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
> override_rows)
> {
> ListCell   *lc;
> doublerows;
> double   *rowsp = NULL;
> List   *useful_pathkeys_list = NIL;
> Path   *cheapest_partial_path = NULL;
>
> /* If there are no partial paths, there's nothing to do here. */
> if (rel->partial_pathlist == NIL)
> return;
>
> /* Should we override the rel's rowcount estimate? */
> if (override_rows)
> rowsp = &rows;
>
> /* generate the regular gather (merge) paths */
> generate_gather_paths(root, rel, override_rows);
>
> void
> generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
> override_rows)
> {
> Path   *cheapest_partial_path;
> Path   *simple_gather_path;
> ListCell   *lc;
> doublerows;
> double   *rowsp = NULL;
>
> /* If there are no partial paths, there's nothing to do here. */
> if (rel->partial_pathlist == NIL)
> return;
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Implement for window functions

2021-01-09 Thread Zhihong Yu
Hi,
For WinGetFuncArgInFrame():

+   if (winobj->null_treatment == NULL_TREATMENT_IGNORE)
{
...
+   if (target > 0)
+   step = 1;
+   else if (target < 0)
+   step = -1;
+   else if (seektype == WINDOW_SEEK_HEAD)
+   step = 1;
+   else if (seektype == WINDOW_SEEK_TAIL)
+   step = -1;
+   else
+   step = 0;
...
+   relpos = 0;
+   }

Why is relpos always set to 0 ?
In similar code in WinGetFuncArgInPartition(), I saw the following:

+   if (target > 0)
+   step = 1;
+   else if (target < 0)
+   step = -1;
+   else
+   step = 0;
+   relpos = step;

Maybe add a comment above the relpos assignment.

Thanks

On Sat, Jan 9, 2021 at 3:31 AM Krasiyan Andreev  wrote:

> Hi, the building warning below is fixed now, no other changes. Also, I can
> confirm that the corner case with offset=0 in lead and lag works correctly.
>
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
> -Wno-format-truncation -Wno-stringop-truncation -O2 -I../../../src/include
> -I/home/krasiyan/pgsql/postgresql/src/include  -D_GNU_SOURCE
> -I/usr/include/libxml2   -c -o nodeWindowAgg.o
> /home/krasiyan/pgsql/postgresql/src/backend/executor/nodeWindowAgg.c
> /home/krasiyan/pgsql/postgresql/src/backend/executor/nodeWindowAgg.c: In
> function ‘WinGetFuncArgInPartition’:
> /home/krasiyan/pgsql/postgresql/src/backend/executor/nodeWindowAgg.c:3274:10:
> warning: ‘step’ may be used uninitialized in this function
> [-Wmaybe-uninitialized]
>  3274 |   relpos += step;
>   |   ~~~^~~
> /home/krasiyan/pgsql/postgresql/src/backend/executor/nodeWindowAgg.c: In
> function ‘WinGetFuncArgInFrame’:
> /home/krasiyan/pgsql/postgresql/src/backend/executor/nodeWindowAgg.c:3531:10:
> warning: ‘step’ may be used uninitialized in this function
> [-Wmaybe-uninitialized]
>  3531 |   relpos += step;
>   |   ~~~^~~~~~~
>
>
>
> На пт, 8.01.2021 г. в 2:02 ч. Vik Fearing 
> написа:
>
>> On 1/1/21 10:21 PM, Zhihong Yu wrote:
>> > Krasiyan:
>> > Happy New Year.
>> >
>> > For WinGetFuncArgInPartition():
>> >
>> > +   if (target > 0)
>> > +   step = 1;
>> > +   else if (target < 0)
>> > +   step = -1;
>> > +   else
>> > +   step = 0;
>> >
>> > When would the last else statement execute ? Since the above code is
>> > for WINDOW_SEEK_CURRENT, I wonder why step should be 0.
>>
>> Hi.
>>
>> "lag(expr, 0) over w" is useless but valid.
>> --
>> Vik Fearing
>>
>


Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-11 Thread Zhihong Yu
Hi,
For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :

+   /* Assume original queries have hasModifyingCTE set correctly */
+   if (parsetree->hasModifyingCTE)
+   hasModifyingCTE = true;

Since hasModifyingCTE is false by the time the above is run, it can be
simplified as:
hasModifyingCTE = parsetree->hasModifyingCTE

+   if (!hasSubQuery)
+   return false;
+
+   return true;

The above can be simplified as:
return hasSubQuery;

Cheers

On Thu, Feb 11, 2021 at 7:02 PM Greg Nancarrow  wrote:

> On Thu, Feb 11, 2021 at 11:17 PM Greg Nancarrow 
> wrote:
> >
> > Posting an updated set of patches. Changes are based on feedback, as
> > detailed below:
> >
>
> Oops, looks like I forgot "COSTS OFF" on some added EXPLAINs in the
> tests, and it caused some test failures in the PostgreSQL Patch Tester
> (cfbot).
> Also, I think that perhaps the localized temporary fix included in the
> patch for the hasModifyingCTE bug should be restricted to INSERT, even
> though the bug actually exists for SELECT too.
> Posting an updated set of patches to address these.
>
> Regards,
> Greg Nancarrow
> Fujitsu Australia
>


Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-11 Thread Zhihong Yu
Greg:
bq. we should just return parsetree->hasModifyingCTE at this point,

Maybe you can clarify a bit.
The if (parsetree->hasModifyingCTE) check is followed by if
(!hasModifyingCTE).
When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be true,
resulting in the execution of the if (!hasModifyingCTE) block.

In your reply, did you mean that the if (!hasModifyingCTE) block is no
longer needed ? (I guess not)

Cheers

On Thu, Feb 11, 2021 at 8:14 PM Greg Nancarrow  wrote:

> On Fri, Feb 12, 2021 at 2:33 PM Zhihong Yu  wrote:
> >
> > For v17-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch :
> >
> > +   /* Assume original queries have hasModifyingCTE set correctly */
> > +   if (parsetree->hasModifyingCTE)
> > +   hasModifyingCTE = true;
> >
> > Since hasModifyingCTE is false by the time the above is run, it can be
> simplified as:
> > hasModifyingCTE = parsetree->hasModifyingCTE
> >
>
> Actually, we should just return parsetree->hasModifyingCTE at this
> point, because if it's false, we shouldn't need to continue the search
> (as we're assuming it has been set correctly for QSRC_ORIGINAL case).
>
> > +   if (!hasSubQuery)
> > +   return false;
> > +
> > +   return true;
> >
> > The above can be simplified as:
> > return hasSubQuery;
> >
>
> Yes, absolutely right, silly miss on that one!
> Thanks.
>
> This was only ever meant to be a temporary fix for this bug that
> affects this patch.
>
> Regards,
> Greg Nancarrow
> Fujitsu Australia
>


Re: Parallel INSERT (INTO ... SELECT ...)

2021-02-12 Thread Zhihong Yu
Greg:
Thanks for more debugging.

Cheers

On Thu, Feb 11, 2021 at 9:43 PM Greg Nancarrow  wrote:

> On Fri, Feb 12, 2021 at 3:21 PM Zhihong Yu  wrote:
> >
> > Greg:
> > bq. we should just return parsetree->hasModifyingCTE at this point,
> >
> > Maybe you can clarify a bit.
> > The if (parsetree->hasModifyingCTE) check is followed by if
> (!hasModifyingCTE).
> > When parsetree->hasModifyingCTE is false, !hasModifyingCTE would be
> true, resulting in the execution of the if (!hasModifyingCTE) block.
> >
> > In your reply, did you mean that the if (!hasModifyingCTE) block is no
> longer needed ? (I guess not)
> >
>
> Sorry for not making it clear. What I meant was that instead of:
>
> if (parsetree->querySource == QSRC_ORIGINAL)
> {
>   /* Assume original queries have hasModifyingCTE set correctly */
>   if (parsetree->hasModifyingCTE)
> hasModifyingCTE = true;
> }
>
> I thought I should be able to use the following (it the setting for
> QSRC_ORIGINAL can really be trusted):
>
> if (parsetree->querySource == QSRC_ORIGINAL)
> {
>   /* Assume original queries have hasModifyingCTE set correctly */
>   return parsetree->hasModifyingCTE;
> }
>
> (and then the "if (!hasModifyingCTE)" test on the code following
> immediately below it can be removed)
>
>
> BUT - after testing that change, the problem test case (in the "with"
> tests) STILL fails.
> I then checked if hasModifyingCTE is always false in the QSRC_ORIGINAL
> case (by adding an Assert), and it always is false.
> So actually, there is no point in having the "if
> (parsetree->querySource == QSRC_ORIGINAL)" code block - even the so
> called "original" query doesn't maintain the setting correctly (even
> though the actual original query sent into the query rewriter does).
> And also then the "if (!hasModifyingCTE)" test on the code following
> immediately below it can be removed.
>
> Regards,
> Greg Nancarrow
> Fujitsu Australia
>


Re: Why do we have MakeSingleTupleTableSlot instead of not using MakeTupleTableSlot?

2021-02-12 Thread Zhihong Yu
Hi,
MakeSingleTupleTableSlot can be defined as a macro, calling
MakeTupleTableSlot().

Cheers

On Fri, Feb 12, 2021 at 5:44 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> Hi,
>
> I wonder, is there a specific reason that MakeTupleTableSlot is
> wrapped up in MakeSingleTupleTableSlot without doing anything than
> just returning the slot created by MakeTupleTableSlot? Do we really
> need MakeSingleTupleTableSlot? Can't we just use MakeTupleTableSlot
> directly? Am I missing something?
>
> I think we can avoid some unnecessary function call costs, for
> instance when called 1000 times inside table_slot_create from
> copyfrom.c or in some other places where MakeSingleTupleTableSlot is
> called in a loop.
>
> If it's okay to remove MakeSingleTupleTableSlot and use
> MakeTupleTableSlot instead, we might have to change in a lot of
> places. If we don't want to change in those many files, we could
> rename MakeTupleTableSlot to MakeSingleTupleTableSlot and change it in
> only a few places.
>
> Thoughts?
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>


Re: Possible dereference null return (src/backend/replication/logical/reorderbuffer.c)

2021-02-12 Thread Zhihong Yu
Hi,
How about the following patch ?

ReorderBufferSetBaseSnapshot() can return a bool to indicate whether the
base snapshot is set up.

For the call by SnapBuildCommitTxn(), it seems xid is top transaction. So
the return value doesn't need to be checked.

Cheers

On Fri, Feb 12, 2021 at 6:40 PM Michael Paquier  wrote:

> On Fri, Feb 12, 2021 at 03:56:02PM +0900, Kyotaro Horiguchi wrote:
> > If the return from the first call is a subtransaction, the second call
> > always obtain the top transaction.  If the top transaction actualy did
> > not exist, it's rather the correct behavior to cause SEGV, than
> > creating a bogus rbtxn. THus it is wrong to set create=true and
> > create_as_top=true.  We could change the assertion like Assert (txn &&
> > txn->base_snapshot) to make things clearer.
>
> I don't see much the point to change this code.  The result would be
> the same: a PANIC at this location.
> --
> Michael
>


reorder-buffer-base-snapshot.patch
Description: Binary data


Re: Possible dereference null return (src/backend/replication/logical/reorderbuffer.c)

2021-02-13 Thread Zhihong Yu
Hi,
+   (errmsg("BaseSnapshot cant't be setup at point %X/%X",
+   (uint32) (lsn >> 32), (uint32) lsn),
+errdetail("Top transaction is running.")));

Did you mean this errdetail:

Top transaction is not running.

Cheers

On Sat, Feb 13, 2021 at 12:34 PM Ranier Vilela  wrote:

>
> Em sáb., 13 de fev. de 2021 às 01:07, Zhihong Yu 
> escreveu:
>
>> Hi,
>> How about the following patch ?
>>
>> ReorderBufferSetBaseSnapshot() can return a bool to indicate whether the
>> base snapshot is set up.
>>
>> For the call by SnapBuildCommitTxn(), it seems xid is top transaction. So
>> the return value doesn't need to be checked.
>>
> IMO anything else is better than PANIC.
> Anyway, if all fails, reporting an error can contribute to checking where.
>
> Attached a patch suggestion v2.
> 1. SnapBuildProcessChange returns a result of
> ReorderBufferSetBaseSnapshot, so the caller can act accordingly.
> 2. SnapBuildCommitTxn can't ignore a result
> from ReorderBufferSetBaseSnapshot, even if it never fails.
>
> regards,
> Ranier Vilela
>


Re: Possible dereference null return (src/backend/replication/logical/reorderbuffer.c)

2021-02-13 Thread Zhihong Yu
Hi,
Patch v4 corrects a small typo:
+   (errmsg("BaseSnapshot cant't be setup at point %X/%X",

Cheers

On Sat, Feb 13, 2021 at 12:58 PM Ranier Vilela  wrote:

> Em sáb., 13 de fev. de 2021 às 17:48, Zhihong Yu 
> escreveu:
>
>> Hi,
>> +   (errmsg("BaseSnapshot cant't be setup at point %X/%X",
>> +   (uint32) (lsn >> 32), (uint32) lsn),
>> +errdetail("Top transaction is running.")));
>>
>> Did you mean this errdetail:
>>
>> Top transaction is not running.
>>
> Done.
>
> Thanks Zhihong.
> v3 based on your patch, attached.
>
> regards,
> Ranier Vilela
>


reorder-buffer-v4.patch
Description: Binary data


Re: CREATE INDEX CONCURRENTLY on partitioned index

2021-02-15 Thread Zhihong Yu
Hi,
For v13-0006-More-refactoring.patch :

+   /* It's not a shared catalog, so refuse to move it to shared tablespace
*/
+   if (params->tablespaceOid == GLOBALTABLESPACE_OID && false)
+   ereport(ERROR,

Do you intend to remove the ineffective check ?

+   else
+   heapRelation = table_open(heapId,
+ ShareUpdateExclusiveLock);
+   table_close(heapRelation, NoLock);

The table_open() seems to be unnecessary since there is no check after the
open.

+   // heapRelationIds = list_make1_oid(heapId);
If the code is not needed, you can remove the above.

For v13-0005-Refactor-to-allow-reindexing-all-index-partition.patch :

+   /* Skip invalid indexes, if requested */
+   if ((options & REINDEXOPT_SKIPVALID) != 0 &&
+   get_index_isvalid(partoid))

The comment seems to diverge from the name of the flag (which says skip
valid index).

Cheers

On Mon, Feb 15, 2021 at 11:34 AM Justin Pryzby  wrote:

> On Mon, Feb 15, 2021 at 10:06:47PM +0300, Anastasia Lubennikova wrote:
> > On 28.01.2021 17:30, Justin Pryzby wrote:
> > > On Thu, Jan 28, 2021 at 09:51:51PM +0900, Masahiko Sawada wrote:
> > > > On Mon, Nov 30, 2020 at 5:22 AM Justin Pryzby 
> wrote:
> > > > > On Sat, Oct 31, 2020 at 01:31:17AM -0500, Justin Pryzby wrote:
> > > > > > Forking this thread, since the existing CFs have been closed.
> > > > > >
> https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd
> > > > > >
> > > > > > The strategy is to create catalog entries for all tables with
> indisvalid=false,
> > > > > > and then process them like REINDEX CONCURRENTLY.  If it's
> interrupted, it
> > > > > > leaves INVALID indexes, which can be cleaned up with DROP or
> REINDEX, same as
> > > > > > CIC on a plain table.
> > > > > >
> > > > > > On Sat, Aug 08, 2020 at 01:37:44AM -0500, Justin Pryzby wrote:
> > > > > > > On Mon, Jun 15, 2020 at 09:37:42PM +0900, Michael Paquier
> wrote:
> > > > > > > Note that the mentioned problem wasn't serious: there was
> missing index on
> > > > > > > child table, therefor the parent index was invalid, as
> intended.  However I
> > > > > > > agree that it's not nice that the command can fail so easily
> and leave behind
> > > > > > > some indexes created successfully and some failed some not
> created at all.
> > > > > > >
> > > > > > > But I took your advice initially creating invalid inds.
> > > > > > ...
> > > > > > > That gave me the idea to layer CIC on top of Reindex, since I
> think it does
> > > > > > > exactly what's needed.
> > > > > > On Sat, Sep 26, 2020 at 02:56:55PM -0500, Justin Pryzby wrote:
> > > > > > > On Thu, Sep 24, 2020 at 05:11:03PM +0900, Michael Paquier
> wrote:
> > > > > > > > It would be good also to check if
> > > > > > > > we have a partition index tree that maps partially with a
> partition
> > > > > > > > table tree (aka no all table partitions have a partition
> index), where
> > > > > > > > these don't get clustered because there is no index to work
> on.
> > > > > > > This should not happen, since a incomplete partitioned index
> is "invalid".
> >
> > > > > > > I had been waiting to rebase since there hasn't been any
> review comments and I
> > > > > > > expected additional, future conflicts.
> > > > > > >
> >
> > I attempted to review this feature, but the last patch conflicts with the
> > recent refactoring, so I wasn't able to test it properly.
> > Could you please send a new version?
>
> I rebased this yesterday, so here's my latest.
>
> > 2) Here we access relation field after closing the relation. Is it safe?
> > /* save lockrelid and locktag for below */
> > heaprelid = rel->rd_lockInfo.lockRelId;
>
> Thanks, fixed this just now.
>
> > 3) leaf_partitions() function only handles indexes, so I suggest to name
> it
> > more specifically and add a comment about meaning of 'options' parameter.
> >
> > 4) I don't quite understand the idea of the regression test. Why do we
> > expect to see invalid indexes there?
> > +"idxpart_a_idx1" UNIQUE, btree (a) INVALID
>
> Because of the unique failure:
> +create unique index concurrently on idxpart (a); -- partitioned, unique
> failure
> +ERROR:  could not create unique index "idxpart2_a_idx2_ccnew"
> +DETAIL:  Key (a)=(10) is duplicated.
> +\d idxpart
>
> This shows that CIC first creates catalog-only INVALID indexes, and then
> reindexes them to "validate".
>
> --
> Justin
>


Re: PATCH: Batch/pipelining support for libpq

2021-02-16 Thread Zhihong Yu
Hi,
+   if (querymode == QUERY_SIMPLE)
+   {
+   commandFailed(st, "startpipeline", "cannot use pipeline mode
with the simple query protocol");
+   st->state = CSTATE_ABORTED;
+   return CSTATE_ABORTED;

I wonder why the st->state is only assigned for this if block. The state is
not set for other cases where CSTATE_ABORTED is returned.

Should PQ_PIPELINE_OFF be returned for the following case ?

+PQpipelineStatus(const PGconn *conn)
+{
+   if (!conn)
+   return false;

Cheers

On Tue, Feb 16, 2021 at 3:14 PM Alvaro Herrera 
wrote:

> Here's a new version, where I've renamed everything to "pipeline".  I
> think the docs could use some additional tweaks now in order to make a
> coherent story on pipeline mode, how it can be used in a batched
> fashion, etc.
>
> Here's the renames I applied.  It's mostly mechanical, except
> PQbatchSendQueue is now PQsendPipeline:
>
> PQBatchStatus -> PGpipelineStatus (enum)
> PQBATCH_MODE_OFF -> PQ_PIPELINE_OFF
> PQBATCH_MODE_ON -> PQ_PIPELINE_ON
> PQBATCH_MODE_ABORTED -> PQ_PIPELINE_ABORTED
> PQbatchStatus -> PQpipelineStatus (function)
> PQenterBatchMode -> PQenterPipelineMode
> PQexitBatchMode -> PQexitPipelineMode
> PQbatchSendQueue -> PQsendPipeline
> PGRES_BATCH_END -> PGRES_PIPELINE_END
> PGRES_BATCH_ABORTED -> PGRES_PIPELINE_ABORTED
>
> Also, PQbatchStatus(conn) returns enum PGpipelineStatus (it previously
> returned int).
>
> I'm tempted to rename PGASYNC_QUEUED to PGASYNC_PIPELINE_IDLE (not sure
> if PGASYNC_PIPELINE_READY fits better with the existing one).
>
>
> In pgbench, I changed the metacommands to be \startpipeline and
> \endpipeline.  There's a failing Assert() there which I commented out;
> needs fixed.
>
> --
> Álvaro Herrera39°49'30"S 73°17'W
>


Re: PoC Refactor AM analyse API

2021-02-18 Thread Zhihong Yu
Hi,

+   *totalrows = floor((liverows / bs.m) * totalblocks + 0.5);

Is the above equivalent to:

+   *totalrows = ceil((liverows / bs.m) * totalblocks);

For compare_rows(), it seems the computation of oa and ob can be delayed to
when ba == bb (after the first two if statements).

Cheers

On Thu, Feb 18, 2021 at 6:06 PM Denis Smirnov  wrote:

> Thanks for your review, Heikki.
>
> I have made the changes you have requested.
>
> 1. All modifications interconnected with column projection were reverted
> (they should be added in https://commitfest.postgresql.org/31/2922 if the
> current patch would be merged one day).
> 2. I have returned PROGRESS_ANALYZE_* states.
> 3. qsort() was moved into heapam_acquire_sample_rows(). Also a comment was
> added, that the acquire_sample_rows() AM function must return the tuples in
> a physical table order.
> 4. table_beginscan_analyze() was removed as a redundant function.
> 5. acquire_sample_rows() comment about reservoir was changed.
>
>
> Best regards,
> Denis Smirnov | Developer
> s...@arenadata.io
> Arenadata | Godovikova 9-17, Moscow 129085 Russia
>
>


Re: PoC Refactor AM analyse API

2021-02-19 Thread Zhihong Yu
Denis:
Thanks for considering my suggestion.

For #1, I didn't take your example into account. Thanks for pointing that
out.

Cheers

On Thu, Feb 18, 2021 at 11:59 PM Denis Smirnov  wrote:

> Hello,  Zhihong.
>
> Thanks for your comments.
>
> 1. I am afraid that an equivalence of "floor(val + 0.5)" to "cell(val)" is
> incorrect: "floor(2.1 + 0.5)" returns 2 while  "cell(2.1)" returns 3. We
> can’t use such replacement, as you have suggested.
>
> 2. >> For compare_rows(), it seems the computation of oa and ob can be
> delayed to when ba == bb (after the first two if statements).
> I have checked some examples of ASM code generated by different compilers
> with -O2/O3 flags on https://gcc.godbolt.org and didn’t see any big
> benefit in result CPU instructions. You can check yourself an attached
> example below.
>
>
>
> Best regards,
> Denis Smirnov | Developer
> s...@arenadata.io
> Arenadata | Godovikova 9-17, Moscow 129085 Russia
>
>


Re: PATCH: Batch/pipelining support for libpq

2021-02-19 Thread Zhihong Yu
Hi,
+static int pqBatchProcessQueue(PGconn *conn);

I was suggesting changing:

+int
+PQexitBatchMode(PGconn *conn)

to:

+static int
+PQexitBatchMode(PGconn *conn)

Cheers

On Fri, Feb 19, 2021 at 10:43 AM Alvaro Herrera 
wrote:

> On 2021-Jan-21, Zhihong Yu wrote:
>
> > It seems '\\gset or \\aset is not ' would correspond to the check more
> > closely.
> >
> > +   if (my_command->argc != 1)
> > +   syntax_error(source, lineno, my_command->first_line,
> > my_command->argv[0],
> >
> > It is possible that my_command->argc == 0 (where my_command->argv[0]
> > shouldn't be accessed) ?
>
> No -- process_backslash_command reads the word and sets it as argv[0].
>
> > +   appendPQExpBufferStr(&conn->errorMessage,
> > + libpq_gettext("cannot queue commands
> > during COPY\n"));
> > +   return false;
> > +   break;
> >
> > Is the break necessary ? Similar comment for pqBatchProcessQueue().
>
> Not necessary.  I removed them.
>
> > +int
> > +PQexitBatchMode(PGconn *conn)
> >
> > Since either 0 or 1 is returned, maybe change the return type to bool.
>
> I was kinda tempted to do that, but in reality a lot of libpq's API is
> defined like that -- to return 0 (failure)/1 (success) as ints, not
> bools.  For example see PQsendQuery().  I'm not inclined to do different
> for these new functions.  (One curious case is PQsetvalue, which returns
> int, and is documented as "returns zero for failure" and then uses
> "return false").
>
> > Also, the function operates on PGconn - should the function be static
> > (pqBatchProcessQueue is) ?
>
> I don't understand this suggestion.  How would an application call it,
> if we make it static?
>
> Thanks
>
> --
> Álvaro Herrera   Valdivia, Chile
> "Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad;
> jugar
> al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
> La Feria de las Tinieblas, R. Bradbury)
>


Re: PATCH: Batch/pipelining support for libpq

2021-02-19 Thread Zhihong Yu
Hi,
Thanks for the response.

On Fri, Feb 19, 2021 at 12:46 PM Alvaro Herrera 
wrote:

> Hi,
>
> On 2021-Feb-19, Zhihong Yu wrote:
>
> > Hi,
> > +static int pqBatchProcessQueue(PGconn *conn);
> >
> > I was suggesting changing:
> >
> > +int
> > +PQexitBatchMode(PGconn *conn)
> >
> > to:
> >
> > +static int
> > +PQexitBatchMode(PGconn *conn)
>
> I understand that, but what I'm saying is that it doesn't work.
> pqBatchProcessQueue can be static because it's only used internally in
> libpq, but PQexitBatchMode is supposed to be called by the client
> application.
>
> --
> Álvaro Herrera   Valdivia, Chile
> "No necesitamos banderas
>  No reconocemos fronteras"  (Jorge González)
>


Re: New Table Access Methods for Multi and Single Inserts

2021-02-20 Thread Zhihong Yu
Hi,
bq. case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes

Is there some other column(s) per row apart from the integer columns ?
Since the 2 integer columns only occupy 8 bytes. I wonder where the other
32-8=24 bytes come from.

Thanks

On Fri, Feb 19, 2021 at 9:45 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Wed, Feb 17, 2021 at 12:46 PM Bharath Rupireddy
>  wrote:
> > Hi,
> >
> > I addressed the following review comments and attaching v3 patch set.
> >
> > 1) ExecClearTuple happens before ExecCopySlot in heap_multi_insert_v2
> > and this allowed us to remove clear_mi_slots flag from
> > TableInsertState.
> > 2) I retained the flushed variable inside TableInsertState so that the
> > callers can know whether the buffered slots have been flushed. If yes,
> > the callers can execute after insert row triggers or perform index
> > insertions. This is easier than passing the after insert row triggers
> > info and index info to new multi insert table am and let it do. This
> > way the functionalities can be kept separate i.e. multi insert ams do
> > only buffering, decisions on when to flush, insertions and the callers
> > will execute triggers or index insertions. And also none of the
> > existing table ams are performing these operations within them, so
> > this is inline with the current design of the table ams.
> > 3) I have kept the single and multi insert API separate. The previous
> > suggestion was to have only a single insert API and let the callers
> > provide initially whether they want multi or single inserts. One
> > problem with that approach is that we have to allow table ams to
> > execute the after row triggers or index insertions. That is something
> > I personally don't like.
> >
> > 0001 - new table ams implementation
> > 0002 - the new multi table ams used in CREATE TABLE AS and REFRESH
> > MATERIALIZED VIEW
> > 0003 - the new multi table ams used in COPY
> >
> > Please review the v3 patch set further.
>
> Below is the performance gain measured for CREATE TABLE AS with the
> new multi insert am propsed in this thread:
>
> case 1 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
> 59 bytes, 100mn tuples
> on master - 185sec
> on master with multi inserts - 121sec, gain - 1.52X
>
> case 2 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
> columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
> on master - 367sec
> on master with multi inserts - 291sec, gain - 1.26X
>
> case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn
> tuples
> on master - 130sec
> on master with multi inserts - 105sec, gain - 1.23X
>
> case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
> columns, tuple size 1064 bytes, 10mn tuples
> on master - 120sec
> on master with multi inserts - 115sec, gain - 1.04X
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>


Re: [PATCH] refactor ATExec{En,Dis}ableRowSecurity

2021-02-28 Thread Zhihong Yu
Hi,
For 0002-Further-refactoring.patch, should there be assertion
inside ATExecSetRowSecurity() on the values for rls and force_rls ?
There could be 3 possible values: -1, 0 and 1.

Cheers

On Sun, Feb 28, 2021 at 1:19 PM Justin Pryzby  wrote:

> tablecmds.c is 17k lines long, this makes it ~30 lines shorter.
>


Re: Table AM modifications to accept column projection lists

2021-03-02 Thread Zhihong Yu
Hi,

+   /* Make sure the the new slot is not dependent on the original tuple */

There is duplicate 'the'.

For neededColumnContextWalker(),

+   else if(var->varattno == 0) {

I think the if following the else is not needed - I assume var->varattno
wouldn't be negative.
Similar comment for extract_scan_columns().

+   while ((col = bms_next_member(parent_cols, col)) >= 0)
+   {
+   Var *var = (Var *) list_nth(translated_vars, col - 1);

If col is 0, do you still want to call list_nth() ?

Cheers

On Tue, Mar 2, 2021 at 9:10 AM Jacob Champion  wrote:

> On Mon, 2021-03-01 at 23:13 +, Jacob Champion wrote:
> > On Mon, 2021-03-01 at 16:59 -0600, Justin Pryzby wrote:
> > > Why is this removed ?
> >
> > Mm, both of those analyze.c changes seem suspect. Possibly a mismerge
> > from the zedstore branch; let me double-check.
> >
> > > its (possessive) not it's ("it is")
> >
> > Thanks, I'll fix this at the same time.
>
> Both fixed in v3; thanks for the catch!
>
> --Jacob
>


Re: [POC] Fast COPY FROM command for the table with foreign partitions

2021-03-03 Thread Zhihong Yu
Hi,

This feature enables bulk COPY into foreign table in the case of
multi inserts is possible

'is possible' -> 'if possible'

FDWAPI was extended by next routines:

next routines -> the following routines

For postgresExecForeignCopy():

+   if ((!OK && PQresultStatus(res) != PGRES_FATAL_ERROR) ||

Is PGRES_FATAL_ERROR handled somewhere else ? I don't seem to find that in
the patch.

Cheers

On Wed, Mar 3, 2021 at 6:24 PM tsunakawa.ta...@fujitsu.com <
tsunakawa.ta...@fujitsu.com> wrote:

> From: Justin Pryzby 
> > Find attached some language fixes.
>
> Thanks a lot!  (I wish there will be some tool like "pgEnglish" that
> corrects English in code comments and docs.)
>
>
> > |/* Do this to ensure we've pumped libpq back to idle state */
> >
> > I don't know why you mean by "pumped"?
>
> I changed it to "have not gotten extra results" to match the error message.
>
>
> > The CopySendEndOfRow "case COPY_CALLBACK:" should have a "break;"
>
> Added.
>
> > This touches some of the same parts as my "bulk insert" patch:
> > https://commitfest.postgresql.org/32/2553/
>
> My colleague will be reviewing it.
>
>
> Regards
> Takayuki Tsunakawa
>
>


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2021-03-05 Thread Zhihong Yu
Hi,

+  \gset and \aset cannot be used
+  pipeline mode, since query results are not immediately

'used pipeline mode' -> 'used in pipeline mode'

--- /dev/null
+++ b/src/test/modules/libpq_pipeline/libpq_pipeline.c
@@ -0,0 +1,1144 @@
+/*
+ * src/test/modules/libpq_pipeline/libpq_pipeline.c

Looks like license information is missing from the header.

Cheers

On Thu, Mar 4, 2021 at 1:40 PM Alvaro Herrera 
wrote:

> On 2021-Mar-04, Alvaro Herrera wrote:
>
> > I don't know where do __WSAFDIsSet and __imp_select come from or what to
> > do about them.  Let's see if adding pgport and pgcommon fixes things.
>
> Indeed all those other problems were fixed and these remain.  New
> failure is:
>
> "C:\projects\postgresql\pgsql.sln" (default target) (1) ->
> 6007"C:\projects\postgresql\libpq_pipeline.vcxproj" (default target) (55)
> ->
> 6008(Link target) ->
> 6009  libpq_pipeline.obj : error LNK2019: unresolved external symbol
> __WSAFDIsSet referenced in function test_pipelined_insert
> [C:\projects\postgresql\libpq_pipeline.vcxproj]
> 6010  libpq_pipeline.obj : error LNK2019: unresolved external symbol
> __imp_select referenced in function test_pipelined_insert
> [C:\projects\postgresql\libpq_pipeline.vcxproj]
> 6011  .\Release\libpq_pipeline\libpq_pipeline.exe : fatal error LNK1120: 2
> unresolved externals [C:\projects\postgresql\libpq_pipeline.vcxproj]
>
> I did notice that isolationtester.c is using select(), and one
> difference is that it includes  which libpq_pipeline.c
> does not -- and it also pulls in ws2_32.lib.  Let's see if those two
> changes fix things.
>
> --
> Álvaro Herrera   Valdivia, Chile
>


Re: Add client connection check during the execution of the query

2021-03-05 Thread Zhihong Yu
For v4-0002-some-fixups.patch :

+   if (client_connection_check_interval > 0)
+   enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,

+   /* Start timeout for checking if the client has gone away if necessary.
*/
+   if (client_connection_check_interval)

It would be better if the second if condition is aligned with that of the
first (> 0).

For v4-0001-Detect-dropped-connections-while-running-queries.patch :

+Sets a time interval, in milliseconds, between periodic

I wonder if the interval should be expressed in seconds. Since the
description says: while running very long queries.

Cheers

On Fri, Mar 5, 2021 at 8:07 PM Thomas Munro  wrote:

> On Mon, Mar 1, 2021 at 6:18 PM Thomas Munro 
> wrote:
> > I've done a quick rebase of this the patch and added it to the
> > commitfest.  No other changes.  Several things were mentioned earlier
> > that still need to be tidied up.
>
> Rebased again due to bitrot.  This time I did some actual work:
>
> 1.  I didn't like the way it was rearming the timer *in the timer
> handler*; I think it should be done in the CFI(), and only if it
> determines that you're still running a query (otherwise you'll get
> periodic wakeups while you're idle between quieries, which is bad for
> the arctic ice cap; we already handle client going away efficiently
> between queries with WaitEventSet socket readiness).
> 2.  The timer handler surely has to set the latch to close a race (cf.
> other similar handlers; between the CFI() and the beginning of the
> sleep, you could handle the signal, set the flag, and then go to sleep
> for 100 years).
> 3.  The test might as well use pg_sleep() instead of doing a plpgsql
> busy loop of SELECT queries.
> 4.  I prefer the name CLIENT_CONNECTION_CHECK_TIMEOUT instead of
> SKIP_CLIENT_CHECK_TIMEOUT; let's make up only one new name for a
> concept instead of two.
> 5.  Miniscule doc change.
>
> I put these into a separate patch for ease of review.  I don't claim
> this is ready -- still needs more testing etc -- but it seems to be
> generating the right system calls at the right times now.
>


Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-06 Thread Zhihong Yu
Hi,
Does CATALOG_VERSION_NO need to be bumped (introduction of
partitionOids field) ?

Cheers

On Sat, Mar 6, 2021 at 2:19 AM Amit Kapila  wrote:

> On Fri, Mar 5, 2021 at 6:34 PM Greg Nancarrow  wrote:
> >
> > For the time being at least, I am posting an updated set of patches,
> > as I found that the additional parallel-safety checks on DOMAIN check
> > constraints to be somewhat inefficient and could also be better
> > integrated into max_parallel_hazard(). I also updated the basic tests
> > with a test case for this.
> >
>
> Thanks, your changes look good to me. I went ahead and changed the
> patch to track the partitionOids once rather than in two separate
> lists and use that list in PlanCacheRelCallback to invalidate the
> plans. I made few other changes:
> a. don't need to retain the lock on indexes as we can't change index
> expressions
> b. added/updated comments at few places in the code.
> c. made a few other cosmetic changes
> d. ran pgindent
> e. slightly modify the commit message.
> f. combine the code and test case patch
>
> For now, I have left 0005 and 0006 patches, we can come back to those
> once we are done with the first set of patches. The first patch looks
> good to me and I think we can commit it and then bikeshed about
> GUC/reloption patch.
>
>
>
> --
> With Regards,
> Amit Kapila.
>


Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-06 Thread Zhihong Yu
I was looking at src/backend/nodes/readfuncs.c

READ_NODE_FIELD(relationOids);
+   READ_NODE_FIELD(partitionOids);

READ_NODE_FIELD would call nodeRead() for partitionOids. However, such
field may not exist.
Since there is no 'if (strncmp(":partitionOids", token, length) == 0) {'
check, I was curious whether CATALOG_VERSION_NO should be bumped.

Cheers

On Sat, Mar 6, 2021 at 6:31 PM Amit Kapila  wrote:

> On Sat, Mar 6, 2021 at 9:13 PM Zhihong Yu  wrote:
> >
> > Hi,
> > Does CATALOG_VERSION_NO need to be bumped (introduction of partitionOids
> field) ?
> >
>
> Good question. I usually update CATALOG_VERSION_NO when the patch
> changes any of the system catalogs. This is what is also mentioned in
> catversion.h. See the following text in that file: "The catalog
> version number is used to flag incompatible changes in the PostgreSQL
> system catalogs.  Whenever anyone changes the format of a system
> catalog relation, or adds, deletes, or modifies standard
> catalog entries in such a way that an updated backend wouldn't work
> with an old database (or vice versa), the catalog version number
> should be changed.".
>
> I might be missing something here but why you think that due to
> partitionOids field (in plannedstmt or at another place) we need to
> update CATALOG_VERSION_NO?
>
> Anyway, thanks for bringing this up.
>
> --
> With Regards,
> Amit Kapila.
>


Re: Parallel INSERT (INTO ... SELECT ...)

2021-03-06 Thread Zhihong Yu
For cfffe83ba82021a1819a656e7ec5c28fb3a99152, if a bool was written (true |
false), READ_INT_FIELD calls atoi() where atoi("true") returns 0
and atoi("false") returns 0 as well.

I am not sure if the new release containing these commits had a higher cat
version compared to the previous release.
If the new releases did have a higher cat version, I guess there was no
issue, by chance.

Cheers

On Sat, Mar 6, 2021 at 8:12 PM Amit Kapila  wrote:

> On Sun, Mar 7, 2021 at 8:24 AM Zhihong Yu  wrote:
> >
> > I was looking at src/backend/nodes/readfuncs.c
> >
> > READ_NODE_FIELD(relationOids);
> > +   READ_NODE_FIELD(partitionOids);
> >
> > READ_NODE_FIELD would call nodeRead() for partitionOids. However, such
> field may not exist.
> > Since there is no 'if (strncmp(":partitionOids", token, length) == 0) {'
> check, I was curious whether CATALOG_VERSION_NO should be bumped.
> >
>
> Won't that be true only if we are reading the stored planned stmt from
> disk? But is it possible in any way? The last two commits in this
> function (cfffe83b and 45639a05) also didn't bump the catversion.
>
> --
> With Regards,
> Amit Kapila.
>


Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW

2021-03-07 Thread Zhihong Yu
Hi,

+* EXPLAIN ANALYZE CREATE TABLE AS or REFRESH MATERIALIZED VIEW
+* WITH NO DATA is weird.

Maybe it is clearer to spell out WITH NO DATA for both statements, instead
of sharing it.

-   if (!stmt->skipData)
+   if (!stmt->skipData && !explainInfo)
...
+   else if (explainInfo)

It would be cleaner to put the 'if (explainInfo)' as the first check. That
way, the check for skipData can be simplified.

Cheers



On Sun, Mar 7, 2021 at 1:34 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Sun, Mar 7, 2021 at 12:13 PM Japin Li  wrote:
> >
> > On Sun, 07 Mar 2021 at 14:25, Bharath Rupireddy <
> bharath.rupireddyforpostg...@gmail.com> wrote:
> > > On Sun, Mar 7, 2021 at 11:49 AM Japin Li  wrote:
> > >>
> > >> On Fri, 05 Mar 2021 at 19:48, Bharath Rupireddy <
> bharath.rupireddyforpostg...@gmail.com> wrote:
> > >> > Attaching v5 patch set for further review.
> > >> >
> > >>
> > >> The v5 patch looks good to me, if there is no objection, I'll change
> the
> > >> cf status to "Ready for Committer" in few days.
> > >
> > > Thanks for the review.
> > >
> > > As I mentioned upthread, I have 2 open points:
> > > 1) In the patch I have added a new mat view info parameter to
> > > ExplainOneQuery(), do we also need to add it to
> > > ExplainOneQuery_hook_type? IMO, we should not (for now), because this
> > > would create a backward compatibility issue.
> >
> > Sorry, I do not know how PostgreSQL handle the backward compatibility
> issue.
> > Is there a guideline?
>
> I'm not aware of any guidelines as such, but we usually avoid any
> changes to existing API, adding/making changes to system catalogs and
> so on.
>
> > > 2) Do we document (under respective command pages or somewhere else)
> > > that we allow explain/explain analyze for a command?
> > >
> >
> > IMO, we can add a new page to list the commands that can be
> explain/explain analyze,
> > since it's clear for users.
>
> We are listing all the supported commands in explain.sgml, so added
> the CREATE MATERIALIZED VIEW(it's missing even though it's supported
> prior to this patch) and REFRESH MATERIALIZED VIEW there.
>
> Attaching v6 patch set. Please have a look.
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: "has_column_privilege()" issue with attnums and non-existent columns

2021-03-07 Thread Zhihong Yu
Joe:
I don't seem to find attachment.

Maybe attach again ?

Thanks

On Sun, Mar 7, 2021 at 11:12 AM Joe Conway  wrote:

> On 3/3/21 9:43 AM, Joe Conway wrote:
> > On 3/3/21 8:50 AM, David Steele wrote:
> >> On 1/29/21 4:56 AM, Joe Conway wrote:
> >>> On 1/29/21 12:13 AM, Ian Lawrence Barwick wrote:
>  2021年1月28日(木) 17:18 Peter Eisentraut:
>   I'm not convinced the current behavior is wrong.  Is there some
>   practical use case that is affected by this behavior?
> 
>  I was poking around at the function with a view to using it for
> something and was
>  curious what it did with bad input.
> 
>  Providing the column name of a dropped column:
> 
>   Me: "Hey Postgres, do I have privileges on the dropped column
> 'bar' of my
>  table 'foo'?"
>   Pg: "That column doesn't even exist - here, have an error
> instead."
>   Me: "Hey Postgres, does some other less-privileged user have
> privileges on the
>    dropped column 'bar' of my table 'foo'?
>   Pg: "That column doesn't even exist - here, have an error
> instead."
> 
>  Providing the attnum of a dropped column:
> 
>   Me: "Hey Postgres, here's the attnum of the dropped column
> 'bar', does some
>    other less-privileged user have privileges on that column?"
>   Pg: "That column doesn't even exist - here, have a NULL".
>   Me: "Hey Postgres, here's the attnum of the dropped column 'bar'
> on this table
>    I own, do I have privileges on that column?"
>   Pg: "Yup. And feel free to throw any other smallint at me, I'll
> pretend that
>    represents a column too even if it never existed.".
> 
>  Looking at the code, particularly the cited comment, it does seems
> the intent was
>  to return NULL in all cases where an invalid attnum was provided, but
> that gets
>  short-circuited by the assumption table owner = has privilege on any
> column.
> >>>
> >>> Nicely illustrated :-)
> >>>
>  Not the most urgent or exciting of issues, but seems inconsistent to
> me.
> >>>
> >>> +1
> >>
> >> Peter, did Ian's explanation answer your concerns?
> >>
> >> Joe (or Peter), any interest in reviewing/committing this patch?
> >
> > Sure, I'll take a look
>
> Based on Tom's commit here:
>
>   https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3d0f68dd3
>
> it appears to me that the intent is to return NULL.
>
> However I was not to happy with the way the submitted patch added an
> argument to
> column_privilege_check(). It seems to me that it is better to just reorder
> the
> checks in column_privilege_check() a bit, as in the attached.
>
> I wasn't entirely sure it was ok to split up the check for dropped
> attribute and
> pg_attribute_aclcheck like I did, but when I tested the race condition (by
> pausing at pg_attribute_aclcheck and dropping the column in another
> session)
> everything seemed to work fine.
>
> Any comments?
>
> Thanks,
>
> Joe
>
> --
> Crunchy Data - http://crunchydata.com
> PostgreSQL Support for Secure Enterprises
> Consulting, Training, & Open Source Development
>
>
>


Re: A qsort template

2021-03-13 Thread Zhihong Yu
Hi,

For 0001-Add-bsearch-and-unique-templates-to-sort_template.h.patch :

+ * Remove duplicates from an array.  Return the new size.
+ */
+ST_SCOPE size_t
+ST_UNIQUE(ST_ELEMENT_TYPE *array,

The array is supposed to be sorted, right ?
The comment should mention this.

Cheers

On Sat, Mar 13, 2021 at 6:36 PM Thomas Munro  wrote:

> On Sat, Mar 13, 2021 at 3:49 PM Thomas Munro 
> wrote:
> > On Fri, Mar 12, 2021 at 7:58 AM Andres Freund 
> wrote:
> > > I wish we had the same for bsearch... :)
> >
> > Glibc already has the definition of the traditional void-based
> > function in /usr/include/bits/stdlib-bsearch.h, so the generated code
> > when the compiler can see the comparator definition is already good in
> > eg lazy_tid_reaped() and eg some nbtree search routines.  We could
> > probably expose more trivial comparators in headers to get more of
> > that, and we could perhaps put our own bsearch definition in a header
> > for other platforms that didn't think of that...
> >
> > It might be worth doing type-safe macro templates as well, though (as
> > I already did in an earlier proposal[1]), just to have nice type safe
> > code though, not sure, I'm thinking about that...
>
> I remembered a very good reason to do this: the ability to do
> branch-free comparators in more places by introducing optional wider
> results.  That's good for TIDs (needs 49 bits), and places that want
> to "reverse" a traditional comparator (just doing -result on an int
> comparator that might theoretically return INT_MIN requires at least
> 33 bits).  So I rebased the relevant parts of my earlier version, and
> went through and wrote a bunch of examples to demonstrate all this
> stuff actually working.
>
> There are two categories of change in these patches:
>
> 0002-0005: Places that sort/unique/search OIDs, BlockNumbers and TIDs,
> which can reuse a small set of typed functions (a few more could be
> added, if useful).  See sortitemptr.h and sortscalar.h.  Mostly this
> is just a notational improvement, and an excuse to drop a bunch of
> duplicated code.  In a few places this might really speed something
> important up!  Like VACUUM's lazy_tid_reaped().
>
> 0006-0009.  Places where a specialised function is generated for one
> special purpose, such as ANALYZE's HeapTuple sort, tidbitmap.c's
> pagetable sort,  some places in nbtree code etc.  These may require
> some case-by-case research on whether the extra executable size is
> worth the speedup, and there are surely more opportunities like that;
> I just picked on these arbitrarily.
>


Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.

2021-03-16 Thread Zhihong Yu
Hi,

+ * Simple signal handler for processes HAVE NOT yet touched or DO NOT

I think there should be a 'which' between processes and HAVE. It seems the
words in Capital letters should be in lower case.

+ * Simple signal handler for processes have touched shared memory to
+ * exit quickly.

Add 'which' between processes and have.

unlink(fname);
+
+   elog(DEBUG2, "removing stats file \"%s\"", fname);

It seems the return value from unlink should be checked and reflected in
the debug message.

Thanks

On Tue, Mar 16, 2021 at 4:09 PM Masahiro Ikeda 
wrote:

> On 2021-03-16 13:44, kuroda.hay...@fujitsu.com wrote:
> > Dear Ikeda-san
> >
> > I think the idea is good.
> >
> > I read the patch and other sources, and I found
> > process_startup_packet_die also execute _exit(1).
> > I think they can be combined into one function and moved to
> > interrupt.c, but
> > some important comments might be removed. How do you think?
>
> Hi, Kuroda-san.
> Thanks for your comments.
>
> I agreed that your idea.
> I combined them into one function and moved the comments to
> the calling function side.
> (v2-0001-pgstat_avoid_writing_on_sigquit.patch)
>
> Regards,
> --
> Masahiro Ikeda
> NTT DATA CORPORATION


Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-14 Thread Zhihong Yu
Hi,
For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :

+   boolhave_notwophase = false;

Maybe name the variable have_no_twophase so that it is easier to read.

+* Two-phase commit is not required if the number of servers performed

performed -> performing

+errmsg("cannot process a distributed transaction that has
operated on a foreign server that does not support two-phase commit
protocol"),
+errdetail("foreign_twophase_commit is \'required\' but the
transaction has some foreign servers which are not capable of two-phase
commit")));

The lines are really long. Please wrap into more lines.



On Wed, Jan 13, 2021 at 9:50 PM Masahiko Sawada 
wrote:

> On Thu, Jan 7, 2021 at 11:44 AM Zhihong Yu  wrote:
> >
> > Hi,
>
> Thank you for reviewing the patch!
>
> > For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :
> >
> > However these functions are not neither committed nor aborted at
> >
> > I think the double negation was not intentional. Should be 'are neither
> ...'
>
> Fixed.
>
> >
> > For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the
> return statement ?
>
> Hmm, you mean that we need MAXALIGN(size) after adding the size of
> FdwXactData structs?
>
> Size
> FdwXactShmemSize(void)
> {
> Sizesize;
>
> /* Size for foreign transaction information array */
> size = offsetof(FdwXactCtlData, fdwxacts);
> size = add_size(size, mul_size(max_prepared_foreign_xacts,
>sizeof(FdwXact)));
> size = MAXALIGN(size);
> size = add_size(size, mul_size(max_prepared_foreign_xacts,
>sizeof(FdwXactData)));
>
> return size;
> }
>
> I don't think we need to do that. Looking at other similar code such
> as TwoPhaseShmemSize() doesn't do that. Why do you think we need that?
>
> >
> > +   fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
> >
> > For the function name, Fdw and Xact appear twice, each. Maybe one of
> them can be dropped ?
>
> Agreed. Changed to FdwXactInsertEntry().
>
> >
> > +* we don't need to anything for this participant because all
> foreign
> >
> > 'need to' -> 'need to do'
>
> Fixed.
>
> >
> > +   else if (TransactionIdDidAbort(xid))
> > +   return FDWXACT_STATUS_ABORTING;
> > +
> > the 'else' can be omitted since the preceding if would return.
>
> Fixed.
>
> >
> > +   if (max_prepared_foreign_xacts <= 0)
> >
> > I wonder when the value for max_prepared_foreign_xacts would be negative
> (and whether that should be considered an error).
> >
>
> Fixed to (max_prepared_foreign_xacts == 0)
>
> Attached the updated version patch set.
>
> Regards,
>
> --
> Masahiko Sawada
> EnterpriseDB:  https://www.enterprisedb.com/
>


Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-14 Thread Zhihong Yu
For v32-0002-postgres_fdw-supports-commit-and-rollback-APIs.patch :

+   entry->changing_xact_state = true;
...
+   entry->changing_xact_state = abort_cleanup_failure;

I don't see return statement in between the two assignments. I wonder
why entry->changing_xact_state is set to true, and later being assigned
again.

For v32-0007-Introduce-foreign-transaction-launcher-and-resol.patch :

bq. This commits introduces to new background processes: foreign

commits introduces to new -> commit introduces two new

+FdwXactExistsXid(TransactionId xid)

Since Xid is the parameter to this method, I think the Xid suffix can be
dropped from the method name.

+ * Portions Copyright (c) 2020, PostgreSQL Global Development Group

Please correct year in the next patch set.

+FdwXactLauncherRequestToLaunch(void)

Since the launcher's job is to 'launch', I think the Launcher can be
omitted from the method name.

+/* Report shared memory space needed by FdwXactRsoverShmemInit */
+Size
+FdwXactRslvShmemSize(void)

Are both Rsover and Rslv referring to resolver ? It would be better to use
whole word which reduces confusion.
Plus, FdwXactRsoverShmemInit should be FdwXactRslvShmemInit (or
FdwXactResolveShmemInit)

+fdwxact_launch_resolver(Oid dbid)

The above method is not in camel case. It would be better if method names
are consistent (in casing).

+errmsg("out of foreign transaction resolver slots"),
+errhint("You might need to increase
max_foreign_transaction_resolvers.")));

It would be nice to include the value of max_foreign_xact_resolvers

For fdwxact_resolver_onexit():

+   LWLockAcquire(FdwXactLock, LW_EXCLUSIVE);
+   fdwxact->locking_backend = InvalidBackendId;
+   LWLockRelease(FdwXactLock);

There is no call to method inside the for loop which may take time. I
wonder if the lock can be obtained prior to the for loop and released
coming out of the for loop.

+FXRslvLoop(void)

Please use Resolver instead of Rslv

+   FdwXactResolveFdwXacts(held_fdwxacts, nheld);

Fdw and Xact are repeated twice each in the method name. Probably the
method name can be made shorter.

Cheers

On Thu, Jan 14, 2021 at 11:04 AM Zhihong Yu  wrote:

> Hi,
> For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :
>
> +   boolhave_notwophase = false;
>
> Maybe name the variable have_no_twophase so that it is easier to read.
>
> +* Two-phase commit is not required if the number of servers performed
>
> performed -> performing
>
> +errmsg("cannot process a distributed transaction that has
> operated on a foreign server that does not support two-phase commit
> protocol"),
> +errdetail("foreign_twophase_commit is \'required\' but
> the transaction has some foreign servers which are not capable of two-phase
> commit")));
>
> The lines are really long. Please wrap into more lines.
>
>
>
> On Wed, Jan 13, 2021 at 9:50 PM Masahiko Sawada 
> wrote:
>
>> On Thu, Jan 7, 2021 at 11:44 AM Zhihong Yu  wrote:
>> >
>> > Hi,
>>
>> Thank you for reviewing the patch!
>>
>> > For pg-foreign/v31-0004-Add-PrepareForeignTransaction-API.patch :
>> >
>> > However these functions are not neither committed nor aborted at
>> >
>> > I think the double negation was not intentional. Should be 'are neither
>> ...'
>>
>> Fixed.
>>
>> >
>> > For FdwXactShmemSize(), is another MAXALIGN(size) needed prior to the
>> return statement ?
>>
>> Hmm, you mean that we need MAXALIGN(size) after adding the size of
>> FdwXactData structs?
>>
>> Size
>> FdwXactShmemSize(void)
>> {
>> Sizesize;
>>
>> /* Size for foreign transaction information array */
>> size = offsetof(FdwXactCtlData, fdwxacts);
>> size = add_size(size, mul_size(max_prepared_foreign_xacts,
>>sizeof(FdwXact)));
>> size = MAXALIGN(size);
>> size = add_size(size, mul_size(max_prepared_foreign_xacts,
>>sizeof(FdwXactData)));
>>
>> return size;
>> }
>>
>> I don't think we need to do that. Looking at other similar code such
>> as TwoPhaseShmemSize() doesn't do that. Why do you think we need that?
>>
>> >
>> > +   fdwxact = FdwXactInsertFdwXactEntry(xid, fdw_part);
>> >
>> > For the function name, Fdw and Xact appear twice, each. Maybe one of
>> them can be dropped ?
>>
>> Agreed. Changed to FdwXactInsertEntry().
>>
>> >
>> > +* we don't need to anything for this participant beca

Re: Transactions involving multiple postgres foreign servers, take 2

2021-01-15 Thread Zhihong Yu
Hi,
For v32-0004-Add-PrepareForeignTransaction-API.patch :

+ * Whenever a foreign transaction is processed, the corresponding FdwXact
+ * entry is update. To avoid holding the lock during transaction
processing
+ * which may take an unpredicatable time the in-memory data of foreign

entry is update -> entry is updated

unpredictable -> unpredictable

+   int nlefts = 0;

nlefts -> nremaining

+   elog(DEBUG1, "left %u foreign transactions", nlefts);

The message can be phrased as "%u foreign transactions remaining"

+FdwXactResolveFdwXacts(int *fdwxact_idxs, int nfdwxacts)

Fdw and Xact are repeated. Seems one should suffice. How about naming the
method FdwXactResolveTransactions() ?
Similar comment for FdwXactResolveOneFdwXact(FdwXact fdwxact)

For get_fdwxact():

+   /* This entry matches the condition */
+   found = true;
+   break;

Instead of breaking and returning, you can return within the loop directly.

Cheers

On Thu, Jan 14, 2021 at 9:17 PM Masahiko Sawada 
wrote:

> On Fri, Jan 15, 2021 at 4:03 AM Zhihong Yu  wrote:
> >
> > Hi,
> > For v32-0008-Prepare-foreign-transactions-at-commit-time.patch :
>
> Thank you for reviewing the patch!
>
> >
> > +   boolhave_notwophase = false;
> >
> > Maybe name the variable have_no_twophase so that it is easier to read.
>
> Fixed.
>
> >
> > +* Two-phase commit is not required if the number of servers
> performed
> >
> > performed -> performing
>
> Fixed.
>
> >
> > +errmsg("cannot process a distributed transaction that
> has operated on a foreign server that does not support two-phase commit
> protocol"),
> > +errdetail("foreign_twophase_commit is \'required\' but
> the transaction has some foreign servers which are not capable of two-phase
> commit")));
> >
> > The lines are really long. Please wrap into more lines.
>
> Hmm, we can do that but if we do that, it makes grepping by the error
> message hard. Please refer to the documentation about the formatting
> guideline[1]:
>
> Limit line lengths so that the code is readable in an 80-column
> window. (This doesn't mean that you must never go past 80 columns. For
> instance, breaking a long error message string in arbitrary places
> just to keep the code within 80 columns is probably not a net gain in
> readability.)
>
> These changes have been made in the local branch. I'll post the
> updated patch set after incorporating all the comments.
>
> Regards,
>
> [1] https://www.postgresql.org/docs/devel/source-format.html
>
> --
> Masahiko Sawada
> EnterpriseDB:  https://www.enterprisedb.com/
>


  1   2   3   4   5   6   >