Re: 回复:Re: Re: 回复:Re: Is it worth pushing conditions to sublink/subplan?

2022-01-05 Thread wenjing
I corrected it according to your suggestion.

thanks

Wenjing.


Zhihong Yu  于2021年12月25日周六 02:26写道:

>
>
> On Thu, Dec 23, 2021 at 3:52 AM 曾文旌(义从) 
> wrote:
>
>>
>> Fixed a bug found during testing.
>>
>>
>> Wenjing
>>
>>
>>>> Hi,
> +   if (condition_is_safe_pushdown_to_sublink(rinfo,
> expr_info->outer))
> +   {
> +   /* replace qual expr from outer var = const to var = const
> and push down to sublink query */
> +   sublink_query_push_qual(subquery, (Node
> *)copyObject(rinfo->clause), expr_info->outer, expr_info->inner);
>
> Since sublink_query_push_qual() is always guarded
> by condition_is_safe_pushdown_to_sublink(), it seems
> sublink_query_push_qual() can be folded into
> condition_is_safe_pushdown_to_sublink().
>
> For generate_base_implied_equalities():
>
> +   if (ec->ec_processed)
> +   {
> +   ec_index++;
> +   continue;
> +   }
> +   else if (list_length(ec->ec_members) > 1)
>
> Minor comment: the keyword `else` can be omitted (due to `continue` above).
>
> +* Since there may be an unexpanded sublink in the targetList,
> +* we'll skip it for now.
>
> Since there may be an -> If there is an
>
> +   {"lazy_process_sublink", PGC_USERSET, QUERY_TUNING_METHOD,
> +   gettext_noop("enable lazy process sublink."),
>
> Looking at existing examples from src/backend/utils/misc/guc.c,
> enable_lazy_sublink_processing seems to be consistent with existing guc
> variable naming.
>
> +lazy_process_sublinks(PlannerInfo *root, bool single_result_rte)
>
> lazy_process_sublinks -> lazily_process_sublinks
>
> +   else
> +   {
> /* There shouldn't be any OJ info to translate, as yet */
> Assert(subroot->join_info_list == NIL);
>
> Indentation for the else block is off.
>
> +   if (istop)
> +   f->quals = preprocess_expression_ext(root, f->quals,
> EXPRKIND_QUAL, false);
> +   else
> +   f->quals = preprocess_expression_ext(root, f->quals,
> EXPRKIND_QUAL, true);
>
> The above can be written as:
>
> +   f->quals = preprocess_expression_ext(root, f->quals,
> EXPRKIND_QUAL, !istop);
>
> For find_equal_conditions_contain_uplevelvar_in_sublink_query():
> +   context.has_unexpected_expr == false &&
> `!context.has_unexpected_expr` should suffice
>
> equal_expr_safety_check -> is_equal_expr_safe
>
> Cheers
>
>


0001-poc-pushdown-qual-to-sublink-v5.patch
Description: Binary data


Re: [Proposal] Global temporary tables

2021-09-22 Thread wenjing
2021年7月14日 10:56,Ming Li  写道:

Hi Wenjing,

Some suggestions may help:

1) It seems that no test case covers the below scenario: 2 sessions attach
the same gtt, and insert/update/select concurrently. It is better to use
the test framework in src/test/isolation like the code changes in
https://commitfest.postgresql.org/24/2233/.


I rewrote the case under regress to make it easier to read.
and I used the Isolation module to add some concurrent cases and fix some
bugs.

Please check code(v52) and give me feedback.


Wenjing


2) CREATE GLOBAL TEMP SEQUENCE also need to be supported
in src/bin/psql/tab-complete.c


On Wed, Jul 14, 2021 at 10:36 AM wenjing  wrote:

> Rebase code based on the latest version.
>
> Regards,
> wenjing
>
>


Re: [Proposal] Global temporary tables

2021-09-23 Thread wenjing
Andrew Dunstan  于2021年3月28日周日 下午9:07写道:

>
> On 3/17/21 7:59 AM, wenjing wrote:
> > ok
> >
> > The cause of the problem is that the name of the dependent function
> > (readNextTransactionID) has changed. I fixed it.
> >
> > This patch(V43) is base on 9fd2952cf4920d563e9cea51634c5b364d57f71a
> >
> > Wenjing
> >
> >
>
> I have fixed this patch so that
>
> a) it applies cleanly
>
> b) it uses project best practice for catalog Oid assignment.
>
> However, as noted elsewhere it fails the recovery TAP test.
>
> I also note this:
>
>
> diff --git a/src/test/regress/parallel_schedule
> b/src/test/regress/parallel_schedule
> index 312c11a4bd..d44fa62f4e 100644
> --- a/src/test/regress/parallel_schedule
> +++ b/src/test/regress/parallel_schedule
> @@ -129,3 +129,10 @@ test: fast_default
>
>  # run stats by itself because its delay may be insufficient under heavy
> load
>  test: stats
> +
> +# global temp table test
> +test: gtt_stats
> +test: gtt_function
> +test: gtt_prepare
> +test: gtt_parallel_1 gtt_parallel_2
> +test: gtt_clean
>
>
> Tests that need to run in parallel should use either the isolation
> tester framework (which is explicitly for testing things concurrently)
> or the TAP test framework.
>
> Adding six test files to the regression test suite for this one feature
> is not a good idea. You should have one regression test script ideally,
> and it should be added as appropriate to both the parallel and serial
> schedules (and not at the end). Any further tests should be added using
> the other frameworks mentioned.
>
Thank you for your advice.
I have simplified the case in regress and put further tests into the
Isolation Tester Framework based on your suggestion.
And I found a few bugs and fixed them.

Please review the GTT v52 and give me feedback.
https://commitfest.postgresql.org/31/2349/


Wenjing



>
>
> cheers
>
>
> andrew
>
>
> --
>
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


Is it worth pushing conditions to sublink/subplan?

2021-08-16 Thread Wenjing
Hi Hackers,

Recently, a issue has been bothering me, This is about conditional push-down in 
SQL.
I use cases from regression testing as an example.
I found that the conditions  (B =1)  can be pushed down into the subquery, 
However, it cannot be pushed down to sublink/subplan.
If a sublink/subplan clause contains a partition table, it can be useful to get 
the conditions for pruning.
So, is it worth pushing conditions to sublink/subplan?
Anybody have any ideas?


regards,
Wenjing


example:
create table p (a int, b int, c int) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create table q (a int, b int, c int) partition by list (a);
create table q1 partition of q for values in (1) partition by list (b);
create table q11 partition of q1 for values in (1) partition by list (c);
create table q111 partition of q11 for values in (1);
create table q2 partition of q for values in (2) partition by list (b);
create table q21 partition of q2 for values in (1);
create table q22 partition of q2 for values in (2);
insert into q22 values (2, 2, 3);


postgres-# explain (costs off)
postgres-# select temp.b  from 
postgres-# (
postgres(# select a,b from ab x where x.a = 1
postgres(# union all 
postgres(# (values(1,1)) 
postgres(# ) temp,
postgres-# ab y
postgres-# where  y.b = temp.b and y.a = 1 and y.b=1;
QUERY PLAN 
---
 Nested Loop
   ->  Seq Scan on ab_a1_b1 y
 Filter: ((b = 1) AND (a = 1))
   ->  Append
 ->  Subquery Scan on "*SELECT* 1"
   ->  Seq Scan on ab_a1_b1 x
 Filter: ((a = 1) AND (b = 1))
 ->  Result
(8 rows)

The conditions  (B =1)  can be pushed down into the subquery.

postgres=# explain (costs off)
postgres-# select
postgres-# y.a,
postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
postgres-# from ab y where a = 1 and b = 1;
QUERY PLAN 
---
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1))
   SubPlan 1
 ->  Append
   ->  Seq Scan on ab_a1_b1 x_1
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a1_b2 x_2
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a1_b3 x_3
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b1 x_4
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b2 x_5
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b3 x_6
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b1 x_7
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b2 x_8
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b3 x_9
 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in 
targetlist.

postgres=# explain (costs off)
postgres-# select y.a
postgres-# from ab y 
postgres-# where
postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
postgres-# y.a = 1 and y.b = 1;
QUERY PLAN 
---
 Seq Scan on ab_a1_b1 y
   Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
   SubPlan 1
 ->  Append
   ->  Seq Scan on ab_a1_b1 x_1
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a1_b2 x_2
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a1_b3 x_3
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b1 x_4
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b2 x_5
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a2_b3 x_6
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b1 x_7
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b2 x_8
 Filter: ((y.a = a) AND (y.b = b))
   ->  Seq Scan on ab_a3_b3 x_9
 Filter: ((y.a = a) AND (y.b = b))
(22 rows)

The conditions  (B=1 and A=1)  cannot be pushed down to sublink/subplan in 
where clause.





smime.p7s
Description: S/MIME cryptographic signature


Re: Is it worth pushing conditions to sublink/subplan?

2021-08-16 Thread Wenjing


> 2021年8月16日 17:15,Wenjing  写道:
> 
> Hi Hackers,
> 
> Recently, a issue has been bothering me, This is about conditional push-down 
> in SQL.
> I use cases from regression testing as an example.
> I found that the conditions  (B =1)  can be pushed down into the subquery, 
> However, it cannot be pushed down to sublink/subplan.
> If a sublink/subplan clause contains a partition table, it can be useful to 
> get the conditions for pruning.
> So, is it worth pushing conditions to sublink/subplan?
> Anybody have any ideas?
> 
> 
> regards,
> Wenjing
> 
> 
> example:
> create table p (a int, b int, c int) partition by list (a);
> create table p1 partition of p for values in (1);
> create table p2 partition of p for values in (2);
> create table q (a int, b int, c int) partition by list (a);
> create table q1 partition of q for values in (1) partition by list (b);
> create table q11 partition of q1 for values in (1) partition by list (c);
> create table q111 partition of q11 for values in (1);
> create table q2 partition of q for values in (2) partition by list (b);
> create table q21 partition of q2 for values in (1);
> create table q22 partition of q2 for values in (2);
> insert into q22 values (2, 2, 3);
Sorry, I messed up the structure of the table.
It is should be:
create table ab (a int not null, b int not null) partition by list (a);
create table ab_a2 partition of ab for values in(2) partition by list (b);
create table ab_a2_b1 partition of ab_a2 for values in (1);
create table ab_a2_b2 partition of ab_a2 for values in (2);
create table ab_a2_b3 partition of ab_a2 for values in (3);
create table ab_a1 partition of ab for values in(1) partition by list (b);
create table ab_a1_b1 partition of ab_a1 for values in (1);
create table ab_a1_b2 partition of ab_a1 for values in (2);
create table ab_a1_b3 partition of ab_a1 for values in (3);
create table ab_a3 partition of ab for values in(3) partition by list (b);
create table ab_a3_b1 partition of ab_a3 for values in (1);
create table ab_a3_b2 partition of ab_a3 for values in (2);
create table ab_a3_b3 partition of ab_a3 for values in (3);


> 
> 
> postgres-# explain (costs off)
> postgres-# select temp.b  from 
> postgres-# (
> postgres(# select a,b from ab x where x.a = 1
> postgres(# union all 
> postgres(# (values(1,1)) 
> postgres(# ) temp,
> postgres-# ab y
> postgres-# where  y.b = temp.b and y.a = 1 and y.b=1;
> QUERY PLAN 
> ---
>  Nested Loop
>->  Seq Scan on ab_a1_b1 y
>  Filter: ((b = 1) AND (a = 1))
>->  Append
>  ->  Subquery Scan on "*SELECT* 1"
>->  Seq Scan on ab_a1_b1 x
>  Filter: ((a = 1) AND (b = 1))
>  ->  Result
> (8 rows)
> 
> The conditions  (B =1)  can be pushed down into the subquery.
> 
> postgres=# explain (costs off)
> postgres-# select
> postgres-# y.a,
> postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
> postgres-# from ab y where a = 1 and b = 1;
> QUERY PLAN 
> ---
>  Seq Scan on ab_a1_b1 y
>Filter: ((a = 1) AND (b = 1))
>SubPlan 1
>  ->  Append
>->  Seq Scan on ab_a1_b1 x_1
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a1_b2 x_2
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a1_b3 x_3
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b1 x_4
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b2 x_5
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b3 x_6
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b1 x_7
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b2 x_8
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b3 x_9
>  Filter: ((y.a = a) AND (y.b = b))
> (22 rows)
> 
> The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan in 
> targetlist.
> 
> postgres=# explain (costs off)
> postgres-# select y.a
> postgres-# from ab y 
> postgres-# where
> postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
> postgres-# y.a = 1 and y.b = 1;
> QUERY PLAN 
> ---
>  Seq Scan on ab_a1_b1 y
>Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
>SubPlan 1
> 

Re: [Proposal] Global temporary tables

2022-02-28 Thread Wenjing Zeng


> 2022年2月25日 15:45,Andres Freund  写道:
> 
> Hi,
> 
> 
> This is a huge thread. Realistically reviewers and committers can't reread
> it. I think there needs to be more of a description of how this works included
> in the patchset and *why* it works that way. The readme does a bit of that,
> but not particularly well.
Thank you for your review of the design and code.
I'm always trying to improve it. If you are confused or need clarification on 
something, please point it out.


> 
> 
> On 2022-02-25 14:26:47 +0800, Wenjing Zeng wrote:
>> +++ b/README.gtt.txt
>> @@ -0,0 +1,172 @@
>> +Global Temporary Table(GTT)
>> +=
>> +
>> +Feature description
>> +-
>> +
>> +Previously, temporary tables are defined once and automatically
>> +exist (starting with empty contents) in every session before using them.
> 
> I think for a README "previously" etc isn't good language - if it were
> commited, it'd not be understandable anymore. It makes more sense for commit
> messages etc.
Thanks for pointing it out. I will adjust the description.

> 
> 
>> +Main design ideas
>> +-
>> +In general, GTT and LTT use the same storage and buffer design and
>> +implementation. The storage files for both types of temporary tables are 
>> named
>> +as t_backendid_relfilenode, and the local buffer is used to cache the data.
> 
> What does "named ast_backendid_relfilenode" mean?
This is the storage file naming format for describing temporary tables.
It starts with 't', followed by backendid and relfilenode, connected by an 
underscore.
File naming rules are the same as LTT.
The data in the file is no different from regular tables and LTT.

> 
> 
>> +The schema of GTTs is shared among sessions while their data are not. We 
>> build
>> +a new mechanisms to manage those non-shared data and their statistics.
>> +Here is the summary of changes:
>> +
>> +1) CATALOG
>> +GTTs store session-specific data. The storage information of GTTs'data, 
>> their
>> +transaction information, and their statistics are not stored in the catalog.
>> +
>> +2) STORAGE INFO & STATISTICS INFO & TRANSACTION INFO
>> +In order to maintain durability and availability of GTTs'session-specific 
>> data,
>> +their storage information, statistics, and transaction information is 
>> managed
>> +in a local hash table tt_storage_local_hash.
> 
> "maintain durability"? Durable across what? In the context of databases it's
> typically about crash safety, but that can't be the case here.
It means that the transaction information(relfrozenxid/relminmxid)  storage 
information(relfilenode)
and statistics(relpages) of GTT, which are maintained in hashtable , not 
pg_class.
This is to allow GTT to store its own local data in different sessions and to 
avoid frequent catalog changes.

> 
> 
>> +3) DDL
>> +Currently, GTT supports almost all table'DDL except CLUSTER/VACUUM FULL.
>> +Part of the DDL behavior is limited by shared definitions and multiple 
>> copies of
>> +local data, and we added some structures to handle this.
> 
>> +A shared hash table active_gtt_shared_hash is added to track the state of 
>> the
>> +GTT in a different session. This information is recorded in the hash table
>> +during the DDL execution of the GTT.
> 
>> +The data stored in a GTT can only be modified or accessed by owning session.
>> +The statements that only modify data in a GTT do not need a high level of
>> +table locking. The operations making those changes include truncate GTT,
>> +reindex GTT, and lock GTT.
> 
> I think you need to introduce a bit more terminology for any of this to make
> sense. Sometimes GTT means the global catalog entity, sometimes, like here, it
> appears to mean the session specific contents of a GTT.
> 
> What state of a GTT in a nother session?
> 
> 
> How do GTTs handle something like BEGIN; TRUNCATE some_gtt_table; ROLLBACK;?

GTT behaves exactly like a regular table.
Specifically, the latest relfilenode for the current session is stored in the 
hashtable and may change it.
If the transaction rolls back, the old relfilenode is enabled again, just as it 
is in pg_class.

> 
> 
>> +1.2 on commit clause
>> +LTT's status associated with on commit DELETE ROWS and on commit PRESERVE 
>> ROWS
>> +is not stored in catalog. Instead, GTTs need a bool value 
>> on_commit_delete_rows
>> +in reloptions which is shared among sessions.
> 
> Why?
The

Re: [Proposal] Global temporary tables

2022-02-28 Thread Wenjing Zeng


> 2022年2月27日 08:21,Justin Pryzby  写道:
> 
> I read through this.
> Find attached some language fixes.  You should be able to apply each "fix"
> patch on top of your own local branch with git am, and then squish them
> together.  Let me know if you have trouble with that.
> 
> I think get_seqence_start_value() should be static.  (Or otherwise, it should
> be in lsyscache.c).
> 
> The include added to execPartition.c seems to be unused.
> 
> +#define RELATION_IS_TEMP_ON_CURRENT_SESSION(relation) \
> +#define RELATION_IS_TEMP(relation) \
> +#define RelpersistenceTsTemp(relpersistence) \
> +#define RELATION_GTT_ON_COMMIT_DELETE(relation)\
> 
> => These macros can evaluate their arguments multiple times.
> You should add a comment to warn about that.  And maybe avoid passing them a
> function argument, like: RelpersistenceTsTemp(get_rel_persistence(rte->relid))
> 
> +list_all_backend_gtt_frozenxids should return TransactionId not int.
> The function name should say "oldest" and not "all" ?
> 
> I think the GUC should have a longer name.  max_active_gtt is too short for a
> global var.
> 
> +#defineMIN_NUM_ACTIVE_GTT  0
> +#defineDEFAULT_NUM_ACTIVE_GTT  1000
> +#defineMAX_NUM_ACTIVE_GTT  100
> 
> +intmax_active_gtt = MIN_NUM_ACTIVE_GTT
> 
> It's being initialized to MIN, but then the GUC machinery sets it to DEFAULT.
> By convention, it should be initialized to default.
> 
> fout->remoteVersion >= 14
> 
> => should say 15
> 
> describe.c has gettext_noop("session"), which is a half-truth.  The data is
> per-session but the table definition is persistent..
Thanks for your advice, I will try to merge this part of the code.

> 
> You redirect stats from pg_class and pg_statistics to a local hash table.
> This is pretty hairy :(
> I guess you'd also need to handle pg_statistic_ext and ext_data.
> pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to 
> look
> at pg_get_gtt_statistics.
> 
> I wonder if there's a better way to do it, like updating pg_statistic but
> forcing the changes to be rolled back when the session ends...  But I think
> that would make longrunning sessions behave badly, the same as "longrunning
> transactions".

There are three pieces of data related to session-level GTT data that need to 
be managed
1 session-level storage info like relfilenode
2 session-level like relfrozenxid
3 session-level stats like relpages or column stats

I think the 1 and 2 are necessary, but not for stats.
In the previous email, It has been suggested that GTT statistics not be 
processed.
This means that GTT statistics are not recorded in the localhash or catalog.
In my observation, very few users require an accurate query plan for temporary 
tables to
perform manual analyze.
Of course, doing this will also avoid catalog bloat and performance problems.


> 
> Have you looked at Gilles Darold's GTT extension ?
If you are referring to https://github.com/darold/pgtt 
<https://github.com/darold/pgtt> , yes.
It is smart to use unlogged table as a template and then use LTT to read and 
write data.
For this implementation, I want to point out two things:
1 For the first insert of GTT in each session, create table or create index is 
implicitly executed.
2 The catalog bloat caused by LTT still exist.


Regards, Wenjing.


> <0002-f-0002-gtt-v64-doc.txt><0004-f-0003-gtt-v64-implementation.txt><0006-f-0004-gtt-v64-regress.txt>



Re: [Proposal] Global temporary tables

2022-03-01 Thread Wenjing Zeng



> 2022年2月27日 12:13,Andres Freund  写道:
> 
> Hi,
> 
> On 2022-02-27 04:17:52 +0100, Pavel Stehule wrote:
>>> You redirect stats from pg_class and pg_statistics to a local hash table.
>>> This is pretty hairy :(
> 
> As is I think the patch is architecturally completely unacceptable. Having
> code everywhere to redirect to manually written in-memory catalog table code
> isn't maintainable.
> 
> 
>>> I guess you'd also need to handle pg_statistic_ext and ext_data.
>>> pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to
>>> look
>>> at pg_get_gtt_statistics.
>> 
>> Without this, the GTT will be terribly slow like current temporary tables
>> with a lot of problems with bloating of pg_class, pg_attribute and
>> pg_depend tables.
> 
> I think it's not a great idea to solve multiple complicated problems at
> once...

I'm trying to break down the entire implementation into multiple sub-patches.


Regards, Wenjing.


> 
> Greetings,
> 
> Andres Freund
> 
> 





why does reindex invalidate relcache without modifying system tables

2021-12-27 Thread wenjing zeng
Hi Tom

I would like to ask you about the details of index build.
I found that in the index_update_stats function, i.e. the CREATE 
INDEX/REINDEX/Truncate INDEX process,
relchche is invalidated whether the index information is updated. I want to 
know why you're did this
The code is:
if (dirty)
 {
heap_inplace_update(pg_class, tuple);
/* the above sends a cache inval message */ } 
else 
{
 /* no need to change tuple, but force relcache inval 
anyway */ 
 CacheInvalidateRelcacheByTuple(tuple); 
}

There's a special line of comment here, and I think you wrote that part for 
some reason.

The reason I ask this question is that 
1 similar places like the vac_update_relstats /vac_update_datfrozenxid function 
don't do this.
2 Local Temp table with ON COMMIT DELETE ROWS builds index for each transaction 
commit.
This causes relcache of the temp table to be rebuilt over and over again.

Looking forward to your reply.

Thanks


Wenjing






Re: why does reindex invalidate relcache without modifying system tables

2022-01-03 Thread wenjing zeng



> 2021年12月27日 23:54,Tom Lane  写道:
> 
> wenjing zeng  writes:
>> I found that in the index_update_stats function, i.e. the CREATE 
>> INDEX/REINDEX/Truncate INDEX process,
>> relchche is invalidated whether the index information is updated. I want to 
>> know why you're did this
> 
> Did you read the function's header comment?  It says
> 
> * NOTE: an important side-effect of this operation is that an SI invalidation
> * message is sent out to all backends --- including me --- causing relcache
> * entries to be flushed or updated with the new data.  This must happen even
> * if we find that no change is needed in the pg_class row.  When updating
> * a heap entry, this ensures that other backends find out about the new
> * index.  When updating an index, it's important because some index AMs
> * expect a relcache flush to occur after REINDEX.
> 
> That is, what we need to force an update of is either the relcache's
> rd_indexlist list (for a table) or rd_amcache (for an index).
> 
> In the REINDEX case, we could conceivably skip the flush on the table,
> but not on the index.  I don't think it's worth worrying about though,
> because REINDEX will very probably have an update for the table's
> physical size data (relpages and/or reltuples), so that it's unlikely
> that the no-change path would be taken anyway.
> 
>   regards, tom lane
Thank you for your explanation, which clears up my doubts.

Wenjing



Re: [Proposal] Global temporary tables

2020-07-22 Thread wenjing zeng


> 2020年7月14日 下午10:28,Pavel Stehule  写道:
> 
> 
> 
> pá 10. 7. 2020 v 11:04 odesílatel wenjing zeng  <mailto:wjzeng2...@gmail.com>> napsal:
> HI all
> 
> I started using my personal email to respond to community issue.
> 
> 
> 
>> 2020年7月7日 下午6:05,Pavel Stehule > <mailto:pavel.steh...@gmail.com>> 写道:
>> 
>> Hi
>>  
>> GTT Merge the latest PGMaster and resolves conflicts.
>> 
>> 
>> 
>> I tested it and it looks fine. I think it is very usable in current form, 
>> but still there are some issues:
>> 
>> postgres=# create global temp table foo(a int);
>> CREATE TABLE
>> postgres=# insert into foo values(10);
>> INSERT 0 1
>> postgres=# alter table foo add column x int;
>> ALTER TABLE
>> postgres=# analyze foo;
>> WARNING:  reloid 16400 not support update attstat after add colunm
>> WARNING:  reloid 16400 not support update attstat after add colunm
>> ANALYZE
> This is a limitation that we can completely eliminate.
> 
>> 
>> Please, can you summarize what is done, what limits are there, what can be 
>> implemented hard, what can be implemented easily?
> Sure.
> 
> The current version of the GTT implementation supports all regular table 
> operations.
> 1 what is done
> 1.1 insert/update/delete on GTT.
> 1.2 The GTT supports all types of indexes, and the query statement supports 
> the use of GTT indexes to speed up the reading of data in the GTT.
> 1.3 GTT statistics keep a copy of THE GTT local statistics, which are 
> provided to the optimizer to choose the best query plan.
> 1.4 analyze vacuum GTT.
> 1.5 truncate cluster GTT.
> 1.6 all DDL on GTT.
> 1.7 GTT table can use  GTT sequence  or Regular sequence.
> 1.8 Support for creating views on GTT.
> 1.9 Support for creating views on foreign key.
> 1.10 support global temp partition.
> 
> I feel like I cover all the necessary GTT requirements.
> 
> For cluster GTT,I think it's complicated.
> I'm not sure the current implementation is quite reasonable. Maybe you can 
> help review it.
> 
> 
>> 
>> 
>> 
>> I found one open question - how can be implemented table locks - because 
>> data is physically separated, then we don't need table locks as protection 
>> against race conditions. 
> Yes, but GTT’s DML DDL still requires table locking.
> 1 The DML requires table locks (RowExclusiveLock) to ensure that 
> definitions do not change during run time (the DDL may modify or delete them).
> This part of the implementation does not actually change the code,
> because the DML on GTT does not block each other between sessions.
> 
> 2 For truncate/analyze/vacuum reinidex cluster GTT is now like DML, 
> they only modify local data and do not modify the GTT definition.
> So I lowered the table lock level held by the GTT, only need RowExclusiveLock.
> 
> 3 For DDLs that need to be modified the GTT table definition(Drop GTT Alter 
> GTT), 
> an exclusive level of table locking is required(AccessExclusiveLock), 
> as is the case for regular table.
> This part of the implementation also does not actually change the code.
> 
> Summary: What I have done is to adjust the GTT lock levels in different types 
> of statements based on the above thinking.
> For example, truncate GTT, I'm reducing the GTT holding table lock level to 
> RowExclusiveLock,
> So We can truncate data in the same GTT between different sessions at the 
> same time.
> 
> What do you think about table locks on GTT?
> 
> I am thinking about explicit LOCK statements. Some applications use explicit 
> locking from some reasons - typically as protection against race conditions. 
> 
> But on GTT race conditions are not possible. So my question is - does the 
> exclusive lock on GTT  protection other sessions do insert into their own 
> instances of the same GTT?
In my opinion, with a GTT, always work on the private data of the session, 
there is no need to do anything by holding the lock, so the lock statement 
should do nothing (The same is true for ORACLE GTT)

What do you think?

> 
> What is a level where table locks are active? shared part of GTT or session 
> instance part of GTT?
I don't quite understand what you mean, could you explain it a little bit?



Wenjing



> 
> 
> 
> 
> 
> Wenjing
> 
> 
>> 
>> Now, table locks are implemented on a global level. So exclusive lock on GTT 
>> in one session block insertion on the second session. Is it expected 
>> behaviour? It is safe, but maybe it is too strict. 
>> 
>> We should define what table lock is meaning on GTT.
>> 
>> Regards
>> 
>> Pavel
>>  
>> Pavel
>> 
>> 
>>> With Regards,
>>> Prabhat Kumar Sahu
>>> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
>> 
>> 
> 



Re: [Proposal] Global temporary tables

2021-10-14 Thread wenjing zeng


> 2021年10月13日 13:08,Andrew Bille  写道:
> 
> Thanks for the fix. It works for me.
> 
> Now I'm exploring another crash related to GTT, but I need a few days to 
> present a simple repro.

Be deeply grateful.
Perhaps you can give the stack of problems so that you can start analyzing them 
as soon as possible.


Wenjing

> 
> On Sat, Oct 9, 2021 at 2:41 PM wenjing  <mailto:wjzeng2...@gmail.com>> wrote:
> 
> Thank you for pointing it out. 
> This is a bug that occurs during transaction rollback and process exit, I 
> fixed it, please confirm it.
> 
> Wenjing 



Re: Asynchronous and "direct" IO support for PostgreSQL.

2023-01-17 Thread Wenjing Zeng


> 2021年9月1日 13:56,Andres Freund  写道:
> 
> Hi,
> 
> Attached is an updated patch AIO series. The major changes are:
> - rebased onto master (Andres)
> - lots of progress on posix AIO backend (Thomas)
> - lots of progress towards a windows native AIO implementation - not yet quite
>  merged (Thomas & David)
> - considerably improved "worker" io_method (Thomas)
> - some preliminary patches merged (Thomas) and thus dropped
> - error handling overhaul, AIO references now use resource owners
> - quite a few more localized bugfixes
> - further CI improvements
> 
> Unfortunately there's a few tests that don't pass on windows. At least some of
> those failures also happen on master - hence the alternative output file added
> in the last commit.
> 
> Thanks to Thomas there's now a new wiki page for AIO support:
> https://wiki.postgresql.org/wiki/AIO
> It's currently mostly a shared todo list
> 
> My own next steps are to try to get some of the preliminary patches merged
> into master, and to address some of the higher level things that aren't yet
> quite right with the AIO interface, and to split the "main" AIO patch into
> smaller patches.
> 
> I hope that we soon send in a new version with native AIO support for
> windows. I'm mostly interested in that to make sure that we get the shared
> infrastructure right.
> 
> Melanie has some work improving bitmap heap scan AIO support and some IO stats
> / explain improvements.
> 
> I think a decent and reasonably simple example for the way the AIO interface
> can be used to do faster IO is
> v3-0028-aio-Use-AIO-in-nbtree-vacuum-scan.patch.gz which adds AIO for nbtree
> vacuum. It's not perfectly polished, but I think it shows that it's not too
> hard to add AIO usage to individual once the general infrastructure is in
> place.
> 
> I've attached the code for posterity, but the series is large enough that I
> don't think it makes sense to do that all that often... The code is at
> https://github.com/anarazel/postgres/tree/aio

HI Andres:

I noticed this feature and did some testing.
code in GitHub's aio branch:
Function 
static void
pgaio_write_smgr_retry(PgAioInProgress *io)
{
uint32 off;
AioBufferTag *tag = &io->scb_data.write_smgr.tag;
SMgrRelation reln = smgropen(tag->rlocator.locator, tag->rlocator.backend);

io->op_data.read.fd = smgrfd(reln, tag->forkNum, tag->blockNum, &off);
Assert(off == io->op_data.read.offset);
}

seems should to be:
io->op_data.write.fd = smgrfd(reln, tag->forkNum, tag->blockNum, &off);
Assert(off == io->op_data.write.offset);


Best regards,
Wenjing

> 
> Greetings,
> 
> Andres Freund
> 



Re: CREATE TABLE ( .. STORAGE ..)

2022-01-20 Thread wenjing zeng
HI

For patch create_table_storage-v1

1 
+ALTER opt_column ColId SET STORAGE name

+opt_column_storage:
+   STORAGE ColId   
{ $$ = $2; }

Are they both set to name or ColId? Although they are the same.

2 For ColumnDef new member storage_name, did you miss the function 
_copyColumnDef()  _equalColumnDef()?


Regards
Wenjing


> 2021年12月27日 15:51,Teodor Sigaev  写道:
> 
> Hi!
> 
> Working on pluggable toaster (mostly, for JSONB improvements, see links 
> below) I had found that STORAGE attribute on column is impossible to set  in 
> CREATE TABLE command but COMPRESS option is possible. It looks unreasonable. 
> Suggested patch implements this possibility.
> 
> [1] http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconfnyc-2021.pdf
> [2] http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgvision-2021.pdf
> [3] http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconfonline-2021.pdf
> [4] http://www.sai.msu.su/~megera/postgres/talks/bytea-pgconfonline-2021.pdf
> 
> PS I will propose pluggable toaster patch a bit later
> -- 
> Teodor Sigaev  E-mail: teo...@sigaev.ru
>  WWW: 
> http://www.sigaev.ru/





Re: [PATCH] Implement INSERT SET syntax

2022-01-21 Thread wenjing zeng


Since this feature adds INSERT OVERRIDING SET syntax, it is recommended to add 
some related testcases.


Regards
Wenjing


> 2021年9月22日 07:38,Rachel Heaton  写道:
> 
>> On 4/23/20 8:04 PM, Gareth Palmer wrote:
>>> 
>>> Thank you for the review, attached is v7 of the patch which should
>>> apply correcly to HEAD.
>>> 
> 
> Hello Gareth,
> 
> This patch no longer applies to HEAD, can you please submit a rebased version?
> 
> Thanks,
> Rachel
> 
> 


0001-insert-overriding-set-case.patch
Description: Binary data


Re: [Commitfest 2022-07] Begins Now

2022-07-15 Thread Wenjing Zeng
Hi Jacob

Abort Global temporary table
https://commitfest.postgresql.org/36/2349/# 
<https://commitfest.postgresql.org/36/2349/#>
Please move the Global Temporary table to check next month, that is at 202208.
I need more time to process the existing issue.

Thanks
Wenjing


> 2022年7月9日 07:42,Jacob Champion  写道:
> 
> On 7/1/22 08:08, Jacob Champion wrote:
>> It's been July everywhere on Earth for a few hours, so the July
>> commitfest is now in progress:
>> 
>>https://commitfest.postgresql.org/38/
> One week down, three to go.
> 
> I forgot to put the overall status in the last email. We started the
> month with the following stats:
> 
>Needs review: 214
>Waiting on Author: 36
>Ready for Committer:   23
>Committed: 21
>Moved to next CF:   1
>Withdrawn:  5
>Rejected:   2
>Returned with Feedback: 3
>--
>Total:305
> 
> And as of this email, we're now at
> 
>Needs review: 193
>Waiting on Author: 38
>Ready for Committer:   24
>Committed: 37
>Moved to next CF:   2
>Withdrawn:  6
>Rejected:   2
>Returned with Feedback: 3
>--
>Total:305
> 
> That's sixteen patchsets committed in the first week.
> 
> Have a good weekend,
> --Jacob
> 
> 



Re: [Proposal] Global temporary tables

2021-07-29 Thread wenjing zeng


> 2021年7月28日 23:09,Tony Zhu  写道:
> 
> Hi Wenjing
> 
> would you please rebase the code?
Thank you for your attention.
According to the test, the latest pgmaster code can merge the latest patch and 
pass the test.
https://www.travis-ci.com/github/wjzeng/postgres/builds 
<https://www.travis-ci.com/github/wjzeng/postgres/builds>
If you have any questions, please give me feedback.


Wenjing


> 
> Thank you very much
> Tony
> 
> The new status of this patch is: Waiting on Author



Re: [Proposal] Global temporary tables

2020-07-10 Thread wenjing zeng
HI all

I started using my personal email to respond to community issue.



> 2020年7月7日 下午6:05,Pavel Stehule  写道:
> 
> Hi
>  
> GTT Merge the latest PGMaster and resolves conflicts.
> 
> 
> 
> I tested it and it looks fine. I think it is very usable in current form, but 
> still there are some issues:
> 
> postgres=# create global temp table foo(a int);
> CREATE TABLE
> postgres=# insert into foo values(10);
> INSERT 0 1
> postgres=# alter table foo add column x int;
> ALTER TABLE
> postgres=# analyze foo;
> WARNING:  reloid 16400 not support update attstat after add colunm
> WARNING:  reloid 16400 not support update attstat after add colunm
> ANALYZE
This is a limitation that we can completely eliminate.

> 
> Please, can you summarize what is done, what limits are there, what can be 
> implemented hard, what can be implemented easily?
Sure.

The current version of the GTT implementation supports all regular table 
operations.
1 what is done
1.1 insert/update/delete on GTT.
1.2 The GTT supports all types of indexes, and the query statement supports the 
use of GTT indexes to speed up the reading of data in the GTT.
1.3 GTT statistics keep a copy of THE GTT local statistics, which are provided 
to the optimizer to choose the best query plan.
1.4 analyze vacuum GTT.
1.5 truncate cluster GTT.
1.6 all DDL on GTT.
1.7 GTT table can use  GTT sequence  or Regular sequence.
1.8 Support for creating views on GTT.
1.9 Support for creating views on foreign key.
1.10 support global temp partition.

I feel like I cover all the necessary GTT requirements.

For cluster GTT,I think it's complicated.
I'm not sure the current implementation is quite reasonable. Maybe you can help 
review it.


> 
> 
> 
> I found one open question - how can be implemented table locks - because data 
> is physically separated, then we don't need table locks as protection against 
> race conditions. 
Yes, but GTT’s DML DDL still requires table locking.
1 The DML requires table locks (RowExclusiveLock) to ensure that 
definitions do not change during run time (the DDL may modify or delete them).
This part of the implementation does not actually change the code,
because the DML on GTT does not block each other between sessions.

2 For truncate/analyze/vacuum reinidex cluster GTT is now like DML, 
they only modify local data and do not modify the GTT definition.
So I lowered the table lock level held by the GTT, only need RowExclusiveLock.

3 For DDLs that need to be modified the GTT table definition(Drop GTT Alter 
GTT), 
an exclusive level of table locking is required(AccessExclusiveLock), 
as is the case for regular table.
This part of the implementation also does not actually change the code.

Summary: What I have done is to adjust the GTT lock levels in different types 
of statements based on the above thinking.
For example, truncate GTT, I'm reducing the GTT holding table lock level to 
RowExclusiveLock,
So We can truncate data in the same GTT between different sessions at the same 
time.

What do you think about table locks on GTT?


Wenjing


> 
> Now, table locks are implemented on a global level. So exclusive lock on GTT 
> in one session block insertion on the second session. Is it expected 
> behaviour? It is safe, but maybe it is too strict. 
> 
> We should define what table lock is meaning on GTT.
> 
> Regards
> 
> Pavel
>  
> Pavel
> 
> 
>> With Regards,
>> Prabhat Kumar Sahu
>> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
> 
> 



Re: [Proposal] Global temporary tables

2020-07-13 Thread wenjing zeng


> 2020年7月10日 下午5:03,wenjing zeng  写道:
> 
> HI all
> 
> I started using my personal email to respond to community issue.
> 
> 
> 
>> 2020年7月7日 下午6:05,Pavel Stehule > <mailto:pavel.steh...@gmail.com>> 写道:
>> 
>> Hi
>>  
>> GTT Merge the latest PGMaster and resolves conflicts.
>> 
>> 
>> 
>> I tested it and it looks fine. I think it is very usable in current form, 
>> but still there are some issues:
>> 
>> postgres=# create global temp table foo(a int);
>> CREATE TABLE
>> postgres=# insert into foo values(10);
>> INSERT 0 1
>> postgres=# alter table foo add column x int;
>> ALTER TABLE
>> postgres=# analyze foo;
>> WARNING:  reloid 16400 not support update attstat after add colunm
>> WARNING:  reloid 16400 not support update attstat after add colunm
>> ANALYZE
> This is a limitation that we can completely eliminate.
> 
>> 
>> Please, can you summarize what is done, what limits are there, what can be 
>> implemented hard, what can be implemented easily?
> Sure.
> 
> The current version of the GTT implementation supports all regular table 
> operations.
> 1 what is done
> 1.1 insert/update/delete on GTT.
> 1.2 The GTT supports all types of indexes, and the query statement supports 
> the use of GTT indexes to speed up the reading of data in the GTT.
> 1.3 GTT statistics keep a copy of THE GTT local statistics, which are 
> provided to the optimizer to choose the best query plan.
> 1.4 analyze vacuum GTT.
> 1.5 truncate cluster GTT.
> 1.6 all DDL on GTT.
> 1.7 GTT table can use  GTT sequence  or Regular sequence.
> 1.8 Support for creating views on GTT.
> 1.9 Support for creating views on foreign key.
> 1.10 support global temp partition.
> 
> I feel like I cover all the necessary GTT requirements.
> 
> For cluster GTT,I think it's complicated.
> I'm not sure the current implementation is quite reasonable. Maybe you can 
> help review it.
> 
> 
>> 
>> 
>> 
>> I found one open question - how can be implemented table locks - because 
>> data is physically separated, then we don't need table locks as protection 
>> against race conditions. 
> Yes, but GTT’s DML DDL still requires table locking.
> 1 The DML requires table locks (RowExclusiveLock) to ensure that 
> definitions do not change during run time (the DDL may modify or delete them).
> This part of the implementation does not actually change the code,
> because the DML on GTT does not block each other between sessions.
As a side note, since the same row of GTT data can not modified by different 
sessions,
So, I don't see the need to care the GTT's PG_class.relminmxID.
What do you think?


Wenjing


> 
> 2 For truncate/analyze/vacuum reinidex cluster GTT is now like DML, 
> they only modify local data and do not modify the GTT definition.
> So I lowered the table lock level held by the GTT, only need RowExclusiveLock.
> 
> 3 For DDLs that need to be modified the GTT table definition(Drop GTT Alter 
> GTT), 
> an exclusive level of table locking is required(AccessExclusiveLock), 
> as is the case for regular table.
> This part of the implementation also does not actually change the code.
> 
> Summary: What I have done is to adjust the GTT lock levels in different types 
> of statements based on the above thinking.
> For example, truncate GTT, I'm reducing the GTT holding table lock level to 
> RowExclusiveLock,
> So We can truncate data in the same GTT between different sessions at the 
> same time.
> 
> What do you think about table locks on GTT?
> 
> 
> Wenjing
> 
> 
>> 
>> Now, table locks are implemented on a global level. So exclusive lock on GTT 
>> in one session block insertion on the second session. Is it expected 
>> behaviour? It is safe, but maybe it is too strict. 
>> 
>> We should define what table lock is meaning on GTT.
>> 
>> Regards
>> 
>> Pavel
>>  
>> Pavel
>> 
>> 
>>> With Regards,
>>> Prabhat Kumar Sahu
>>> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
>> 
>> 
>