from:"wenhui qiu"

Re: Support "Right Semi Join" plan shapes

2024-01-21 Thread wenhui qiu

Hi  vignesh CI saw this path has been passed (
https://cirrus-ci.com/build/6109321080078336),can we push it?

Best wish

Richard Guo  于2024年1月9日周二 18:49写道：

>
> On Sun, Jan 7, 2024 at 3:03 PM vignesh C  wrote:
>
>> One of the tests in CFBot has failed at [1] with:
>> -   Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
>> -   Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6,
>> r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND EXISTS
>> (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
>> ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3))) ORDER BY
>> r1."C 1" ASC NULLS LAST
>> -(4 rows)
>> +   Sort Key: t1.c1
>> +   ->  Foreign Scan
>> + Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8
>> + Relations: (public.ft1 t1) SEMI JOIN (public.ft2 t2)
>> + Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5,
>> r1.c6, r1.c7, r1.c8 FROM "S 1"."T 1" r1 WHERE ((r1."C 1" < 20)) AND
>> EXISTS (SELECT NULL FROM "S 1"."T 1" r3 WHERE ((r3."C 1" > 10)) AND
>> ((date(r3.c5) = '1970-01-17'::date)) AND ((r3.c3 = r1.c3)))
>> +(7 rows)
>
>
> Thanks.  I looked into it and have figured out why the plan differs.
> With this patch the SEMI JOIN that is pushed down to the remote server
> is now implemented using JOIN_RIGHT_SEMI, whereas previously it was
> implemented using JOIN_SEMI.  Consequently, this leads to changes in the
> costs of the paths: path with the sort pushed down to remote server, and
> path with the sort added atop the foreign join.  And at last the latter
> one wins by a slim margin.
>
> I think we can simply update the expected file to fix this plan diff, as
> attached.
>
> Thanks
> Richard
>

Re: Support "Right Semi Join" plan shapes

2024-01-23 Thread wenhui qiu

Hi  vignesh C
   Many thanks, I have marked it to  "ready for committer"

Best wish

vignesh C  于2024年1月23日周二 10:56写道：

> On Mon, 22 Jan 2024 at 11:27, wenhui qiu  wrote:
> >
> > Hi  vignesh CI saw this path has been passed (
> https://cirrus-ci.com/build/6109321080078336),can we push it?
>
> If you have found no comments from your review and testing, let's mark
> it as "ready for committer".
>
> Regards,
> Vignesh
>

Re: Support "Right Semi Join" plan shapes

2024-02-07 Thread wenhui qiu

Hi Alena Rybakina
 I saw this code snippet also disable mergejoin ,I think it same  effect
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+

Regards

Alena Rybakina  于2024年1月30日周二 14:51写道：

> Hi! Thank you for your work on this subject.
>
> I have reviewed your patch and I think it is better to add an Assert for
> JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
> prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
> and MergeJoin).
> Mostly I'm suggesting this because of the set_join_pathlist_hook
> function, which is in the add_paths_to_joinrel function, which allows
> you to create a custom node. What do you think?
>
> --
> Regards,
> Alena Rybakina
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

Thoughts about NUM_BUFFER_PARTITIONS

2024-02-08 Thread wenhui qiu

HI hackers
When I read this text in this document there is a paragraph in it(
https://www.interdb.jp/pg/pgsql08/03.html)
/*
The BufMappingLock is split into partitions to reduce contention in the
buffer table (the default is 128 partitions). Each BufMappingLock partition
guards a portion of the corresponding hash bucket slots.
*/,

Physical servers with terabytes of RAM are now commonplace,I'm looking at
the comments inside the source code.I'm looking at the comments inside the
source code to see if they still match the current hardware capability? The
 comment says that in the future there may be a parameter,Iam a  dba now
and I try to think of implementing this parameter, but I'm not a
professional kernel developer, I still want the community senior developer
to implement this parameter

/*
 * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
 * here, but we need them to figure out offsets within MainLWLockArray, and
 * having this file include lock.h or bufmgr.h would be backwards.
 */

/* Number of partitions of the shared buffer mapping hashtable */
#define NUM_BUFFER_PARTITIONS  128

/*
 * The number of partitions for locking purposes.  This is set to match
 * NUM_BUFFER_PARTITIONS for now, on the basis that whatever's good enough
for
 * the buffer pool must be good enough for any other purpose.  This could
 * become a runtime parameter in future.
 */
#define DSHASH_NUM_PARTITIONS_LOG2 7
#define DSHASH_NUM_PARTITIONS (1 << DSHASH_NUM_PARTITIONS_LOG2)

Re: Thoughts about NUM_BUFFER_PARTITIONS

2024-02-08 Thread wenhui qiu

Hi Heikki Linnakangas
I think the larger shared buffer  higher the probability of multiple
backend processes accessing the same bucket slot BufMappingLock
simultaneously, （   InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I
have free time, I want to do this test. I have seen some tests, but the
result report is in Chinese


Best wishes

Heikki Linnakangas  于2024年2月8日周四 19:26写道：

> On 08/02/2024 12:17, wenhui qiu wrote:
> > HI hackers
> >  When I read this text in this document there is a paragraph in
> > it(https://www.interdb.jp/pg/pgsql08/03.html
> > <https://www.interdb.jp/pg/pgsql08/03.html>)
> > /*
> > The BufMappingLock is split into partitions to reduce contention in the
> > buffer table (the default is 128 partitions). Each BufMappingLock
> > partition guards a portion of the corresponding hash bucket slots.
> > */,
> >
> > Physical servers with terabytes of RAM are now commonplace,I'm looking
> > at the comments inside the source code.I'm looking at the comments
> > inside the source code to see if they still match the current hardware
> > capability?
>
> The optimal number of partitions has more to do with the number of
> concurrent processes using the buffer cache, rather than the size of the
> cache. The number of CPUs in servers has increased too, but not as much
> as RAM.
>
> But yeah, it's a valid question if the settings still make sense with
> modern hardware.
>
> > The  comment says that in the future there may be a
> > parameter,Iam a  dba now and I try to think of implementing this
> > parameter, but I'm not a professional kernel developer, I still want the
> > community senior developer to implement this parameter
>
> The first thing to do would be to benchmark with different
> NUM_BUFFER_PARTITIONS settings, and see if there's benefit in having
> more partitions.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>

Re: Commitfest Manager for March

2024-03-20 Thread wenhui qiu

 Hi Aleksander Alekseev
  Could you take a look at the patch (
https://commitfest.postgresql.org/47/4284/),How about your opinion

Thanks

On Tue, 12 Mar 2024 at 21:41, Aleksander Alekseev 
wrote:

> Hi Andrey,
>
> > > If you need any help please let me know.
> >
> > Aleksander, I would greatly appreciate if you join me in managing CF.
> Together we can move more stuff :)
> > Currently, I'm going through "SQL Commands". And so far I had not come
> to "Performance" and "Server Features" at all... So if you can handle
> updating statuses of that sections - that would be great.
>
> OK, I'll take care of the "Performance" and "Server Features"
> sections. I submitted my summaries of the entries triaged so far to
> the corresponding thread [1].
>
> [1]:
> https://www.postgresql.org/message-id/CAJ7c6TN9SnYdq%3DkfP-txgo5AaT%2Bt9YU%2BvQHfLBZqOBiHwoipAg%40mail.gmail.com
>
> --
> Best regards,
> Aleksander Alekseev
>
>
>

Re: Thoughts about NUM_BUFFER_PARTITIONS

2024-02-19 Thread wenhui qiu

Hi Japlin Li
   Thank you for such important information ! Got it

Japin Li  于2024年2月19日周一 10:26写道：

>
> On Mon, 19 Feb 2024 at 00:56, Tomas Vondra 
> wrote:
> > On 2/18/24 03:30, Li Japin wrote:
> >>
> >> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the
> NUM_BUFFER_PARTITIONS,
> >> I didn’t find any comments to describe the relation between
> MAX_SIMUL_LWLOCKS and
> >> NUM_BUFFER_PARTITIONS, am I missing someghing?
> >
> > IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
> > higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
> > the partition locks if needed.
> >
>
> Thanks for the explanation!  Got it.
>
> > There's other places that acquire a bunch of locks, and all of them need
> > to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
> > GIST_MAX_SPLIT_PAGES.
> >
> >
> > regards
>

Re: Thoughts about NUM_BUFFER_PARTITIONS

2024-02-19 Thread wenhui qiu

Hi Heikki Linnakangas
   I saw git log found this commit:
https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892
,I don't seem to see an email discussing this commit. As the commit log
tells us, we don't know exactly how large a value is optimal, and I believe
it's more flexible to make it as a parameter.Thank you very much
tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS
was just doubled in this commit, is there a more appropriate ratio between
them?

```
commit 3acc10c997f916f6a741d0b4876126b7b08e3892
Author: Robert Haas 
Date:   Thu Oct 2 13:58:50 2014 -0400

Increase the number of buffer mapping partitions to 128.

Testing by Amit Kapila, Andres Freund, and myself, with and without
other patches that also aim to improve scalability, seems to indicate
that this change is a significant win over the current value and over
smaller values such as 64.  It's not clear how high we can push this
value before it starts to have negative side-effects elsewhere, but
going this far looks OK.

`````````

wenhui qiu  于2024年2月20日周二 09:36写道：

> Hi Japlin Li
>Thank you for such important information ! Got it
>
> Japin Li  于2024年2月19日周一 10:26写道：
>
>>
>> On Mon, 19 Feb 2024 at 00:56, Tomas Vondra 
>> wrote:
>> > On 2/18/24 03:30, Li Japin wrote:
>> >>
>> >> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the
>> NUM_BUFFER_PARTITIONS,
>> >> I didn’t find any comments to describe the relation between
>> MAX_SIMUL_LWLOCKS and
>> >> NUM_BUFFER_PARTITIONS, am I missing someghing?
>> >
>> > IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
>> > higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
>> > the partition locks if needed.
>> >
>>
>> Thanks for the explanation!  Got it.
>>
>> > There's other places that acquire a bunch of locks, and all of them need
>> > to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
>> > GIST_MAX_SPLIT_PAGES.
>> >
>> >
>> > regards
>>
>

Re: Thoughts about NUM_BUFFER_PARTITIONS

2024-02-23 Thread wenhui qiu

Hi Tomas Vondra
Thanks for the information!  But I found postgres pro enterprise
version has been implemented ,However, it defaults to 16 and maxes out at
128, and the maxes are the same as in PostgreSQL.I kindly  hope that if the
developers can explain what the purpose of this is.May be 128 partitions is
the optimal value,It's a parameter to make it easier to adjust the number
of partitions in the future when it's really not enough. and the code
comments also said that  hope to implement the parameter in the future


( https://postgrespro.com/docs/enterprise/16/runtime-config-locks )


log2_num_lock_partitions (integer) #
<https://postgrespro.com/docs/enterprise/16/runtime-config-locks#GUC-LOG2-NUM-LOCK-PARTITIONS>

This controls how many partitions the shared lock tables are divided into.
Number of partitions is calculated by raising 2 to the power of this
parameter. The default value is 4, which corresponds to 16 partitions, and
the maximum is 8. This parameter can only be set in the postgresql.conf file
or on the server command line.

Best wish


On Tue, 20 Feb 2024 at 21:55, Tomas Vondra 
wrote:

> Hi,
>
> On 2/20/24 03:16, wenhui qiu wrote:
> > Hi Heikki Linnakangas
> >I saw git log found this commit:
> >
> https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892
> > ,I don't seem to see an email discussing this commit. As the commit log
> > tells us, we don't know exactly how large a value is optimal, and I
> believe
> > it's more flexible to make it as a parameter.Thank you very much
> > tomas.vondra for explaining the relationship, i see that
> MAX_SIMUL_LWLOCKS
> > was just doubled in this commit, is there a more appropriate ratio
> between
> > them?
> >
>
> I think the discussion for that commit is in [1] (and especially [2]).
>
> That being said, I don't think MAX_SIMUL_LOCKS and NUM_BUFFER_PARTITIONS
> need to be in any particular ratio. The only requirement is that there
> needs to be enough slack, and 72 locks seemed to work quite fine until
> now - I don't think we need to change that.
>
> What might be necessary is improving held_lwlocks - we treat is as LIFO,
> but more as an expectation than a hard rule. I'm not sure how often we
> violate that rule (if at all), but if we do then it's going to get more
> expensive as we increase the number of locks. But I'm not sure this is
> actually a problem in practice, we usually hold very few LWLocks at the
> same time.
>
> As for making this a parameter, I'm rather opposed to the idea. If we
> don't have a very clear idea how to set this limit, what's the chance
> users with little knowledge of the internals will pick a good value?
> Adding yet another knob would just mean users start messing with it in
> random ways (typically increasing it to very high value, because "more
> is better"), causing more harm than good.
>
> Adding it as a GUC would also require making some parts dynamic (instead
> of just doing static allocation with compile-time constants). That's not
> great, but I'm not sure how significant the penalty might be.
>
>
> IMHO adding a GUC might be acceptable only if we fail to come up with a
> good value (which is going to be a trade off), and if someone
> demonstrates a clear benefit of increasing the value (which I don't
> think happen in this thread yet).
>
>
> regards
>
>
> [1]
>
> https://www.postgresql.org/message-id/flat/CAA4eK1LSTcMwXNO8ovGh7c0UgCHzGbN%3D%2BPjggfzQDukKr3q_DA%40mail.gmail.com
>
> [2]
>
> https://www.postgresql.org/message-id/CA%2BTgmoY58dQi8Z%3DFDAu4ggxHV-HYV03-R9on1LSP9OJU_fy_zA%40mail.gmail.com
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: Support "Right Semi Join" plan shapes

2024-03-03 Thread wenhui qiu

HI Richard
 Now it is starting the last commitfest for v17, can you respond to
Alena Rybakina points?


Regards

On Thu, 8 Feb 2024 at 13:50, wenhui qiu  wrote:

> Hi Alena Rybakina
>  I saw this code snippet also disable mergejoin ,I think it same  effect
> + /*
> + * For now we do not support RIGHT_SEMI join in mergejoin.
> + */
> + if (jointype == JOIN_RIGHT_SEMI)
> + {
> + *mergejoin_allowed = false;
> + return NIL;
> + }
> +
>
> Regards
>
> Alena Rybakina  于2024年1月30日周二 14:51写道：
>
>> Hi! Thank you for your work on this subject.
>>
>> I have reviewed your patch and I think it is better to add an Assert for
>> JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
>> prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
>> and MergeJoin).
>> Mostly I'm suggesting this because of the set_join_pathlist_hook
>> function, which is in the add_paths_to_joinrel function, which allows
>> you to create a custom node. What do you think?
>>
>> --
>> Regards,
>> Alena Rybakina
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>

Re: Support "Right Semi Join" plan shapes

2024-03-04 Thread wenhui qiu

Hi Richard
 Agree +1 ,I think can push now.

Richard

On Tue, 5 Mar 2024 at 10:44, Richard Guo  wrote:

>
> On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina 
> wrote:
>
>> I have reviewed your patch and I think it is better to add an Assert for
>> JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
>> prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
>> and MergeJoin).
>
>
> Hmm, I don't see why this is necessary.  The planner should already
> guarantee that we won't have nestloops/mergejoins with right-semi joins.
>
> Thanks
> Richard
>

Re: Support "Right Semi Join" plan shapes

2024-03-05 Thread wenhui qiu

Hi Alena Rybakina
For merge join
+ /*
+ * For now we do not support RIGHT_SEMI join in mergejoin.
+ */
+ if (jointype == JOIN_RIGHT_SEMI)
+ {
+ *mergejoin_allowed = false;
+ return NIL;
+ }
+
Tanks

On Wed, 6 Mar 2024 at 04:10, Alena Rybakina 
wrote:

> To be honest, I didn't see it in the code, could you tell me where they
> are, please?
> On 05.03.2024 05:44, Richard Guo wrote:
>
>
> On Tue, Jan 30, 2024 at 2:51 PM Alena Rybakina 
> wrote:
>
>> I have reviewed your patch and I think it is better to add an Assert for
>> JOIN_RIGHT_SEMI to the ExecMergeJoin and ExecNestLoop functions to
>> prevent the use of RIGHT_SEMI for these types of connections (NestedLoop
>> and MergeJoin).
>
>
> Hmm, I don't see why this is necessary.  The planner should already
> guarantee that we won't have nestloops/mergejoins with right-semi joins.
>
> Thanks
> Richard
>
> --
> Regards,
> Alena Rybakina
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

Re: Add 64-bit XIDs into PostgreSQL 15

2023-12-14 Thread wenhui qiu

Hi Maxim Orlov
Good news,xid64 has achieved a successful first phase,I tried to change
the path status (https://commitfest.postgresql.org/43/3594/) ,But it seems
incorrect

Maxim Orlov  于2023年12月13日周三 20:26写道：

> Hi!
>
> Just to keep this thread up to date, here's a new version after recent
> changes in SLRU.
> I'm also change order of the patches in the set, to make adding initdb MOX
> options after the
> "core 64 xid" patch, since MOX patch is unlikely to be committed and now
> for test purpose only.
>
> --
> Best regards,
> Maxim Orlov.
>

Support "Right Semi Join" plan shapes

2023-12-14 Thread wenhui qiu

Hi Richard Guo   I see that the test samples are all (exists)
subqueries ,I think semi join should also support ( in) and ( any)
subqueries. would you do more test on  ( in) and ( any) subqueries?


Best whish

Re: Add 64-bit XIDs into PostgreSQL 15

2023-12-15 Thread wenhui qiu

Hi Pavel Borisov
Many thanks

Best whish

Pavel Borisov  于2023年12月15日周五 17:13写道：

> Hi, Wenhui!
>
> On Fri, 15 Dec 2023 at 05:52, wenhui qiu  wrote:
>
>> Hi Maxim Orlov
>> Good news,xid64 has achieved a successful first phase,I tried to
>> change the path status (https://commitfest.postgresql.org/43/3594/) ,But
>> it seems incorrect
>>
>> Maxim Orlov  于2023年12月13日周三 20:26写道：
>>
>>> Hi!
>>>
>>> Just to keep this thread up to date, here's a new version after recent
>>> changes in SLRU.
>>> I'm also change order of the patches in the set, to make adding initdb
>>> MOX options after the
>>> "core 64 xid" patch, since MOX patch is unlikely to be committed and now
>>> for test purpose only.
>>>
>>
> If the patch is RwF the CF entry is finished and can't be enabled, rather
> the patch needs to be submitted in a new entry, which I have just done.
> https://commitfest.postgresql.org/46/4703/
>
> Please feel free to submit your review.
>
> Kind regards,
> Pavel Borisov,
> Supabase
>

Re: Transaction timeout

2023-12-20 Thread wenhui qiu

Hi Junwang Zhao
   Agree +1

Best whish

Junwang Zhao  于2023年12月20日周三 10:35写道：

> On Wed, Dec 20, 2023 at 9:58 AM Thomas wen
>  wrote:
> >
> > Hi Junwang Zhao
> >  #should we invalidate lock_timeout? Or maybe just document this.
> > I think you mean when lock_time is greater than trasaction-time
> invalidate lock_timeout or  needs to be logged ?
> >
> I mean the interleaving of the gucs, which is lock_timeout and the new
> introduced transaction_timeout,
> if lock_timeout >= transaction_timeout, seems no need to enable
> lock_timeout.
> >
> >
> >
> > Best whish
> > 
> > 发件人: Junwang Zhao 
> > 发送时间: 2023年12月20日 9:48
> > 收件人: Andrey M. Borodin 
> > 抄送: Japin Li ; 邱宇航 ; Fujii Masao
> ; Andrey Borodin ;
> Andres Freund ; Michael Paquier <
> michael.paqu...@gmail.com>; Nikolay Samokhvalov ;
> pgsql-hackers ;
> pgsql-hackers@lists.postgresql.org 
> > 主题: Re: Transaction timeout
> >
> > On Tue, Dec 19, 2023 at 10:51 PM Junwang Zhao  wrote:
> > >
> > > On Tue, Dec 19, 2023 at 6:27 PM Andrey M. Borodin <
> x4...@yandex-team.ru> wrote:
> > > >
> > > >
> > > >
> > > > > On 19 Dec 2023, at 13:26, Andrey M. Borodin 
> wrote:
> > > > >
> > > > > I don’t have Windows machine, so I hope CF bot will pick this.
> > > >
> > > > I used Github CI to produce version of tests that seems to be is
> stable on Windows.
> > > > Sorry for the noise.
> > > >
> > > >
> > > > Best regards, Andrey Borodin.
> > >
> > > +   
> > > +If transaction_timeout is shorter than
> > > +idle_in_transaction_session_timeout or
> > > statement_timeout
> > > +transaction_timeout will invalidate longer
> timeout.
> > > +   
> > >
> > > When transaction_timeout is *equal* to
> idle_in_transaction_session_timeout
> > > or statement_timeout, idle_in_transaction_session_timeout and
> statement_timeout
> > > will also be invalidated, the logic in the code seems right, though
> > > this document
> > > is a little bit inaccurate.
> > >
> >
> > Unlike statement_timeout, this timeout can
> only occur
> > while waiting for locks.  Note that if
> > statement_timeout
> > is nonzero, it is rather pointless to set
> > lock_timeout to
> > the same or larger value, since the statement timeout would
> always
> > trigger first.  If log_min_error_statement is
> set to
> > ERROR or lower, the statement that timed out
> will be
> > logged.
> >
> >
> > There is a note about statement_timeout and lock_timeout, set both
> > and lock_timeout >= statement_timeout is pointless, but this logic seems
> not
> > implemented in the code. I am wondering if lock_timeout >=
> transaction_timeout,
> > should we invalidate lock_timeout? Or maybe just document this.
> >
> > > --
> > > Regards
> > > Junwang Zhao
> >
> >
> >
> > --
> > Regards
> > Junwang Zhao
> >
> >
>
>
> --
> Regards
> Junwang Zhao
>
>
>

Re: Support "Right Semi Join" plan shapes

2023-12-27 Thread wenhui qiu

: ((a.fivethous <> fivethous) AND (a.twothousand =
twothousand))
   ->  Materialize
 ->  Seq Scan on int4_tbl i4
(9 rows)
test=# set enable_nestloop =off;
SET
test=# explain (costs off ) SELECT *
FROM int4_tbl i4, tenk1 a
WHERE (a.twothousand, a.fivethous) IN (
SELECT b.twothousand, b.fivethous
FROM tenk1 b
WHERE a.twothousand = b.twothousand  and  a.fivethous <> b.fivethous
)
AND i4.f1 = a.tenthous;
   QUERY PLAN

 Hash Join
   Hash Cond: (a.tenthous = i4.f1)
   ->  Seq Scan on tenk1 a
 Filter: (SubPlan 1)
 SubPlan 1
   ->  Seq Scan on tenk1 b
 Filter: ((a.fivethous <> fivethous) AND (a.twothousand =
twothousand))
   ->  Hash
 ->  Seq Scan on int4_tbl i4
(9 rows)


```

wenhui qiu  于2023年12月15日周五 14:40写道：

> Hi Richard Guo   I see that the test samples are all (exists)
> subqueries ,I think semi join should also support ( in) and ( any)
> subqueries. would you do more test on  ( in) and ( any) subqueries?
>
>
> Best whish
>

Re: Support "Right Semi Join" plan shapes

2024-06-24 Thread wenhui qiu

Hi Japin Li
 Thank you for your reviewing ,This way the notes are more accurate and
complete. Thanks also to the author for updating the patch ,I also tested
the new patch ,It looks good to me


Regrads

Japin Li  于2024年6月25日周二 08:51写道：

> On Mon, 24 Jun 2024 at 17:59, Richard Guo  wrote:
> > Thank you for reviewing.
> >
> > On Mon, Jun 24, 2024 at 1:27 PM Li Japin  wrote:
> >> +   /*
> >> +* For now we do not support RIGHT_SEMI join in mergejoin or
> nestloop
> >> +* join.
> >> +*/
> >> +   if (jointype == JOIN_RIGHT_SEMI)
> >> +   return;
> >> +
> >>
> >> How about adding some reasons here?
> >
> > I've included a brief explanation in select_mergejoin_clauses.
> >
>
> Thank you for updating the patch.
>
> >> + * this is a right-semi join, or this is a right/right-anti/full join
> and
> >> + * there are nonmergejoinable join clauses.  The executor's mergejoin
> >>
> >> Maybe we can put the right-semi join together with the
> right/right-anti/full
> >> join.  Is there any other significance by putting it separately?
> >
> > I don't think so.  The logic is different: for right-semi join we will
> > always set *mergejoin_allowed to false, while for right/right-anti/full
> > join it is set to false only if there are nonmergejoinable join clauses.
> >
>
> Got it.  Thanks for the explanation.
>
> >> Maybe the following comments also should be updated. Right?
> >
> > Correct.  And there are a few more places where we need to mention
> > JOIN_RIGHT_SEMI, such as in reduce_outer_joins_pass2 and in the comment
> > for SpecialJoinInfo.
> >
> >
> > I noticed that this patch changes the plan of a query in join.sql from
> > a semi join to right semi join, compromising the original purpose of
> > this query, which was to test the fix for neqjoinsel's behavior for
> > semijoins (see commit 7ca25b7d).
> >
> > --
> > -- semijoin selectivity for <>
> > --
> > explain (costs off)
> > select * from int4_tbl i4, tenk1 a
> > where exists(select * from tenk1 b
> >  where a.twothousand = b.twothousand and a.fivethous <>
> b.fivethous)
> >   and i4.f1 = a.tenthous;
> >
> > So I've changed this test case a bit so that it is still testing what it
> > is supposed to test.
> >
> > In passing, I've also updated the commit message to clarify that this
> > patch does not address the support of "Right Semi Join" for merge joins.
> >
>
> Tested and looks good to me!
>
> --
> Regrads,
> Japin Li
>

Linux likely() unlikely() for PostgreSQL

2024-06-30 Thread wenhui qiu

Hi Hackers
   When I saw this document:https://en.wikipedia.org/wiki/Branch_predictor,
I thought of Linux likely() vs unlikely() and thus thought that there are
some code segments in src/backend/executor/execMain.c that can be optimized.
For example :
if (ExecutorStart_hook)
(*ExecutorStart_hook) (queryDesc, eflags);
else
standard_ExecutorStart(queryDesc, eflags);
}
void
ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count,
bool execute_once)
{
if (ExecutorRun_hook)
(*ExecutorRun_hook) (queryDesc, direction, count, execute_once);
else
standard_ExecutorRun(queryDesc, direction, count, execute_once);
}
###
if (unlikely(ExecutorRun_hook)),

I hope to get guidance from community experts，Many thanks

Thanks

Re: Linux likely() unlikely() for PostgreSQL

2024-06-30 Thread wenhui qiu

Hi Tom ,Matthias
Thank you  for your explanation.Maybe future compilers will be able to
do intelligent prediction?


Thanks

Tom Lane  于2024年6月30日周日 23:23写道：

> Matthias van de Meent  writes:
> > On Sun, 30 Jun 2024, 15:56 wenhui qiu,  wrote:
> >> if (unlikely(ExecutorRun_hook)),
>
> > While hooks are generally not installed by default, I would advise
> > against marking the hooks as unlikely, as that would unfairly penalize
> > the performance of extensions that do utilise this hook (or hooks in
> > general when applied to all hooks).
>
> In general, we have a policy of using likely/unlikely very sparingly,
> and only in demonstrably hot code paths.  This hook call certainly
> doesn't qualify as hot.
>
> Having said that ... something I've been wondering about is how to
> teach the compiler that error paths are unlikely.  Doing this
> across-the-board wouldn't be "sparingly", but I think surely it'd
> result in better code quality on average.  This'd be easy enough
> to do in Assert:
>
>  #define Assert(condition) \
> do { \
> -   if (!(condition)) \
> +   if (unlikely(!(condition))) \
> ExceptionalCondition(#condition, __FILE__,
> __LINE__); \
> } while (0)
>
> but on the other hand we don't care that much about micro-optimization
> of assert-enabled builds, so maybe that's not worth the trouble.  The
> real win would be if constructs like
>
> if (trouble)
> ereport(ERROR, ...);
>
> could be interpreted as
>
> if (unlikely(trouble))
> ereport(ERROR, ...);
>
> But I surely don't want to make thousands of such changes manually.
> And it's possible that smart compilers already realize this, using
> a heuristic that any path that ends in pg_unreachable() must be
> unlikely.  Is there a way to encourage compilers to believe that?
>
> regards, tom lane
>

Re: Support "Right Semi Join" plan shapes

2024-07-04 Thread wenhui qiu

Hi Richard Guo
 Thank you for updating the patch.Tested on v8 , It looks good to me



Thanks

Richard Guo  于2024年7月4日周四 17:18写道：

> On Fri, Jun 28, 2024 at 3:21 PM Richard Guo 
> wrote:
> > On Fri, Jun 28, 2024 at 2:54 PM Richard Guo 
> wrote:
> > > I've refined this test case further to make it more stable by using an
> > > additional filter 'a.tenthous < 5000'.  Besides, I noticed a surplus
> > > blank line in ExecHashJoinImpl().  I've removed it in the v7 patch.
> >
> > BTW, I've also verified the empty-rel optimization for hash join and
> > AFAICT it works correctly for the new right-semi join.
>
> Here is a new rebase.
>
> Barring objections, I'm planning to push it soon.
>
> Thanks
> Richard
>

Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables

2024-07-07 Thread wenhui qiu

Hi feichanghong
 Thanks for updating the patch ,I think could be configured as a GUC
parameter,PostgreSQL has too many static variables that are written to
death and explicitly stated in the code comments may later be designed as
parameters. Now that more and more applications that previously used oracle
are migrating to postgresql, there will be more and more scenarios where
temporary tables are heavily used.Because oracle will global temporary
tablespace optimised for this business scenario, which works well in
oracle, migrating to pg faces very tricky performance issues,I'm sure the
patch has vaule

Best Regards

feichanghong  于2024年7月6日周六 03:40写道：

> The patch in the attachment, compared to the previous one, adds a
> threshold for
> using the bloom filter. The current ON_COMMITS_FILTER_THRESHOLD is set to
> 64,
> which may not be the optimal value. Perhaps this threshold could be
> configured
> as a GUC parameter?
> --
> Best Regards,
> Fei Changhong
>

Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables

2024-07-07 Thread wenhui qiu

Hi feichanghong
I don't think it's acceptable to introduce a patch to fix a problem
that leads to performance degradation, or can we take tom's suggestion to
optimise PreCommit_on_commit_actions?  I think it to miss the forest for
the trees



Best Regards,

feichanghong  于2024年7月8日周一 10:35写道：

> Hi wenhui,
>
> Thank you for your suggestions. I have supplemented some performance tests.
>
> Here is the TPS performance data for different numbers of temporary tables
> under different thresholds, as compared with the head (98347b5a). The
> testing
> tool used is pgbench, with the workload being to insert into one temporary
> table (when the number of temporary tables is 0, the workload is SELECT 1):
>
> | table num | 0| 1| 5   | 10
> | 100 | 1000|
>
> |---|--|--|-|-|-|-|
> | head 98347b5a | 39912.722209 | 10064.306268 | 7452.071689 | 5641.487369
> | 1073.203851 | 114.530958  |
> | threshold 1   | 40332.367414 | 7078.117192  | 7044.951156 | 7020.249434
> | 6893.652062 | 5826.597260 |
> | threshold 5   | 40173.562744 | 10017.532933 | 7023.770203 | 7024.283577
> | 6919.769315 | 5806.314494 |
>
> Here is the TPS performance data for different numbers of temporary tables
> at a threshold of 5, compared with the head (commit 98347b5a). The testing
> tool
> is pgbench, with the workload being to insert into all temporary tables:
>
> | table num | 1   | 5   | 10  | 100|
> 1000  |
>
> |---|-|-|-||---|
> | head 98347b5a | 7243.945042 | 3627.290594 | 2262.594766 | 297.856756 |
> 27.745808 |
> | threshold 5   | 7287.764656 | 3130.814888 | 2038.308763 | 288.226032 |
> 27.705149 |
>
> According to test results, the patch does cause some performance loss with
> fewer temporary tables, but benefits are substantial when many temporary
> tables
> are used. The specific threshold could be set to 10 (HDDs may require a
> smaller
> one).
>
> I've provided two patches in the attachments, both with a default
> threshold of 10.
> One has the threshold configured as a GUC parameter, while the other is
> hardcoded
> to 10.
> --
> Best Regards,
> Fei Changhong
>

Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables

2024-07-07 Thread wenhui qiu

Hi feichanghong
I think adding an intercept this way is better than implementing a
global temp table,there is  a path to  implement a global temporary table (
https://www.postgresql.org/message-id/1a1a6edc-d0ec-47b0-bd21-c2acbaea6...@alibaba-inc.com),you
can consult with them ,they work at Alibaba


Best Regards,

feichanghong  于2024年7月8日周一 12:42写道：

> Hi wenhui,
>
> On Jul 8, 2024, at 12:18, wenhui qiu  wrote:
>
> Hi feichanghong
> I don't think it's acceptable to introduce a patch to fix a problem
> that leads to performance degradation, or can we take tom's suggestion to
> optimise PreCommit_on_commit_actions?  I think it to miss the forest for
> the trees
>
>
> You're right, any performance regression is certainly unacceptable. That's
> why
>
> we've introduced a threshold. The bloom filter optimization is only applied
>
> when the number of temporary tables exceeds this threshold. Test data also
>
> reveals that with a threshold of 10, barring cases where all temporary
> tables
>
> are implicated in a transaction, there's hardly any performance loss.
>
>
> "Improve PreCommit_on_commit_actions by having it just quit immediately if
>
> XACT_FLAGS_ACCESSEDTEMPNAMESPACE is not set" can only reduce the overhead
> of
>
> traversing the OnCommitItem List but still doesn't address the issue with
>
> temporary table truncation.
>
>
> Looking forward to more suggestions!
>
> Best Regards,
> Fei Changhong
>
>

Re: allow changing autovacuum_max_workers without restarting

2024-04-16 Thread wenhui qiu

Agree +1,From a dba perspective, I would prefer that this parameter can be
dynamically modified, rather than adding a new parameter,What is more
difficult is how to smoothly reach the target value when the setting is
considered to be too large and needs to be lowered.



Regards

On Tue, 16 Apr 2024 at 01:41, Imseih (AWS), Sami  wrote:

> > Another option could be to just remove the restart-only GUC and hard-code
> > the upper limit of autovacuum_max_workers to 64 or 128 or something.
> While
> > that would simplify matters, I suspect it would be hard to choose an
> > appropriate limit that won't quickly become outdated.
>
> Hardcoded values are usually hard to deal with because they are hidden
> either
> In code or in docs.
>
> > When I thought about this, I considered proposing to add a new GUC for
> > "autovacuum_policy_workers".
>
> > autovacuum_max_workers would be the same as before, requiring a restart
> > to change.  The policy GUC would be the soft limit, changable at runtime
>
> I think autovacuum_max_workers should still be the GUC that controls
> the number of concurrent autovacuums. This parameter is already well
> established and changing the meaning now will be confusing.
>
> I suspect most users will be glad it's now dynamic, but will probably
> be annoyed if it's no longer doing what it's supposed to.
>
> Regards,
>
> Sami
>
>

Re: POC: make mxidoff 64 bits

2024-04-23 Thread wenhui qiu

Hi Maxim Orlov
   Thank you so much for your tireless work on this. Increasing the WAL
size by a few bytes should have very little impact with today's disk
performance(Logical replication of this feature wal log is also increased a
lot, logical replication is a milestone new feature, and the community has
been improving the logical replication of functions),I believe removing
troubled postgresql Transaction ID Wraparound was also a milestone  new
feature  adding a few bytes is worth it!

Best regards

On Tue, 23 Apr 2024 at 17:37, Heikki Linnakangas  wrote:

> On 23/04/2024 11:23, Maxim Orlov wrote:
> > PROPOSAL
> > Make multixact offsets 64 bit.
>
> +1, this is a good next step and useful regardless of 64-bit XIDs.
>
> > @@ -156,7 +148,7 @@
> >   ((uint32) ((0x % MULTIXACT_MEMBERS_PER_PAGE) + 1))
> >
> >  /* page in which a member is to be found */
> > -#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId)
> MULTIXACT_MEMBERS_PER_PAGE)
> > +#define MXOffsetToMemberPage(xid) ((xid) / (MultiXactOffset)
> MULTIXACT_MEMBERS_PER_PAGE)
> >  #define MXOffsetToMemberSegment(xid) (MXOffsetToMemberPage(xid) /
> SLRU_PAGES_PER_SEGMENT)
> >
> >  /* Location (byte offset within page) of flag word for a given member */
>
> This is really a bug fix. It didn't matter when TransactionId and
> MultiXactOffset were both typedefs of uint32, but it was always wrong.
> The argument name 'xid' is also misleading.
>
> I think there are some more like that, MXOffsetToFlagsBitShift for example.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>
>
>

Re: Support "Right Semi Join" plan shapes

2024-04-28 Thread wenhui qiu

Hi Richard
 Thank you so much for your tireless work on this,I see the new version
of the patch improves some of the comments .I think it can commit in July


Thanks

On Thu, 25 Apr 2024 at 11:28, Richard Guo  wrote:

> Here is another rebase with a commit message to help review.  I also
> tweaked some comments.
>
> Thanks
> Richard
>

Re: Proposal to Enable/Disable Index using ALTER INDEX

2024-09-10 Thread wenhui qiu

Hi Shayon
   Thank you for  your work on this , I think it's great to have this
feature implemented ,I checked the doucment  on other databases,It seems
both MySQL 8.0  and oracle supports it, sql server need to rebuild indexes
after  disabled,It seems  disable the index, it's equivalent to deleting
the index, except that the index's metadata is still retained:
https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/invisible-indexes.html
https://learn.microsoft.com/en-us/sql/t-sql/statements/alter-index-transact-sql?view=sql-server-ver16
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/ALTER-INDEX.html
->A disabled index would still receive updates and enforce constraints as
usual but would not be used for queries. This allows users to assess ->
->whether an index impacts query performance before deciding to drop it
entirely.
MySQL 8.0 and oracle settings are not visible, index information is always
updated, I would then suggest that the statement be changed to set the
index invisible and visible.



Thanks

David Rowley  于2024年9月10日周二 06:17写道：

> On Tue, 10 Sept 2024 at 09:39, Shayon Mukherjee  wrote:
> > Adding and removing indexes is a common operation in PostgreSQL. On
> larger databases, however, these operations can be resource-intensive. When
> evaluating the performance impact of one or more indexes, dropping them
> might not be ideal since as a user you may want a quicker way to test their
> effects without fully committing to removing & adding them back again.
> Which can be a time taking operation on larger tables.
> >
> > Proposal:
> > I propose adding an ALTER INDEX command that allows for enabling or
> disabling an index globally. This could look something like:
> >
> > ALTER INDEX index_name ENABLE;
> > ALTER INDEX index_name DISABLE;
> >
> > A disabled index would still receive updates and enforce constraints as
> usual but would not be used for queries. This allows users to assess
> whether an index impacts query performance before deciding to drop it
> entirely.
>
> I personally think having some way to alter an index to stop it from
> being used in query plans would be very useful for the reasons you
> mentioned.  I don't have any arguments against the syntax you've
> proposed.  We'd certainly have to clearly document that constraints
> are still enforced. Perhaps there is some other syntax which would
> self-document slightly better. I just can't think of it right now.
>
> > Implementation:
> > To keep this simple, I suggest toggling the indisvalid flag in pg_index
> during the enable/disable operation.
>
> That's not a good idea as it would allow ALTER INDEX ... ENABLE; to be
> used to make valid a failed concurrently created index.  I think this
> would need a new flag and everywhere in the planner would need to be
> adjusted to ignore indexes when that flag is false.
>
> > Additional Considerations:
> > - Keeping the index up-to-date while it’s disabled seems preferable, as
> it avoids the need to rebuild the index if it’s re-enabled later. The
> alternative would be dropping and rebuilding the index upon re-enabling,
> which I believe would introduce additional overhead in terms of application
> logic & complexity.
>
> I think the primary use case here is to assist in dropping useless
> indexes in a way that can very quickly be undone if the index is more
> useful than thought. If you didn't keep the index up-to-date then that
> would make the feature useless for that purpose.
>
> If we get the skip scan feature for PG18, then there's likely going to
> be lots of people with indexes that they might want to consider
> removing after upgrading. Maybe this is a good time to consider this
> feature as it possibly won't ever be more useful than it will be after
> we get skip scans.
>
> David
>
>
>

Re: Can we rely on the ordering of paths in pathlist?

2024-09-17 Thread wenhui qiu

Hi Richard Guo
I  tried to changed the comment, can you help me to check if this is
correct?Many thanks.
-  /*
-  * get_cheapest_parallel_safe_total_inner
-  *  Find the unparameterized parallel-safe path with the least total cost.
-  */
+ /* get_cheapest_parallel_safe_total_inner
+  *  Skip paths that do not meet the criteria,find the unparameterized
parallel-safe path with the least total cost and return NULL if it does not
exist.
+  *
+  */

Thanks

wenhui qiu  于2024年7月31日周三 09:21写道：

> Hi Richard Guo
> Today is the last day of the commitfest cycle.I think this patch
> should be commented ,Except for the comments, I tested it good to me
>
>
> Thanks
>
> wenhui qiu  于2024年7月25日周四 16:18写道：
>
>> Hi Richard Guo
>> Is it necessary to add some comments here？
>>
>>
>> + if (!innerpath->parallel_safe ||
>> + !bms_is_empty(PATH_REQ_OUTER(innerpath)))
>> + continue;
>> +
>> + if (matched_path != NULL &&
>> + compare_path_costs(matched_path, innerpath, TOTAL_COST) <= 0)
>> + continue;
>> +
>> + matched_path = innerpath;
>>
>> Richard Guo  于2024年1月10日周三 15:08写道：
>>
>>>
>>> On Tue, Apr 11, 2023 at 11:43 AM Andy Fan 
>>> wrote:
>>>
>>>> On Tue, Apr 11, 2023 at 11:03 AM Richard Guo 
>>>> wrote:
>>>>
>>>>> As the comment above add_path() says, 'The pathlist is kept sorted by
>>>>> total_cost, with cheaper paths at the front.'  And it seems that
>>>>> get_cheapest_parallel_safe_total_inner() relies on this ordering
>>>>> (without being mentioned in the comments, which I think we should do).
>>>>>
>>>>
>>>> I think the answer for ${subject} should be yes. Per the comments in
>>>> add_partial_path, we have
>>>>
>>>>  * add_partial_path
>>>>  *
>>>>  *  As in add_path, the partial_pathlist is kept sorted with the
>>>> cheapest
>>>>  *  total path in front.  This is depended on by multiple places, which
>>>>  *  just take the front entry as the cheapest path without searching.
>>>>  *
>>>>
>>>
>>> I'm not sure about this conclusion.  Surely we can depend on that the
>>> partial_pathlist is kept sorted by total_cost ASC.  This is emphasized
>>> in the comment of add_partial_path, and also leveraged in practice, such
>>> as in many places we just use linitial(rel->partial_pathlist) as the
>>> cheapest partial path.
>>>
>>> But get_cheapest_parallel_safe_total_inner works on pathlist not
>>> partial_pathlist.  And for pathlist, I'm not sure if it's a good
>>> practice to depend on its ordering.  Because
>>>
>>> 1) the comment of add_path only mentions that add_path_precheck relies
>>> on this ordering, but it does not mention that other functions also do;
>>>
>>> 2) other than add_path_precheck, I haven't observed any other functions
>>> that rely on this ordering.  The only exception as far as I notice is
>>> get_cheapest_parallel_safe_total_inner.
>>>
>>> On the other hand, if we declare that we can rely on the pathlist being
>>> sorted in ascending order by total_cost, we should update the comment
>>> for add_path to highlight this aspect.  We should also include a comment
>>> for get_cheapest_parallel_safe_total_inner to clarify why an early exit
>>> is possible, similar to what we do for add_path_precheck.  Additionally,
>>> in several places, we can optimize our code by taking advantage of this
>>> fact and terminate the iteration through the pathlist early when looking
>>> for the cheapest path of a certain kind.
>>>
>>> Thanks
>>> Richard
>>>
>>

Re: Can we rely on the ordering of paths in pathlist?

2024-07-25 Thread wenhui qiu

Hi Richard Guo
Is it necessary to add some comments here？


+ if (!innerpath->parallel_safe ||
+ !bms_is_empty(PATH_REQ_OUTER(innerpath)))
+ continue;
+
+ if (matched_path != NULL &&
+ compare_path_costs(matched_path, innerpath, TOTAL_COST) <= 0)
+ continue;
+
+ matched_path = innerpath;

Richard Guo  于2024年1月10日周三 15:08写道：

>
> On Tue, Apr 11, 2023 at 11:43 AM Andy Fan 
> wrote:
>
>> On Tue, Apr 11, 2023 at 11:03 AM Richard Guo 
>> wrote:
>>
>>> As the comment above add_path() says, 'The pathlist is kept sorted by
>>> total_cost, with cheaper paths at the front.'  And it seems that
>>> get_cheapest_parallel_safe_total_inner() relies on this ordering
>>> (without being mentioned in the comments, which I think we should do).
>>>
>>
>> I think the answer for ${subject} should be yes. Per the comments in
>> add_partial_path, we have
>>
>>  * add_partial_path
>>  *
>>  *  As in add_path, the partial_pathlist is kept sorted with the cheapest
>>  *  total path in front.  This is depended on by multiple places, which
>>  *  just take the front entry as the cheapest path without searching.
>>  *
>>
>
> I'm not sure about this conclusion.  Surely we can depend on that the
> partial_pathlist is kept sorted by total_cost ASC.  This is emphasized
> in the comment of add_partial_path, and also leveraged in practice, such
> as in many places we just use linitial(rel->partial_pathlist) as the
> cheapest partial path.
>
> But get_cheapest_parallel_safe_total_inner works on pathlist not
> partial_pathlist.  And for pathlist, I'm not sure if it's a good
> practice to depend on its ordering.  Because
>
> 1) the comment of add_path only mentions that add_path_precheck relies
> on this ordering, but it does not mention that other functions also do;
>
> 2) other than add_path_precheck, I haven't observed any other functions
> that rely on this ordering.  The only exception as far as I notice is
> get_cheapest_parallel_safe_total_inner.
>
> On the other hand, if we declare that we can rely on the pathlist being
> sorted in ascending order by total_cost, we should update the comment
> for add_path to highlight this aspect.  We should also include a comment
> for get_cheapest_parallel_safe_total_inner to clarify why an early exit
> is possible, similar to what we do for add_path_precheck.  Additionally,
> in several places, we can optimize our code by taking advantage of this
> fact and terminate the iteration through the pathlist early when looking
> for the cheapest path of a certain kind.
>
> Thanks
> Richard
>

Re: Add 64-bit XIDs into PostgreSQL 15

2024-07-25 Thread wenhui qiu

Maybe a setting similar to max_wal_size could be better for that?
+1

Thanks

Peter Eisentraut  于2024年7月25日周四 21:31写道：

> On 25.07.24 13:09, Heikki Linnakangas wrote:
> >> However, if there is no more disaster threshold at 2^31, what is the
> >> guidance for setting these?  Or more radically, why even run
> >> transaction-count-based vacuum at all?
> >
> > To allow the CLOG to be truncated. There's no disaster anymore, but
> > without freezing, the clog will grow indefinitely.
>
> Maybe a setting similar to max_wal_size could be better for that?
>
>
>
>

Re: Can we rely on the ordering of paths in pathlist?

2024-07-30 Thread wenhui qiu

Hi Richard Guo
Today is the last day of the commitfest cycle.I think this patch should
be commented ,Except for the comments, I tested it good to me


Thanks

wenhui qiu  于2024年7月25日周四 16:18写道：

> Hi Richard Guo
> Is it necessary to add some comments here？
>
>
> + if (!innerpath->parallel_safe ||
> + !bms_is_empty(PATH_REQ_OUTER(innerpath)))
> + continue;
> +
> + if (matched_path != NULL &&
> + compare_path_costs(matched_path, innerpath, TOTAL_COST) <= 0)
> + continue;
> +
> + matched_path = innerpath;
>
> Richard Guo  于2024年1月10日周三 15:08写道：
>
>>
>> On Tue, Apr 11, 2023 at 11:43 AM Andy Fan 
>> wrote:
>>
>>> On Tue, Apr 11, 2023 at 11:03 AM Richard Guo 
>>> wrote:
>>>
>>>> As the comment above add_path() says, 'The pathlist is kept sorted by
>>>> total_cost, with cheaper paths at the front.'  And it seems that
>>>> get_cheapest_parallel_safe_total_inner() relies on this ordering
>>>> (without being mentioned in the comments, which I think we should do).
>>>>
>>>
>>> I think the answer for ${subject} should be yes. Per the comments in
>>> add_partial_path, we have
>>>
>>>  * add_partial_path
>>>  *
>>>  *  As in add_path, the partial_pathlist is kept sorted with the cheapest
>>>  *  total path in front.  This is depended on by multiple places, which
>>>  *  just take the front entry as the cheapest path without searching.
>>>  *
>>>
>>
>> I'm not sure about this conclusion.  Surely we can depend on that the
>> partial_pathlist is kept sorted by total_cost ASC.  This is emphasized
>> in the comment of add_partial_path, and also leveraged in practice, such
>> as in many places we just use linitial(rel->partial_pathlist) as the
>> cheapest partial path.
>>
>> But get_cheapest_parallel_safe_total_inner works on pathlist not
>> partial_pathlist.  And for pathlist, I'm not sure if it's a good
>> practice to depend on its ordering.  Because
>>
>> 1) the comment of add_path only mentions that add_path_precheck relies
>> on this ordering, but it does not mention that other functions also do;
>>
>> 2) other than add_path_precheck, I haven't observed any other functions
>> that rely on this ordering.  The only exception as far as I notice is
>> get_cheapest_parallel_safe_total_inner.
>>
>> On the other hand, if we declare that we can rely on the pathlist being
>> sorted in ascending order by total_cost, we should update the comment
>> for add_path to highlight this aspect.  We should also include a comment
>> for get_cheapest_parallel_safe_total_inner to clarify why an early exit
>> is possible, similar to what we do for add_path_precheck.  Additionally,
>> in several places, we can optimize our code by taking advantage of this
>> fact and terminate the iteration through the pathlist early when looking
>> for the cheapest path of a certain kind.
>>
>> Thanks
>> Richard
>>
>

Re: bgwrite process is too lazy

2024-10-06 Thread wenhui qiu

Hi Andres Freund
  Thank you for explanation

> I doubt that slowdown is caused by bgwriter not being active enough. I
suspect
> what you're seeing is one or more of:

> a) The overhead of doing full page writes (due to increasing the WAL
>   volume). You could verify whether that's the case by turning
>  full_page_writes off (but note that that's not generally safe!) or see if
>  the overhead shrinks if you set wal_compression=zstd or
wal_compression=lz4
>   (don't use pglz, it's too slow).

> b) The overhead of renaming WAL segments during recycling. You could see
if
>  this is related by specifying --wal-segsize 512 or such during initdb.
I am aware of these optimizations, and these optimizations only mitigate
the impact, I didn't turn on wal log compression on purpose during stress
test ，

shared_buffers = '32GB'
bgwriter_delay = '10ms'
bgwriter_lru_maxpages = '8192'
bgwriter_lru_multiplier = '10.0'
wal_buffers = '64MB'
checkpoint_completion_target = '0.999'
checkpoint_timeout = '600'
max_wal_size = '32GB'
min_wal_size = '16GB'

I think in business scenarios where there are many reads and few writes, it
is indeed desirable to keep as many dirty pages in memory as
possible,However, in scenarios such as push systems and task scheduling
systems, which also have a lot of reads and writes, the impact of
checkpoints will be more obvious,Adaptive bgwrite or bgwrite triggered when
a dirty page reaches a certain watermark eliminates the impact of
checkpoints on performance jitter.From what I understand, quite a few
commercial databases residing in postgresql have added the adaptive refresh
dirty page feature, and from their internal reports, the whole stress
testing process was very smooth! Since it's a trade secret, I don't know
how they implemented this feature.

Re: New PostgreSQL Contributors

2024-10-04 Thread wenhui qiu

Congratulations to all!


On Fri, 4 Oct 2024 at 22:50, Nathan Bossart 
wrote:

> On Fri, Oct 04, 2024 at 03:55:23PM +0200, Christoph Berg wrote:
> > New PostgreSQL Contributors:
> >
> > * Antonin Houska
> > * Ants Aasma
> > * Georgios Kokolatos
> > * Henrietta Dombrovskaya
> > * Ian Lawrence Barwick
> > * Jelte Fennema-Nio
> > * Karen Jex
> > * Pavlo Golub
> > * Zhang Mingli
> >
> > New PostgreSQL Major Contributors:
> >
> > * Bertrand Drouvot
> > * Laurenz Albe
> > * Richard Guo
>
> Congratulations to all!
>
> --
> nathan
>
>
>

Re: bgwrite process is too lazy

2024-10-03 Thread wenhui qiu

Hi  Andres

> It's implied, but to make it more explicit: One big efficiency advantage
of
> writes by checkpointer is that they are sorted and can often be combined
into
> larger writes. That's often a lot more efficient: For network attached
storage
> it saves you iops, for local SSDs it's much friendlier to wear leveling.

thank you for explanation, I think bgwrite also can merge io ,It  writes
asynchronously to the file system cache, scheduling by os, .



> Another aspect is that checkpointer's writes are much easier to pace over
time
> than e.g. bgwriters, because bgwriter is triggered by a fairly short term
> signal.  Eventually we'll want to combine writes by bgwriter too, but
that's
> always going to be more expensive than doing it in a large batched fashion
> like checkpointer does.

> I think we could improve checkpointer's pacing further, fwiw, by taking
into
> account that the WAL volume at the start of a spread-out checkpoint
typically
> is bigger than at the end.

I'm also very keen to improve checkpoints , Whenever I do stress test,
bgwrite does not write dirty pages when the data set is smaller than
shard_buffer size,Before the checkpoint, the pressure measurement tps was
stable and the highest during the entire pressure measurement phase，Other
databases refresh dirty pages at a certain frequency, at intervals, and at
dirty page water levels，They have a much smaller impact on performance when
checkpoints occur


Thanks


Andres Freund  于2024年10月4日周五 03:40写道：

> Hi,
>
> On 2024-10-02 18:36:44 +0200, Tomas Vondra wrote:
> > On 10/2/24 17:02, Tony Wayne wrote:
> > >
> > >
> > > On Wed, Oct 2, 2024 at 8:14 PM Laurenz Albe  > > <mailto:laurenz.a...@cybertec.at>> wrote:
> > >
> > > On Wed, 2024-10-02 at 16:48 +0800, wenhui qiu wrote:
> > > > Whenever I check the checkpoint information in a log, most dirty
> > > pages are written by the checkpoint process
> > >
> > > That's exactly how it should be!
> > >
> > > is it because if bgwriter frequently flushes, the disk io will be
> more?🤔
> >
> > Yes, pretty much. But it's also about where the writes happen.
> >
> > Checkpoint flushes dirty buffers only once per checkpoint interval,
> > which is the lowest amount of write I/O that needs to happen.
> >
> > Every other way of flushing buffers is less efficient, and is mostly a
> > sign of memory pressure (shared buffers not large enough for active part
> > of the data).
>
> It's implied, but to make it more explicit: One big efficiency advantage of
> writes by checkpointer is that they are sorted and can often be combined
> into
> larger writes. That's often a lot more efficient: For network attached
> storage
> it saves you iops, for local SSDs it's much friendlier to wear leveling.
>
>
> > But it's also happens about where the writes happen. Checkpoint does
> > that in the background, not as part of regular query execution. What we
> > don't want is for the user backends to flush buffers, because it's
> > expensive and can cause result in much higher latency.
> >
> > The bgwriter is somewhere in between - it's happens in the background,
> > but may not be as efficient as doing it in the checkpointer. Still much
> > better than having to do this in regular backends.
>
> Another aspect is that checkpointer's writes are much easier to pace over
> time
> than e.g. bgwriters, because bgwriter is triggered by a fairly short term
> signal.  Eventually we'll want to combine writes by bgwriter too, but
> that's
> always going to be more expensive than doing it in a large batched fashion
> like checkpointer does.
>
> I think we could improve checkpointer's pacing further, fwiw, by taking
> into
> account that the WAL volume at the start of a spread-out checkpoint
> typically
> is bigger than at the end.
>
> Greetings,
>
> Andres Freund
>
>
>

optimize the value of vacthresh and anlthresh

2024-11-04 Thread wenhui qiu

Hi hackers
A few days ago, I was looking at the sql server documentation and found
that sql server has optimized the algorithm related to updating statistics
in the 2016 ,version,I think we can also learn from the implementation
method of sql server to optimize the problem of automatic vacuum triggered
by large tables，The Document link(
https://learn.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver16),I
refer to the sql server implementation method, submit a patch, please see
the attachment, hope to get the help of expert hackers


0001-optimize-the-value-of-vacthresh-and-anlthresh.patch
Description: Binary data

Re: POC: make mxidoff 64 bits

2024-10-24 Thread wenhui qiu

HI Maxim Orlov
> After a bit of thought, I've realized that to be conservative here is the
way to go.
>We can reuse a maximum of existing logic. I mean, we can remove offset
wraparound "error logic" and reuse "warning logic". But set the threshold
for "warning >logic" to a much higher value. For now, I choose 2^32-1. In
other world, legit logic, in my view, here would be to trigger autovacuum
if the number of offsets (i.e. >difference nextOffset - oldestOffset)
exceeds 2^32-1. PFA patch set.
good point ,Couldn't agree with you more. xid64 is the solution to the
wraparound problem,The previous error log is no longer meaningful ,But we
might want to refine the output waring log a little(For example, checking
the underlying reasons why age has been increasing),Though we don't have to
worry about xid wraparound

+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuum in order to release the disk space if possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0x)
Can we refine this annotation a bit? for example
+/*
+ * Multixact members warning threshold.
+ *
+ * If difference bettween nextOffset and oldestOffset exceed this value, we
+ * trigger autovacuum in order to release the disk space ,reduce table
bloat if possible.
+ */
+#define MULTIXACT_MEMBER_AUTOVAC_THRESHOLD UINT64CONST(0x)

Thanks

Maxim Orlov  于2024年10月23日周三 23:55写道：

> After a bit of thought, I've realized that to be conservative here is the
> way to go.
>
> We can reuse a maximum of existing logic. I mean, we can remove offset
> wraparound "error logic" and reuse "warning logic". But set the threshold
> for "warning logic" to a much higher value. For now, I choose 2^32-1. In
> other world, legit logic, in my view, here would be to trigger autovacuum
> if the number of offsets (i.e. difference nextOffset - oldestOffset)
> exceeds 2^32-1. PFA patch set.
>
> --
> Best regards,
> Maxim Orlov.
>

Re: Changing the default random_page_cost value

2024-10-24 Thread wenhui qiu

HI Greg Sabino Mullane
Another thing is that you simply change the configuration template is
not effective,

need to modify the DEFAULT_RANDOM_PAGE_COST values
{
{"random_page_cost", PGC_USERSET, QUERY_TUNING_COST,
gettext_noop("Sets the planner's estimate of the cost of a "
"nonsequentially fetched disk page."),
NULL,
GUC_EXPLAIN
},
&random_page_cost,
DEFAULT_RANDOM_PAGE_COST, 0, DBL_MAX,
NULL, NULL, NULL
},

src/include/optimizer/cost.h
/* defaults for costsize.c's Cost parameters */
/* NB: cost-estimation code should use the variables, not these constants!
*/
/* If you change these, update backend/utils/misc/postgresql.conf.sample */
#define DEFAULT_SEQ_PAGE_COST 1.0
#define DEFAULT_RANDOM_PAGE_COST 4.0
#define DEFAULT_CPU_TUPLE_COST 0.01
#define DEFAULT_CPU_INDEX_TUPLE_COST 0.005
#define DEFAULT_CPU_OPERATOR_COST 0.0025
#define DEFAULT_PARALLEL_TUPLE_COST 0.1
#define DEFAULT_PARALLEL_SETUP_COST 1000.0

Thanks

>
>
>
>

Re: New GUC autovacuum_max_threshold ?

2024-11-13 Thread wenhui qiu

HI
> In any case, we should do the tests that Robert suggested and/or come up
> with a good mathematical model, because we are in the dark at the moment.
I think SQL Server has given us great inspiration
>I think we should indeed provide a retro-compatible behaviour (so maybe
> another GUC after all).
I am ready to implement a new guc parameter,Enable database administrators
to configure appropriate calculation methods（The default value is the
original calculation formula）


Frédéric Yhuel  于2024年11月13日周三 18:03写道：

>
>
> On 11/9/24 16:59, Nathan Bossart wrote:
> > AFAICT the main advantage of these formulas is that you don't need
> another
> > GUC, but they also makes the existing ones more difficult to configure.
>
> I wouldn't say that's the main advantage. It doesn't seem very clean to
> me to cap to a fixed value. Because you could take Robert's
> demonstration with a bigger table, and come to the same conclusion:
>
> Let's compare the current situation to the situation post-Nathan's-patch
> with a cap of 100M. Consider a table 100 times larger than the one of
> Robert's previous example, so pgbench scale factor 2_560_000, size on
> disk 32TB.
> Currently, that table will be vacuumed for bloat when the number of
> dead tuples exceeds 20% of the table size, because that's the default
> value of autovacuum_vacuum_scale_factor. The table has 256 billion
> tuples, so that means that we're going to vacuum it when there are
> more than 51 billion dead tuples. Post-patch, we will vacuum when we
> have 100 million dead tuples. Suppose a uniform workload that slowly
> updates rows in the table. If we were previously autovacuuming the
> table once per day (1440 minutes) we're now going to try to vacuum it
> almost every minute (1440 minutes / 512 = 168 seconds).
>
> (compare with every 55 min with my formula)
>
> Of course, this a theoretical example that is probably unrealistic. I
> don't know, really. I don't know if Robert's example was realistic in
> the first place.
>
> In any case, we should do the tests that Robert suggested and/or come up
> with a good mathematical model, because we are in the dark at the moment.
>
> > Plus, there's no way to go back to the existing behavior.
>
> I think we should indeed provide a retro-compatible behaviour (so maybe
> another GUC after all).
>
>

Re: New GUC autovacuum_max_threshold ?

2024-11-10 Thread wenhui qiu

Hi Nathan Bossart
> AFAICT the main advantage of these formulas is that you don't need another
> GUC, but they also makes the existing ones more difficult to configure.
> Plus, there's no way to go back to the existing behavior.
There is indeed this problem,But I think this formula should not be a
linear relationship in the first place,SQL Server was realized and
optimized eight years ago. I think we can definitely draw on the experience
of SQL Server.Maybe many people are worried that frequent vacuum will
affect io performance, but we can learn from the experience of SQL Server
in vacuum analysis   .

Nathan Bossart  于2024年11月9日周六 23:59写道：

> On Sat, Nov 09, 2024 at 10:08:51PM +0800, wenhui qiu wrote:
> >   Sorry ,I forgot to explain the reason in my last email,In fact, I
> > submitted the patch to the community,(frederic.yh...@dalibo.com) told me
> > there has a same idea ,so ,
> >   Let me explain those two formulas here,about (   vacthresh =
> (float4)
> > fmin(vac_base_thresh + (vac_scale_factor * reltuples), sqrt(1000.0 *
> > reltuples));   A few days ago, I was looking at the sql server
> > documentation and found that sql server has optimized the algorithm
> related
> > to updating statistics in the 2016 ,version,I think we can also learn
> from
> > the implementation method of sql server to optimize the problem of
> > automatic vacuum triggered by large tables,The Document link(
> >
> https://learn.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver16
> > ),about ( vacthresh =  (float4) fmin(vac_base_thresh + vac_scale_factor *
> >  reltuples,vac_base_thresh+ vac_scale_factor * log2(reltuples) *
> 1);)I
> > came to the conclusion by trying to draw a function graph,I personally
> > think it is a smooth formula
>
> AFAICT the main advantage of these formulas is that you don't need another
> GUC, but they also makes the existing ones more difficult to configure.
> Plus, there's no way to go back to the existing behavior.
>
> --
> nathan
>

Re: [PATCH] Support Int64 GUCs

2024-09-26 Thread wenhui qiu

 Hi Alexander
  I think we need int64 GUCs, due to these  parameters（
autovacuum_freeze_table_age, autovacuum_freeze_max_age,When a table age is
greater than any of these parameters an aggressive vacuum will be
performed, When we implementing xid64, is it still necessary to be in the
int range? btw, I have a suggestion to record a warning in the log when the
table age exceeds the int maximum. These default values we can set a
reasonable values ,for example autovacuum_freeze_max_age=4294967295 or
8589934592.

Thanks

Alexander Korotkov  于2024年9月26日周四 02:05写道：

> Hi, Tom!
>
> On Wed, Sep 25, 2024 at 6:08 PM Tom Lane  wrote:
> > FWIW, I agree with the upthread opinions that we shouldn't do this
> > (invent int64 GUCs).  I don't think we need the added code bloat
> > and risk of breaking user code that isn't expecting this new GUC
> > type.  We invented the notion of GUC units in part to ensure that
> > int32 GUCs could be adapted to handle potentially-large numbers.
> > And there's always the fallback position of using a float8 GUC
> > if you really feel you need a wider range.
>
> Thank you for your feedback.
> Do you think we don't need int64 GUCs just now, when 64-bit
> transaction ids are far from committable shape?  Or do you think we
> don't need int64 GUCs even if we have 64-bit transaction ids?  If yes,
> what do you think we should use for *_age variables with 64-bit
> transaction ids?
>
> --
> Regards,
> Alexander Korotkov
> Supabase
>
>
>

Re: Patch: Show queries of processes holding a lock

2024-10-01 Thread wenhui qiu

Hi Alexey Orlov
 Thank you for your work on this path,The lock information is recorded
in detail,Easy to trace the lock competition at that time there is a
detailed lock competition log，But I have a concern,Frequent calls to this
function (pgstat_get_backend_current_activity) in heavy lock contention or
high concurrency environments may cause performance degradation, especially
when processes frequently enter and exit lock waits. Can you add a guc
parameter to turn this feature on or off?After all communities for this
parameter( log_lock_waits )default values set to on many people concern (
https://commitfest.postgresql.org/49/4718/)



Thanks

Alexey Orlov  于2024年10月1日周二 16:04写道：

> Hi, there!
>
> I created patch improving the log messages generated by
> log_lock_waits.
>
> Sample output (log_lock_waits=on required):
>
> session 1:
> CREATE TABLE foo (val integer);
> INSERT INTO foo (val) VALUES (1);
> BEGIN;
> UPDATE foo SET val = 3;
>
> session 2:
> BEGIN;
> UPDATE TABLE foo SET val = 2;
>
> Output w/o patch:
>
> LOG: process 3133043 still waiting for ShareLock on transaction 758
> after 1000.239 ms
> DETAIL: Process holding the lock: 3132855. Wait queue: 3133043.
> CONTEXT: while updating tuple (0,7) in relation "foo"
> STATEMENT: update foo SET val = 2;
>
> Output with path
>
> LOG: process 3133043 still waiting for ShareLock on transaction 758
> after 1000.239 ms
> DETAIL: Process holding the lock: 3132855. Wait queue: 3133043.
> Process 3132855: update foo SET val = 3;
> CONTEXT: while updating tuple (0,7) in relation "foo"
> STATEMENT: update foo SET val = 2;
>
> As you can see information about query that holds the lock goes into log.
>
> If this approach proves unacceptable, we can make the log_lock_waits
> parameter as an enum
> and display the query if the log_lock_waits=verbose (for example).
>
> What do you think?
>
> Regards,
>
> --
> Orlov Alexey
>

Re: bgwrite process is too lazy

2024-10-02 Thread wenhui qiu

Hi Tomas
 Thank you for explaining,If  do not change this static parameter,Can
 refer to the parameter autovacuum_vacuum_cost_delay and lower the minimum
value of the bgwrite_delay parameter to 2ms? Let bgwrite write more dirty
pages to reduce the impact on performance when checkpoint occurs?After all,
extending the checkpoint time and crash recovery time needs to find a
balance, and cannot be increased indefinitely ( checkpoint_timeout
,max_wal_size)



Thanks

Tomas Vondra  于2024年10月3日周四 00:36写道：

> On 10/2/24 17:02, Tony Wayne wrote:
> >
> >
> > On Wed, Oct 2, 2024 at 8:14 PM Laurenz Albe  > <mailto:laurenz.a...@cybertec.at>> wrote:
> >
> >     On Wed, 2024-10-02 at 16:48 +0800, wenhui qiu wrote:
> > > Whenever I check the checkpoint information in a log, most dirty
> > pages are written by the checkpoint process
> >
> > That's exactly how it should be!
> >
> > is it because if bgwriter frequently flushes, the disk io will be
> more?🤔
>
> Yes, pretty much. But it's also about where the writes happen.
>
> Checkpoint flushes dirty buffers only once per checkpoint interval,
> which is the lowest amount of write I/O that needs to happen.
>
> Every other way of flushing buffers is less efficient, and is mostly a
> sign of memory pressure (shared buffers not large enough for active part
> of the data).
>
> But it's also happens about where the writes happen. Checkpoint does
> that in the background, not as part of regular query execution. What we
> don't want is for the user backends to flush buffers, because it's
> expensive and can cause result in much higher latency.
>
> The bgwriter is somewhere in between - it's happens in the background,
> but may not be as efficient as doing it in the checkpointer. Still much
> better than having to do this in regular backends.
>
>
> regards
>
> --
> Tomas Vondra
>
>
>

Re: optimize the value of vacthresh and anlthresh

2024-11-06 Thread wenhui qiu

Hi Frédéric
many thanks for your email。
I'll go and see.

On Wed, 6 Nov 2024 at 17:47, Frédéric Yhuel 
wrote:

>
>
> On 11/4/24 09:30, wenhui qiu wrote:
> > Hi hackers
> >  A few days ago, I was looking at the sql server documentation and
> > found that sql server has optimized the algorithm related to updating
> > statistics in the 2016 ,version,I think we can also learn from the
> > implementation method of sql server to optimize the problem of automatic
> > vacuum triggered by large tables，The Document
> > link(
> https://learn.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver16
> <
> https://learn.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver16>),I
> refer to the sql server implementation method, submit a patch, please see
> the attachment, hope to get the help of expert hackers
>
> similar idea here :
>
> https://www.postgresql.org/message-id/6a2ac9b7-6535-4bb1-8274-0647f7c31c82%40dalibo.com
>

Re: New GUC autovacuum_max_threshold ?

2024-11-06 Thread wenhui qiu

Hi frederic.yhuel

> Thank you. FWIW, I would prefer a sub-linear growth, so maybe something
> like this

>   vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples,
>   vac_base_thresh + vac_scale_factor * pow(reltuples, 0.7) * 100);

>   This would give :

>   * 386M (instead of 5.1 billion currently) for a 25.6 billion tuples
table ;
>   * 77M for a 2.56 billion tuples table (Robert's example) ;
>   * 15M (instead of 51M currently) for a 256M tuples table ;
>   * 3M (instead of 5M currently) for a 25.6M tuples table.
> The other advantage is that you don't need another GUC.
Argee ,We just need to change the calculation formula，But I prefer this
formula because it calculates a smoother value.

vacthresh =  (float4) fmin(vac_base_thresh + vac_scale_factor *
reltuples,vac_base_thresh
+ vac_scale_factor * log2(reltuples) * 1);
or
vacthresh = (float4) fmin(vac_base_thresh + (vac_scale_factor * reltuples)
, sqrt(1000.0 * reltuples));

Frédéric Yhuel  于2024年8月12日周一 21:41写道：

>
>
> On 8/7/24 23:39, Nathan Bossart wrote:
> > I've attached a new patch to show roughly what I think this new GUC
> should
> > look like.  I'm hoping this sparks more discussion, if nothing else.
> >
>
> Thank you. FWIW, I would prefer a sub-linear growth, so maybe something
> like this:
>
> vacthresh = Min(vac_base_thresh + vac_scale_factor * reltuples,
> vac_base_thresh + vac_scale_factor * pow(reltuples, 0.7) * 100);
>
> This would give :
>
> * 386M (instead of 5.1 billion currently) for a 25.6 billion tuples table ;
> * 77M for a 2.56 billion tuples table (Robert's example) ;
> * 15M (instead of 51M currently) for a 256M tuples table ;
> * 3M (instead of 5M currently) for a 25.6M tuples table.
>
> The other advantage is that you don't need another GUC.
>
> > On Tue, Jun 18, 2024 at 12:36:42PM +0200, Frédéric Yhuel wrote:
> >> By the way, I wonder if there were any off-list discussions after
> Robert's
> >> conference at PGConf.dev (and I'm waiting for the video of the conf).
> >
> > I don't recall any discussions about this idea, but Robert did briefly
> > mention it in his talk [0].
> >
> > [0] https://www.youtube.com/watch?v=RfTD-Twpvac
> >
>
> Very interesting, thanks!
>
>
>

Re: pg_stat_statements: Avoid holding excessive lock

2024-11-08 Thread wenhui qiu

Hi Karina Liskevich
> + /*
> +  * There is no need to hold entry->mutex when reading
stats_since and
> +  * minmax_stats_since for (unlike counters) they are always
written
> +  * while holding pgss->lock exclusively. We are holding
pgss->lock
> +  * shared so there should be no race here.
> +  */
>   stats_since = entry->stats_since;
>   minmax_stats_since = entry->minmax_stats_since;
> - SpinLockRelease(&entry->mutex);

>> The comment could be simpler, say a "The spinlock is not required when
>> reading these two as they are always updated when holding pgss->lock
>> exclusively.".  Or something like that.
Agree , It reduces the lock time , The new comment are short and concise,
It  sounds good .

Michael Paquier  于2024年11月8日周五 14:08写道：

> On Thu, Nov 07, 2024 at 04:08:30PM +0300, Karina Litskevich wrote:
> > Thank you for your feedback and the shorter wording of the comment.
> > I used it in the new version of the patch.
>
> After a second look, sounds good to me.  Let's wait a bit and see of
> others have comments or thoughts to share.
> --
> Michael
>

Re: Auto Vacuum optimisation

2024-11-27 Thread wenhui qiu

Hi Japin
> Worth a try.
> However, I don't think we need to based on {vac,anl}_scale_factor when
using
> the sqrt algorithm.
  Here a new path, I removed the log2 algorithm and the debug2 log and the
enumeration type is also kept consistent, based on {vac,anl}_scale_factor
when using  the sqrt algorithm ,The main purpose is also to give DBAs the
flexibility to make small adjustments,Mainly referencing another discussion
as well, trying to stay in line with them

On Thu, Nov 28, 2024 at 1:46 PM Japin Li  wrote:

> On Wed, 27 Nov 2024 at 15:51, wenhui qiu  wrote:
> > HI Japin Li
> >  Thank you for you reply, Do you think that removing the log2
> algorithm because it will be more frequent,I think sql
> > server chose sqrt algorithm after their long-term verification, I think
> we need to retain sqrt algorithm,The relevant
> > documents are here(
> >
> https://learn.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver16),I
> will provide a
> > new patch later with your  reply
> >
>
> Worth a try.
>
> However, I don't think we need to based on {vac,anl}_scale_factor when
> using
> the sqrt algorithm.
>
> --
> Regrads,
> Japin Li
>


0002-Add-guc-to-optimize-the-autovacuum-tigger-algorithm.patch
Description: Binary data

Re: A way to build PSQL 17.1 source on AIX platform

2024-11-19 Thread wenhui qiu

no way ，due to remove aix support，
https://commitfest.postgresql.org/50/5003/you can see this link，

On Tue, 19 Nov 2024 at 23:26, Raghu Dev Ramaiah 
wrote:

> Hi
>
> I am getting error while building PSQL 17.1 on AIX box-
>
> bash-5.1# uname -a
> AIX AIXLPHPOM01 1 7 00F8B83A4C00
>
> bash-5.1# ./configure --prefix=/home/raghu/postgres/psql17/build
> --without-readline
> checking build system type... powerpc-ibm-aix7.1.4.0
> checking host system type... powerpc-ibm-aix7.1.4.0
> checking which template to use... configure: error:
> ***
> PostgreSQL has apparently not been ported to your platform yet.
> To try a manual configuration, look into the src/template directory
> for a similar platform and use the '--with-template=' option.
>
> Please also contact  to see about
> rectifying this.  Include the above 'checking host system type...'
> line.
> ***
>
> Is there an alternate way I can build PSQL 17.1 source code on AIX
> platform? Please let me know..thanks.
>
> Regards
> Raghu
>
>
>

Auto Vacuum optimisation

2024-11-18 Thread wenhui qiu

HI hackers and japin li
  There was an earlier discussion about auto vacuum optimisation(
https://www.postgresql.org/message-id/6a2ac9b7-6535-4bb1-8274-0647f7c31c82%40dalibo.com),I
referring to the discussion in there, I implemented a parameter selection
triggering the calculation of AUTO VACUUM,This is my first patch for the
community.Hoping for support from the experts


0001-Add-guc-to-optimize-the-autovacuum-tigger-algorithm.patch
Description: Binary data

Re: An inefficient query caused by unnecessary PlaceHolderVar

2024-11-27 Thread wenhui qiu

Hi Richard
> BTW, since commit cb8e50a4a, we've chosen not to wrap a non-var
>  expression if it contains Vars/PHVs of the pulled-up subquery and does
> not contain non-strict constructs.  I wonder if we can apply the same
> optimization from this patch to non-var expressions: for a LATERAL
> subquery, if a non-var expression contains Vars/PHVs of the
> lowest_nullable_relids and does not contain non-strict constructs, it
>could also escape being wrapped.  Any thoughts?
agree , The path tested  and looks good to me,In addition, can the test
case have a multi-layer nested subquery(maybe) ?



Thanks

On Wed, Nov 27, 2024 at 4:46 PM Richard Guo  wrote:

> On Fri, Nov 22, 2024 at 5:08 AM Dmitry Dolgov <9erthali...@gmail.com>
> wrote:
> > The patch looks good to me, the implementation is concise and clear. I
> can't
> > imagine any visible overhead due to storing lowest_nullable_relids in
> this
> > context. The only nitpick I have is about this commentary:
> >
> >   /*
> >* Simple Vars always escape being wrapped, unless they are
> >* lateral references to something outside the subquery being
> > -  * pulled up.  (Even then, we could omit the PlaceHolderVar if
> > -  * the referenced rel is under the same lowest outer join, but
> > -  * it doesn't seem worth the trouble to check that.)
> > +  * pulled up and the referenced rel is not under the same
> > +  * lowest outer join.
> >*/
> >
> > It mentions "lowest outer join", as in the original version of the text.
> Would
> > it be more precise to mention nullable side of the outer join as well?
>
> Thank you for your review.
>
> I ended up using 'under the same lowest nulling outer join' to
> keep consistent with the wording used elsewhere.  Please see the
> updated patch attached.
>
> BTW, since commit cb8e50a4a, we've chosen not to wrap a non-var
> expression if it contains Vars/PHVs of the pulled-up subquery and does
> not contain non-strict constructs.  I wonder if we can apply the same
> optimization from this patch to non-var expressions: for a LATERAL
> subquery, if a non-var expression contains Vars/PHVs of the
> lowest_nullable_relids and does not contain non-strict constructs, it
> could also escape being wrapped.  Any thoughts?
>
> Thanks
> Richard
>

Re: UUID v7

2024-11-24 Thread wenhui qiu

HI Andrey M. Borodin
 It's not just mariadb, percona also implements the uuid plugin.

https://docs.percona.com/percona-server/8.4/uuid-versions.html#functions-available-in-uuid_vx

Thanks

Andrey M. Borodin  于2024年11月23日周六 16:21写道：

>
>
> > On 23 Nov 2024, at 10:58, Masahiko Sawada  wrote:
> >
> > I've attached an updated patch that squashed changes I made for v33.
> > We're still discussing increasing entropy on Windows and macOS, but
> > the patch seems to be in good shape.
>
> +1, thanks!
>
> PFA version with improved comment.
>
> Sergey Prokhorenko just draw my attention to the new release of MariaDB
> [0]. They are doing very similar UUID v7 generation as we do [1].
>
>
> Best regards, Andrey Borodin.
>
>
> [0]
> https://mariadb.com/resources/blog/announcing-mariadb-community-server-11-7-rc-with-vector-search-and-11-6-ga/
> [1]
> https://github.com/mariadb/server/blob/main/plugin/type_uuid/sql_type_uuid_v7.h#L32
>
>
>

adjust_limit_rows_costs algorithm

2024-12-23 Thread wenhui qiu

Hi Hackers
There should be many people who have encountered the problem of order
by col limit 1 without index filtering,Here is an example of my test
create extension pg_trgm ;

CREATE TABLE public.dba_users (
userid integer primary key,
username character varying(64),
password character varying(64)
);

sysbench=# create extension pg_trgm ;
CREATE EXTENSION
sysbench=# CREATE TABLE public.dba_users (
sysbench(#
sysbench(# userid integer primary key,
sysbench(#
sysbench(# username character varying(64),
sysbench(#
sysbench(# password character varying(64)
sysbench(#
sysbench(# );
CREATE TABLE
sysbench=# insert into dba_users select
generate_series(1,600),md5(random()::varchar(64)),md5(random()::varchar(64));
INSERT 0 600
sysbench=# CREATE INDEX dba_users_username_idx ON public.dba_users USING
gin (username gin_trgm_ops);
CREATE INDEX
sysbench=#
sysbench=#
sysbench=#
sysbench=#
sysbench=# explain analyze  select userid from dba_users where  username
like '%%' order by userid limit 1;
 QUERY PLAN
-
 Limit  (cost=0.43..408.59 rows=1 width=4) (actual time=1433.469..1433.470
rows=0 loops=1)
   ->  Index Scan using dba_users_pkey on dba_users  (cost=0.43..244892.51
rows=600 width=4) (actual time=1433.468..1433.468 rows=0 loops=1)
 Filter: ((username)::text ~~ '%%'::text)
 Rows Removed by Filter: 600
 Planning Time: 0.384 ms
 Execution Time: 1433.489 ms
(6 rows)

sysbench=# explain analyze  select userid from dba_users where  username
like '%%' order by userid+0 limit 1;
QUERY
PLAN
--
 Limit  (cost=2300.03..2300.03 rows=1 width=8) (actual time=54.562..54.563
rows=0 loops=1)
   ->  Sort  (cost=2300.03..2301.53 rows=600 width=8) (actual
time=54.560..54.561 rows=0 loops=1)
 Sort Key: ((userid + 0))
 Sort Method: quicksort  Memory: 25kB
 ->  Bitmap Heap Scan on dba_users  (cost=57.22..2297.03 rows=600
width=8) (actual time=54.526..54.527 rows=0 loops=1)
   Recheck Cond: ((username)::text ~~ '%%'::text)
   Rows Removed by Index Recheck: 41310
   Heap Blocks: exact=31810
   ->  Bitmap Index Scan on dba_users_username_idx
 (cost=0.00..57.07 rows=600 width=0) (actual time=7.307..7.307 rows=41310
loops=1)
 Index Cond: ((username)::text ~~
'%%'::text)
 Planning Time: 0.238 ms
 Execution Time: 54.625 ms
(12 rows)



/*
* adjust_limit_rows_costs
* Adjust the size and cost estimates for a LimitPath node according to the
* offset/limit.
*
* This is only a cosmetic issue if we are at top level, but if we are
* building a subquery then it's important to report correct info to the
outer
* planner.
*
* When the offset or count couldn't be estimated, use 10% of the estimated
* number of rows emitted from the subpath.
*
* XXX we don't bother to add eval costs of the offset/limit expressions
* themselves to the path costs. In theory we should, but in most cases those
* expressions are trivial and it's just not worth the trouble.
*/
void
adjust_limit_rows_costs(double *rows, /* in/out parameter */
Cost *startup_cost, /* in/out parameter */
Cost *total_cost, /* in/out parameter */
int64 offset_est,
int64 count_est)
{
double input_rows = *rows;
Cost input_startup_cost = *startup_cost;
Cost input_total_cost = *total_cost;

if (offset_est != 0)
{
double offset_rows;

if (offset_est > 0)
offset_rows = (double) offset_est;
else
offset_rows = clamp_row_est(input_rows * 0.10);
if (offset_rows > *rows)
offset_rows = *rows;
if (input_rows > 0)
*startup_cost +=
(input_total_cost - input_startup_cost)
* offset_rows / input_rows;
*rows -= offset_rows;
if (*rows < 1)
*rows = 1;
}

if (count_est != 0)
{
double count_rows;

if (count_est > 0)
count_rows = (double) count_est;
else
count_rows = clamp_row_est(input_rows * 0.10);
if (count_rows > *rows)
count_rows = *rows;
if (input_rows > 0)
*total_cost = *startup_cost +
(input_total_cost - input_startup_cost)
* count_rows / input_rows;
*rows = count_rows;
if (*rows < 1)
*rows = 1;
}
}


I think the total_cost calculation method cannot be linear, because the
data distribution is not linear in reality. I try to change it to a
logarithmic

$ git diff
diff --git a/src/backend/optimizer/util/pathnode.c
b/src/backend/optimizer/util/pathnode.c
index 4f74caf..2ab59e9 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -4074,7 +4074,7 @@ adjust_limit_rows_costs(double *rows, /* in/out
parameter */
if (input_rows > 0)
*total_

Re: add vacuum starttime columns

2024-12-30 Thread wenhui qiu

Hi Sami
   Many people have encountered situations where autovacuum or auto analyze on
tables are not triggered in time, leading to suboptimal execution plans and
some performance issues. When analyzing such problems, we often need to
trace back to when autovacuum or auto analyze was triggered for the
corresponding table. However, if the log_autovacuum_min_duration parameter
is not configured (as I’ve encountered in many cases where this parameter
is either not set or has an inappropriate value), we cannot determine the
trigger time and duration of the operation.

Our idea is to directly query the pg_stat_all_tables view to obtain the
start time of (auto)vacuum/(auto)analyze for a table. This would help us
monitor the duration of vacuum/analyze and determine whether it is
necessary to tune and speed up the vacuum/analyze process.

Of course, to observe the duration of vacuum operations, we can configure
the log_autovacuum_min_durationparameter, but if there are many tables in
the database, the vacuum entries in the logs might be quite numerous,
making it difficult to analyze.

Additionally, we are currently considering whether it would be possible to
add a last_(auto)vacuum_start field to the pg_stat_all_tables view. For
tables where a vacuum operation is in progress, the last_(auto)vacuum field
may not be updated, and it may not be possible to estimate the vacuum
duration using just these two fields.

However, this could still indicate whether a vacuum is in progress (if
last_(auto)vacuum_start is more recent than last_(auto)vacuum, it means a
vacuum is ongoing). While it is possible to monitor vacuum activity through
the pg_stat_progress_vacuum view, this view itself does not record
timestamps, so additional views might be needed, which would be less
convenient than querying pg_stat_all_tables directly.

Therefore, we personally believe that adding such fields would be
beneficial for monitoring the execution of (auto)vacuum and (auto)analyze.

Thanks

On Tue, Dec 31, 2024 at 12:40 PM Sami Imseih  wrote:

> The last_(auto)vacuum is useful because it allows
> the user to monitor vacuum to ensure that vacuums
> are completing on a relation at expected intervals.
> I am not sure what value a start time will provide.
> Can you provide a reason for this?
>
> Regards,
>
> Sami Imseih
>
>
>

Re: [RFC] Lock-free XLog Reservation from WAL

2025-01-06 Thread wenhui qiu

HI Zhiguo
Thank you for your reply ,Then you'll have to prove that 128 is the
optimal value, otherwise they'll have a hard time agreeing with you on this
patch.

Thanks

On Mon, Jan 6, 2025 at 2:46 PM Zhou, Zhiguo  wrote:

> Hi Yura and Wenhui,
>
> Thanks for kindly reviewing this work!
>
> On 1/3/2025 9:01 PM, wenhui qiu wrote:
> > Hi
> >  Thank you for your path，NUM_XLOGINSERT_LOCKS increase to 128，I
> > think it will be challenged，do we make it guc ？
> >
>
> I noticed there have been some discussions (for example, [1] and its
> responses) about making NUM_XLOGINSERT_LOCKS a GUC, which seems to be a
> controversial proposal. Given that, we may first focus on the lock-free
> XLog reservation implementation, and leave the increase of
> NUM_XLOGINSERT_LOCKS for a future patch, where we would provide more
> quantitative evidence for the various implementations. WDYT?
>
>
> > On Fri, 3 Jan 2025 at 20:36, Yura Sokolov  > <mailto:y.soko...@postgrespro.ru>> wrote:
> >
> > Good day, Zhiguo.
> >
> > Idea looks great.
> >
> > Minor issue:
> > - you didn't remove use of `insertpos_lck` from `ReserveXLogSwitch`.
> >
> > I initially thought it became un-synchronized against
> > `ReserveXLogInsertLocation`, but looking closer I found it is
> > synchronized with `WALInsertLockAcquireExclusive`.
> > Since there are no other `insertpos_lck` usages after your patch, I
> > don't see why it should exists and be used in `ReserveXLogSwitch`.
> >
> > Still I'd prefer to see CAS loop in this place to be consistent with
> > other non-locking access. And it will allow to get rid of
> > `WALInsertLockAcquireExclusive`, (though probably it is not a big
> > issue).
> >
>
> Exactly, it should be safe to remove `insertpos_lck`. And I agree with
> you on getting rid of `WALInsertLockAcquireExclusive` with CAS loop
> which should significantly reduce the synchronization cost here
> especially when we intend to increase NUM_XLOGINSERT_LOCKS. I will try
> it in the next version of patch.
>
>
> > Major issue:
> > - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read with
> on
> > platforms where MAXALIGN != 8 or without native 64 load/store. Branch
> > with 'memcpy` is rather obvious, but even pointer de-referencing on
> > "lucky case" is not safe either.
> >
> > I have no idea how to fix it at the moment.
> >
>
> Indeed, non-atomic write/read operations can lead to safety issues in
> some situations. My initial thought is to define a bit near the
> prev-link to flag the completion of the update. In this way, we could
> allow non-atomic or even discontinuous write/read operations on the
> prev-link, while simultaneously guaranteeing its atomicity through
> atomic operations (as well as memory barriers) on the flag bit. What do
> you think of this as a viable solution?
>
>
> > Readability issue:
> > - It would be good to add `Assert(ptr >= upto)` into `GetXLogBuffer`.
> > I had hard time to recognize `upto` is strictly not in the future.
> > - Certainly, final version have to have fixed and improved comments.
> > Many patch's ideas are strictly non-obvious. I had hard time to
> > recognize patch is not a piece of ... (excuse me for the swear
> > sentence).
>
> Thanks for the suggestion and patience. It's really more readable after
> inserting the assertion, I will fix it and improve other comments in the
> following patches.
>
>
> > Indeed, patch is much better than it looks on first sight.
> > I came with alternative idea yesterday, but looking closer to your
> > patch
> > today I see it is superior to mine (if atomic access will be fixed).
> >
> > 
> >
> > regards,
> > Yura Sokolov aka funny-falcon
> >
> >
>
> Regards,
> Zhiguo
>
>
> [1]
> https://www.postgresql.org/message-id/2266698.1704854297%40sss.pgh.pa.us
>
>

Re: [RFC] Lock-free XLog Reservation from WAL

2025-01-06 Thread wenhui qiu

HI Zhiguo
> Maybe we could leave the NUM_XLOGINSERT_LOCKS unchanged in this patch,
> as it is not a hard dependency of the lock-free algorithm. And when this
> patch has been fully accepted, we could then investigate the more proper
> way of increasing NUM_XLOGINSERT_LOCKS. WDYT?
If the value is not a strong dependency, then the best way is not to change
it.

Thanks

On Mon, Jan 6, 2025 at 4:49 PM Zhou, Zhiguo  wrote:

> Maybe we could leave the NUM_XLOGINSERT_LOCKS unchanged in this patch,
> as it is not a hard dependency of the lock-free algorithm. And when this
> patch has been fully accepted, we could then investigate the more proper
> way of increasing NUM_XLOGINSERT_LOCKS. WDYT?
>
> On 1/6/2025 4:35 PM, wenhui qiu wrote:
> > HI Zhiguo
> >  Thank you for your reply ,Then you'll have to prove that 128 is the
> > optimal value, otherwise they'll have a hard time agreeing with you on
> > this patch.
> >
> > Thanks
> >
> > On Mon, Jan 6, 2025 at 2:46 PM Zhou, Zhiguo  > <mailto:zhiguo.z...@intel.com>> wrote:
> >
> >     Hi Yura and Wenhui,
> >
> > Thanks for kindly reviewing this work!
> >
> > On 1/3/2025 9:01 PM, wenhui qiu wrote:
> >  > Hi
> >  >  Thank you for your path，NUM_XLOGINSERT_LOCKS increase to
> > 128，I
> >  > think it will be challenged，do we make it guc ？
> >  >
> >
> > I noticed there have been some discussions (for example, [1] and its
> > responses) about making NUM_XLOGINSERT_LOCKS a GUC, which seems to
> be a
> > controversial proposal. Given that, we may first focus on the
> lock-free
> > XLog reservation implementation, and leave the increase of
> > NUM_XLOGINSERT_LOCKS for a future patch, where we would provide more
> > quantitative evidence for the various implementations. WDYT?
> >
> >
> >  > On Fri, 3 Jan 2025 at 20:36, Yura Sokolov
> > mailto:y.soko...@postgrespro.ru>
> >  > <mailto:y.soko...@postgrespro.ru
> > <mailto:y.soko...@postgrespro.ru>>> wrote:
> >  >
> >  > Good day, Zhiguo.
> >  >
> >  > Idea looks great.
> >  >
> >  > Minor issue:
> >  > - you didn't remove use of `insertpos_lck` from
> > `ReserveXLogSwitch`.
> >  >
> >  > I initially thought it became un-synchronized against
> >  > `ReserveXLogInsertLocation`, but looking closer I found it is
> >  > synchronized with `WALInsertLockAcquireExclusive`.
> >  > Since there are no other `insertpos_lck` usages after your
> > patch, I
> >  > don't see why it should exists and be used in
> > `ReserveXLogSwitch`.
> >  >
> >  > Still I'd prefer to see CAS loop in this place to be
> > consistent with
> >  > other non-locking access. And it will allow to get rid of
> >  > `WALInsertLockAcquireExclusive`, (though probably it is not a
> big
> >  > issue).
> >  >
> >
> > Exactly, it should be safe to remove `insertpos_lck`. And I agree
> with
> > you on getting rid of `WALInsertLockAcquireExclusive` with CAS loop
> > which should significantly reduce the synchronization cost here
> > especially when we intend to increase NUM_XLOGINSERT_LOCKS. I will
> try
> > it in the next version of patch.
> >
> >
> >  > Major issue:
> >  > - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/
> > read with on
> >  > platforms where MAXALIGN != 8 or without native 64 load/
> > store. Branch
> >  > with 'memcpy` is rather obvious, but even pointer de-
> > referencing on
> >  > "lucky case" is not safe either.
> >  >
> >  > I have no idea how to fix it at the moment.
> >  >
> >
> > Indeed, non-atomic write/read operations can lead to safety issues in
> > some situations. My initial thought is to define a bit near the
> > prev-link to flag the completion of the update. In this way, we could
> > allow non-atomic or even discontinuous write/read operations on the
> > prev-link, while simultaneously guaranteeing its atomicity through
> > atomic operations (as well as memory barriers) on the flag bit. What
> do
> > you think of this as a viable solution?
> >
> >
> >  > Readability issue:
> &

Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)

2024-12-26 Thread wenhui qiu

Hi

As far as I know, more than 10,000 tables  of instances  are often
encountered,
So I insist that the maximum can be appropriately increased to 256MB，Can be
more adaptable to many table situations

On Thu, 26 Dec 2024 at 23:23, Robert Treat  wrote:

> On Wed, Dec 25, 2024 at 12:26 PM Tomas Vondra  wrote:
> >
> > Hi,
> >
> > On 12/23/24 07:35, wenhui qiu wrote:
> > > Hi Tomas
> > >  This is a great feature.
> > > + /*
> > > + * Define (or redefine) custom GUC variables.
> > > + */
> > > + DefineCustomIntVariable("stats_history.size",
> > > + "Sets the amount of memory available for past events.",
> > > + NULL,
> > > + &statsHistorySizeMB,
> > > + 1,
> > > + 1,
> > > + 128,
> > > + PGC_POSTMASTER,
> > > + GUC_UNIT_MB,
> > > + NULL,
> > > + NULL,
> > > + NULL);
> > > +
> > > RAM is in terabytes now, the statsHistorySize is 128MB ,I think can
> > > increase to store more history record ?
> > >
> >
> > Maybe, the 128MB is an arbitrary (and conservative) limit - it's enough
> > for ~500k events, which seems good enough for most systems. Of course,
> > on systems with many relations might need more space, not sure.
> >
> > I was thinking about specifying the space in more natural terms, either
> > as amount of time ("keep 1 day of history") or number of entries ("10k
> > entries"). That would probably mean the memory can't be allocated as
> > fixed size.
> >
>
> Based on the above, a rough calculation is that this is enough for
> holding 1 year of hourly vacuum runs for 50 tables, or a year of daily
> vacuums for 1000 tables. Most folks will fall somewhere in that range
> (and won't really need a year's history) but that seems like plenty
> for a default.
>
> > But maybe it'd be possible to just write the entries to a file. We don't
> > need random access to past entries (unlike e.g. pg_stat_statements), and
> > people won't query that very often either.
> >
>
> Yeah, workloads will vary, but it doesn't seem like they would more
> than query workloads do.
>
> Robert Treat
> https://xzilla.net
>

Re: transaction lost when delete clog file after normal shutdown

2024-12-24 Thread wenhui qiu

Hi Michael Paquier
 Thank you for the information you provided，


Thanks


On Tue, 24 Dec 2024 at 13:13, Michael Paquier  wrote:

> On Tue, Dec 24, 2024 at 09:55:09AM +0800, wenhui qiu wrote:
> > However, on the other hand, oracle has many solutions to open the
> database
> > after the data files are damaged, and his intention should be to start
> the
> > database even if some critical files are damaged to salvage the data
> > inside.Because there's a lot of that going on in the real world, and you
> > can't change that by firing the dba.
>
> So does Postgres, at least partially depending on the state of the
> cluster (for example, see ways to bypass catalog indexes to be able to
> log in).  FWIW, I can be easily convinced that more tooling in this
> area to help folks do low-level chirurgy on the on-disk files of a
> data folder while a server is running is cool to have, like
> pg_surgery:
> https://www.postgresql.org/docs/devel/pgsurgery.html
>
> If you have suggestions about ways that would help, feel free to
> propose them.
>
> Anyway, if it comes to corruption, these tools should only be used if
> you don't have a backup, and only to retrieve data from a data folder
> to then do a logical copy of it to a freshly initialized data folder,
> most likely on a different host or partition, if you suspect that your
> disk is at fault for example.
>
> If you see an issue in the backend code, even a tiny window where we
> could lose data because we are missing a flush or manipulate files so
> as consistency is not guaranteed post-crash, that would be worth
> discussing on the ground of being a legit bug.  Manual removal of
> on-disk files is not that.
> --
> Michael
>

Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)

2024-12-26 Thread wenhui qiu

Hi Tomas

> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not
> make a fundamental difference ...
agree
> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
> limit, or possibly removing it entirely (and accepting anything the
> system can handle).
yes,   I mean by this is that the maximum value is not friendly to large
instances if the setting is small ，In the real production  instance , many
sub-tables with the same table structure are often encountered


On Fri, Dec 27, 2024 at 1:58 AM Tomas Vondra  wrote:

> On 12/26/24 17:00, wenhui qiu wrote:
> > Hi
> >
> > As far as I know, more than 10,000 tables  of instances  are often
> > encountered,
> > So I insist that the maximum can be appropriately increased to 256MB，
> > Can be more adaptable to many table situations
> >
>
> If 128MB is insufficient, why would 256MB be OK? A factor of 2x does not
> make a fundamental difference ...
>
> Anyway, the 128MB value is rather arbitrary. I don't mind increasing the
> limit, or possibly removing it entirely (and accepting anything the
> system can handle).
>
>
> regards
>
> --
> Tomas Vondra
>
>

Re: New GUC autovacuum_max_threshold ?

2025-02-03 Thread wenhui qiu

HI Nathan
> I had the opportunity to bring this patch up for discussion at the
> developer meeting at FOSDEM PGDay last week [0].  We discussed a subset
> of the topics folks have already written about in this thread, and AFAICT
> there was general approval among the attendees for proceeding with the
> "hard cap" approach due to its user-friendliness.  Given that, I am
> planning to commit the attached patch in the near future (although I may
> fiddle with the commit message a bit more).
Thanks for your work on this ,that is good news.Any method that solves the
issue of vacuum being triggered by large tables is a good method.When more
comprehensive vacuum statistics become available in the future, we can
improve the calculation method then.


On Tue, Feb 4, 2025 at 3:51 AM Nathan Bossart 
wrote:

> I had the opportunity to bring this patch up for discussion at the
> developer meeting at FOSDEM PGDay last week [0].  We discussed a subset
> of the topics folks have already written about in this thread, and AFAICT
> there was general approval among the attendees for proceeding with the
> "hard cap" approach due to its user-friendliness.  Given that, I am
> planning to commit the attached patch in the near future (although I may
> fiddle with the commit message a bit more).
>
> On Tue, Jan 14, 2025 at 11:35:17PM +0300, Alena Rybakina wrote:
> > #autovacuum_vacuum_max_threshold = 1# max number of row
> updates
> > # before vacuum; -1 disables max
> > # threshold
> >
> > I think instead of "# threshold" should be "#vacuum"?
>
> That would more closely match the description of
> autovacuum_vacuum_insert_threshold, which refers to "insert vacuums," but I
> felt it would be weird to refer to "max vacuums."  IMHO it is clearer to
> say that -1 disables the maximum threshold here.
>
> > There is a typo:
> >
> > * if (threshold > vac_max_thresh)
> > * threshold = vac_max_thres; - here
>
> Fixed.
>
> > I think you should add more information to the description of the
> > Relations_needs_vacanalyze function: what is vac_max_thresh and how is it
> > calculated. It is not clear what the below condition means.
> >
> > /* -1 is used to disable max threshold */
> > vac_max_thresh= (relopts&& relopts->vacuum_max_threshold>= -1)
> > ? relopts->vacuum_max_threshold
> > : autovacuum_vac_max_thresh;
>
> I looked at the commentary for this function and felt that the comments for
> this new parameter are in line with the comments for all the adjacent
> parameters.  There may be an opportunity to improve this commentary, but
> IMHO that would be better handled in a separate patch that improved it for
> all these parameters.
>
> [0] https://2025.fosdempgday.org/devmeeting
>
> --
> nathan
>

Re: Compression of bigger WAL records

2025-01-30 Thread wenhui qiu

Hi Andery
 I have a question ,If wal_compression_threshold is set to more than
the block size of the wal log, then the FPI is not compressed, and if so,
it might make sense to have a maximum value of this parameter that does not
exceed the block size of the wal log？

Best regards


On Thu, Jan 30, 2025 at 9:26 PM Andrey Borodin  wrote:

>
>
> > On 23 Jan 2025, at 20:13, Japin Li  wrote:
> >
> >
> > I find this feature interesting;
>
> Thank you for your interest in the patch!
>
> > however, it cannot be applied to the current
> > master (b35434b134b) due to commit 32a18cc0a73.
>
> PFA a rebased version.
>
> >
> > I see the patch compresses the WAL record according to the
> wal_compression,
> > IIRC the wal_compression is only used for FPI, right?  Maybe we should
> update
> > the description of this parameter.
>
> Yes, I'll udpate documentation in future versions too.
>
> > I see that the wal_compression_threshold defaults to 512. I wonder if you
> > chose this value based on testing or randomly.
>
> Voices in my head told me it's a good number.
>
>
> > On 28 Jan 2025, at 22:10, Fujii Masao 
> wrote:
> >
> > I like the idea of WAL compression more.
>
> Thank you!
>
> > With the current approach, each backend needs to allocate memory twice
> > the size of the total WAL record. Right? One area is for the gathered
> > WAL record data (from rdt and registered_buffers), and the other is for
> > storing the compressed data.
>
> Yes, exactly. And also a decompression buffer for each WAL reader.
>
> > Could this lead to potential memory usage
> > concerns? Perhaps we should consider setting a limit on the maximum
> > memory each backend can use for WAL compression?
>
> Yes, the limit makes sense.
>
> Also, we can reduce memory consumption by employing a streaming
> compression. Currently, I'm working on a prototype of such technology,
> because it would allow wholesale WAL compression. The idea is to reuse
> compression context from previous records to better compress new records.
> This would allow efficient compression of even very small records. However,
> there is exactly 0 chance to get it done in a decent shape before feature
> freeze.
>
> The chances of getting currently proposed approach to v18 seems slim
> either... I'm hesitating to register this patch on the CF. What do you
> think?
>
>
> Best regards, Andrey Borodin.
>
>

Uncached buffered IO

2025-02-09 Thread wenhui qiu

Hi Hackers.
https://lwn.net/Articles/998783/ , is Uncached buffered IO useful for
double cache?PostgreSQL's double caching is often a source of complaints.
This new Linux kernel feature seems to help mitigate performance jitters
caused by file cache reclaim when memory free memory is too small.




Thanks

Re: Trigger more frequent autovacuums of heavy insert tables

2025-02-06 Thread wenhui qiu

Hi
> We could add autovacuum_vacuum_insert_max_threshold, but with an
>  insert-only workload, we can expect that the cold data is being
>  frozen. By calculating the threshold based on unfrozen data, we are
>  effectively capping the threshold for inserted data without adding
>  another guc. If any of that data is being unfrozen via updates or
>  deletes, then the autovacuum_vacuum_max_threshold would apply.

>  Perhaps I'm missing a case where calculating the insert threshold on
>  unfrozen data would not act as a cap, in which case I could get on
>  board with a guc.
Actually ,I like your solution.  Even I think this formula could use that
pcnt_unfrozen parameter
vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples *
pcnt_unfrozen;

Thanks

On Thu, Feb 6, 2025 at 11:42 PM Melanie Plageman 
wrote:

> Attached v6 is rebased over 306dc520b9dfd60
>
> On Wed, Feb 5, 2025 at 8:54 PM wenhui qiu  wrote:
> >
> > Hi Melanie Plageman
> >Thank you for working on this ,Actually, there were two patches aimed
> at optimizing vacuum-triggered processes, and one of them reached a
> consensus and has been committed:
> https://commitfest.postgresql.org/52/5046/  ,
> https://commitfest.postgresql.org/51/5395/, Maybe referring to the
> already committed patch and setting a maximum value for
> vacuum_max_ins_threshold would be more acceptable.
>
> We could add autovacuum_vacuum_insert_max_threshold, but with an
> insert-only workload, we can expect that the cold data is being
> frozen. By calculating the threshold based on unfrozen data, we are
> effectively capping the threshold for inserted data without adding
> another guc. If any of that data is being unfrozen via updates or
> deletes, then the autovacuum_vacuum_max_threshold would apply.
>
> Perhaps I'm missing a case where calculating the insert threshold on
> unfrozen data would not act as a cap, in which case I could get on
> board with a guc.
>
>
> - Melanie
>

Re: Trigger more frequent autovacuums of heavy insert tables

2025-02-05 Thread wenhui qiu

Hi Melanie Plageman
   Thank you for working on this ,Actually, there were two patches aimed at
optimizing vacuum-triggered processes, and one of them reached a consensus
and has been committed:https://commitfest.postgresql.org/52/5046/  ,
https://commitfest.postgresql.org/51/5395/, Maybe referring to the already
committed patch and setting a maximum value for vacuum_max_ins_threshold
would be more acceptable.


Thanks

On Thu, Feb 6, 2025 at 6:08 AM Melanie Plageman 
wrote:

> On Thu, Jan 16, 2025 at 5:50 PM Melanie Plageman
>  wrote:
> >
> > On Thu, Jan 16, 2025 at 4:43 PM Melanie Plageman
> >  wrote:
> > >
> > > On Fri, Oct 25, 2024 at 11:14 AM Melanie Plageman
> > >  wrote:
> > > >
> > > > I've done something similar to this in attached v2.
> > >
> > > This needed a rebase. See attached v4.
> >
> > Whoops -- docs didn't build. Attached v5.
>
> Outside of the positive performance impact of vacuuming pages before
> they go cold (detailed in my first email [1]), there is also a
> substantial positive effect with this patch for large tables with
> substantial cold regions: fewer anti-wraparound vacuums and more
> frequent normal/aggressive vacuums
>
> With the default vacuum settings, you often see an append-only table
> devolve to _only_ anti-wraparound vacuums after the first aggressive
> vacuum. I ran an insert-only workload for an hour (with 32 clients and
> synchronous commit off to maximize the amount of data inserted) with
> the default vacuum settings. On master, after the first aggressive
> vacuum, we do only anti-wraparound vacuums (and only two of these are
> triggered). With the patch, after the first aggressive vacuum, 10 more
> vacuums are triggered -- none of which are anti-wraparound vacuums.
>
> I attached a chart comparing the autovacuums triggered on master vs
> with the patch.
>
> Besides the performance benefit of spreading the freezing work over
> more normal vacuums (thereby disrupting foreground workloads less),
> anti-wraparound vacuums are not auto canceled by DDL -- making them
> more of a nuisance to users.
>
> [1]
> https://www.postgresql.org/message-id/CAAKRu_aj-P7YyBz_cPNwztz6ohP%2BvWis%3Diz3YcomkB3NpYA--w%40mail.gmail.com
>

Re: [PATCH] Support Int64 GUCs

2024-12-10 Thread wenhui qiu

Hi aleksander
   Didn't you mark this patch as rejected last time? Do we still need to
continue this path?

Thanks

On Sun, Dec 8, 2024 at 10:16 PM Evgeny Voropaev 
wrote:

> Hi hackers!
>
> Upgraded the "Int64 GUC" patch in order to conform to f3f06b13308e3
> updates. Rebased and tested upon the current master (3f9b9621766). The
> patch is still needed to be up to date as a part of the xid64 solution.
>
> Best regards,
> Evgeny Voropaev,
> TantorLabs, LLC.
> 
>

Re: Add 64-bit XIDs into PostgreSQL 15

2024-12-16 Thread wenhui qiu

Hi Evgeny
 xid64 path split several threads ,The current one should be this:(
https://www.postgresql.org/message-id/flat/CACG=ezawg7_nt-8ey4akv2w9lculthhknwcawmbgeetnjrj...@mail.gmail.com)
,We can do some tests on path so that  can merge earlier


Thanks

On Tue, Dec 10, 2024 at 7:47 PM Evgeny  wrote:

> The v56 patch version includes redundant additions which are not quite
> relevant to the xid64. The v57 version is refined.
>
> Best regards,
> Evgeny Voropaev,
> Tantor Labs, LLC.
> 
> 
>

Re: Add 64-bit XIDs into PostgreSQL 15

2024-12-17 Thread wenhui qiu

Hi Evgeny

> we are causing the target solution to become staled
   I think the community is not positive enough about xid64,There is also a
lot of resistance, such as the wal log becomes larger
>I also hope you and the community will support attempts to retain the
> whole xid64 solution in the consistent state because only the consistent
> state allows us to continually test and fix the xid64 solution.
My point of view is the same as yours, I just kindly remind you that the
community's point of view is to split into small patches, like logical
replication ddl, which was even more popular, resulting in no news now


Thanks


On Tue, Dec 17, 2024 at 3:28 PM Evgeny Voropaev <
evgeny.vorop...@tantorlabs.com> wrote:

> > Hi Evgeny
> >  xid64 path split several threads ,The current one should be
> > this:(
> https://www.postgresql.org/message-id/flat/CACG=ezawg7_nt-8ey4akv2w9lculthhknwcawmbgeetnjrj...@mail.gmail.com)
>
> > ,We can do some tests on path so that  can merge earlier
> > Thanks
> Hello Wenhui!
>
> Thank you for pointing out the specific topic developing the Multixact
> ID Offset MXIDOFF64 solution. I agree to the fullest extent with
> community's decision to split the xid64 into smaller parts and
> subsequently merge them into the master. In the meantime, abandoning the
> full version of the original patch and developing only its fragments, we
> are causing the target solution to become staled. That is why I have
> tried to put the whole solution together. The xid64 patch (v56-v58)
> includes the MXIDOFF patch as a distinct files. The first four
> patch-files of the xid64 are exactly the 9th version of the MXIDOFF64.
> If new versions of the xid64 patch update the parts relevant to
> MXIDOFF64, we will add them into MXIDOFF64 thread as a next version. I
> also hope you and the community will support attempts to retain the
> whole xid64 solution in the consistent state because only the consistent
> state allows us to continually test and fix the xid64 solution.
>
> Best regards,
> Evgeny Voropaev,
> Tantor Labs LLC.
>
>
>

Re: PoC: history of recent vacuum/checkpoint runs (using new hooks)

2024-12-22 Thread wenhui qiu

Hi Tomas
 This is a great feature.
+ /*
+ * Define (or redefine) custom GUC variables.
+ */
+ DefineCustomIntVariable("stats_history.size",
+ "Sets the amount of memory available for past events.",
+ NULL,
+ &statsHistorySizeMB,
+ 1,
+ 1,
+ 128,
+ PGC_POSTMASTER,
+ GUC_UNIT_MB,
+ NULL,
+ NULL,
+ NULL);
+
RAM is in terabytes now, the statsHistorySize is 128MB ,I think can
increase to store more history record ?

Thanks

On Sun, Dec 22, 2024 at 4:28 AM Tomas Vondra  wrote:

> Hi,
>
> Our runtime stats system is great, but it only keeps a snapshot of
> cumulative stats / snapshot. And while that works for user workloads, it
> may not be quite sufficient when analyzing maintenance operations like
> vacuum/checkpoint, etc.
>
> For those operations it's useful to have information about individual
> runs, not just the cumulative counters (or even deltas between regular
> snapshots). There's also the issue that we only keep a very limited
> subset of available information - just look at the info included in
> VACUUM VERBOSE or with log_checkpoints=on, and how little of that is
> available in pg_stats_. For vacuum we have the vacuum/analyze counts,
> and timestamp of the last operation, but that's it. VACUUM VERBOSE
> provides way more information, but we can only guess based on the
> regular pgstat counters.
>
> Yes, we can get this info written to server log using log_checkpoints
> and log_autovacuum_min_duration (AFAIK there's no way to ensure logging
> for manual VACUUM). But processing this information is not particularly
> convenient, as it requires parsing the log, the message format is
> suitable more for humans, etc. And it'd be very convenient to be able to
> query this information, just like the other pgstat catalogs.
>
> I wonder if we might/should do two things, to improve this:
>
> 1) Introduce hooks that allow doing some custom stuff with info about
> those actions, after logging it. The attached 0001 and 0002 patches do
> this for vacuum and checkpoint.
>
> 2) Allow keeping information about events. The 0003 patch does that in
> an extension, leveraging the new hooks, but it'd certainly possible to
> do in core too.
>
> I realize our current pgstat collector is keeping per-object stats, not
> per-event. We might add this to per-object stats (e.g. each table would
> have stats about vacuum runs), but that wouldn't work for checkpoints.
> There's also the question of memory consumption - I'm sure we don't want
> to keep infinite history of vacuum runs, for example.
>
> So the extension simply maintains a fixed-size circular queue, i.e. when
> it gets full it starts evicting oldest entries. 1MB is enough for
> storing ~4k VACUUM runs - I'm sure it can be made more compact.
>
> I don't think there's a better way to do this. I've considered if this
> might be done using emit_log_hook, but (a) that only helps when we end
> up logging the event (and I'd like to do this always), and (b) it'd
> require parsing the server log. So it's not much better than just doing
> that, I think ...
>
>
> Opinions?
>
> --
> Tomas Vondra
>

Re: New GUC autovacuum_max_threshold ?

2025-01-09 Thread wenhui qiu

HI Nathan Frédéric Yhuel
 On 1/7/25 23:57, Nathan Bossart wrote:
> Here is a rebased patch for cfbot.  AFAICT we are still pretty far from
> consensus on which approach to take, unfortunately.
>

> For what it's worth, although I would have preferred the sub-linear
> growth thing, I'd much rather have this than nothing.
Agree , Better late than never. But I personally think a GUC parameter can
also be added, allowing users to choose the algorithm that works better,
especially since SQL Server is a pioneer in this area."


On Fri, Jan 10, 2025 at 2:20 AM Nathan Bossart 
wrote:

> On Wed, Jan 08, 2025 at 09:32:58PM +, Vinícius Abrahão wrote:
> > Please also provide the tests on the new parameter you want to introduce.
>
> I skimmed around and didn't see any existing tests for these kinds of
> parameters, which of course isn't a great reason not to add tests, but it's
> also not clear what such tests might look like.  Do you have any ideas?
>
> --
> nathan
>

Re: POC: make mxidoff 64 bits

2025-01-20 Thread wenhui qiu

HI Maxim
> Looks like there is a bit of a pause in the discussion. Here is a small
update. Consider v12.
> No major changes, rebase to the actual master and a squash of multiple
commits to make a
> patch set easy to reviewer.

> AFAICs, we are reached a consensus on a core patch for switching to 64
bits offsets. The
> only concern is about more comprehensive test coverage for pg_upgrade, is
it?
Agree ,When upgrading meets extremes (oldestOffsetKnown==false.) Just
follow the solution mentioned by Heikki Linnakangas.

Thanks

On Thu, Jan 16, 2025 at 9:32 PM Maxim Orlov  wrote:

> Looks like there is a bit of a pause in the discussion. Here is a small
> update. Consider v12.
> No major changes, rebase to the actual master and a squash of multiple
> commits to make a
> patch set easy to reviewer.
>
> AFAICs, we are reached a consensus on a core patch for switching to 64
> bits offsets. The
> only concern is about more comprehensive test coverage for pg_upgrade, is
> it?
>
> --
> Best regards,
> Maxim Orlov.
>

Re: Increase NUM_XLOGINSERT_LOCKS

2025-01-22 Thread wenhui qiu

HI Japin
 Thank you for you test ,It seems NUM_XLOGINSERT_LOCKS 64 is great , I
think it doesn't need to grow much,What do you think?

Regards


On Thu, Jan 23, 2025 at 10:30 AM Japin Li  wrote:

> On Sat, 18 Jan 2025 at 14:53, Yura Sokolov 
> wrote:
> > Since it seems Andres missed my request to send answer's copy,
> > here it is:
> >
> > On 2025-01-16 18:55:47 +0300, Yura Sokolov wrote:
> >> 16.01.2025 18:36, Andres Freund пишет:
> >>> Hi,
> >>>
> >>> On 2025-01-16 16:52:46 +0300, Yura Sokolov wrote:
>  Good day, hackers.
> 
>  Zhiguo Zhow proposed to transform xlog reservation to lock-free
> > algorighm to
>  increment NUM_XLOGINSERT_LOCKS on very huge (480vCPU) servers. [1]
> 
>  While I believe lock-free reservation make sense on huge server,
> > it is hard
>  to measure on small servers and personal computers/notebooks.
> 
>  But increase of NUM_XLOGINSERT_LOCKS have measurable performance
> > gain (using
>  synthetic test) even on my working notebook:
> 
> Ryzen-5825U (8 cores, 16 threads) limited to 2GHz , Ubuntu 24.04
> >>>
> >>> I've experimented with this in the past.
> >>>
> >>>
> >>> Unfortunately increasing it substantially can make the contention on
> the
> >>> spinlock *substantially* worse.
> >>>
> >>> c=80 && psql -c checkpoint -c 'select pg_switch_wal()' && pgbench
> >-n -M prepared -c$c -j$c -f <(echo "SELECT
> >pg_logical_emit_message(true, 'test', repeat('0', 1024*1024));";)
> >   -P1 -T15
> >>>
> >>> On a 2x Xeon Gold 5215, with max_wal_size = 150GB and the workload
> >ran a few
> >>> times to ensure WAL is already allocated.
> >>>
> >>> With
> >>> NUM_XLOGINSERT_LOCKS = 8:   1459 tps
> >>> NUM_XLOGINSERT_LOCKS = 80:  2163 tps
> >>
> >> So, even in your test you have +50% gain from increasing
> >> NUM_XLOGINSERT_LOCKS.
> >>
> >> (And that is why I'm keen on smaller increase, like upto 64, not 128).
> >
> > Oops, I swapped the results around when reformatting the results,
> > sorry! It's
> > the opposite way.  I.e. increasing the locks hurts.
> >
> > Here's that issue fixed and a few more NUM_XLOGINSERT_LOCKS.  This is a
> > slightly different disk (the other seems to have to go the way of the
> dodo),
> > so the results aren't expected to be exactly the same.
> >
> > NUM_XLOGINSERT_LOCKS  TPS
> > 1   2583
> > 2   2524
> > 4   2711
> > 8 2788
> > 16  1938
> > 32  1834
> > 64  1865
> > 128 1543
> >
> >
> >>>
> >>> The main reason is that the increase in insert locks puts a lot
> >more pressure
> >>> on the spinlock.
> >>
> >> That it addressed by Zhiguo Zhow and me in other thread [1]. But
> >   increasing
> >> NUM_XLOGINSERT_LOCKS gives benefits right now (at least on smaller
> >> installations), and "lock-free reservation" should be measured
> >   against it.
> >
> > I know that there's that thread, I just don't see how we can increase
> > NUM_XLOGINSERT_LOCKS due to the regressions it can cause.
> >
> >
> >>> Secondarily it's also that we spend more time iterating
> >>> through the insert locks when waiting, and that that causes a lot
> >of cacheline
> >>> pingpong.
> >>
> >> Waiting is done with LWLockWaitForVar, and there is no wait if
> >   `insertingAt`
> >> is in future. It looks very efficient in master branch code.
> >
> > But LWLockWaitForVar is called from WaitXLogInsertionsToFinish, which
> just
> > iterates over all locks.
> >
>
> Hi, Yura Sokolov
>
> I tested the patch on Hygon C86 7490 64-core using benchmarksql 5.0 with
> 500 warehouses and 256 terminals run time 10 mins:
>
> | case   | min  | avg  | max  |
> |+--+--+--|
> | master (4108440)   | 891,225.77   | 904,868.75   | 913,708.17   |
> | lock 64| 1,007,716.95 | 1,012,013.22 | 1,018,674.00 |
> | lock 64 attempt 1  | 1,016,716.07 | 1,017,735.55 | 1,019,328.36 |
> | lock 64 attempt 2  | 1,015,328.31 | 1,018,147.74 | 1,021,513.14 |
> | lock 128   | 1,010,147.38 | 1,014,128.11 | 1,018,672.01 |
> | lock 128 attempt 1 | 1,018,154.79 | 1,023,348.35 | 1,031,365.42 |
> | lock 128 attempt 2 | 1,013,245.56 | 1,018,984.78 | 1,023,696.00 |
>
> I didn't NUM_XLOGINSERT_LOCKS with 16 and 32, however, I tested it with
> 256,
> and got the following error:
>
> 2025-01-23 02:23:23.828 CST [333524] PANIC:  too many LWLocks taken
>
> I hope this test will be helpful.
>
> --
> Regrads,
> Japin Li
>

Re: [RFC] Lock-free XLog Reservation from WAL

2025-01-03 Thread wenhui qiu

Hi
Thank you for your path，NUM_XLOGINSERT_LOCKS increase to 128，I think it
will be challenged，do we make it guc ？


On Fri, 3 Jan 2025 at 20:36, Yura Sokolov  wrote:

> 02.01.2025 10:05, Zhou, Zhiguo wrote:
> > Hi all,
> >
> > I am reaching out to solicit your insights and comments on a recent
> proposal regarding the "Lock-free XLog Reservation from WAL." We have
> identified some challenges with the current WAL insertions, which require
> space reservations in the WAL buffer which involve updating two
> shared-memory statuses in XLogCtlInsert: CurrBytePos (the start position of
> the current XLog) and PrevBytePos (the prev-link to the previous XLog).
> Currently, the use of XLogCtlInsert.insertpos_lck ensures consistency but
> introduces lock contention, hindering the parallelism of XLog insertions.
> >
> > To address this issue, we propose the following changes:
> >
> > 1. Removal of PrevBytePos: This will allow increments of the CurrBytePos
> (a single uint64 field) to be implemented with an atomic operation
> (fetch_add).
> > 2. Updating Prev-Link of next XLog: Based on the fact that the prev-link
> of the next XLog always points to the head of the current Xlog,we will
> slightly exceed the reserved memory range of the current XLog to update the
> prev-link of the next XLog, regardless of which backend acquires the next
> memory space. The next XLog inserter will wait until its prev-link is
> updated for CRC calculation before starting its own XLog copy into the WAL.
> > 3. Breaking Sequential Write Convention: Each backend will update the
> prev-link of its next XLog first, then return to the header position for
> the current log insertion. This change will reduce the dependency of XLog
> writes on previous ones (compared with the sequential writes).
> > 4. Revised GetXLogBuffer: To support #3, we need update this function to
> separate the LSN it intends to access from the LSN it expects to update in
> the insertingAt field.
> > 5. Increase NUM_XLOGINSERT_LOCKS: With the above changes, increasing
> NUM_XLOGINSERT_LOCKS, for example to 128, could effectively enhance the
> parallelism.
> >
> > The attached patch could pass the regression tests (make check, make
> check-world), and in the performance test of this POC on SPR (480 vCPU)
> shows that this optimization could help the TPCC benchmark better scale
> with the core count and as a result the performance with full cores enabled
> could be improved by 2.04x.
> >
> > Before we proceed with further patch validation and refinement work, we
> are eager to hear the community's thoughts and comments on this
> optimization so that we can confirm our current work aligns with
> expectations.
>
> Good day, Zhiguo.
>
> Idea looks great.
>
> Minor issue:
> - you didn't remove use of `insertpos_lck` from `ReserveXLogSwitch`.
>
> I initially thought it became un-synchronized against
> `ReserveXLogInsertLocation`, but looking closer I found it is
> synchronized with `WALInsertLockAcquireExclusive`.
> Since there are no other `insertpos_lck` usages after your patch, I
> don't see why it should exists and be used in `ReserveXLogSwitch`.
>
> Still I'd prefer to see CAS loop in this place to be consistent with
> other non-locking access. And it will allow to get rid of
> `WALInsertLockAcquireExclusive`, (though probably it is not a big issue).
>
> Major issue:
> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read with on
> platforms where MAXALIGN != 8 or without native 64 load/store. Branch
> with 'memcpy` is rather obvious, but even pointer de-referencing on
> "lucky case" is not safe either.
>
> I have no idea how to fix it at the moment.
>
> Readability issue:
> - It would be good to add `Assert(ptr >= upto)` into `GetXLogBuffer`.
> I had hard time to recognize `upto` is strictly not in the future.
> - Certainly, final version have to have fixed and improved comments.
> Many patch's ideas are strictly non-obvious. I had hard time to
> recognize patch is not a piece of ... (excuse me for the swear sentence).
>
> Indeed, patch is much better than it looks on first sight.
> I came with alternative idea yesterday, but looking closer to your patch
> today I see it is superior to mine (if atomic access will be fixed).
>
> 
>
> regards,
> Yura Sokolov aka funny-falcon
>
>
>

Re: POC: track vacuum/analyze cumulative time per relation

2025-01-03 Thread wenhui qiu

Hi Sami
 Thank you for your path，it seems some path monitor vacuum status，Can we
synthesize their good ideas together。

On Fri, 3 Jan 2025 at 02:24, Sami Imseih  wrote:

> Hi,
>
> After a recent question regarding tracking vacuum start_time in
> pg_stat_all_tables [1], it dawned on me that this view is missing
> an important cumulative metric, which is how much time is spent
> performing vacuum per table.
>
> Currently, the only way a user can get this
> information is if they enable autovacuum logging or track timing
> for manual vacuums. Even then, if a user wants to trend
> the time spent vacuuming over time, they must store the
> timing data somewhere and perform the calculations.
>
> Also, unless autovacuum logging is enabled for all a/v
> operations, they could have gaps in their analysis.
>
> Having the total (auto)vacuum elapsed time
> along side the existing (auto)vaccum_count
> allows a user to track the average time an
> operating overtime and to find vacuum tuning
> opportunities.
>
> The same can also be said for (auto)analyze.
>
> attached a patch ( without doc changes)
> that adds 4 new columns:
>
> total_autovacuum_time
> total_vacuum_time
> total_autoanalyze_time
> total_analyze_time
>
> Below is an example of output and how it
> can be used to derive the average vacuum
> operation time.
>
> postgres=# select
> relname,
> autovacuum_count,
> total_autovacuum_time,
> total_autovacuum_time/NULLIF(autovacuum_count,0) average_autovac_time,
> vacuum_count,
> total_vacuum_time,
> total_vacuum_time/NULLIF(vacuum_count,0) average_vac_time
> from pg_catalog.pg_stat_all_tables
> where relname = 'pgbench_history';
> -[ RECORD 1 ]-+-
> relname   | pgbench_history
> autovacuum_count  | 3
> total_autovacuum_time | 1689
> average_autovac_time  | 563
> vacuum_count  | 1
> total_vacuum_time | 1
> average_vac_time  | 1
>
> It should be noted that the timing is only tracked
> when the vacuum or analyze operation completes,
> as is the case with the other metrics.
>
> Also, there is another discussion in-flight [2] regarding
> tracking vacuum run history in a view, but this serves a
> different purpose as this will provide all the metrics
> that are other wise exposed in vacuum logging
> via sql. This history will also be required to drop
> entries using some criteria to keep the cache from
> growing infinitely.
>
> Feedback for the attached patch is appreciated!
>
> Regards,
>
> Sami Imseih
> Amazon Web Services (AWS)
>
> [1]
> https://www.postgresql.org/message-id/flat/CAGjGUAKQ4UBNdkjunH2qLsdUVG-3F9gCuG0Kb0hToo%2BuMmSteQ%40mail.gmail.com
> [2]
> https://www.postgresql.org/message-id/flat/b68ab452-c41f-4d04-893f-eaab84f1855b%40vondra.me
>

Re: transaction lost when delete clog file after normal shutdown

2024-12-23 Thread wenhui qiu

>Why would you trust the other 99.999 TB if >something corrupted the clog
>file?
+1

On Tue, 24 Dec 2024 at 02:06, Jan Wieck  wrote:

> On 12/23/24 06:01, 章晨曦@易景科技 wrote:
> > Yes, of course we can solve this by restoring from backup.
> > But if the database volumn is large, say, 100TB or more, the cost
> > is really too expensive just because the tiny clog file corrupt.
>
> Why would you trust the other 99.999 TB if something corrupted the clog
> file?
>
>
> Regards, Jan
>
> >
> > Regards,
> > Jet
> >
> > Daniel Gustafsson 在2024年12月23日 周一 18:43 写道：
> >  > On 23 Dec 2024, at 11:36, 章晨曦@易景科技
> >  wrote:
> >
> >  > Human errors, disk errors, or even cosmic rays ...
> >
> > That sounds exactly like the scenario which backups are made for.  In
> > all these
> > error cases there is no way of being sure that no other part of the
> > system has
> > been compromised, so you restore from backup with WAL replay.
> >
> > --
> > Daniel Gustafsson
> >
>
>
>
>

Re: transaction lost when delete clog file after normal shutdown

2024-12-23 Thread wenhui qiu

HI Jan Wieck
 > Why would you trust the other 99.999 TB if something corrupted the clog
> file?
However, on the other hand, oracle has many solutions to open the database
after the data files are damaged, and his intention should be to start the
database even if some critical files are damaged to salvage the data
inside.Because there's a lot of that going on in the real world, and you
can't change that by firing the dba.

On Tue, Dec 24, 2024 at 9:45 AM wenhui qiu  wrote:

> >Why would you trust the other 99.999 TB if >something corrupted the clog
> >file?
> +1
>
> On Tue, 24 Dec 2024 at 02:06, Jan Wieck  wrote:
>
>> On 12/23/24 06:01, 章晨曦@易景科技 wrote:
>> > Yes, of course we can solve this by restoring from backup.
>> > But if the database volumn is large, say, 100TB or more, the cost
>> > is really too expensive just because the tiny clog file corrupt.
>>
>> Why would you trust the other 99.999 TB if something corrupted the clog
>> file?
>>
>>
>> Regards, Jan
>>
>> >
>> > Regards,
>> > Jet
>> >
>> > Daniel Gustafsson 在2024年12月23日 周一 18:43 写道：
>> >  > On 23 Dec 2024, at 11:36, 章晨曦@易景科技
>> >  wrote:
>> >
>> >  > Human errors, disk errors, or even cosmic rays ...
>> >
>> > That sounds exactly like the scenario which backups are made for.  In
>> > all these
>> > error cases there is no way of being sure that no other part of the
>> > system has
>> > been compromised, so you restore from backup with WAL replay.
>> >
>> > --
>> > Daniel Gustafsson
>> >
>>
>>
>>
>>

Re: POC: make mxidoff 64 bits

2025-03-26 Thread wenhui qiu

HI Maxim Orlov Heikki Linnakangas
Thank you for working on it,A few more days is a code freeze.It seems
like things have been quiet for a while, but I believe implementing xid64
is absolutely necessary. For instance, there’s often concern about
performance jitter caused by frequent freezes. If xid64 is implemented, the
freeze  process can be smoother.

Thanks

On Fri, Mar 7, 2025 at 7:30 PM Maxim Orlov  wrote:

> Here is a rebase, v14.
>
> --
> Best regards,
> Maxim Orlov.
>

Re: GSoC 2025 - Looking for Beginner-Friendly PostgreSQL Project

2025-03-26 Thread wenhui qiu

Hi Kruti
 You can start by  reviewing  paths ，
https://commitfest.postgresql.org/53/ ,you can choose  a path with fewer
changes . Here's my little suggestion.

Thanks

On Thu, Mar 27, 2025 at 2:15 AM Kruti Popat  wrote:

> Hello PostgreSQL Developers,
>
> My name is Kruti Popat, and I am an IT engineering student interested in
> contributing to PostgreSQL for GSoC 2025.
>
> I am comfortable with C, C++, and have a basic understanding of
> PostgreSQL. I am eager to explore database internals and open-source
> development. While I am new to PostgreSQL development, I am willing to
> learn and contribute effectively.
>
> Could you please suggest some beginner-friendly project ideas that align
> with my skills? Also, how can I make my first meaningful contribution to
> get started?
>
> Looking forward to your guidance.
>
> Thanks,
> Kruti Popat
>
>

Re: An incorrect check in get_memoize_path

2025-04-07 Thread wenhui qiu

Hi Richard
Thank you work on this

- * - * XXX this could be enabled if the remaining join quals were made
part of - * the inner scan's filter instead of the join filter. Maybe it's
worth - * considering doing that? */ - if (extra->inner_unique && -
(inner_path->param_info == NULL || -
bms_num_members(inner_path->param_info->ppi_serials) < -
list_length(extra->restrictlist))) - return NULL; + if
(extra->inner_unique) + { + Bitmapset *ppi_serials; + + if
(inner_path->param_info == NULL) + return NULL; + + ppi_serials =
inner_path->param_info->ppi_serials; + + foreach_node(RestrictInfo, rinfo,
extra->restrictlist) + { + if (!bms_is_member(rinfo->rinfo_serial,
ppi_serials)) + return NULL; + } + } The refined version introduces a more
precise validation by:
1. Explicitly verifying each restriction – Instead of relying on a count
comparison, it checks whether every restriction's serial number is included
in the inner path’s parameter info (ppi_serials).
2. Reducing false negatives – The optimizer now only discards a path if a
restriction is definitively unhandled, allowing more valid plans to be
considered.

On Mon, Apr 7, 2025 at 3:50 PM Richard Guo  wrote:

> While reviewing another patch [1], I noticed an incorrect check in
> get_memoize_path.
>
> if (extra->inner_unique &&
> (inner_path->param_info == NULL ||
>  bms_num_members(inner_path->param_info->ppi_serials) <
>  list_length(extra->restrictlist)))
> return NULL;
>
> This check is to ensure that, for unique joins, all the restriction
> clauses are parameterized to enable the use of Memoize nodes.
> However, ppi_clauses includes join clauses available from all outer
> rels, not just the current outer rel, while extra->restrictlist only
> includes the restriction clauses for the current join.  This means the
> check could pass even if a restriction clause isn't parameterized, as
> long as another join clause, which doesn't belong to the current join,
> is included in ppi_clauses.  It's not very hard to construct a query
> that exposes this issue.
>
> create table t (a int, b int);
>
> explain (costs off)
> select * from t t0 left join
>   (t t1 left join t t2 on t1.a = t2.a
>join lateral
>  (select distinct on (a, b) a, b, t0.a+t1.a+t2.a as x from t t3 offset
> 0) t3
>  on t1.a = t3.a and coalesce(t2.b) = t3.b)
>   on t0.b = t3.b;
>QUERY PLAN
> -
>  Nested Loop Left Join
>->  Seq Scan on t t0
>->  Nested Loop
>  Join Filter: (COALESCE(t2.b) = t3.b)
>  ->  Hash Left Join
>Hash Cond: (t1.a = t2.a)
>->  Seq Scan on t t1
>->  Hash
>  ->  Seq Scan on t t2
>  ->  Subquery Scan on t3
>Filter: ((t0.b = t3.b) AND (t1.a = t3.a))
>->  Unique
>  ->  Sort
>Sort Key: t3_1.a, t3_1.b
>->  Seq Scan on t t3_1
> (15 rows)
>
> Consider the join to t3.  It is a unique join, and not all of its
> restriction clauses are parameterized.  Despite this, the check still
> passes.
>
> Interestingly, although the check passes, no Memoize nodes are added
> on top of t3's paths because paraminfo_get_equal_hashops insists that
> all ppi_clauses must satisfy clause_sides_match_join (though I
> actually question whether this is necessary, but that's a separate
> issue).  However, this does not justify the check being correct.
>
> Hence, I propose the attached patch for the fix.
>
> BTW, there is a XXX comment there saying that maybe we can make the
> remaining join quals part of the inner scan's filter instead of the
> join filter.  I don't think this is possible in all cases.  In the
> above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
> filter, according to join_clause_is_movable_into.  So I removed that
> comment in the patch while we're here.
>
> Any thoughts?
>
> [1] https://postgr.es/m/60bf8d26-7c7e-4915-b544-afdb90200...@gmail.com
>
> Thanks
> Richard
>

Re: Move tests of contrib/spi/ out of the core regression tests

2025-04-08 Thread wenhui qiu

Hi
> This is too late for v18 of course, so I'll park it in the next CF.

> In my opinion this would still be totally OK for v18. It's just test
> changes. I would rather commit cleanups like this sooner than later.
Agree +1

On Tue, Apr 8, 2025 at 5:22 PM Heikki Linnakangas  wrote:

> On 08/04/2025 04:59, Tom Lane wrote:
> > The attached patch removes test cases concerned with contrib/spi/
> > from the core regression tests and instead puts them into new
> > test files in contrib/spi/ itself.
>
> +1
>
> > My original motivation for looking at this was the discussion in
> > [1] about whether to remove contrib/spi/refint.c entirely, since
> > it's rather buggy and not a great example of our modern coding
> > practices.  So I wondered whether the core test cases that use it
> > were contributing any significant amount of code coverage on the
> > core code.  (Spoiler: nope.)  But I think this is generally good
> > cleanup anyway, because it locates the test code for contrib/spi
> > where a person would expect to find that, and removes some rather
> > grotty coding in src/test/regress's Makefile and meson.build.
> > As a side benefit, it removes some small number of cycles from
> > the core tests, which seems like a good thing.
> >
> > The tests for the refint module are just moved over verbatim,
> > except for using CREATE EXTENSION instead of manual declaration
> > of the C functions.  Also, I kept the tests for COMMENT ON TRIGGER
> > in the core tests, by applying them to a different trigger.
> >
> > The tests for autoinc were kind of messy, because the behavior of
> > autoinc itself was impossibly intertwined with the behavior of
> > "ttdummy", which is an undocumented C function in regress.c.
> > After some thought I decided we could just nuke ttdummy altogether,
> > so the new autoinc.sql test is much simpler and more straightforward.
>
> My only worry would if 'ttdummy' was a good showcase for how to write a
> trigger function in C. But it's not a particularly good example. There
> is a better example in the docs, and there's the 'autoinc' trigger too.
>
> > This is too late for v18 of course, so I'll park it in the next CF.
>
> In my opinion this would still be totally OK for v18. It's just test
> changes. I would rather commit cleanups like this sooner than later.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>
>

Re: TOAST versus toast

2025-04-13 Thread wenhui qiu

Hi，
I think this point is of no significance at all. Besides, this is a
document that has been around for over ten years. Everyone has become
accustomed to this kind of expression. This is just a case of being full
but having nothing to do with anything.


On Sat, 12 Apr 2025 at 10:31, David G. Johnston 
wrote:

> On Sun, Mar 16, 2025 at 8:33 PM Peter Smith  wrote:
>
>> Thanks for your suggestions. At this point option (1) is looking most
>> attractive. Probably, I will just withdraw the CF entry soon unless
>> there is some new interest. Just chipping away fixing a few places
>> isn't going to achieve the consistency this thread was aiming for.
>>
>>
> I've moved this back to waiting on author pending a final decision.
> Interested parties might still chime in but it doesn't seem like it is
> actively looking for reviewers at this point.
>
> David J.
>
>

Re: POC: make mxidoff 64 bits

2025-04-14 Thread wenhui qiu

HI Heikki
Pls Kindly help to create task in https://commitfest.postgresql.org/53/
,I can not found this path in
Commitfest 2025-07


Thanks

On Wed, Apr 2, 2025 at 2:26 AM Heikki Linnakangas  wrote:

> On 01/04/2025 21:25, Heikki Linnakangas wrote:
> > On 07/03/2025 13:30, Maxim Orlov wrote:
> >> Here is a rebase, v14.
> >
> > Thanks! I did some manual testing of this. I created a little helper
> > function to consume multixids, to test the autovacuum behavior, and
> > found one issue:
>
> Forgot to attach the test function I used, here it is.
>
> --
> Heikki Linnakangas
> Neon (https://neon.tech)

Re: An incorrect check in get_memoize_path

2025-04-14 Thread wenhui qiu

HI
  No objections.It's a pity that the postgresql18 version has been
code-frozen


Thanks

On Mon, Apr 14, 2025 at 4:21 PM Andrei Lepikhov  wrote:

> On 4/14/25 08:49, Richard Guo wrote:
> > On Mon, Apr 7, 2025 at 4:50 PM Richard Guo 
> wrote:
> >> Hence, I propose the attached patch for the fix.
> >>
> >> BTW, there is a XXX comment there saying that maybe we can make the
> >> remaining join quals part of the inner scan's filter instead of the
> >> join filter.  I don't think this is possible in all cases.  In the
> >> above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
> >> filter, according to join_clause_is_movable_into.  So I removed that
> >> comment in the patch while we're here.
> >>
> >> Any thoughts?
> >
> > Here is an updated patch with a commit message.  Regarding
> > backporting, I'm inclined not to, given the lack of field reports.
> > Any objections to pushing it?
> No objections.
> By the way, I think you should be less shy about adding to CC people who
> have been involved in the discussion (at least David should have a
> chance to know). Also, don't hesitate to use the 'Reviewed-by' tag to
> let people know who else may remember the context of the problem, just
> in case.
>
> --
> regards, Andrei Lepikhov
>
>
>

Re: New committer: Jacob Champion

2025-04-14 Thread wenhui qiu

Congratulations

On Mon, Apr 14, 2025 at 6:56 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> > The Core Team would like to extend our congratulations to Jacob
> > Champion, who has accepted an invitation to become our newest PostgreSQL
> > committer.
>
> Congrats, Jacob!
>
> --
> Best regards,
> Aleksander Alekseev
>
>
>

Re: POC: Parallel processing of indexes in autovacuum

2025-04-17 Thread wenhui qiu

HI *Maxim Orlov*
 Thank you for your working on this ，I like your idea ，but I have a
suggestion ,autovacuum_max_workers is not need change requires restart , I
think those guc are  can like
autovacuum_max_workers
+#max_parallel_index_autovac_workers = 0 # this feature disabled by default
+ # (change requires restart)
+#autovac_idx_parallel_min_rows = 0
+ # (change requires restart)
+#autovac_idx_parallel_min_indexes = 2
+ # (change requires restart)

Thanks

On Wed, Apr 16, 2025 at 7:05 PM Maxim Orlov  wrote:

> Hi!
>
> The VACUUM command can be executed with the parallel option. As
> documentation states, it will perform index vacuum and index cleanup
> phases of VACUUM in parallel using *integer* background workers. But such
> an interesting feature is not used for an autovacuum. After a quick look
> at the source codes, it became clear to me that when the parallel option
> was added, the corresponding option for autovacuum wasn't implemented, 
> although
> there are no clear obstacles to this.
>
> Actually, one of our customers step into a problem with autovacuum on a
> table with many indexes and relatively long transactions. Of course, long
> transactions are an ultimate evil and the problem can be solved by calling
> running vacuum and a cron task, but, I think, we can do better.
>
> Anyhow, what about adding parallel option for an autovacuum? Here is a
> POC patch for proposed functionality. For the sake of simplicity's, several
> GUC's have been added. It would be good to think through the parallel
> launch condition without them.
>
> As always, any thoughts and opinions are very welcome!
>
> --
> Best regards,
> Maxim Orlov.
>

Re: Introduce some randomness to autovacuum

2025-04-25 Thread wenhui qiu

Hi，I like your idea，It would be even better if the weights could be taken
according to the larger tables

On Fri, 25 Apr 2025 at 22:03, Junwang Zhao  wrote:

> Hi hackers,
>
> After watching Robert's talk[1] on autovacuum and participating in the
> related
> workshop yesterday, it appears that people are inclined to use
> prioritization
> to address the issues highlighted in Robert's presentation. Here I list two
> of the failure modes that were discussed.
>
> - Spinning. Running repeatedly on the same table but not accomplishing
> anything useful.
> - Starvation. autovacuum can't vacuum everything that needs vacuuming.
> - ...
>
> The prioritization way needs some basic stuff that postgres doesn't have
> now.
>
> I had a random thought that introducing some randomness might help
> mitigate some of the issues mentioned above. Before performing vacuum
> on the collected tables, we could rotate the table_oids list by a random
> number within the range [0, list_length(table_oids)]. This way, every table
> would have an equal chance of being vacuumed first, thus no spinning and
> starvation.
>
> Even if there is a broken table that repeatedly gets stuck, this random
> approach would still provide opportunities for other tables to be vacuumed.
> Eventually, the system would converge.
>
> The change is something like the following, I haven't tested the code,
> just posted it here for discussion, let me know your thoughts.
>
> diff --git a/src/backend/postmaster/autovacuum.c
> b/src/backend/postmaster/autovacuum.c
> index 16756152b71..6273d22 100644
> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -79,6 +79,7 @@
>  #include "catalog/pg_namespace.h"
>  #include "commands/dbcommands.h"
>  #include "commands/vacuum.h"
> +#include "common/pg_prng.h"
>  #include "common/int.h"
>  #include "lib/ilist.h"
>  #include "libpq/pqsignal.h"
> @@ -2267,6 +2268,25 @@ do_autovacuum(void)
>
>"Autovacuum Portal",
>
>ALLOCSET_DEFAULT_SIZES);
>
> +   /*
> +* Randomly rotate the list of tables to vacuum.  This is to avoid
> +* always vacuuming the same table first, which could lead to
> spinning
> +* on the same table or vacuuming starvation.
> +*/
> +   if (list_length(table_oids) > 2)
> +   {
> +   int rand = 0;
> +   static pg_prng_state prng_state;
> +   List   *tmp_oids = NIL;
> +
> +   pg_prng_seed(&prng_state, (uint64) (getpid() ^
> time(NULL)));
> +   rand = (int) pg_prng_uint64_range(&prng_state, 0,
> list_length(table_oids) - 1);
> +   if (rand != 0) {
> +   tmp_oids = list_copy_tail(table_oids, rand);
> +   table_oids = list_copy_head(table_oids,
> list_length(table_oids) - rand);
> +   table_oids = list_concat(table_oids, tmp_oids);
> +   }
> +   }
> /*
>  * Perform operations on collected tables.
>  */
>
>
> [1] How Autovacuum Goes Wrong: And Can We Please Make It Stop Doing
> That? https://www.youtube.com/watch?v=RfTD-Twpvac
>
>
> --
> Regards
> Junwang Zhao
>
>
>

Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

2025-02-18 Thread wenhui qiu

HI
> Ok you mean that xid64 will remove the need for freezing... it's a way to
see things.
When xid64 is implemented, there will be no need to
trigger vacuum_failsafe_age,it  has a long enough time to vacuum freeze, it
will have less of an impact on performance，I think that problem  may be due
to  trigger the vacuum_failsafe_age


Thanks



On Wed, 19 Feb 2025 at 01:13, Sébastien  wrote:

> Ok you mean that xid64 will remove the need for freezing... it's a way to
> see things.
>
> Le mar. 18 févr. 2025 à 15:57, Greg Sabino Mullane  a
> écrit :
>
>> On Tue, Feb 18, 2025 at 9:17 AM Sébastien  wrote:
>>
>>> Sorry it won't work. It just delays the problem. But still the freeze
>>> procedure must rewrite all pages.
>>>
>>
>> Actually, a 64-bit transaction ID allows for quite a "delay" - like
>> hundreds of millions of years at your current rate. :)
>>
>> (Yes, there are other reasons to vacuum, and other limits and problems
>> would arise. You'd have 99 problems, but a vacuum freeze ain't one.)
>>
>> Cheers,
>> Greg
>>
>> --
>> Crunchy Data - https://www.crunchydata.com
>> Enterprise Postgres Software Products & Tech Support
>>
>>
>
> --
> Sébastien Caunes
> +33 6 7 229 229 7
>

Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

2025-02-17 Thread wenhui qiu

Hello Sébastien
  this case can be solved by xid64,but it seems like very few people
are interested.But it seems to me that xid64 should be implemented as soon
as possible.


Thanks

On Mon, Feb 17, 2025 at 9:47 PM Sébastien  wrote:

> Hello weuhui,
>
> It's not a problem with heavy insert table but heavy delete. Also
> triggering more frequent autovacuum will not help because autovacuum does
> not delete recently dead tuples when a large and slow vacuum freeze
> operation older than their delete is still running in parallel. The
> solution was to increase the priority and speed of the vaccum freeze
> opeartion.
>
> Anyway, there should be a way to insert freeze data other than copy that
> does not work with foreign tables. (INSERT into my_table select * from
> foreign_table)
>
>
>
> Le lun. 17 févr. 2025 à 09:46, wenhui qiu  a
> écrit :
>
>> HI Sébastien
>>  You can check out the email subject:Trigger more frequent
>> autovacuums of heavy insert tables , I think it can alleviate the problem
>>
>> Thanks
>>
>> On Sat, Feb 15, 2025 at 3:13 AM Andres Freund  wrote:
>>
>>> Hi,
>>>
>>> On 2025-02-13 10:52:31 +0100, Sébastien wrote:
>>> > Introduce an INSERT FROZEN feature to bypass vacuum processing for
>>> > large-scale cold data imports, reducing the impact on system
>>> performance
>>> > post-import. For large imports, migrations and major version upgrades.
>>> > Business Use-case:
>>> >
>>> > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a
>>> heavily
>>> > loaded server, we observed that the system struggled significantly
>>> weeks
>>> > later when the autovacuum process had to freeze all the imported data
>>> > pages. This led to severe performance degradation, requiring manual
>>> > intervention to prioritize vacuum jobs to complete them as quickly as
>>> > possible.
>>>
>>> What version of postgres was this?  What batch sizes do you need to
>>> support?
>>> I.e. is all of this data inserted at once, or in steps?
>>>
>>> As already discussed, it seems unlikely that we'll ever support INSERT
>>> FROZEN,
>>> due to the potential of causing concurrent queries to give bogus
>>> answers. But
>>> there's actually a lot we can do to improve this short of INSERT FROZEN.
>>>
>>> The reason I asked for the version is that the behaviour would e.g.
>>> likely be
>>> worse before autovacuum_vacuum_insert_scale_factor existed. We are
>>> working on
>>> improvements around that in 18 too, ensuring that the gap between insert
>>> triggered vacuums does not grow forever.
>>>
>>> Several recent releases have also improved the situation around this in
>>> other
>>> ways, e.g. by just making vacuuming faster and by avoiding doing
>>> redundant
>>> work in more cases (by increasing relfrozenzid more aggressively).
>>>
>>> We've also been talking about performing freezing during e.g.
>>> checkpoints, if
>>> possible.
>>>
>>> If you're inserting all the data in a single transaction however, it'll
>>> be
>>> hard to improve most of this, because while that long long transaction
>>> runs,
>>> we can't do anything that needs to know the transaction has finished.
>>> OTOH,
>>> if it were a single transaction, you could already use COPY FREEZE.
>>>
>>>
>>> A somewhat related issue is that bulk INSERTs, in contrast to COPY,
>>> currently
>>> does not use the bulk-insert logic, leading the INSERT to cause a lot
>>> more WAL
>>> to be emitted compared to inserting the same data via COPY.
>>>
>>>
>>> > This issue is particularly critical during database *migrations* or
>>> *version
>>> > upgrades*, where a full data reload is often necessary. Each time a
>>> major
>>> > PostgreSQL upgrade occurs, users must reimport large datasets, leading
>>> to
>>> > the same problem of vacuum storms post-import. An INSERT FROZEN feature
>>> > would allow importing data that is known to be immutable, preventing
>>> > unnecessary vacuum overhead and reducing system strain.
>>>
>>> What are you using for such upgrades or migrations? I'd not expect
>>> INSERT to
>>> be used, due to the overhead that has compared to COPY.
>>>
>>> Greetings,
>>>
>>> Andres Freund
>>>
>>>
>>>
>
> --
> Sébastien Caunes
> +33 6 7 229 229 7
>

Re: Trigger more frequent autovacuums of heavy insert tables

2025-02-28 Thread wenhui qiu

Hi
> + * It could be the stats were updated manually and relallfrozen >
> + * relpages. Clamp relallfrozen to relpages to avoid nonsensical
> + * calculations.
> + */
> + relallfrozen = Min(relallfrozen, relpages);
> + pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);
> + }
> +
Based on the comments, the  pcnt_unfrozen   value could potentially be 0,
which would indicate that everything is frozen. Therefore,  is it necessary
to handle the case where the value is 0.？





On Sat, Mar 1, 2025 at 1:54 AM Nathan Bossart 
wrote:

> On Wed, Feb 26, 2025 at 04:48:20PM -0500, Melanie Plageman wrote:
> > Makes sense. Thanks Robert and Nathan. Attached v11 changes the docs
> > wording and is rebased.
>
> 0001 LGTM.
>
> >  
> > - Specifies a fraction of the table size to add to
> > - autovacuum_vacuum_insert_threshold
> > - when deciding whether to trigger a VACUUM.
> > - The default is 0.2 (20% of table size).
> > - This parameter can only be set in the
> postgresql.conf
> > - file or on the server command line;
> > - but the setting can be overridden for individual tables by
> > - changing table storage parameters.
> > +Specifies a fraction of the active (unfrozen) table size to add
> to
> > +autovacuum_vacuum_insert_threshold
> > +when deciding whether to trigger a VACUUM.
> > +The default is 0.2 (20% of active table
> size).
> > +This parameter can only be set in the
> postgresql.conf
> > +file or on the server command line;
> > +but the setting can be overridden for individual tables by
> > +changing table storage parameters.
> >  
>
> nitpick: There might be an unintentional indentation change here.
>
> I'm wondering about the use of the word "active," too.  While it's
> qualified by the "(unfrozen)" after it, I'm worried it might not be
> descriptive enough.  For example, I might consider a frozen page that's in
> the buffer cache and is being read by queries to be "active."  And it
> doesn't seem clear to me that it's referring to unfrozen pages and not
> unfrozen tuples.  Perhaps we should say something like "a fraction of the
> unfrozen pages in the table to add...".
>
> > + /*
> > +  * If we have data for relallfrozen, calculate the
> unfrozen percentage
> > +  * of the table to modify insert scale factor. This helps
> us decide
> > +  * whether or not to vacuum an insert-heavy table based on
> the number
> > +  * of inserts to the "active" part of the table.
> > +  */
> > + if (relpages > 0 && relallfrozen > 0)
>
> So, if we don't have this data, we just use reltuples, which is the
> existing behavior and should trigger vacuums less aggressively than if we
> _did_ have the data.  That seems like the correct choice to me.
>
> > + /*
> > +  * It could be the stats were updated manually and
> relallfrozen >
> > +  * relpages. Clamp relallfrozen to relpages to
> avoid nonsensical
> > +  * calculations.
> > +  */
> > + relallfrozen = Min(relallfrozen, relpages);
> > + pcnt_unfrozen = 1 - ((float4) relallfrozen /
> relpages);
>
> Makes sense.
>
> --
> nathan
>
>
>

Re: Trigger more frequent autovacuums

2025-03-07 Thread wenhui qiu

Sorry ,A wrong version of debug pcnt_visibletuples ，kindly please check the
v3 attachment

On Fri, Mar 7, 2025 at 5:37 PM wenhui qiu  wrote:

> Hi
>The more accurate data I've found is tabentry->live_tuples; provides
> the second version
>
> #Here's a simple test I did
>
> test=# select count(*) from join1;
>   count
> -
>  2289001
> (1 row)
>
>
> test=# update join1 set name=md5(now()::text) where id<100;
> UPDATE 1938700
> test=# select 1938700/2289001;
>  ?column?
> --
> 0
> (1 row)
>
> test=# select 1938700/2289001::float;
>   ?column?
> 
>  0.8469633696097119
> (1 row)
>
> test=#
>
>
>
> test=# select count(*) from join1;
>   count
> -
>  2289001
> (1 row)
> test=# update join1 set name=md5(now()::text) where id<=8;
> UPDATE 159901
> test=# select 159901/2289001::float;
>   ?column?
> -
>  0.06985623859491542
> (1 row)
>
>
> test=# select * from pg_stat_all_tables where relname='join1';
> -[ RECORD 1 ]--+--
> relid  | 16385
> schemaname | public
> relname| join1
> seq_scan   | 17
> last_seq_scan  | 2025-03-07 15:34:02.793659+08
> seq_tup_read   | 14994306
> idx_scan   | 7
> last_idx_scan  | 2025-03-07 15:34:23.404788+08
> idx_tup_fetch  | 500281
> n_tup_ins  | 2289001
> n_tup_upd  | 2268604
> n_tup_del  | 0
> n_tup_hot_upd  | 399
> n_tup_newpage_upd  | 2268205
> n_live_tup | 2286701
> n_dead_tup | 159901
> n_mod_since_analyze| 159901
> n_ins_since_vacuum | 0
> last_vacuum| 2025-03-06 18:18:11.318419+08
> last_autovacuum| 2025-03-07 15:25:53.055576+08
> last_analyze   | 2025-03-06 18:18:11.424253+08
> last_autoanalyze   | 2025-03-07 15:25:53.456656+08
> vacuum_count   | 3
> autovacuum_count   | 3
> analyze_count  | 2
> autoanalyze_count  | 4
> total_vacuum_time  | 205
> total_autovacuum_time  | 2535
> total_analyze_time | 203
> total_autoanalyze_time | 1398
>
> test=#
> test=# update join1 set name=md5(now()::text) where id<=8;
> UPDATE 159901
>
>
> test=# \x
> Expanded display is on.
> test=# select (n_live_tup)/(n_live_tup+n_dead_tup)::float from
> pg_stat_all_tables where relname='join1';
> -[ RECORD 1 ]
> ?column? | 0.8774142777358045
>
> test=# select *  from pg_stat_all_tables where relname='join1';
> -[ RECORD 1 ]--+--
> relid  | 16385
> schemaname | public
> relname| join1
> seq_scan   | 17
> last_seq_scan  | 2025-03-07 15:34:02.793659+08
> seq_tup_read   | 14994306
> idx_scan   | 8
> last_idx_scan  | 2025-03-07 15:46:38.331795+08
> idx_tup_fetch  | 660182
> n_tup_ins  | 2289001
> n_tup_upd  | 2428505
> n_tup_del  | 0
> n_tup_hot_upd  | 424
> n_tup_newpage_upd  | 2428081
> n_live_tup | 2289001
> n_dead_tup | 319802
> n_mod_since_analyze| 0
> n_ins_since_vacuum | 0
> last_vacuum| 2025-03-06 18:18:11.318419+08
> last_autovacuum| 2025-03-07 15:25:53.055576+08
> last_analyze   | 2025-03-06 18:18:11.424253+08
> last_autoanalyze   | 2025-03-07 15:47:35.950932+08
> vacuum_count   | 3
> autovacuum_count   | 3
> analyze_count  | 2
> autoanalyze_count  | 5
> total_vacuum_time  | 205
> total_autovacuum_time  | 2535
> total_analyze_time | 203
> total_autoanalyze_time | 1770
>
> test=#
> tail -n 1000 postgresql-Fri_17.csv |grep join1
> 2025-03-07 17:30:12.782 +08,,,755739,,67cabca4.b881b,3,,2025-03-07
> 17:30:12 +08,2017/2,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
> 228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
> 1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
> 2025-03-07 17:31:12.803 +08,,,756028,,67cabce0.b893c,3,,2025-03-07
> 17:31:12 +08,2003/4,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
> 228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
> 1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
> 2025-03-07 17:32:12.822 +08,,,756405,,67cabd1c.b8ab5,3,,2025-03-07
> 17:32:12 +08,2006/4,0,DEBUG,0,"vacthresh: 457850.2187

Re: Trigger more frequent autovacuums

2025-03-07 Thread wenhui qiu

Hi
   The more accurate data I've found is tabentry->live_tuples; provides the
second version

#Here's a simple test I did

test=# select count(*) from join1;
  count
-
 2289001
(1 row)


test=# update join1 set name=md5(now()::text) where id<100;
UPDATE 1938700
test=# select 1938700/2289001;
 ?column?
--
0
(1 row)

test=# select 1938700/2289001::float;
  ?column?

 0.8469633696097119
(1 row)

test=#



test=# select count(*) from join1;
  count
-
 2289001
(1 row)
test=# update join1 set name=md5(now()::text) where id<=8;
UPDATE 159901
test=# select 159901/2289001::float;
  ?column?
-
 0.06985623859491542
(1 row)


test=# select * from pg_stat_all_tables where relname='join1';
-[ RECORD 1 ]--+--
relid  | 16385
schemaname | public
relname| join1
seq_scan   | 17
last_seq_scan  | 2025-03-07 15:34:02.793659+08
seq_tup_read   | 14994306
idx_scan   | 7
last_idx_scan  | 2025-03-07 15:34:23.404788+08
idx_tup_fetch  | 500281
n_tup_ins  | 2289001
n_tup_upd  | 2268604
n_tup_del  | 0
n_tup_hot_upd  | 399
n_tup_newpage_upd  | 2268205
n_live_tup | 2286701
n_dead_tup | 159901
n_mod_since_analyze| 159901
n_ins_since_vacuum | 0
last_vacuum| 2025-03-06 18:18:11.318419+08
last_autovacuum| 2025-03-07 15:25:53.055576+08
last_analyze   | 2025-03-06 18:18:11.424253+08
last_autoanalyze   | 2025-03-07 15:25:53.456656+08
vacuum_count   | 3
autovacuum_count   | 3
analyze_count  | 2
autoanalyze_count  | 4
total_vacuum_time  | 205
total_autovacuum_time  | 2535
total_analyze_time | 203
total_autoanalyze_time | 1398

test=#
test=# update join1 set name=md5(now()::text) where id<=8;
UPDATE 159901


test=# \x
Expanded display is on.
test=# select (n_live_tup)/(n_live_tup+n_dead_tup)::float from
pg_stat_all_tables where relname='join1';
-[ RECORD 1 ]
?column? | 0.8774142777358045

test=# select *  from pg_stat_all_tables where relname='join1';
-[ RECORD 1 ]--+--
relid  | 16385
schemaname | public
relname| join1
seq_scan   | 17
last_seq_scan  | 2025-03-07 15:34:02.793659+08
seq_tup_read   | 14994306
idx_scan   | 8
last_idx_scan  | 2025-03-07 15:46:38.331795+08
idx_tup_fetch  | 660182
n_tup_ins  | 2289001
n_tup_upd  | 2428505
n_tup_del  | 0
n_tup_hot_upd  | 424
n_tup_newpage_upd  | 2428081
n_live_tup | 2289001
n_dead_tup | 319802
n_mod_since_analyze| 0
n_ins_since_vacuum | 0
last_vacuum| 2025-03-06 18:18:11.318419+08
last_autovacuum| 2025-03-07 15:25:53.055576+08
last_analyze   | 2025-03-06 18:18:11.424253+08
last_autoanalyze   | 2025-03-07 15:47:35.950932+08
vacuum_count   | 3
autovacuum_count   | 3
analyze_count  | 2
autoanalyze_count  | 5
total_vacuum_time  | 205
total_autovacuum_time  | 2535
total_analyze_time | 203
total_autoanalyze_time | 1770

test=#
tail -n 1000 postgresql-Fri_17.csv |grep join1
2025-03-07 17:30:12.782 +08,,,755739,,67cabca4.b881b,3,,2025-03-07 17:30:12
+08,2017/2,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
2025-03-07 17:31:12.803 +08,,,756028,,67cabce0.b893c,3,,2025-03-07 17:31:12
+08,2003/4,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
2025-03-07 17:32:12.822 +08,,,756405,,67cabd1c.b8ab5,3,,2025-03-07 17:32:12
+08,2006/4,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
228950.109375, the join1 has 2289001.00 reltuples, pcnt_unfrozen:
1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0
2025-03-07 17:33:12.842 +08,,,757026,,67cabd58.b8d22,3,,2025-03-07 17:33:12
+08,2009/4,0,DEBUG,0,"vacthresh: 457850.218750,anlthresh:
228950.109375, the join1 has 2289001.000000 reltuples, pcnt_unfrozen:
1.00, pcnt_visibletuples: 0.877414 ","","autovacuum worker",,0

On Fri, Mar 7, 2025 at 2:22 PM wenhui qiu  wrote:

> HI Nathan Bossart Melanie Plageman
>
> Firstly, congratulations on the submission of this path:
> https://commitfest.postgresql.org/patch/5320/
>
> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
> anlthresh = (f

Trigger more frequent autovacuums

2025-03-06 Thread wenhui qiu

HI Nathan Bossart Melanie Plageman

Firstly, congratulations on the submission of this path:
https://commitfest.postgresql.org/patch/5320/

vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor *
reltuples;
These three calculations have already been optimised for two of them, and
with this patch, we have the key data pcnt_unfrozen, I think we can also
consider applying it to the vacthresh and anlthresh calculations, and I've
added a new pcnt_unrelallvisible parameter with reference to pcnt_unfrozen,
so I'm not sure if it's a good idea for me to use it. I'd like to hear your
opinions on this.
#Here's a simple test I did
test=# select count(*) from join1;
  count
-
 2289001
(1 row)

test=# update join1 set name=md5(now()::text) where id<=2;
UPDATE 70001
test=#

2025-03-07 14:03:33.968 +08,,,607191,,67ca8c35.943d7,2,,2025-03-07 14:03:33
+08,2005/2,0,DEBUG,0,"vacthresh: 222674.75,anlthresh: 11371.118164,
the j
oin1 has 2291275.00 reltuples, pcnt_unfrozen: 0.485810,
pcnt_unrelallvisible: 0.049410 ","","autovacuum worker",,0


Optimising-the-vacuum-algorithm.diff
Description: Binary data

Re: Trigger more frequent autovacuums

2025-03-10 Thread wenhui qiu

Hi Melanie Plageman
   Thank you for your reply. My calculation logic is to calculate the
proportion of active tuples. What I really want to know is whether this
algorithm is correct and acceptable. The way I wrote it is mainly to
express that I want to calculate the percentage of active tuples. When this
proportion is relatively low, it is more likely to be triggered.for
example，A million rows of tables. it  updated 199,000.
50+100 * 0.2  = 200050 ,
Use of new calculation methods  approximately equal to 50+100 * 0.2 *
0.8= 160050 ，

If this algorithm is accepted ，I follow your suggestion or you provide a
patch for a better algorithm，I actually just want to promote these
calculation formulas. In fact, I highly admire the solution provided by SQL
Server.

On Fri, Mar 7, 2025 at 11:48 PM Melanie Plageman 
wrote:

> On Fri, Mar 7, 2025 at 6:19 AM wenhui qiu  wrote:
> >
> > Sorry ,A wrong version of debug pcnt_visibletuples ，kindly please check
> the v3 attachment
>
> I looked at v3. I think I need more than the logging message to
> understand your goal here. Could you explain the algorithm and why you
> think it makes sense and what scenarios it is meant to handle better?
>
> Thinking about it conceptually, I don't think this makes sense:
>
> pcnt_visibletuples = (float4) (livetuples / (livetuples + vactuples));
> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples *
> pcnt_visibletuples
> do_vacuum = vactuples > vacthresh
>
> livetuples + deadtuples is approx reltuples (little more complicated
> than this, but), so this is basically
> livetuples/reltuples*reltuples -> livetuples
>
> So vactuples > vacthresh is basically just deadtuples > livetuples
>
> Maybe you think that we should be comparing the portion of the table
> that is dead to the portion of the table that is live, but that
> doesn't seem to be what you mean the algorithm to do based on the one
> comment you have.
>
> The anlthresh calculation is a different discussion, since
> mod_since_analyze is calculated in a different way (tuples updated +
> tuples inserted + tuples_deleted). But I am also skeptical of this
> one.
>
> I think you need to explain more conceptually about why you think
> these ways of calculating the thresholds makes sense.
>
> - Melanie
>

Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations

2025-02-17 Thread wenhui qiu

HI Sébastien
 You can check out the email subject:Trigger more frequent autovacuums
of heavy insert tables , I think it can alleviate the problem

Thanks

On Sat, Feb 15, 2025 at 3:13 AM Andres Freund  wrote:

> Hi,
>
> On 2025-02-13 10:52:31 +0100, Sébastien wrote:
> > Introduce an INSERT FROZEN feature to bypass vacuum processing for
> > large-scale cold data imports, reducing the impact on system performance
> > post-import. For large imports, migrations and major version upgrades.
> > Business Use-case:
> >
> > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a
> heavily
> > loaded server, we observed that the system struggled significantly weeks
> > later when the autovacuum process had to freeze all the imported data
> > pages. This led to severe performance degradation, requiring manual
> > intervention to prioritize vacuum jobs to complete them as quickly as
> > possible.
>
> What version of postgres was this?  What batch sizes do you need to
> support?
> I.e. is all of this data inserted at once, or in steps?
>
> As already discussed, it seems unlikely that we'll ever support INSERT
> FROZEN,
> due to the potential of causing concurrent queries to give bogus answers.
> But
> there's actually a lot we can do to improve this short of INSERT FROZEN.
>
> The reason I asked for the version is that the behaviour would e.g. likely
> be
> worse before autovacuum_vacuum_insert_scale_factor existed. We are working
> on
> improvements around that in 18 too, ensuring that the gap between insert
> triggered vacuums does not grow forever.
>
> Several recent releases have also improved the situation around this in
> other
> ways, e.g. by just making vacuuming faster and by avoiding doing redundant
> work in more cases (by increasing relfrozenzid more aggressively).
>
> We've also been talking about performing freezing during e.g. checkpoints,
> if
> possible.
>
> If you're inserting all the data in a single transaction however, it'll be
> hard to improve most of this, because while that long long transaction
> runs,
> we can't do anything that needs to know the transaction has finished. OTOH,
> if it were a single transaction, you could already use COPY FREEZE.
>
>
> A somewhat related issue is that bulk INSERTs, in contrast to COPY,
> currently
> does not use the bulk-insert logic, leading the INSERT to cause a lot more
> WAL
> to be emitted compared to inserting the same data via COPY.
>
>
> > This issue is particularly critical during database *migrations* or
> *version
> > upgrades*, where a full data reload is often necessary. Each time a major
> > PostgreSQL upgrade occurs, users must reimport large datasets, leading to
> > the same problem of vacuum storms post-import. An INSERT FROZEN feature
> > would allow importing data that is known to be immutable, preventing
> > unnecessary vacuum overhead and reducing system strain.
>
> What are you using for such upgrades or migrations? I'd not expect INSERT
> to
> be used, due to the overhead that has compared to COPY.
>
> Greetings,
>
> Andres Freund
>
>
>

Re: Trigger more frequent autovacuums of heavy insert tables

2025-02-24 Thread wenhui qiu

Hi Melanie
> relallvisible. It seems like we should make it consistent. Perhaps we
> should just remove it from heap_vacuum_rel(). Then add an assert in
>  all these places to at least protect development mistakes.
I think there's some objection to that.

Thanks

On Tue, Feb 25, 2025 at 7:35 AM Melanie Plageman 
wrote:

> On Mon, Feb 24, 2025 at 4:53 PM Nathan Bossart 
> wrote:
> >
> > On Thu, Feb 20, 2025 at 07:35:32PM -0500, Melanie Plageman wrote:
> > > Attache v7 doesn't cap the result for manual stats updating done with
> > > relation_statistics_update().
> >
> > Unfortunately, this already needs a rebase...
>
> Thanks! Attached v8 does just that.
>
> I've also taken a pass at updating the stats import tests to include
> relallfrozen. I'm not totally sure how I feel about the results. I
> erred on the side of putting relallfrozen wherever relallvisible was.
> But, that means relallfrozen is included in test cases where it
> doesn't seem important to the test case. But, that was true with
> relallvisible too.
>
> Anyway, I talked to Corey off-list last week. Maybe he will have a
> chance to weigh in on the test cases. Also, I had forgotten to include
> relallfrozen in pg_clear_relation_stats(). I've fixed that in v8, but
> now I'm worried there are some other places I may have missed.
>
> > > I did, however, keep the cap for the
> > > places where vacuum/analyze/create index update the stats. There the
> > > number for relallfrozen is coming directly from visibilitymap_count(),
> > > so it should be correct. I could perhaps add an assert instead, but I
> > > didn't think that really made sense. An assert is meant to help the
> > > developer and what could the developer do about the visibility map
> > > being corrupted.
> >
> > This might just be personal preference, but I think this is exactly the
> > sort of thing an assertion is meant for.  If we expect the value to
> always
> > be correct, and it's not, then our assumptions were wrong or someone has
> > broken something.  In both of these cases, I as a developer would really
> > like to know so that I don't introduce a latent bug.  If we want Postgres
> > to gracefully handle or detect visibility map corruption, then maybe we
> > should do both or PANIC.
>
> I'm on the fence about adding a PANIC. We do PANIC in other places
> where we notice corruption (like PageAddItemExtended()).  But, in most
> of the cases, it seems like we are PANICing because there isn't a
> reasonable way to accomplish the intended task. In this case, we
> probably can't trust the visibility map counts for that page, but the
> pg_class columns are just estimates, so just capping relallfrozen
> might be good enough.
>
> I will note that in the other place where we may notice corruption in
> the VM, in lazy_scan_prune(), we do visibilitymap_clear() and print a
> WARNING -- as opposed to PANICing. Perhaps that's because there is no
> need to panic, since we are already fixing the problem with
> visibiliytmap_clear().
>
> An assert would only help if the developer did something while
> developing that corrupted the visibility map. It doesn't help keep
> bogus values out of pg_class if a user's visibility map got corrupted
> in some way. But, maybe that isn't needed.
>
> > I do see that heap_vacuum_rel() already caps relallvisible to relpages,
> but
> > it's not clear to me whether that's something that we expect to regularly
> > happen in practice or what the adverse effects might be.  So perhaps I'm
> > misunderstanding the scope and severity of bogus results from
> > visibilitymap_count().  Commit e6858e6, which added this code, doesn't
> say
> > anything about safety, but commit 3d351d9 changed the comment in question
> > to its current wording.  After a very quick skim of the latter's thread
> > [0], I don't see any discussion about this point, either.
>
> Thanks for doing this archaeology.
>
> I am hesitant to keep the current cap on relallvisible in
> heap_vacuum_rel() but then not include an equivalent cap for
> relallfrozen. And I think it would be even more confusing for
> relallvisible to be capped but relallfrozen has an assert instead.
>
> None of the other locations where relallvisible is updated
> (do_analyze_rel(), index_update_stats()) do this capping of
> relallvisible. It seems like we should make it consistent. Perhaps we
> should just remove it from heap_vacuum_rel(). Then add an assert in
> all these places to at least protect development mistakes.
>
> - Melanie
>

Anti join confusion

2025-02-23 Thread wenhui qiu

Hi Richard Guo
   I found this path https://commitfest.postgresql.org/patch/3235/ already
supports anti join , But I've found that in many cases it doesn't work.It
always uses SubPlan Here's my testing process.

###
create table join1 (id integer,name varchar(300),k1 integer);
create table join2 (id integer,name varchar(300),score integer);
insert into join1 values (
generate_series(1,2),'AAaaaaaA',10);
insert into join1 values (
generate_series(1,2),'AAaaaaaA',10);
insert into join1 values (
generate_series(1,2),'AAaaaaaA',10);
insert into join1 values (
generate_series(50201,50300),'AASaaaaaA',10);
insert into join1 values (
generate_series(50201,50300),'AASaaaaaA',10);
insert into join1 values (
generate_series(150201,1350300),'AASaaaaaA',10);
insert into join2 values (
generate_series(1,4),'AAaaaaaA',1);
insert into join2 values (
generate_series(1,4),'AAABBAAAaaaaaA',2);
insert into join2 values (
generate_series(20001,22000),'AAaaaaaA',3);
insert into join2 values (
generate_series(150201,950300),'AAaaaaaA',3);
create index idx_j1 on join1(id);
create index idx_j2 on join2(id);
VACUUM ANALYZE JOIN1;
VACUUM ANALYZE JOIN2;

test=#  explain  SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 WHERE T2.ID>1)
test-# ;
   QUERY PLAN
-
 Gather  (cost=1000.42..9016319078.86 rows=630150 width=8)
   Workers Planned: 2
   ->  Parallel Seq Scan on join1 t1  (cost=0.42..9016255063.86 rows=262562
width=8)
 Filter: (NOT (ANY (id = (SubPlan 1).col1)))
 SubPlan 1
   ->  Materialize  (cost=0.42..32181.54 rows=863294 width=4)
 ->  Index Only Scan using idx_j2 on join2 t2
 (cost=0.42..24492.07 rows=863294 width=4)
   Index Cond: (id > 1)
(8 rows)

test=#  explain  SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 );
   QUERY PLAN
-
 Gather  (cost=1000.42..8633476697.61 rows=630150 width=8)
   Workers Planned: 2
   ->  Parallel Seq Scan on join1 t1  (cost=0.42..8633412682.61 rows=262562
width=8)
 Filter: (NOT (ANY (id = (SubPlan 1).col1)))
 SubPlan 1
   ->  Materialize  (cost=0.42..30676.42 rows=882100 width=4)
 ->  Index Only Scan using idx_j2 on join2 t2
 (cost=0.42..22819.92 rows=882100 width=4)
(7 rows)

test=#  explain  SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 where T2.ID < 1000);
   QUERY PLAN

 Seq Scan on join1 t1  (cost=61.73..38730.47 rows=630150 width=8)
   Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
   SubPlan 1
 ->  Index Only Scan using idx_j2 on join2 t2  (cost=0.42..57.06
rows=1865 width=4)
   Index Cond: (id < 1000)
(5 rows)

test=#  explain  SELECT T1.id,T1.K1 FROM join1 t1 WHERE T1.id NOT IN
(SELECT T2.id FROM join2 t2 where T2.ID = 1000);
 QUERY PLAN

 Seq Scan on join1 t1  (cost=4.45..38673.19 rows=630150 width=8)
   Filter: (NOT (ANY (id = (hashed SubPlan 1).col1)))
   SubPlan 1
 ->  Index Only Scan using idx_j2 on join2 t2  (cost=0.42..4.44 rows=1
width=4)
   Index Cond: (id = 1000)
(5 rows)

Thanks

Re: Anti join confusion

2025-02-24 Thread wenhui qiu

Hi Laurenz
Actually ,Many fork postgresql databases have already implemented ,For
example, if the relevant field has a non-null constraint ，Many databases
can do the same thing as not exist （ MySQL ，SQL Server，Oracle）


Thanks

On Mon, Feb 24, 2025 at 7:55 PM Laurenz Albe 
wrote:

> On Mon, 2025-02-24 at 17:12 +0800, wenhui qiu wrote:
> > Do we have plans for  NOT IN subquery  pull up?
>
> As mentioned before, that is not possible.
> Best practice is to avoid NOT IN with subqueries altogether.
> The result is usually not what people expect it to be.
>
> Yours,
> Laurenz Albe
>
> --
>
> *E-Mail Disclaimer*
> Der Inhalt dieser E-Mail ist ausschliesslich fuer den
> bezeichneten Adressaten bestimmt. Wenn Sie nicht der vorgesehene Adressat
> dieser E-Mail oder dessen Vertreter sein sollten, so beachten Sie bitte,
> dass jede Form der Kenntnisnahme, Veroeffentlichung, Vervielfaeltigung
> oder
> Weitergabe des Inhalts dieser E-Mail unzulaessig ist. Wir bitten Sie, sich
> in diesem Fall mit dem Absender der E-Mail in Verbindung zu setzen.
>
> *CONFIDENTIALITY NOTICE & DISCLAIMER
> *This message and any attachment are
> confidential and may be privileged or otherwise protected from disclosure
> and solely for the use of the person(s) or entity to whom it is intended.
> If you have received this message in error and are not the intended
> recipient, please notify the sender immediately and delete this message
> and
> any attachment from your system. If you are not the intended recipient, be
> advised that any use of this message is prohibited and may be unlawful,
> and
> you must not copy this message or attachment or disclose the contents to
> any other person.
>

Re: AIX support

2025-04-03 Thread wenhui qiu

HI Srirama
 It's getting close to  code freeze. Any updates from your end?

Thanks

On Tue, Mar 18, 2025 at 12:58 AM Srirama Kucherlapati 
wrote:

> Hi Team,
>
> Here are the updates on the meson on AIX.  Our team had pushed the fixes
> meson github here …
>
>
>
> https://github.com/mesonbuild/meson/pull/14335
>
>
>
> #14335 Enhance AIX shared library build to use an export List.
> 
>
>
>
>
>
>
>
>

Re: Use CLOCK_MONOTONIC_COARSE for instr_time when available

2025-03-26 Thread wenhui qiu

>
> HI
>
> > As far as I know, our usage of instr_time really needs the highest
> resolution available, because we are usually trying to measure pretty
> short intervals.  You say that this patch reduces execution time,
> and I imagine that's true ... but I wonder if it doesn't do so at
> the cost of totally destroying the reliability of the output numbers.
i strongly agree ，It seems like focusing on the small stuff while missing
the big pictur

Re: Requested WAL segment xxx has already been removed

2025-07-15 Thread wenhui qiu

HI
>What is really painful right now, logical walsenders can only look into
pg_wal, and unfortunately replication slots don't give 100% guarantee for
WAL >retention because of max_slot_wal_keep_size.
>That is, using restore_command for logical walsenders would be really
helpful and solve some problems and pain points with logical replication.
restore_command needs to be realized with the help of ssh or nfs shared
storage,most companies  due to the requirement of security audit, it is not
possible to establish ssh mutual trust.It would be very convenient if this
feature was implemented


Thanks

On Tue, Jul 15, 2025 at 5:24 PM Alexander Kukushkin 
wrote:

> Hi,
>
> On Mon, 14 Jul 2025 at 11:24, Japin Li  wrote:
>
>> The configuration is as expected. My test script simulates two distinct
>> hosts
>> by utilizing local archive storage.
>>
>> For physical replication across distinct hosts without shared WAL archive
>> storage, WALs are archived locally (in my test).
>>
>> When the primary's walsender needs a WAL file from the archive that's not
>> in
>> its pg_wal directory, manual copying is required to the primary's pg_wal
>> or the
>> standby's pg_wal (or its archive directory, and use restore_command to
>> fetch it).
>>
>> What prevents us from using the primary's restore_command to retrieve the
>> necessary WALs?
>>
>
> I am just talking about the practical side of local archive storage.
> Such archives will be gone along with the server in case of disaster and
> therefore they bring only a little value.
> With the same success, physical standby can use restore_command to copy
> files from the archive on the primary via ssh/rsync or similar. This
> approach is used for ages and works just fine.
>
> What is really painful right now, logical walsenders can only look into
> pg_wal, and unfortunately replication slots don't give 100% guarantee for
> WAL retention because of max_slot_wal_keep_size.
> That is, using restore_command for logical walsenders would be really
> helpful and solve some problems and pain points with logical replication.
>
> However, if we start calling restore_command also for physical walsenders
> it might result in increased resource usage on primary without providing
> much additional value. For example, restore_command is failing, but standby
> indefinitely continues making replication connection attempts.
>
> I don't mind if it will also work for physical replication, but IMO there
> should be a possibility to opt out from it.
>
> Regards,
> --
> Alexander Kukushkin
>

Re: Small optimization with expanding dynamic hash table

2025-07-11 Thread wenhui qiu

Hi
> The v2 patch maybe more clear:
> We can calc bucket just by hashvalue & high_mask when expanding table
because the if condition in calc_bucket() must be false.
I think you may add a comment to this path so that code reviewers can
clearly see your optimization.

Thanks

On Thu, Jul 10, 2025 at 10:46 AM cca5507  wrote:

> Hi,
>
> The v2 patch maybe more clear:
>
> We can calc bucket just by hashvalue & high_mask when expanding table
> because the if condition in calc_bucket() must be false.
>
> --
> Regards,
> ChangAo Chen
>
>

Re: Making pg_rewind faster

2025-07-06 Thread wenhui qiu

Hi
> Thanks, LGTM.
I think it's possible to register to https://commitfest.postgresql.org/54,
https://commitfest.postgresql.org/53 will closed soon



Thanks

On Fri, Jul 4, 2025 at 10:50 AM Japin Li  wrote:

> On Thu, 03 Jul 2025 at 12:59, John H  wrote:
> > Hi,
> >
> > On Wed, Jul 2, 2025 at 6:40 PM Japin Li  wrote:
> >>
> >> >
> >>
> >> Splitting the logs from $PGDATA is definitely better. The question is
> whether
> >> it's worth implementing this directly in core or if a prominent note in
> the
> >> documentation would suffice.
> >>
> >
> > I can work on the documentation update as a separate patch if folks
> > think this is worthwhile.
> >
> >> >> On Wed, Jul 2, 2025 at 10:21 AM Japin Li 
> wrote:
> >>
> >> Exactly!  It's confusing that getFileType() returns file_content_type_t
> >> instead of file_type_t.
> >>
> >
> > Ah yes that is confusing, updated in patch.
> >
> >> For v5 patch:
> >>
> >> 1.
> >> We could simply use the global WalSegSz variable within
> decide_file_action(),
> >> eliminating the need to pass wal_segsz_bytes as an argument.
> >>
> >
> > Good point.
> >
> >> 2.
> >> For last_common_segno, we could implement it similarly to WalSegSz,
> avoiding a
> >> signature change for decide_file_actions() and decide_file_action().
> I'm not
> >> insisting on this approach, however.
> >>
> >
> > I made it a global as well, and had to include access/xlog_internal.h
> > in pg_rewind.h but I don't feel strongly about it either way.
> >
>
> Thanks, LGTM.
>
> --
> Regards,
> Japin Li
>

1 2 >

1 - 100 of 123 matches

Mail list logo