date:20231222

Re: planner chooses incremental but not the best one

2023-12-22 Thread ywgrit

The possible solution of one scenario I can come up with so far is the
query's predicate columns and group columns belonging to one table .

For a query that contains where clause, perhaps num_groups could be
estimated according to the following formula.

num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where
clause * ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
sort_col_2, ... sort_col_m) / ndistinct(pred_col_1, pred_col_2, ...
pred_col_n).

ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where clause =
ndistinct(pred_var_1, pred_var_2, ... pred_var_n) * selectivity of rel.

pred_col_n belong to the columns involved in the where clause and
sort_col_m belong to the columns involved in the group by clause.

The reason for multiplying by selectivity of rel directly is that the
selectivity of rel depends on only pred_col not sort_col. So the above
formula can be simplified as follows.

num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
sort_col_2, ... sort_col_m) * selectivity of rel.

The correctness of the above formula depends on the following conditions.

   -

   ndistinct(pred_col_1, pred_col_2, ... pred_col_n)* ndistinct(pred_col_1,
   pred_col_2, ... pred_col_n, sort_col_1, sort_col_2, ... sort_col_m)
   statistics already exist, and need be accurate.
   -

   Both pred_col_n and sort_col_m are uniformly distributed, if not,
   statistics such as mcv are needed for correction.
   -

   The tuples of rel are the number of total tuples of the table , not the
   number of filtered tuples.

After experimentation, in the scenario mentioned in previous thread. The
estimate num_groups is 3, the accuracy of result strongly relies on the
uniform distribution of b, which makes ndistinct(pred_col_1, pred_col_2,
... pred_col_n) with where clause could be able to estimated accurately.

I'd like to hear your opinions.

Regards.

ywgrit.

Tomas Vondra  于2023年12月18日周一 20:53写道：

>
>
> On 12/18/23 11:40, Richard Guo wrote:
> >
> > On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra
> > mailto:tomas.von...@enterprisedb.com>>
> > wrote:
> >
> > Oh! Now I see what you meant by using the new formula in 84f9a35e3
> > depending on how we sum tuples. I agree that seems like the right
> thing.
> >
> > I'm not sure it'll actually help with the issue, though - if I apply
> the
> > patch, the plan does not actually change (and the cost changes just a
> > little bit).
> >
> > I looked at this only very briefly, but I believe it's due to the
> > assumption of independence I mentioned earlier - we end up using the
> new
> > formula introduced in 84f9a35e3, but it assumes it assumes the
> > selectivity and number of groups are independent. But that'd not the
> > case here, because the groups are very clearly correlated (with the
> > condition on "b").
> >
> >
> > You're right.  The patch allows us to adjust the estimate of distinct
> > values for appendrels using the new formula introduced in 84f9a35e3.
> > But if the restrictions are correlated with the grouping expressions,
> > the new formula does not behave well.  That's why the patch does not
> > help in case [1], where 'b' and 'c' are correlated.
> >
> > OTOH, if the restrictions are not correlated with the grouping
> > expressions, the new formula would perform quite well.  And in this case
> > the patch would help a lot, as shown in [2] where estimate_num_groups()
> > gives a much more accurate estimate with the help of this patch.
> >
> > So this patch could be useful in certain situations.  I'm wondering if
> > we should at least have this patch (if it is right).
> >
>
> I do agree the patch seems to do the right thing, and it's worth pushing
> on it's own.
>
> >
> > If that's the case, I'm not sure how to fix this :-(
> >
> >
> > The commit message of 84f9a35e3 says
> >
> > This could possibly be improved upon in the future by identifying
> > correlated restrictions and using a hybrid of the old and new
> > formulae.
> >
> > Maybe this is something we can consider trying.  But anyhow this is not
> > an easy task I suppose.
>
> Yeah, if it was easy, it'd have been done in 84f9a35e3 already ;-)
>
> The challenge is where to get usable information about correlation
> between columns. I only have a couple very rought ideas of what might
> try. For example, if we have multi-column ndistinct statistics, we might
> look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from
>
> ndistinct(b,c,d) / ndistinct(b,c)
>
> If we know how many distinct values we have for the predicate column, we
> could then estimate the number of groups. I mean, we know that for the
> restriction "WHERE b = 3" we only have 1 distinct value, so we could
> estimate the number of groups as
>
> 1 * ndistinct(b,c)
>
> I'm well aware this is only a very trivial example, and for more complex
> examples it's likely way more complicated. But hopefully it illustrates
> the gener

Re: Trigger violates foreign key constraint

2023-12-22 Thread Pavel Luzanov

One more not documented issue with system triggers. It might be worth 
considering together.


CREATE ROLE app_owner;

CREATE TABLE t (
    id    int PRIMARY KEY,
    parent_id int REFERENCES t(id)
);

ALTER TABLE t OWNER TO app_owner;

-- No actions by application owner
REVOKE ALL ON t FROM app_owner;

INSERT INTO t VALUES (1,NULL);

DELETE FROM t;
ERROR:  permission denied for table t
CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."t" x WHERE "id" 
OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"


It is not at all obvious why the superuser cannot delete the row that he 
just added. The reason is that system triggers are executed with the 
rights of the table owner, not the current role. But I can't find a 
description of this behavior in the documentation.


--
Pavel Luzanov
Postgres Professional: https://postgrespro.com

Re: Avoid computing ORDER BY junk columns unnecessarily

2023-12-22 Thread Xiaoran Wang

Hi Heikii,
I haven't dug into your patch yet, but for this problem, I have another
idea.
---
explain verbose
  select foo from mytable order by sha256(bar::bytea);

QUERY PLAN


  Index Scan using mytable_sha256_idx on public.mytable
(cost=0.29..737.28 rows=1 width=64)
Output: foo, sha256((bar)::bytea)
(2 rows)

The index is used to satisfy the ORDER BY, but the expensive ORDER BY
expression is still computed for every row, just to be thrown away by
the junk filter.
--
How about adding the orderby column value  in 'xs_heaptid' with the
'xs_heaptid'
together? So that we can use that value directly instead of computing it
when using
an index scan to fetch the ordered data.
Another problem I am concerned about is that if we exclude junk
columns in the sort plan, we may change its behavior. I'm not sure if it
can lead to some
other issues.

Heikki Linnakangas  于2023年12月22日周五 02:39写道：

> Problem
> ---
>
> We are using junk columns for (at least) two slightly different purposes:
>
> 1. For passing row IDs and other such data from lower plan nodes to
> LockRows / ModifyTable.
>
> 2. To represent ORDER BY and GROUP BY columns that don't appear in the
> SELECT list. For example, in a query like:
>
>  SELECT foo FROM mytable ORDER BY bar;
>
> The parser adds 'bar' to the target list as a junk column. You can see
> that with EXPLAIN VERBOSE:
>
> explain (verbose, costs off)
>select foo from mytable order by bar;
>
>  QUERY PLAN
> --
>   Sort
> Output: foo, bar
> Sort Key: mytable.bar
> ->  Seq Scan on public.mytable
>   Output: foo, bar
> (5 rows)
>
> The 'bar' column get filtered away in the executor, by the so-called
> junk filter. That's fine for simple cases like the above, but in some
> cases, that causes the ORDER BY value to be computed unnecessarily. For
> example:
>
> create table mytable (foo text, bar text);
> insert into mytable select g, g from generate_series(1, 1) g;
> create index on mytable (sha256(bar::bytea));
> explain verbose
>   select foo from mytable order by sha256(bar::bytea);
>
> QUERY PLAN
>
>
> 
>   Index Scan using mytable_sha256_idx on public.mytable
> (cost=0.29..737.28 rows=1 width=64)
> Output: foo, sha256((bar)::bytea)
> (2 rows)
>
> The index is used to satisfy the ORDER BY, but the expensive ORDER BY
> expression is still computed for every row, just to be thrown away by
> the junk filter.
>
> This came up with pgvector, as the vector distance functions are pretty
> expensive. All vector operations are expensive, so one extra distance
> function call per row doesn't necessarily make that much difference, but
> it sure looks silly. See
> https://github.com/pgvector/pgvector/issues/359#issuecomment-1840786021
> (thanks Matthias for the investigation!).
>
> Solution
> 
>
> The obvious solution is that the planner should not include those junk
> columns in the plan. But how exactly to implement that is a different
> matter.
>
> I came up with the attached patch set, which adds a projection to all
> the paths at the end of planning in grouping_planner(). The projection
> filters out the unnecessary junk columns. With that, the plan for the
> above example:
>
> postgres=# explain verbose select foo from mytable order by
> sha256(bar::bytea);
>QUERY PLAN
>
>
> ---
>   Index Scan using mytable_sha256_idx on public.mytable
> (cost=0.29..662.24 rows=1 width=4)
> Output: foo
> (2 rows)
>
>
> Problems with the solution
> --
>
> So this seems to work, but I have a few doubts:
>
> 1. Because Sort cannot project, this adds an extra Result node on top of
> Sort nodes when the the ORDER BY is implemented by sorting:
>
> postgres=# explain verbose select foo from mytable order by bar;
> QUERY PLAN
>
>
> 
>   Result  (cost=818.39..843.39 rows=1 width=4)
> Output: foo
> ->  Sort  (cost=818.39..843.39 rows=1 width=8)
>   Output: foo, bar
>   Sort Key: mytable.bar
>   ->  Seq Scan on public.mytable  (cost=0.00..154.00 rows=1
> width=8)
> Output: foo, bar
> (7 rows)
>
>  From a performance point of view, I think that's not as bad as it
> sounds. Remember that without this patch, the executor needs to execute
> the junk filter to filter out the extra column instead. It's not clear
> that an extra Result is worse than that, although I haven't tried
> benchmarking it though.
>
> This

Re: Set all variable-length fields of pg_attribute to null on column drop

2023-12-22 Thread Alvaro Herrera

On 2023-Nov-30, Peter Eisentraut wrote:

> I noticed that when a column is dropped, RemoveAttributeById() clears out
> certain fields in pg_attribute, but it leaves the variable-length fields at
> the end (attacl, attoptions, and attfdwoptions) unchanged. This is probably
> harmless, but it seems wasteful and unclean, and leaves potentially dangling
> data lying around (for example, attacl could contain references to users
> that are later also dropped).

Yeah, this looks like an ancient oversight -- when DROP COLUMN was added
we didn't have any varlena fields in this catalog, and when the first
one was added (attacl in commit 3cb5d6580a33) resetting it on DROP
COLUMN was overlooked.

LGTM.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)

Re: planner chooses incremental but not the best one

2023-12-22 Thread Sébastien Lardière


On 15/12/2023 09:58, Richard Guo wrote:


On Thu, Dec 14, 2023 at 6:02 PM Richard Guo  
wrote:


It seems that we need to improve estimate of distinct values in
estimate_num_groups() when taking the selectivity of restrictions into
account.

In 84f9a35e3 we changed to a new formula to perform such estimation.
But that does not apply to the case here, because for an appendrel,
set_append_rel_size() always sets "raw tuples" count equal to "rows",
and that would make estimate_num_groups() skip the adjustment of the
estimate using the new formula.


I'm wondering why we set the appendrel's 'tuples' equal to its 'rows'.
Why don't we set it to the accumulated estimate of tuples from each live
child, like attached?  I believe this aligns more closely with reality.

And this would also allow us to adjust the estimate for the number of
distinct values in estimate_num_groups() for appendrels using the new
formula introduced in 84f9a35e3.  As I experimented, this can improve
the estimate for appendrels.  For instance,

create table t (a int, b int, c float) partition by range(a);
create table tp1 partition of t for values from (0) to (1000);
create table tp2 partition of t for values from (1000) to (2000);

insert into t select i%2000, (10 * random())::int, random() from 
generate_series(1,100) i;

analyze t;

explain analyze select b from t where c < 0.1 group by b;

-- on master
 HashAggregate  (cost=18659.28..19598.74 rows=93946 width=4)
                (actual time=220.760..234.439 rows=63224 loops=1)

-- on patched
 HashAggregate  (cost=18659.28..19294.25 rows=63497 width=4)
                (actual time=235.161..250.023 rows=63224 loops=1)

With the patch the estimate for the number of distinct 'b' values is
more accurate.

BTW, this patch does not change any existing regression test results.  I
attempted to devise a regression test that shows how this change can
improve query plans, but failed.  Should I try harder to find such a
test case?



Hi,

thank you for the patch ; I've tried it and it works with the scenario 
you provide.


As Nicolas's co-worker, I've been involved in this case, but, 
unfortunately, we're not able to test the patch with the actual data for 
the moment, but I'll ask a dump to the real owner.


About the regression test, I don't know how to implement it either.

best regards,

--
Sébastien

Re: Stack overflow issue

2023-12-22 Thread Egor Chindyaskin


On 24/11/2023 21:14, Heikki Linnakangas wrote:

What do you think?
Hello! Thank you for researching the problem! I'm more of a tester than 
a developer, so I was able to check the patches from that side.
I've configured the server with CFLAGS=" -O0" and cassert enabled and 
checked the following queries:


#CommitTransactionCommand
(n=100; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT 
s$i;"; done; printf "ERROR; COMMIT;") | psql >/dev/null


#ShowTransactionStateRec
(n=100; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT 
s$i;"; done; printf "SET log_min_messages = 'DEBUG5'; SAVEPOINT sp;") | 
psql >/dev/null


#MemoryContextCheck
(n=100; printf "begin;"; for ((i=1;i<=$n;i++)); do printf "savepoint 
s$i;"; done; printf "release s1;" ) | psql >/dev/null


#MemoryContextStatsInternal
(n=100; printf "BEGIN;"; for ((i=1;i<=$n;i++)); do printf "SAVEPOINT 
s$i;"; done; printf "SELECT 
pg_log_backend_memory_contexts(pg_backend_pid())") | psql >/dev/null


On my system, every of that queries led to a server crash at a number of 
savepoints in the range from 174,400 to 174,700.
With your patches applied, the savepoint counter goes well beyond these 
values, I settled on an amount of approximately 300,000 savepoints.

Your patches look good to me.

Best regards,
Egor Chindyaskin
Postgres Professional: http://postgrespro.com/

Re: brininsert optimization opportunity

2023-12-22 Thread James Wang

Hi All,  not sure how to "Specify thread msgid"  - choose one which i think is 
close to my new feature request.

query:

SELECT count(1) FROM table1 t1 JOIN table2 t2 ON t1.id = t2.id WHERE 
t1.a_indexed_col='some_value' OR t2.a_indexed_col='some_vable';

can the server automatically replace the OR logic above with UNION please? i.e. 
replace it with:

(SELECT count(1) FROM table1 t1 JOIN table2 t2 ON t1.id = t2.id WHERE 
t1.a_indexed_col='some_value' )
UNION
(SELECT count(1) FROM table1 t1 JOIN table2 t2 ON t1.id = t2.id WHERE  
t2.a_indexed_col='some_vable');

Thanks

RE: Synchronizing slots from primary to standby

2023-12-22 Thread Zhijie Hou (Fujitsu)

On Thursday, December 21, 2023 5:39 PM Bertrand Drouvot 
 wrote:
> 
> On Thu, Dec 21, 2023 at 02:23:12AM +, Zhijie Hou (Fujitsu) wrote:
> > On Wednesday, December 20, 2023 8:42 PM Zhijie Hou (Fujitsu)
>  wrote:
> > >
> > > Attach the V51 patch set which addressed Kuroda-san's comments.
> > > I also tried to improve the test in 0003 to make it stable.
> >
> > The patches conflict with a recent commit dc21234.
> > Here is the rebased V51_2 version, there is no code changes in this version.
> >
> 
> Thanks!
> 
> I've a few remarks regarding 0001:

Thanks for the comments!

> 
> 1 ===
> 
> In the commit message what about replacing "Allow logical walsenders to wait
> for the physical standbys" with "Force some logical walsenders to wait for the
> physical standbys"?

I feel 'Allow' is OK, as the GUC standby_slot_names is optional for user. ISTM, 
'force'
means we always wait for physical standbys regardless of the GUC.

> 
> Also I think it would be better to first explain what we are trying to 
> achieve and
> after explain how we do it (adding a new flag in CREATE SUBSCRIPTION and so
> on).

Noted. We are about to split the patches, so will improve each commit message 
after that.

> 
> 4 ===
> 
> @@ -248,10 +262,13 @@ ReplicationSlotValidateName(const char *name, int
> elevel)
>   * during getting changes, if the two_phase option is enabled it can skip
>   * prepare because by that time start decoding point has been moved. So
> the
>   * user will only get commit prepared.
> + * failover: If enabled, allows the slot to be synced to physical standbys so
> + * that logical replication can be resumed after failover.
> 
> s/allows/forces ?

I think whether the slot is synced also depends on the
GUC setting on standby, so I feel 'allow' is fine here.

> 
> 5 ===
> 
> +   boolok;
> 
> parse_ok maybe?

The flag is also used to store the slot type check result, so I feel 'ok' is
better here.

> 
> 6 ===
> 
> +   /* Need a modifiable copy of string. */
> +   rawname = pstrdup(*newval);
> 
> It seems to me that the single line comments in the neighborhood functions
> (see
> RestoreSlotFromDisk() for example) don't finish with ".". Worth to follow the
> same format for all what we add in slot.c?

I felt we have both styles in slot.c, but it seems Kuroda-san also
prefer removing the ".", so will address. 

> 
> 7 ===
> 
> +static void
> +parseAlterReplSlotOptions(AlterReplicationSlotCmd *cmd, bool *failover)
> 
> ParseAlterReplSlotOptions instead?

I think it followed parseCreateReplSlotOptions, but I agree that it looks
inconsistent with other names. Will address.

> 11 ===
> 
> +* When the wait event is WAIT_FOR_STANDBY_CONFIRMATION, wait on
> another
> +* CV that is woken up by physical walsenders when the walreceiver has
> +* confirmed the receipt of LSN.
> 
> s/that is woken up by/that is broadcasted by/ ?

Will reword the comment here.

> 
> 12 ===
> 
> We are mentioning in several places that the replication can be resumed after 
> a
> failover. Should we add a few words about possible lag? (see [1])
> 
> [1]:
> https://www.postgresql.org/message-id/CAA4eK1KihniOK21mEVYtSOHRQiG
> NyToUmENWp7hPbH_PMsqzkA%40mail.gmail.com

It feels like the implementation detail to me, but noted. We will think more
about the document.


The comments not mentioned above look good to me.

Best Regards,
Hou zj

Re: [meson] expose buildtype debug/optimization info to pg_config

2023-12-22 Thread Junwang Zhao

Hi,

On Fri, Dec 15, 2023 at 10:20 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-12-14 17:24:58 +0800, Junwang Zhao wrote:
> > On Thu, Dec 14, 2023 at 4:50 PM Peter Eisentraut  
> > wrote:
> > >
> > > On 12.12.23 11:40, Junwang Zhao wrote:
> > > > build system using configure set VAL_CFLAGS with debug and
> > > > optimization flags, so pg_config will show these infos. Some
> > > > extensions depend on the mechanism.
> > > >
> > > > This patch exposes these flags with a typo fixed together.
> > >
> > > I have committed the typo fix.
> > >
> > > But I would like to learn more about the requirements of extensions in
> > > this area.  This seems a bit suspicious to me.
> >
> > This is what I found when building citus against an installation
> > of meson debug build pg instance, since the CFLAGS doesn't
> > contain -g flag, the binary doesn't include the debug information,
> > which is different behavior from configure building system.
>
> Hm. I'm not sure it's the right call to make extensions build the same way as
> the main postgres install with regard to optimization and debug info. So I
> feel a bit hesitant around generating -g and particularly -Ox. But it's
> historically what we've done...
>
> If we want to do so, I think this should not check buildtype, but debug.

I'm confused which *debug* do you mean, can you be more specific?
>
>
> > Another issue I found is that some C++
> > extensions(ajust/parquet_fdw for example) don't build against
> > the meson generated pgxs.mk, since it doesn't set the CXX
> > command. CXX is only set when llvm option is enabled, which
> > is different from old building system.
>
> I wanted to skip the C++ tests when we don't need C++, because it makes
> configure take longer. But I could be convinced that we should always at least
> determine the C++ compiler for Makefile.global.

The first idea that came to my mind is using the *project* command
to set [`c`, `cpp`], but this might be a little bit confusing for somebody.

Then I tried another way by adding a 'pgxscpp' option to let the user
choose whether he will set the C++ compiler for Makefile.global.
It works but may not be an ideal way, see the attached.


>
> Greetings,
>
> Andres Freund



-- 
Regards
Junwang Zhao


0001-PGXS-determine-C-compiler-for-Makefile.global.patch
Description: Binary data

Re: Optimization outcome depends on the index order

2023-12-22 Thread Alexander Korotkov

On Fri, Dec 22, 2023 at 8:53 AM Andrei Lepikhov 
wrote:
> On 21/12/2023 12:10, Alexander Korotkov wrote:
>  > I took a closer look at the patch in [9].  I should drop my argument
>  > about breaking the model, because add_path() already considers other
>  > aspects than just costs.  But I have two more note about that patch:
>  >
>  > 1) It seems that you're determining the fact that the index path
>  > should return strictly one row by checking path->rows <= 1.0 and
>  > indexinfo->unique.  Is it really guaranteed that in this case quals
>  > are matching unique constraint?  path->rows <= 1.0 could be just an
>  > estimation error.  Or one row could be correctly estimated, but it's
>  > going to be selected by some quals matching unique constraint and
>  > other quals in recheck.  So, it seems there is a risk to select
>  > suboptimal index due to this condition.
>
> Operating inside the optimizer, we consider all estimations to be the
> sooth. This patch modifies only one place: having two equal assumptions,
> we just choose one that generally looks more stable.
> Filtered tuples should be calculated and included in the cost of the
> path. The decision on the equality of paths has been made in view of the
> estimation of these filtered tuples.

Even if estimates are accurate the conditions in the patch doesn't
guarantee there is actually a unique condition.

# create table t as select i/1000 a, i % 1000 b, i % 1000 c from
generate_series(1,100) i;
# create unique index t_unique_idx on t(a,b);
# create index t_another_idx on t(a,c);
# \d t
 Table "public.t"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
 b  | integer |   |  |
 c  | integer |   |  |
Indexes:
"t_another_idx" btree (a, c)
"t_unique_idx" UNIQUE, btree (a, b)
# set enable_bitmapscan = off; explain select * from t where a = 1 and c =
1;
SET
Time: 0.459 ms
QUERY PLAN
--
 Index Scan using t_unique_idx on t  (cost=0.42..1635.16 rows=1 width=12)
   Index Cond: (a = 1)
   Filter: (c = 1)
(3 rows)


>  > 2) Even for non-unique indexes this patch is putting new logic on top
>  > of the subsequent code.  How we can prove it's going to be a win?
>  > That could lead, for instance, to dropping parallel-safe paths in
>  > cases we didn't do so before.
> Because we must trust all predictions made by the planner, we just
> choose the most trustworthy path. According to the planner logic, it is
> a path with a smaller selectivity. We can make mistakes anyway just
> because of the nature of estimation.

Even if we need to take selectivity into account here, it's still not clear
why this should be on top of other logic later in add_path().

>  > Anyway, please start a separate thread if you're willing to put more
>  > work into this.
>
> Done

Thanks.

--
Regards,
Alexander Korotkov

Re: Set log_lock_waits=on by default

2023-12-22 Thread Christoph Berg

Re: Robert Haas
> On Thu, Dec 21, 2023 at 8:29 AM Laurenz Albe  wrote:
> > Here is a patch to implement this.
> > Being stuck behind a lock for more than a second is almost
> > always a problem, so it is reasonable to turn this on by default.
> 
> I think it depends somewhat on the lock type, and also on your
> threshold for what constitutes a problem. For example, you can wait
> for 1 second for a relation extension lock pretty easily, I think,
> just because the I/O system is busy. Or I think also a VXID lock held
> by some transaction that has a tuple locked could be not particularly
> exciting. A conflict on a relation lock seems more likely to represent
> a real issue, but I guess it's all kind of a judgement call. A second
> isn't really all that long on an overloaded system, and I see an awful
> lot of overloaded systems (because those are the people who call me).

If a system is so busy that it's waiting so long for the disk, I would
like PG to tell me about it. Likewise, if my transactions are slow
because they are waiting for each other, I'd also like PG to tell me.
Especially as the 2nd condition can't be seen by "it's slow because
CPU or IO is at 100%".

In any case, setting log_lock_waits=on by default helps.

In fact, everyone I talked to was wondering why log_checkpoints was
turned on by default, and not this parameter. The info provided by
log_lock_waits is much more actionable than the stream of
log_checkpoint messages.

> Just a random idea but what if we separated log_lock_waits from
> deadlock_timeout? Say, it becomes time-valued rather than
> Boolean-valued, but it has to be >= deadlock_timeout? Because I'd
> probably be more interested in hearing about a lock wait that was more
> than say 10 seconds, but I don't necessarily want to wait 10 seconds
> for the deadlock detector to trigger.

That's also a good point, but I'd like to see log_lock_waits default
to 'on' independently from having this extra change.

> In general, I do kind of like the idea of trying to log more problem
> situations by default, so that when someone has a major issue, you
> don't have to start by having them change all the logging settings and
> then wait until they get hosed a second time before you can
> troubleshoot anything. I'm just concerned that 1s might be too
> sensitive for a lot of users who aren't as, let's say, diligent about
> keeping the system healthy as you probably are.

I don't think 1s would be too sensitive by default.

Christoph

Re: Synchronizing slots from primary to standby

2023-12-22 Thread shveta malik

On Thu, Dec 21, 2023 at 6:37 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shveta,
>
> Thanks for updating the patch! Here is my comments for v52-0002.

Thanks for the feedback Kuroda-san. I have addressed these in v53.

> ~
> system-views.sgml
>
> 01.
>
> ```
> +
> + 
> +  
> +   sync_state char
> +  
> +  
> +  Defines slot synchronization state. This is meaningful on the physical
> +  standby which has configured  = 
> true.
> +  Possible values are:
> +   
> +
> + n = none for user created slots,
> ...
> ```
>
> Hmm. I'm not sure why we must show a single character to a user. I'm OK for
> pg_subscription.srsubstate because it is a "catalog" - the actual value would 
> be
> recorded in the heap. But pg_replication_slot is just a view so that we can 
> replace
> internal representations to other strings. E.g., 
> pg_replication_slots.wal_status.
> How about using {none, initialized, ready} or something?

Done.

> ~
> postmaster.c
>
> 02. bgworker_should_start_now
>
> ```
> +if (start_time == BgWorkerStart_ConsistentState_HotStandby &&
> + pmState != PM_RUN)
> +return true;
> ```
>
> I'm not sure the second condition is really needed. The line will be executed 
> when
> pmState is PM_HOT_STANDBY. Is there a possibility that pmState is changed 
> around here?

'case PM_RUN:' is a fall-through and thus we need to have this second
condition under 'case PM_HOT_STANDBY' for
BgWorkerStart_ConsistentState_HotStandby to avoid the worker getting
started on non-standby.

> ~
> libpqwalreceiver.c
>
> 03. PQWalReceiverFunctions
>
> ```
> +.walrcv_get_dbname_from_conninfo = libpqrcv_get_dbname_from_conninfo,
> ```
>
> Just to confirm - is there a rule for ordering?

No, I think. I am not aware of any.

> ~
> slotsync.c
>
> 04. SlotSyncWorkerCtx
>
> ```
> typedef struct SlotSyncWorkerCtx
> {
> pid_t   pid;
> slock_t mutex;
> } SlotSyncWorkerCtx;
>
> SlotSyncWorkerCtx *SlotSyncWorker = NULL;
> ```
>
> Per other files like launcher.c, should we use a name like 
> "SlotSyncWorkerCtxStruct"?

Modified.

> 05. SlotSyncWorkerRegister()
>
> Your coding will work well, but there is another approach which validates
> slotsync parameters here. In this case, the postmaster should exit ASAP. This 
> can
> notify that there are some wrong settings to users earlier. Thought?

I think the postmaster should not exit. IMO, slot-sync worker being a
child process of postmaster, should not control start or exit of
postmaster. The worker should only exit itself if slot-sync GUCs are
not set. Have you seen any other case where postmaster exits if any of
its bgworker processes has invalid GUCs?

> 06. wait_for_primary_slot_catchup
>
> ```
> +CHECK_FOR_INTERRUPTS();
> +
> +/* Handle any termination request if any */
> +ProcessSlotSyncInterrupts(wrconn);
> ```
>
> ProcessSlotSyncInterrupts() also has CHECK_FOR_INTERRUPTS(), so no need to 
> call.

yes, removed.

> 07. wait_for_primary_slot_catchup
>
> ```
> +/*
> + * XXX: Is waiting for 2 seconds before retrying enough or more or
> + * less?
> + */
> +rc = WaitLatch(MyLatch,
> +   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
> +   2000L,
> +   WAIT_EVENT_REPL_SLOTSYNC_PRIMARY_CATCHUP);
> +
> +ResetLatch(MyLatch);
> +
> +/* Emergency bailout if postmaster has died */
> +if (rc & WL_POSTMASTER_DEATH)
> +proc_exit(1);
> ```
>
> Is there any reasons not to use WL_EXIT_ON_PM_DEATH event? If not, you can 
> use.

I think we should use WL_EXIT_ON_PM_DEATH. Corrected now.

> 08. synchronize_slots
>
> ```
> +SpinLockAcquire(&WalRcv->mutex);
> +if (!WalRcv ||
> +(WalRcv->slotname[0] == '\0') ||
> +XLogRecPtrIsInvalid(WalRcv->latestWalEnd))
> +{
> ...
> ```
>
> Assuming that WalRcv is still NULL. In this case, does the first 
> SpinLockAcquire()
> lead a segmentation fault?

It may. Thanks for pointing this out. Modified.

> 09. synchronize_slots
>
> ```
> +elog(DEBUG2, "slot sync worker's query:%s \n", s.data);
> ```
>
> The query is not dynamical one, so I think no need to print even if the debug
> mode.

Okay. Removed.

> 10. synchronize_one_slot
>
> IIUC, this function can synchronize slots even if the used plugin on primary 
> is
> not installed on the secondary server. If the slot is created by the slotsync
> worker, users will recognize it after the server is promoted and the decode is
> starting. I felt it is not good specification. Can we detect in the validation
> phase?

Noted the concern. Let me review more on this. I will revert back.

> ~
> not the source code
>
> 11.
>
> I tested the typical case - promoting a publisher from a below diagram.
> A physical replication slot "physical" was specified as standby_slot_names.
>
> ```
> node A (primary) -

Re: Avoid computing ORDER BY junk columns unnecessarily

2023-12-22 Thread Heikki Linnakangas


On 22/12/2023 11:05, Xiaoran Wang wrote:
I haven't dug into your patch yet, but for this problem, I have another 
idea.

---
explain verbose
   select foo from mytable order by sha256(bar::bytea);

                                             QUERY PLAN


   Index Scan using mytable_sha256_idx on public.mytable
(cost=0.29..737.28 rows=1 width=64)
     Output: foo, sha256((bar)::bytea)
(2 rows)

The index is used to satisfy the ORDER BY, but the expensive ORDER BY
expression is still computed for every row, just to be thrown away by
the junk filter.
--
How about adding the orderby column value  in 'xs_heaptid' with the 
'xs_heaptid'
together? So that we can use that value directly instead of computing it 
when using

an index scan to fetch the ordered data.


Hmm, so return the computed column from the index instead of recomputing 
it? Yeah, that makes sense too and would help in this example. It won't 
help in all cases though, the index might not store the original value 
in the first place.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Set log_lock_waits=on by default

2023-12-22 Thread Laurenz Albe

On Thu, 2023-12-21 at 09:14 -0500, Robert Haas wrote:
> 
> I think it depends somewhat on the lock type, and also on your
> threshold for what constitutes a problem. For example, you can wait
> for 1 second for a relation extension lock pretty easily, I think,
> just because the I/O system is busy. Or I think also a VXID lock held
> by some transaction that has a tuple locked could be not particularly
> exciting. A conflict on a relation lock seems more likely to represent
> a real issue, but I guess it's all kind of a judgement call. A second
> isn't really all that long on an overloaded system, and I see an awful
> lot of overloaded systems (because those are the people who call me).

Sure, you don't want "log_lock_waits = on" in all conceivable databases.
I have seen applications that use database locks to synchronize
application threads (*shudder*).  If it is normal for your database
to experience long lock waits, disable the parameter.

My point is that in the vast majority of cases, long lock waits
indicate a problem that you would like to know about, so the parameter
should default to "on".

(Out of curiosity: what would ever wait for a VXID lock?)

> Just a random idea but what if we separated log_lock_waits from
> deadlock_timeout? Say, it becomes time-valued rather than
> Boolean-valued, but it has to be >= deadlock_timeout? Because I'd
> probably be more interested in hearing about a lock wait that was more
> than say 10 seconds, but I don't necessarily want to wait 10 seconds
> for the deadlock detector to trigger.

That is an appealing thought, but as far as I know, "log_lock_waits"
is implemented by the deadlock detector, which is why it is tied to
"deadlock_timeout".  So if we want that, we'd need a separate "live
lock detector".  I don't know if we want to go there.

Yours,
Laurenz Albe

Re: Built-in CTYPE provider

2023-12-22 Thread Daniel Verite

Robert Haas wrote:

> For someone who is currently defaulting to es_ES.utf8 or fr_FR.utf8,
> a change to C.utf8 would be a much bigger problem, I would
> think. Their alphabet isn't in code point order, and so things would
> be alphabetized wrongly.

> That might be OK if they don't care about ordering for any purpose
> other than equality lookups, but otherwise it's going to force them
> to change the default, where today they don't have to do that.

Sure, in whatever collation setup we expose, we need to keep
it possible and even easy to sort properly with linguistic rules.

But some reasons to use $LANG as the default locale/collation
are no longer as good as they used to be, I think.

Starting with v10/ICU we have many pre-created ICU locales with
fixed names, and starting with v16, we can simply write "ORDER BY
textfield COLLATE unicode" which is good enough in most cases. So
the configuration "bytewise sort by default" / "linguistic sort on-demand"
has become more realistic.

By contrast in the pre-v10 days with only libc collations, an
application could have no idea which collations were going to be
available on the server, and how they were named precisely, as this
varies across OSes and across installs even with the same OS.
On Windows, I think that before v16 initdb did not create any libc
collation beyond C/POSIX and the default language/region of the OS.

In that libc context, if a db wants the C locale by default for
performance and truly immutable indexes, but the client app needs to
occasionally do in-db linguistic sorts, the app needs to figure out
which collation name will work for that. This is hard if you don't
target a specific installation that guarantees that such or such
collation is going to be installed.
Whereas if the linguistic locale is the default, the app never needs
to know its name or anything about it. So it's done that way,
linguistic by default. But that leaves databases with many
indexes sorted linguistically instead of bytewise for fields
that semantically never need any linguistic sort.

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Trigger violates foreign key constraint

2023-12-22 Thread Laurenz Albe

On Fri, 2023-12-22 at 10:59 +0300, Pavel Luzanov wrote:
> Please, consider small suggestion to replace last sentence.
> 
> - This is not considered a bug, and it is the responsibility of the user 
> to write triggers so that such problems are avoided.
> + It is the trigger programmer's responsibility to avoid such scenarios.
> 
> To be consistent with the sentence about recursive trigger calls: [1]
> "It is the trigger programmer's responsibility to avoid infinite 
> recursion in such scenarios."

Yes, that is better - shorter and avoids passive mode.  Changed.

> Also I don't really like "This is not considered a bug" part, since it 
> looks like an excuse.

In a way, it is an excuse, so why not be honest about it.

The example you provided in your other message (cascading triggers
fail if the table ovner has revoked the required permissions from
herself) is not really about breaking foreign keys.  You hit a
surprising error, but referential integrity will be maintained.

Patch v3 is attached.

Yours,
Laurenz Albe
From f47c149edd529dc7f1f39977b3d01ee501e19fab Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Fri, 22 Dec 2023 12:33:40 +0100
Subject: [PATCH v3] Document foreign key internals

Warn the user that foreign keys are implemented as triggers, and
that user-defined triggers can interact with them and break
referential integrity.

Author: Laurenz Albe
Reviewed-by: David G. Johnston, Pavel Luzanov
Discussion: https://postgr.es/m/b81fe38fcc25a81be6e2e5b3fc1ff624130762fa.camel%40cybertec.at
---
 doc/src/sgml/ddl.sgml | 12 
 1 file changed, 12 insertions(+)

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index d2951cd754..03c5619e9e 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1224,6 +1224,18 @@ CREATE TABLE posts (
 syntax in the reference documentation for
 .

+
+   
+
+ Foreign key constraints are implemented as system triggers in
+ PostgreSQL.  As such, they are subject to the
+ trigger firing rules described in .
+ In particular, user-defined triggers on the referencing table can cancel
+ or modify the effects of cascading deletes or updates, thereby breaking
+ referential integrity.  This is not considered a bug, and it is the
+ trigger programmer's responsibility of avoid such problems.
+
+   
   
 
   
-- 
2.43.0

Re: Transaction timeout

2023-12-22 Thread Junwang Zhao

On Fri, Dec 22, 2023 at 1:39 PM Japin Li  wrote:
>
>
> On Tue, 19 Dec 2023 at 22:06, Japin Li  wrote:
> > On Tue, 19 Dec 2023 at 18:27, Andrey M. Borodin  
> > wrote:
> >>> On 19 Dec 2023, at 13:26, Andrey M. Borodin  wrote:
> >>>
> >>> I don’t have Windows machine, so I hope CF bot will pick this.
> >>
> >> I used Github CI to produce version of tests that seems to be is stable on 
> >> Windows.
> >
> > It still failed on Windows Server 2019 [1].
> >
> > diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out 
> > C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out
> > --- C:/cirrus/src/test/isolation/expected/timeouts.out2023-12-19 
> > 10:34:30.354721100 +
> > +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out  
> > 2023-12-19 10:38:25.877981600 +
> > @@ -100,7 +100,7 @@
> >  step stt3_check_stt2: SELECT count(*) FROM pg_stat_activity WHERE 
> > application_name = 'isolation/timeouts/stt2'
> >  count
> >  -
> > -0
> > +1
> >  (1 row)
> >
> >  step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET 
> > statement_timeout = '10s'; SET lock_timeout = '10s'; SET 
> > transaction_timeout = '10s';
> >
> > [1] 
> > https://api.cirrus-ci.com/v1/artifact/task/4707530400595968/testrun/build/testrun/isolation/isolation/regression.diffs
>
> Hi,
>
> I try to split the test for transaction timeout, and all passed on my CI [1].
>
> OTOH, I find if I set transaction_timeout in a transaction, it will not take
> effect immediately.  For example:
>
> [local]:2049802 postgres=# BEGIN;
> BEGIN
> [local]:2049802 postgres=*# SET transaction_timeout TO '1s';
when this execute, TransactionTimeout is still 0, this command will
not set timeout
> SET
> [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1;  -- wait 10s
when this command get execute, start_xact_command will enable the timer
>relname
> --
>  pg_statistic
> (1 row)
>
> [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1;
> FATAL:  terminating connection due to transaction timeout
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
>
> It looks odd.  Does this is expected? I'm not read all the threads,
> am I missing something?

I think this is by design, if you debug statement_timeout, it's the same
behaviour, the timeout will be set for each command after the second
command was called, you just aren't aware of this.

I doubt people will set this in a transaction.
>
> [1] https://cirrus-ci.com/build/6574686130143232
>
> --
> Regrads,
> Japin Li
> ChengDu WenWu Information Technology Co., Ltd.
>


-- 
Regards
Junwang Zhao

Re: remaining sql/json patches

2023-12-22 Thread jian he

Hi
v33-0007-SQL-JSON-query-functions.patch, commit message:
This introduces the SQL/JSON functions for querying JSON data using
jsonpath expressions. The functions are:

should it be "These functions are"

+   
+Returns true if the SQL/JSON path_expression
+applied to the context_item using the
+values yields any items.
+The ON ERROR clause specifies what is returned if
+an error occurs; the default is to return FALSE.
+Note that if the path_expression
+is strict, an error is generated if it
yields no items.
+   

I think the following description is more accurate.
+Note that if the path_expression
+is strict and the ON
ERROR clause is  ERROR,
+an error is generated if it yields no items.
+   

+/*
+ * transformJsonTable -
+ * Transform a raw JsonTable into TableFunc.
+ *
+ * Transform the document-generating expression, the row-generating expression,
+ * the column-generating expressions, and the default value expressions.
+ */
+ParseNamespaceItem *
+transformJsonTable(ParseState *pstate, JsonTable *jt)
+{
+ JsonTableParseContext cxt;
+ TableFunc  *tf = makeNode(TableFunc);
+ JsonFuncExpr *jfe = makeNode(JsonFuncExpr);
+ JsonExpr   *je;
+ JsonTablePlan *plan = jt->plan;
+ char*rootPathName = jt->pathname;
+ char*rootPath;
+ bool is_lateral;
+
+ if (jt->on_empty)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("ON EMPTY not allowed in JSON_TABLE"),
+ parser_errposition(pstate,
+ exprLocation((Node *) jt->on_empty;

This error may be slightly misleading?
you can add ON EMPTY inside the COLUMNS part, like the following:
SELECT * FROM (VALUES ('1'), ('"1"')) vals(js) LEFT OUTER JOIN
JSON_TABLE(vals.js::jsonb, '$' COLUMNS (a int PATH '$' default 1 ON
empty)) jt ON true;

+  
+   Each NESTED PATH clause can generate one or more
+   columns. Columns produced by NESTED PATHs at the
+   same level are considered to be siblings,
+   while a column produced by a NESTED PATH is
+   considered to be a child of the column produced by a
+   NESTED PATH or row expression at a higher level.
+   Sibling columns are always joined first. Once they are processed,
+   the resulting rows are joined to the parent row.
+  
Does changing to the following make sense?
+   considered to be a child of the column produced by a
+   the resulting rows are joined to the parent row.

seems like `format json_representation`, not listed in the
documentation, but json_representation is "Parameters", do we need
add a section to explain it?
even though I think currently we can only do `FORMAT JSON`.

SELECT * FROM JSON_TABLE(jsonb '123', '$' COLUMNS (item int PATH '$'
empty on empty)) bar;
ERROR:  cannot cast jsonb array to type integer
The error is the same as the output of the following:
SELECT * FROM JSON_TABLE(jsonb '123', '$' COLUMNS (item int PATH '$'
empty array on empty )) bar;
but these two are different things?

+ /* FALLTHROUGH */
+ case JTC_EXISTS:
+ case JTC_FORMATTED:
+ {
+ Node   *je;
+ CaseTestExpr *param = makeNode(CaseTestExpr);
+
+ param->collation = InvalidOid;
+ param->typeId = cxt->contextItemTypid;
+ param->typeMod = -1;
+
+ if (rawc->wrapper != JSW_NONE &&
+ rawc->quotes != JS_QUOTES_UNSPEC)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot use WITH WRAPPER clause for formatted colunmns"
+ " without also specifying OMIT/KEEP QUOTES"),
+ parser_errposition(pstate, rawc->location)));

typo, should be "formatted columns".
I suspect people will be confused with the meaning of "formatted column".
maybe we can replace this part:"cannot use WITH WRAPPER clause for
formatted column"
to
"SQL/JSON  WITH WRAPPER behavior must not be specified when FORMAT
clause is used"

SELECT * FROM JSON_TABLE(jsonb '"world"', '$' COLUMNS (item text
FORMAT JSON PATH '$' with wrapper KEEP QUOTES));
ERROR:  cannot use WITH WRAPPER clause for formatted colunmns without
also specifying OMIT/KEEP QUOTES
LINE 1: ...T * FROM JSON_TABLE(jsonb '"world"', '$' COLUMNS (item text ...
 ^
this error is misleading, since now I am using WITH WRAPPER clause for
formatted columns and specified KEEP QUOTES.

in parse_expr.c, we have errmsg("SQL/JSON QUOTES behavior must not be
specified when WITH WRAPPER is used").

+/*
+ * Fetch next row from a cross/union joined scan.
+ *
+ * Returns false at the end of a scan, true otherwise.
+ */
+static bool
+JsonTablePlanNextRow(JsonTablePlanState * state)
+{
+ JsonTableJoinState *join;
+
+ if (state->type == JSON_TABLE_SCAN_STATE)
+ return JsonTableScanNextRow((JsonTableScanState *) state);
+
+ join = (JsonTableJoinState *) state;
+ if (join->advanceRight)
+ {
+ /* fetch next inner row */
+ if (JsonTablePlanNextRow(join->right))
+ return true;
+
+ /* inner rows are exhausted */
+ if (join->cross)
+ join->advanceRight = false; /* next outer row */
+ else
+ return false; /* end of scan */
+ }
+
+ while (!join->advanceRight)
+ {

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2023-12-22 Thread Andrey M. Borodin



> On 19 Dec 2023, at 10:34, Dilip Kumar  wrote:

Just a side node.
It seems like commit log is kind of antipattern of data contention. Even when 
we build a super-optimized SLRU. Nearby **bits** are written by different CPUs.
I think that banks and locks are good thing. But also we could reorganize log 
so that 
status of transaction 0 is on a page 0 at bit offset 0
status of transaction 1 is on a page 1 at bit offset 0
status of transaction 2 is on a page 2 at bit offset 0
status of transaction 3 is on a page 3 at bit offset 0
status of transaction 4 is on a page 0 at bit offset 2
status of transaction 5 is on a page 1 at bit offset 2
status of transaction 6 is on a page 2 at bit offset 2
status of transaction 7 is on a page 3 at bit offset 2
etc...

And it would be even better if page for transaction statuses would be 
determined by backend id somehow. Or at least cache line. Can we allocate a 
range (sizeof(cacheline)) of xids\subxids\multixacts\whatever for each backend?

This does not matter much because
0. Patch set in current thread produces robust SLRU anyway
1. One day we are going to throw away SLRU anyway


Best regards, Andrey Borodin.

Re: Remove MSVC scripts from the tree

2023-12-22 Thread Andrew Dunstan




On 2023-12-21 Th 18:20, Michael Paquier wrote:

On Thu, Dec 21, 2023 at 03:43:32PM -0500, Andrew Dunstan wrote:

On 2023-12-21 Th 03:01, Michael Paquier wrote:

Andrew, was the original target of pgperlsyncheck committers and
hackers who played with the MSVC scripts but could not run sanity
checks on Windows (see [1])?


yes.

Okay, thanks.  Wouldn't it be better to remove it at the end?  With
the main use case behind its introduction being gone, it is less
attractive to keep maintaining it.  If some people have been using it
in their workflows, I'm OK to keep it but the rest of the tree can be
checked at runtime as well.


I'm actually a bit dubious about win32tzlist.pl. Win32::Registry is not
present in a recent Strawberry Perl installation, and its latest version
says it is obsolete, although it's still included in the cpan bundle
libwin32.

I wonder who has actually run the script any time recently?

Hmm...  I've never run it with meson on Win32.



Turns out I was wrong - Windows sometimes doesn't find files nicely. It 
is present in my Strawberry installation.






In any case, we can probably work around the syncheck issue by making the
module a runtime requirement rather than a compile time requirement, by
using "require" instead of "use".

Interesting.  Another trick would be needed for HKEY_LOCAL_MACHINE,
like what the dummylib but local to win32tzlist.pl.  Roughly among
these lines:
-use Win32::Registry;
+use Config;
+
+require Win32::Registry;
  
  my $tzfile = 'src/bin/initdb/findtimezone.c';
  
+if ($Config{osname} ne 'MSWin32' && $Config{osname} ne 'msys')

+{
+   use vars qw($HKEY_LOCAL_MACHINE);
+}



I've done it a bit differently, but the same idea. I have tested that 
what I committed passes checks on Unix and works on Windows.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-12-22 Thread John Naylor

On Thu, Dec 21, 2023 at 6:27 PM Andres Freund  wrote:
>
> Could either of you summarize what the design changes you've made in the last
> months are and why you've done them? Unfortunately this thread is very long,
> and the comments in the file just say "FIXME" in places that apparently are
> affected by design changes.  This makes it hard to catch up here.

I'd be happy to try, since we are about due for a summary. I was also
hoping to reach a coherent-enough state sometime in early January to
request your feedback, so good timing. Not sure how much detail to go
into, but here goes:

Back in May [1], the method of value storage shifted towards "combined
pointer-value slots", which was described and recommended in the
paper. There were some other changes for simplicity and efficiency,
but none as far-reaching as this.

This is enabled by using the template architecture that we adopted
long ago for different reasons. Fixed length values are either stored
in the slot of the last-level node (if the value fits into the
platform's pointer), or are a "single-value" leaf (otherwise).

For tid store, we want to eventually support bitmap heap scans (in
addition to vacuum), and in doing so make it independent of heap AM.
That means value types similar to PageTableEntry tidbitmap.c, but with
a variable number of bitmapwords.

That required radix tree to support variable length values. That has
been the main focus in the last several months, and it basically works
now.

To my mind, the biggest architectural issues in the patch today are:

- Variable-length values means that pointers are passed around in
places. This will require some shifting responsibility for locking to
the caller, or longer-term maybe a callback interface. (This is new,
the below are pre-existing issues.)
- The tid store has its own "control object" (when shared memory is
needed) with its own lock, in addition to the same for the associated
radix tree. This leads to unnecessary double-locking. This area needs
some attention.
- Memory accounting is still unsettled. The current thinking is to cap
max block/segment size, scaled to a fraction of m_w_m, but there are
still open questions.

There has been some recent effort toward finishing work started
earlier, like shrinking nodes. There a couple places that can still
use either simplification or optimization, but otherwise work fine.
Most of the remaining fixmes/todos/wips are trivial; a few are
actually outdated now that I look again, and will be removed shortly.
The regression tests could use some tidying up.

-John

[1] 
https://www.postgresql.org/message-id/CAFBsxsFyWLxweHVDtKb7otOCR4XdQGYR4b%2B9svxpVFnJs08BmQ%40mail.gmail.com

Re: Transaction timeout

2023-12-22 Thread Japin Li



On Fri, 22 Dec 2023 at 20:29, Junwang Zhao  wrote:
> On Fri, Dec 22, 2023 at 1:39 PM Japin Li  wrote:
>>
>>
>> On Tue, 19 Dec 2023 at 22:06, Japin Li  wrote:
>> > On Tue, 19 Dec 2023 at 18:27, Andrey M. Borodin  
>> > wrote:
>> >>> On 19 Dec 2023, at 13:26, Andrey M. Borodin  wrote:
>> >>>
>> >>> I don’t have Windows machine, so I hope CF bot will pick this.
>> >>
>> >> I used Github CI to produce version of tests that seems to be is stable 
>> >> on Windows.
>> >
>> > It still failed on Windows Server 2019 [1].
>> >
>> > diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out 
>> > C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out
>> > --- C:/cirrus/src/test/isolation/expected/timeouts.out2023-12-19 
>> > 10:34:30.354721100 +
>> > +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out  
>> > 2023-12-19 10:38:25.877981600 +
>> > @@ -100,7 +100,7 @@
>> >  step stt3_check_stt2: SELECT count(*) FROM pg_stat_activity WHERE 
>> > application_name = 'isolation/timeouts/stt2'
>> >  count
>> >  -
>> > -0
>> > +1
>> >  (1 row)
>> >
>> >  step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET 
>> > statement_timeout = '10s'; SET lock_timeout = '10s'; SET 
>> > transaction_timeout = '10s';
>> >
>> > [1] 
>> > https://api.cirrus-ci.com/v1/artifact/task/4707530400595968/testrun/build/testrun/isolation/isolation/regression.diffs
>>
>> Hi,
>>
>> I try to split the test for transaction timeout, and all passed on my CI [1].
>>
>> OTOH, I find if I set transaction_timeout in a transaction, it will not take
>> effect immediately.  For example:
>>
>> [local]:2049802 postgres=# BEGIN;
>> BEGIN
>> [local]:2049802 postgres=*# SET transaction_timeout TO '1s';
> when this execute, TransactionTimeout is still 0, this command will
> not set timeout
>> SET
>> [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1;  -- wait 
>> 10s
> when this command get execute, start_xact_command will enable the timer

Thanks for your exaplantion, got it.

>>relname
>> --
>>  pg_statistic
>> (1 row)
>>
>> [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1;
>> FATAL:  terminating connection due to transaction timeout
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> The connection to the server was lost. Attempting reset: Succeeded.
>>
>> It looks odd.  Does this is expected? I'm not read all the threads,
>> am I missing something?
>
> I think this is by design, if you debug statement_timeout, it's the same
> behaviour, the timeout will be set for each command after the second
> command was called, you just aren't aware of this.
>

I try to set idle_in_transaction_session_timeout after begin transaction,
it changes immediately, so I think transaction_timeout should also be take
immediately.

> I doubt people will set this in a transaction.

Maybe not,


--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Re: Synchronizing slots from primary to standby

2023-12-22 Thread Bertrand Drouvot

Hi,

On Fri, Dec 22, 2023 at 04:02:21PM +0530, shveta malik wrote:
> PFA v53. Changes are:

Thanks!

> patch002:
> 2) Addressed comments in [2] for v52-002.
> 3) Fixed CFBot failure. The failure was caused by an assert in
> wait_for_primary_slot_catchup() for null confirmed_lsn received. In
> wait_for_primary_slot_catchup(), we had an assumption that if
> restart_lsn is valid and 'conflicting' is also false, then we must
> have non-null confirmed_lsn. But this is not true. It is possible to
> get null values for confirmed_lsn and catalog_xmin if on the primary
> server the slot is just created with a valid restart_lsn and slot-sync
> worker has fetched the slot before the primary server could set valid
> confirmed_lsn and catalog_xmin. In
> pg_create_logical_replication_slot(), there is a small window between
> CreateInitDecodingContext-->ReplicationSlotReserveWal() which sets
> restart_lsn and DecodingContextFindStartpoint() which sets
> confirmed_lsn. If the slot-sync worker fetches the slot in this
> window, confirmed_lsn received will be NULL. Corrected the code to
> remove assert and added one additional condition that confirmed_lsn
> should be valid before moving the slot to 'r'.
> 

Looking at v53-0002 commit message:

It states:

"
If a logical slot on the primary is valid but is invalidated on the standby,
then that slot is dropped and recreated on the standby in next sync-cycle.
"

and one of the reasons mentioned is:

"
- The primary changes wal_level to a level lower than logical.
"

I think that as long at there is still logical replication slot on the primary
that should not be possible. The primary should fail to start with messages 
like:

"
2023-12-22 14:06:09.281 UTC [31824] FATAL:  logical replication slot 
"logical_slot" exists, but wal_level < logical
"

Now, if:

- The standby is shutdown
- All the logical replication slots are removed on the primary
- wal_level is set to < logical on the primary and it is restarted

Then when the standby starts, the "synced" slots will be invalidated and later 
removed but not re-created on the next sync-cycle (because they don't exist
anymore on the primary).

Worth to reword a bit that part?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Transaction timeout

2023-12-22 Thread Junwang Zhao

On Fri, Dec 22, 2023 at 10:25 PM Japin Li  wrote:
>
>
> On Fri, 22 Dec 2023 at 20:29, Junwang Zhao  wrote:
> > On Fri, Dec 22, 2023 at 1:39 PM Japin Li  wrote:
> >>
> >>
> >> On Tue, 19 Dec 2023 at 22:06, Japin Li  wrote:
> >> > On Tue, 19 Dec 2023 at 18:27, Andrey M. Borodin  
> >> > wrote:
> >> >>> On 19 Dec 2023, at 13:26, Andrey M. Borodin  
> >> >>> wrote:
> >> >>>
> >> >>> I don’t have Windows machine, so I hope CF bot will pick this.
> >> >>
> >> >> I used Github CI to produce version of tests that seems to be is stable 
> >> >> on Windows.
> >> >
> >> > It still failed on Windows Server 2019 [1].
> >> >
> >> > diff -w -U3 C:/cirrus/src/test/isolation/expected/timeouts.out 
> >> > C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out
> >> > --- C:/cirrus/src/test/isolation/expected/timeouts.out2023-12-19 
> >> > 10:34:30.354721100 +
> >> > +++ C:/cirrus/build/testrun/isolation/isolation/results/timeouts.out  
> >> > 2023-12-19 10:38:25.877981600 +
> >> > @@ -100,7 +100,7 @@
> >> >  step stt3_check_stt2: SELECT count(*) FROM pg_stat_activity WHERE 
> >> > application_name = 'isolation/timeouts/stt2'
> >> >  count
> >> >  -
> >> > -0
> >> > +1
> >> >  (1 row)
> >> >
> >> >  step itt4_set: SET idle_in_transaction_session_timeout = '1ms'; SET 
> >> > statement_timeout = '10s'; SET lock_timeout = '10s'; SET 
> >> > transaction_timeout = '10s';
> >> >
> >> > [1] 
> >> > https://api.cirrus-ci.com/v1/artifact/task/4707530400595968/testrun/build/testrun/isolation/isolation/regression.diffs
> >>
> >> Hi,
> >>
> >> I try to split the test for transaction timeout, and all passed on my CI 
> >> [1].
> >>
> >> OTOH, I find if I set transaction_timeout in a transaction, it will not 
> >> take
> >> effect immediately.  For example:
> >>
> >> [local]:2049802 postgres=# BEGIN;
> >> BEGIN
> >> [local]:2049802 postgres=*# SET transaction_timeout TO '1s';
> > when this execute, TransactionTimeout is still 0, this command will
> > not set timeout
> >> SET
> >> [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1;  -- wait 
> >> 10s
> > when this command get execute, start_xact_command will enable the timer
>
> Thanks for your exaplantion, got it.
>
> >>relname
> >> --
> >>  pg_statistic
> >> (1 row)
> >>
> >> [local]:2049802 postgres=*# SELECT relname FROM pg_class LIMIT 1;
> >> FATAL:  terminating connection due to transaction timeout
> >> server closed the connection unexpectedly
> >> This probably means the server terminated abnormally
> >> before or while processing the request.
> >> The connection to the server was lost. Attempting reset: Succeeded.
> >>
> >> It looks odd.  Does this is expected? I'm not read all the threads,
> >> am I missing something?
> >
> > I think this is by design, if you debug statement_timeout, it's the same
> > behaviour, the timeout will be set for each command after the second
> > command was called, you just aren't aware of this.
> >
>
> I try to set idle_in_transaction_session_timeout after begin transaction,
> it changes immediately, so I think transaction_timeout should also be take
> immediately.

Ah, right, idle_in_transaction_session_timeout is set after the set
command finishes and before the backend send *ready for query*
to the client, so the value of the GUC is already set before
next command.

I bet you must have checked this ;)

>
> > I doubt people will set this in a transaction.
>
> Maybe not,
>
>
> --
> Regrads,
> Japin Li
> ChengDu WenWu Information Technology Co., Ltd.



-- 
Regards
Junwang Zhao

Re: Transaction timeout

2023-12-22 Thread Japin Li



On Fri, 22 Dec 2023 at 22:37, Junwang Zhao  wrote:
> On Fri, Dec 22, 2023 at 10:25 PM Japin Li  wrote:
>> I try to set idle_in_transaction_session_timeout after begin transaction,
>> it changes immediately, so I think transaction_timeout should also be take
>> immediately.
>
> Ah, right, idle_in_transaction_session_timeout is set after the set
> command finishes and before the backend send *ready for query*
> to the client, so the value of the GUC is already set before
> next command.
>

I mean, is it possible to set transaction_timeout before next comand?


--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation

2023-12-22 Thread Japin Li



On Thu, 21 Dec 2023 at 21:03, Pavel Stehule  wrote:
> Hi
>
> čt 21. 12. 2023 v 13:37 odesílatel Ishaan Adarsh 
> napsal:
>
>> The recent documentation patches are part of my GSoC 2023 project
>> 
>> to develop a comprehensive PostgreSQL extension development tutorial, it
>> assumes only a basic knowledge of Postgres and the target programming
>> language.
>>
>> The entire project is available on GitHub: Postgres-extension-tutorial
>> .
>> It covers many topics, including prerequisites, writing extensions,
>> creating Makefiles, using procedural languages, incorporating external
>> languages, writing regression tests, and managing extension releases. *The 
>> patch submitted
>> for procedural languages, specifically PL/pgSQL and PL/Python, is part of
>> the procedural language section within the broader tutorial. *
>>
>> Based on the feedback I think there is a real need
>>  for this as this
>> is a very important and growing part of the Postgres ecosystem. Currently,
>> all the extension material is scattered and very limited. There are
>> various third-party blog posts focusing on different areas, and sometimes
>> contradictory. The main motivation behind making this is to make the barrier
>> for entry less prohibitive for new contributors.
>>
>> I would greatly appreciate your input on how to add it to the existing
>> documentation (this is where I have major doubts) and any suggestions on
>> how to proceed. If there are areas where the existing documentation is
>> already sufficient or if there are ways to improve the overall structure, I
>> am open to making adjustments.
>>
>
> https://www.postgresql.org/docs/current/plpgsql-development-tips.html and
> new section - deployment or packaging to extensions
>
> I agree so https://www.postgresql.org/docs/current/plpgsql-overview.html is
> under dimensioned, but packaging should not be there
>

It seems redundant if we add this for each PL, maybe a separate section to
describe how to package PL into extensions is better.

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Re: Teach predtest about IS [NOT] proofs

2023-12-22 Thread James Coleman

On Thu, Dec 14, 2023 at 4:38 PM Tom Lane  wrote:
>
> James Coleman  writes:
> > On Wed, Dec 13, 2023 at 1:36 PM Tom Lane  wrote:
> >> I don't have an objection in principle to adding more smarts to
> >> predtest.c.  However, we should be wary of slowing down cases where
> >> no BooleanTests are present to be optimized.  I wonder if it could
> >> help to use a switch on nodeTag rather than a series of if(IsA())
> >> tests.  (I'd be inclined to rewrite the inner if-then-else chains
> >> as switches too, really.  You get some benefit from the compiler
> >> noticing whether you've covered all the enum values.)
>
> > I think I could take this on; would you prefer it as a patch in this
> > series? Or as a new patch thread?
>
> No, keep it in the same thread (and make a CF entry, if you didn't
> already).  It might be best to make a series of 2 patches, first
> just refactoring what's there per this discussion, and then a
> second one to add BooleanTest logic.

CF entry is already created; I'll keep it here then.

> >> I note you've actively broken the function's ability to cope with
> >> NULL input pointers.  Maybe we don't need it to, but I'm not going
> >> to accept a patch that just side-swipes that case without any
> >> justification.
>
> > [ all callers have previously used predicate_classify ]
>
> OK, fair enough.  The checks for nulls are probably from ancient
> habit, but I agree we could remove 'em here.
>
> >> Perhaps, rather than hoping people will notice comments that are
> >> potentially offscreen from what they're modifying, we should relocate
> >> those comment paras to be adjacent to the relevant parts of the
> >> function?
>
> > Splitting up that block comment makes sense to me.
>
> Done, let's make it so.
>
> >> I've not gone through the patch in detail to see whether I believe
> >> the proposed proof rules.  It would help to have more comments
> >> justifying them.
>
> > Most of them are sufficiently simple -- e.g., X IS TRUE implies X --
> > that I don't think there's a lot to say in justification. In some
> > cases I've noted the cases that force only strong or weak implication.
>
> Yeah, it's the strong-vs-weak distinction that makes me cautious here.
> One's high-school-algebra instinct for what's obviously true tends to
> not think about NULL/UNKNOWN, and you do have to consider that.
>
> >>> As noted in a TODO in the patch itself, I think it may be worth 
> >>> refactoring
> >>> the test_predtest module to run the "x, y" case as well as the "y, x" case
>
> >> I think that's actively undesirable.  It is not typically the case that
> >> a proof rule for A => B also works in the other direction, so this would
> >> encourage wasting cycles in the tests.  I fear it might also cause
> >> confusion about which direction a proof rule is supposed to work in.
>
> > That makes sense in the general case.
>
> > Boolean expressions seem like a special case in that regard: (subject
> > to what it looks like) would you be OK with a wrapping function that
> > does both directions (with output that shows which direction is being
> > tested) used only for the cases where we do want to check both
> > directions?
>
> Using a wrapper where appropriate would remove the inefficiency
> concern, but I still worry whether it will promote confusion about
> which direction we're proving things in.  You'll need to be very clear
> about the labeling.

I've not yet applied all of your feedback, but I wanted to get an
initial read on your thoughts on how using switch statements ends up
looking. Attached is a single (pure refactor) patch that converts the
various if/else levels that check things like node tag and
boolean/null test type into switch statements. I've retained 'default'
keyword usages for now for simplicity (my intuition is that we
generally prefer to list out all options for compiler safety benefits,
though I'm not 100% sure that's useful in the outer node tag check
since it's unlikely that someone adding a new node would modify
this...).

My big question is: are you comfortable with the indentation explosion
this creates? IMO it's a lot wordier, but it is also more obvious what
the structural goal is. I'm not sure how we want to make the right
trade-off though.

Once there's agreement on this part, I'll add back a second patch
applying my changes on top of the refactor as well as apply other
feedback (e.g., splitting up the block comment).

Regards,
James Coleman


v2-0001-WIP-use-switch-statements.patch
Description: Binary data

Re: Avoid computing ORDER BY junk columns unnecessarily

2023-12-22 Thread Tom Lane

Heikki Linnakangas  writes:
> Hmm, so return the computed column from the index instead of recomputing 
> it? Yeah, that makes sense too and would help in this example.

Yeah, that's been on the to-do list for ages.  The main problems are
(1) we need the planner to not spend too much effort on looking for
subexpression matches, and (2) amcanreturn ability isn't implemented
by the executor in plain indexscans.  There's another thread right now
discussing fixing (2), after which we could perhaps work on this.

> It won't 
> help in all cases though, the index might not store the original value 
> in the first place.

I'm a little skeptical that an index could produce an accurate ORDER BY
result if it doesn't store the values-to-be-sorted exactly.  Any loss
of information would compromise its ability to sort nearly-identical
values correctly.  A more credible argument is that the index might
expose amcanorder ability but not amcanreturn; but what I'm saying is
that that's probably an AM implementation gap that ought to be fixed.

How much of your patchset still makes sense if we assume that we
can always extract the ORDER BY column values from the index?

regards, tom lane

Re: Transaction timeout

2023-12-22 Thread Junwang Zhao

On Fri, Dec 22, 2023 at 10:44 PM Japin Li  wrote:
>
>
> On Fri, 22 Dec 2023 at 22:37, Junwang Zhao  wrote:
> > On Fri, Dec 22, 2023 at 10:25 PM Japin Li  wrote:
> >> I try to set idle_in_transaction_session_timeout after begin transaction,
> >> it changes immediately, so I think transaction_timeout should also be take
> >> immediately.
> >
> > Ah, right, idle_in_transaction_session_timeout is set after the set
> > command finishes and before the backend send *ready for query*
> > to the client, so the value of the GUC is already set before
> > next command.
> >
>
> I mean, is it possible to set transaction_timeout before next comand?
>
Yeah, it's possible, set transaction_timeout in the when it first
goes into *idle in transaction* mode, see the attached files.

>
> --
> Regrads,
> Japin Li
> ChengDu WenWu Information Technology Co., Ltd.



-- 
Regards
Junwang Zhao


v15-0001-Introduce-transaction_timeout.patch
Description: Binary data


v15-0002-set-transaction_timeout-before-next-command.patch
Description: Binary data

Re: Avoid computing ORDER BY junk columns unnecessarily

2023-12-22 Thread Heikki Linnakangas


On 22/12/2023 17:24, Tom Lane wrote:

Heikki Linnakangas  writes:

It won't
help in all cases though, the index might not store the original value
in the first place.


I'm a little skeptical that an index could produce an accurate ORDER BY
result if it doesn't store the values-to-be-sorted exactly.  Any loss
of information would compromise its ability to sort nearly-identical
values correctly.


In the context of pgvector, its ordering is approximate anyway. Aside 
from that, there's one trick that it implements: it compares squares of 
distances, avoiding a sqrt() calculation. (I wonder if we could do the 
same in GiST opclasses)



A more credible argument is that the index might
expose amcanorder ability but not amcanreturn; but what I'm saying is
that that's probably an AM implementation gap that ought to be fixed.

How much of your patchset still makes sense if we assume that we
can always extract the ORDER BY column values from the index?


That would make it much less interesting. But I don't think that's a 
good assumption. Especially in the kNN case, the ORDER BY value would 
not be stored in the index. Most likely the index needs to calculate it 
in some form, but it might take shortcuts like avoiding the sqrt().


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation

2023-12-22 Thread Pavel Stehule

pá 22. 12. 2023 v 15:50 odesílatel Japin Li  napsal:

>
> On Thu, 21 Dec 2023 at 21:03, Pavel Stehule 
> wrote:
> > Hi
> >
> > čt 21. 12. 2023 v 13:37 odesílatel Ishaan Adarsh 
> > napsal:
> >
> >> The recent documentation patches are part of my GSoC 2023 project
> >> <
> https://wiki.postgresql.org/wiki/GSoC_2023#Postgres_extension_tutorial_.2F_quick_start
> >
> >> to develop a comprehensive PostgreSQL extension development tutorial, it
> >> assumes only a basic knowledge of Postgres and the target programming
> >> language.
> >>
> >> The entire project is available on GitHub: Postgres-extension-tutorial
> >> <
> https://github.com/IshaanAdarsh/Postgres-extension-tutorial/blob/main/SGML/intro_and_toc.md
> >.
> >> It covers many topics, including prerequisites, writing extensions,
> >> creating Makefiles, using procedural languages, incorporating external
> >> languages, writing regression tests, and managing extension releases.
> *The patch submitted
> >> for procedural languages, specifically PL/pgSQL and PL/Python, is part
> of
> >> the procedural language section within the broader tutorial. *
> >>
> >> Based on the feedback I think there is a real need
> >>  for this as this
> >> is a very important and growing part of the Postgres ecosystem.
> Currently,
> >> all the extension material is scattered and very limited. There are
> >> various third-party blog posts focusing on different areas, and
> sometimes
> >> contradictory. The main motivation behind making this is to make the
> barrier
> >> for entry less prohibitive for new contributors.
> >>
> >> I would greatly appreciate your input on how to add it to the existing
> >> documentation (this is where I have major doubts) and any suggestions on
> >> how to proceed. If there are areas where the existing documentation is
> >> already sufficient or if there are ways to improve the overall
> structure, I
> >> am open to making adjustments.
> >>
> >
> > https://www.postgresql.org/docs/current/plpgsql-development-tips.html
> and
> > new section - deployment or packaging to extensions
> >
> > I agree so https://www.postgresql.org/docs/current/plpgsql-overview.html
> is
> > under dimensioned, but packaging should not be there
> >
>
> It seems redundant if we add this for each PL, maybe a separate section to
> describe how to package PL into extensions is better.
>

I have not a strong opinion about it. My personal experience is so 99% PL
code is PLpgSQL, so it can be there, and other PL can be referenced there.
I am not sure if there is some common part for all PL.

Regards

Pavel

>
> --
> Regrads,
> Japin Li
> ChengDu WenWu Information Technology Co., Ltd.
>

Re: Optimization outcome depends on the index order

2023-12-22 Thread Andrei Lepikhov


On 22/12/2023 11:48, Alexander Korotkov wrote:

 > Because we must trust all predictions made by the planner, we just
 > choose the most trustworthy path. According to the planner logic, it is
 > a path with a smaller selectivity. We can make mistakes anyway just
 > because of the nature of estimation.

Even if we need to take selectivity into account here, it's still not 
clear why this should be on top of other logic later in add_path().
I got your point now, thanks for pointing it out. In the next version of 
the patch selectivity is used as a criteria only in the case of COSTS_EQUAL.


--
regards,
Andrei Lepikhov
Postgres Professional
From 45bda9784d28dc9cec90c5b33285023a49850800 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 27 Nov 2023 11:23:48 +0700
Subject: [PATCH] Choose an index path with the best selectivity estimation.

In the case when optimizer predicts only one row prefer choosing UNIQUE indexes
In other cases, if optimizer treats indexes as equal, make a last attempt
selecting the index with less selectivity - this decision takes away dependency
on the order of indexes in an index list (good for reproduction of some issues)
and proposes one more objective argument to choose specific index.
---
 src/backend/optimizer/util/pathnode.c | 32 +++
 src/test/regress/expected/functional_deps.out | 39 +++
 src/test/regress/sql/functional_deps.sql  | 32 +++
 3 files changed, 103 insertions(+)

diff --git a/src/backend/optimizer/util/pathnode.c 
b/src/backend/optimizer/util/pathnode.c
index 0b1d17b9d3..984c974b57 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -454,6 +454,38 @@ add_path(RelOptInfo *parent_rel, Path *new_path)
costcmp = compare_path_costs_fuzzily(new_path, old_path,

 STD_FUZZ_FACTOR);
 
+   /*
+* Apply some heuristics on index paths.
+*/
+   if (costcmp == COSTS_EQUAL)
+   {
+   IndexPath *inp = (IndexPath *) new_path;
+   IndexPath *iop = (IndexPath *) old_path;
+
+   if  (IsA(new_path, IndexPath) && IsA(old_path, 
IndexPath))
+   {
+   /*
+* When both paths are predicted to produce 
only one tuple,
+* the optimiser should prefer choosing a 
unique index scan
+* in all cases.
+*/
+   if (inp->indexinfo->unique && 
!iop->indexinfo->unique)
+   costcmp = COSTS_BETTER1;
+   else if (!inp->indexinfo->unique && 
iop->indexinfo->unique)
+   costcmp = COSTS_BETTER2;
+   else if (costcmp != COSTS_DIFFERENT)
+   /*
+* If the optimiser doesn't have an 
obviously stable choice
+* of unique index, increase the chance 
of avoiding mistakes
+* by choosing an index with smaller 
selectivity.
+* This option makes decision more 
conservative and looks
+* debatable.
+*/
+   costcmp = (inp->indexselectivity < 
iop->indexselectivity) ?
+   
COSTS_BETTER1 : COSTS_BETTER2;
+   }
+   }
+
/*
 * If the two paths compare differently for startup and total 
cost,
 * then we want to keep both, and we can skip comparing 
pathkeys and
diff --git a/src/test/regress/expected/functional_deps.out 
b/src/test/regress/expected/functional_deps.out
index 32381b8ae7..7057254278 100644
--- a/src/test/regress/expected/functional_deps.out
+++ b/src/test/regress/expected/functional_deps.out
@@ -230,3 +230,42 @@ EXECUTE foo;
 ALTER TABLE articles DROP CONSTRAINT articles_pkey RESTRICT;
 EXECUTE foo;  -- fail
 ERROR:  column "articles.keywords" must appear in the GROUP BY clause or be 
used in an aggregate function
+/*
+ * Corner case of the PostgreSQL optimizer:
+ *
+ * ANDed clauses selectivity multiplication increases total selectivity error.
+ * If such non-true selectivity is so tiny that row estimation predicts the
+ * absolute minimum number of tuples (1), the optimizer can't choose between
+ * different indexes and picks a first from the index list (last created).
+ */
+CREATE TABLE t AS (  -- selectivity(c1)*selectivity(c2)*nrows <= 1
+SELECT  gs AS c1,
+gs AS

Re: Avoid computing ORDER BY junk columns unnecessarily

2023-12-22 Thread Tom Lane

Heikki Linnakangas  writes:
> On 22/12/2023 17:24, Tom Lane wrote:
>> How much of your patchset still makes sense if we assume that we
>> can always extract the ORDER BY column values from the index?

> That would make it much less interesting. But I don't think that's a 
> good assumption. Especially in the kNN case, the ORDER BY value would 
> not be stored in the index. Most likely the index needs to calculate it 
> in some form, but it might take shortcuts like avoiding the sqrt().

Yeah, fair point.  I'll try to take a look at your patchset after
the holidays.

regards, tom lane

Re: ci: Build standalone INSTALL file

2023-12-22 Thread Tom Lane

Michael Paquier  writes:
> On Thu, Dec 21, 2023 at 02:22:02PM -0500, Tom Lane wrote:
>> Here's a draft patch for this.  Most of it is mechanical removal of
>> infrastructure for building the INSTALL file.  If anyone wants to
>> bikeshed on the new wording of README, feel free.

> Thanks for putting this together.  That looks reasonable.

Thanks for checking it.  Pushed --- we can tweak things later
if we decide the web-redirect idea is superior to this.

regards, tom lane

date_trunc function in interval version

2023-12-22 Thread Przemysław Sztoch


Hello.
There is date_trunc(interval, timestamptz, timezone) function.
First parameter can be '5 year', '2 month', '6 hour', '3 hour', '15 
minute', '10 second' etc.

--
Przemysław Sztoch | Mobile +48 509 99 00 66
diff --git a/src/backend/utils/adt/timestamp.c 
b/src/backend/utils/adt/timestamp.c
index eebc59172b..90fd253d0c 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -4940,6 +4940,177 @@ timestamptz_trunc_zone(PG_FUNCTION_ARGS)
PG_RETURN_TIMESTAMPTZ(result);
 }
 
+/*
+ * Common code for timestamptz_trunc_int() and timestamptz_trunc_int_zone().
+ *
+ * tzp identifies the zone to truncate with respect to.  We assume
+ * infinite timestamps have already been rejected.
+ */
+static TimestampTz
+timestamptz_trunc_int_internal(Interval *interval, TimestampTz timestamp, 
pg_tz *tzp)
+{
+   TimestampTz result;
+   int tz;
+   int interval_parts = 0;
+   boolbad_interval = false;
+   boolredotz = false;
+   fsec_t  fsec;
+   struct pg_tm tt,
+  *tm = &tt;
+
+   if (interval->month != 0)
+   {
+   interval_parts++;
+   /* 1200 = hundred years */
+   if ((1200/interval->month) * interval->month != 1200)
+   bad_interval = true;
+   }
+   if (interval->day != 0)
+   {
+   interval_parts++;
+   if (interval->day != 1 && interval->day != 7)
+   bad_interval = true;
+   }
+   if (interval->time != 0)
+   {
+   interval_parts++;
+   if (interval->time > USECS_PER_SEC)
+   {
+   if ((interval->time % USECS_PER_SEC) != 0)
+   bad_interval = true;
+   if ((USECS_PER_DAY/interval->time) * interval->time != 
USECS_PER_DAY)
+   bad_interval = true;
+   }
+   else if (interval->time < USECS_PER_SEC && 
(USECS_PER_SEC/interval->time) * interval->time != USECS_PER_SEC)
+   bad_interval = true;
+   }
+   if (interval_parts != 1 || bad_interval)
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("interval has to be a divisor of a day, 
week or century.")));
+   return 0;
+   }
+
+   if (timestamp2tm(timestamp, &tz, tm, &fsec, NULL, tzp) != 0)
+   ereport(ERROR,
+   (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+errmsg("timestamp out of range")));
+
+   if (interval->month != 0)
+   {
+   int months;
+   months = (tm->tm_year - 1) * 12 + tm->tm_mon - 1;
+   months -= months % interval->month;
+   tm->tm_year = (months / 12) + 1;
+   tm->tm_mon = (months % 12) + 1;
+   tm->tm_mday = 1;
+   tm->tm_hour = 0;
+   tm->tm_min = 0;
+   tm->tm_sec = 0;
+   fsec = 0;
+   redotz = true;
+   }
+   else if (interval->day == 7)
+   {
+   int woy;
+   woy = date2isoweek(tm->tm_year, tm->tm_mon, tm->tm_mday);
+
+   /*
+* If it is week 52/53 and the month is January, then the
+* week must belong to the previous year. Also, some
+* December dates belong to the next year.
+*/
+   if (woy >= 52 && tm->tm_mon == 1)
+   --tm->tm_year;
+   if (woy <= 1 && tm->tm_mon == MONTHS_PER_YEAR)
+   ++tm->tm_year;
+   isoweek2date(woy, &(tm->tm_year), &(tm->tm_mon), 
&(tm->tm_mday));
+   tm->tm_hour = 0;
+   tm->tm_min = 0;
+   tm->tm_sec = 0;
+   fsec = 0;
+   redotz = true;
+   }
+   else if (interval->day == 1)
+   {
+   tm->tm_hour = 0;
+   tm->tm_min = 0;
+   tm->tm_sec = 0;
+   fsec = 0;
+   redotz = true;  /* for all cases > HOUR */
+   }
+   else if (interval->time > USECS_PER_SEC)
+   {
+   int seconds;
+   seconds = tm->tm_hour * 3600 + tm->tm_min * 60 + tm->tm_sec;
+   seconds -= seconds % (interval->time / USECS_PER_SEC);
+   tm->tm_hour = seconds / 3600;
+   tm->tm_min = (seconds / 60) % 60;
+   tm->tm_sec = seconds % 60;
+   fsec = 0;
+   redotz = (interval->time > USECS_PER_HOUR);
+   }
+   else if (interval->time == USECS_PER_SEC)
+   fsec = 0;
+   else if (interval->time > 0)
+   fsec -= fsec % interval->time;
+
+

Re: date_trunc function in interval version

2023-12-22 Thread Pavel Stehule

Hi

pá 22. 12. 2023 v 20:26 odesílatel Przemysław Sztoch 
napsal:

> Hello.
> There is date_trunc(interval, timestamptz, timezone) function.
> First parameter can be '5 year', '2 month', '6 hour', '3 hour', '15
> minute', '10 second' etc.
>

should not be named interval_trunc instead? In this case the good name can
be hard to choose, but with the name date_trunc it can be hard to find it.

Regards

Pavel


> --
> Przemysław Sztoch | Mobile +48 509 99 00 66
>

Re: Teach predtest about IS [NOT] proofs

2023-12-22 Thread Tom Lane

James Coleman  writes:
> I've not yet applied all of your feedback, but I wanted to get an
> initial read on your thoughts on how using switch statements ends up
> looking. Attached is a single (pure refactor) patch that converts the
> various if/else levels that check things like node tag and
> boolean/null test type into switch statements. I've retained 'default'
> keyword usages for now for simplicity (my intuition is that we
> generally prefer to list out all options for compiler safety benefits,
> though I'm not 100% sure that's useful in the outer node tag check
> since it's unlikely that someone adding a new node would modify
> this...).

> My big question is: are you comfortable with the indentation explosion
> this creates? IMO it's a lot wordier, but it is also more obvious what
> the structural goal is. I'm not sure how we want to make the right
> trade-off though.

Yeah, I see what you mean.  Also, I'd wanted to shove most of
the text in the function header in-line and get rid of the short
restatements of those paras.  I carried that through just for
predicate_implied_by_simple_clause, as attached.  The structure is
definitely clearer, but we end up with an awful lot of indentation,
which makes the comments less readable than I'd like.  (I did some
minor rewording to make them flow better.)

On balance I think this is probably better than what we have, but
maybe we'd be best off to avoid doubly nested switches?  I think
there's a good argument for the outer switch on nodeTag, but
maybe we're getting diminishing returns from an inner switch.

regards, tom lane

diff --git a/src/backend/optimizer/util/predtest.c b/src/backend/optimizer/util/predtest.c
index fe83e45311..a9accbed53 100644
--- a/src/backend/optimizer/util/predtest.c
+++ b/src/backend/optimizer/util/predtest.c
@@ -1087,38 +1087,12 @@ arrayexpr_cleanup_fn(PredIterInfo info)
 }
 
 
-/*--
+/*
  * predicate_implied_by_simple_clause
  *	  Does the predicate implication test for a "simple clause" predicate
  *	  and a "simple clause" restriction.
  *
  * We return true if able to prove the implication, false if not.
- *
- * We have several strategies for determining whether one simple clause
- * implies another:
- *
- * A simple and general way is to see if they are equal(); this works for any
- * kind of expression, and for either implication definition.  (Actually,
- * there is an implied assumption that the functions in the expression are
- * immutable --- but this was checked for the predicate by the caller.)
- *
- * Another way that always works is that for boolean x, "x = TRUE" is
- * equivalent to "x", likewise "x = FALSE" is equivalent to "NOT x".
- * These can be worth checking because, while we preferentially simplify
- * boolean comparisons down to "x" and "NOT x", the other form has to be
- * dealt with anyway in the context of index conditions.
- *
- * If the predicate is of the form "foo IS NOT NULL", and we are considering
- * strong implication, we can conclude that the predicate is implied if the
- * clause is strict for "foo", i.e., it must yield false or NULL when "foo"
- * is NULL.  In that case truth of the clause ensures that "foo" isn't NULL.
- * (Again, this is a safe conclusion because "foo" must be immutable.)
- * This doesn't work for weak implication, though.
- *
- * Finally, if both clauses are binary operator expressions, we may be able
- * to prove something using the system's knowledge about operators; those
- * proof rules are encapsulated in operator_predicate_proof().
- *--
  */
 static bool
 predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
@@ -1127,65 +1101,115 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
 	/* Allow interrupting long proof attempts */
 	CHECK_FOR_INTERRUPTS();
 
-	/* First try the equal() test */
+	/*
+	 * A simple and general rule is that a clause implies itself, hence we
+	 * check if they are equal(); this works for any kind of expression, and
+	 * for either implication definition.  (Actually, there is an implied
+	 * assumption that the functions in the expression are immutable --- but
+	 * this was checked for the predicate by the caller.)
+	 */
 	if (equal((Node *) predicate, clause))
 		return true;
 
-	/* Next see if clause is boolean equality to a constant */
-	if (is_opclause(clause) &&
-		((OpExpr *) clause)->opno == BooleanEqualOperator)
+	/* Our remaining strategies are all clause-type-specific */
+	switch (nodeTag(clause))
 	{
-		OpExpr	   *op = (OpExpr *) clause;
-		Node	   *rightop;
-
-		Assert(list_length(op->args) == 2);
-		rightop = lsecond(op->args);
-		/* We might never see a null Const here, but better check anyway */
-		if (rightop && IsA(rightop, Const) &&
-			!((Const *) rightop)->constisnull)
-		{
-			Node	   *leftop = linitial(op->args);
-
-			if (DatumGetBool(((Const *) rightop)->constvalue))
+		case T_OpExpr:
 			{
-/* X = true implies X */
-if (equal(predicate, leftop))

Re: pg_upgrade --copy-file-range

2023-12-22 Thread Peter Eisentraut


On 13.11.23 08:15, Peter Eisentraut wrote:

On 08.10.23 07:15, Thomas Munro wrote:

About your patch:

I think you should have a "check" function called from
check_new_cluster().  That check function can then also handle the "not
supported" case, and you don't need to handle that in
parseCommandLine().  I suggest following the clone example for these,
since the issues there are very similar.


Done.


This version looks good to me.

Tiny nit:  You copy-and-pasted "%s/PG_VERSION.clonetest"; maybe choose a 
different suffix.


Thomas, are you planning to proceed with this patch?

Re: pg_upgrade --copy-file-range

2023-12-22 Thread Thomas Munro

On Sat, Dec 23, 2023 at 9:40 AM Peter Eisentraut  wrote:
> On 13.11.23 08:15, Peter Eisentraut wrote:
> > On 08.10.23 07:15, Thomas Munro wrote:
> >>> About your patch:
> >>>
> >>> I think you should have a "check" function called from
> >>> check_new_cluster().  That check function can then also handle the "not
> >>> supported" case, and you don't need to handle that in
> >>> parseCommandLine().  I suggest following the clone example for these,
> >>> since the issues there are very similar.
> >>
> >> Done.
> >
> > This version looks good to me.
> >
> > Tiny nit:  You copy-and-pasted "%s/PG_VERSION.clonetest"; maybe choose a
> > different suffix.
>
> Thomas, are you planning to proceed with this patch?

Yes.  Sorry for being slow... got stuck working on an imminent new
version of streaming read.  I will be defrosting my commit bit and
committing this one and a few things shortly.

As it happens I was just thinking about this particular patch because
I suddenly had a strong urge to teach pg_combinebackup to use
copy_file_range.  I wonder if you had the same idea...

Re: Set all variable-length fields of pg_attribute to null on column drop

2023-12-22 Thread Peter Eisentraut


On 22.12.23 10:05, Alvaro Herrera wrote:

On 2023-Nov-30, Peter Eisentraut wrote:


I noticed that when a column is dropped, RemoveAttributeById() clears out
certain fields in pg_attribute, but it leaves the variable-length fields at
the end (attacl, attoptions, and attfdwoptions) unchanged. This is probably
harmless, but it seems wasteful and unclean, and leaves potentially dangling
data lying around (for example, attacl could contain references to users
that are later also dropped).


Yeah, this looks like an ancient oversight -- when DROP COLUMN was added
we didn't have any varlena fields in this catalog, and when the first
one was added (attacl in commit 3cb5d6580a33) resetting it on DROP
COLUMN was overlooked.

LGTM.


committed

Re: [PATCHES] Post-special page storage TDE support

2023-12-22 Thread David Christensen

Hi again!

Per some offline discussion with Stephen, I've continued to work on some
modifications here; this particular patchset is intended to facilitate
review by highlighting the mechanical nature of many of these changes.  As
such, I have taken the following approach to this rework:

0001 - Create PageUsableSpace to represent space post-smgr
0002 - Add support for fast, non-division-based div/mod algorithms
0003 - Use fastdiv code in visibility map
0004 - Make PageUsableSpace incorporate variable-sized limit
0005 - Add Calc, Limit and Dynamic forms of all variable constants
0006 - Split MaxHeapTuplesPerPage into Limit and Dynamic variants
0007 - Split MaxIndexTuplesPerPage into Limit and Dynamic variants
0008 - Split MaxHeapTupleSize into Limit and Dynamic variants
0009 - Split MaxTIDsPerBTreePage into Limit and Dynamic variant

0001 - 0003 have appeared in this thread or in other forms on the list
already, though 0001 refactors things slightly more aggressively, but makes
StaticAssert() to ensure that this change is still sane.

0004 adds the ReservedPageSpace variable, and also redefines the previous
BLCKSZ - SizeOfPageHeaderDate as PageUsableSpaceMax; there are a few
related fixups.

0005 adds the macros to compute the former constants while leaving their
original definitions to evaluate to the same place (the infamous Calc* and
*Limit, plus we invite *Dynamic to the party as well; the names are
terrible and there must be something better)

0006 - 0009 are all the same approach; we undefine the old constant name
and modify the existing uses of this symbol to be either the *Limit or
*Dynamic, depending on if the changed available space would impact the
calculations.  Since we are touching every use of this symbol, this
facilitates review of the impact, though I would contend that almost every
piece I've spot-checked seems like it really does need to know about the
runtime limit.  Perhaps there is more we could do here.  I could also see a
variable per constant rather than recalculating this every time, in which
case the *Dynamic would just be the variable and we'd need a hook to
initialize this or otherwise set on first use.

There are a number of additional things remaining to be done to get this to
fully work, but I did want to get some of this out there for review.

Still to do (almost all in some form in original patch, so just need to
extract the relevant pieces):
- set reserved-page-size via initdb
- load reserved-page-size from pg_control
- apply to the running cluster
- some form of compatibility for these constants in common and ensuring
bin/ works
- some toast-related changes (this requires a patch to support dynamic
relopts, which I can extract, as the existing code is using a constant
lookup table)
- probably some more pieces I'm forgetting

Thanks,
David


v2-0005-Add-Calc-Limit-and-Dynamic-forms-of-all-variable-.patch
Description: Binary data


v2-0001-Create-PageUsableSpace-to-represent-space-post-sm.patch
Description: Binary data


v2-0002-Add-support-for-fast-non-division-based-div-mod-a.patch
Description: Binary data


v2-0003-Use-fastdiv-code-in-visibility-map.patch
Description: Binary data


v2-0004-Make-PageUsableSpace-incorporate-variable-sized-l.patch
Description: Binary data


v2-0007-Split-MaxIndexTuplesPerPage-into-Limit-and-Dynami.patch
Description: Binary data


v2-0008-Split-MaxHeapTupleSize-into-Limit-and-Dynamic-var.patch
Description: Binary data


v2-0006-Split-MaxHeapTuplesPerPage-into-Limit-and-Dynamic.patch
Description: Binary data


v2-0009-Split-MaxTIDsPerBTreePage-into-Limit-and-Dynamic-.patch
Description: Binary data

Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.

2023-12-22 Thread Alexander Korotkov

Hi, Anton!

On Mon, Dec 4, 2023 at 3:50 AM Anton A. Melnikov 
wrote:

> Thanks for remarks!
>
> On 28.11.2023 21:34, Alexander Korotkov wrote:
> > After examining the second patch
> > ("v2-0001-Add-restartpoint-stats.patch"), it appears that adding
> > additional statistics as outlined in the patch is the most suitable
> > approach to address the concerns raised. This solution provides more
> > visibility into the system's behavior without altering its core
> > mechanics.
>
> Agreed. I left only this variant of the patch and rework it due to commit
> 96f05261.
> So the new counters is in the pg_stat_checkpointer view now.
> Please see the v3-0001-add-restartpoints-stats.patch attached.
>
>
> > However, it's essential that this additional functionality
> > is accompanied by comprehensive documentation to ensure clear
> > understanding and ease of use by the PostgreSQL community.
> >
> > Please consider expanding the documentation to include detailed
> > explanations of the new statistics and their implications in various
> > scenarios.
>
> In the separate v3-0002-doc-for-restartpoints-stats.patch i added the
> definitions
> of the new counters into the "28.2.15. pg_stat_checkpointer" section
> and explanation of them with examples into the "30.5.WAL Configuration"
> one.
>
> Would be glad for any comments and and concerns.
>

I made some grammar corrections to the docs and have written the commit
message.

I think this patch now looks good.  I'm going to push this if there are no
objections.

--
Regards,
Alexander Korotkov


0001-Enhance-checkpointer-restartpoint-statistics-v4.patch
Description: Binary data

Re: authentication/t/001_password.pl trashes ~/.psql_history

2023-12-22 Thread Tom Lane

I wrote:
> I happened to notice this stuff getting added to my .psql_history:
> \echo background_psql: ready
> SET password_encryption='scram-sha-256';
> ;
> \echo background_psql: QUERY_SEPARATOR
> SET scram_iterations=42;
> ;
> \echo background_psql: QUERY_SEPARATOR
> \password scram_role_iter
> \q

> After grepping for these strings, this is evidently the fault of
> src/test/authentication/t/001_password.pl by way of BackgroundPsql.pm,
> which fires up an interactive psql run that is not given the -n switch.

> Currently the only other user of interactive_psql() seems to be
> psql/t/010_tab_completion.pl, which avoids this problem by
> explicitly redirecting the history file.  We could have 001_password.pl
> do likewise, or we could have it pass the -n switch, but I think we're
> going to have this problem resurface repeatedly if we leave it to the
> outer test script to remember to do it.


After studying this some more, my conclusion is that BackgroundPsql.pm
failed to borrow as much as it should have from 010_tab_completion.pl.
Specifically, we want all the environment-variable changes that that
script performed to be applied in any test using an interactive psql.
Maybe ~/.inputrc and so forth would never affect any other test scripts,
but that doesn't seem like a great bet.

So that leads me to the attached proposed patch.

regards, tom lane

diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index 4cd0fa4680..f2d2809aef 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -46,25 +46,6 @@ $node->safe_psql('postgres',
 	  . "CREATE TYPE enum1 AS ENUM ('foo', 'bar', 'baz', 'BLACK');\n"
 	  . "CREATE PUBLICATION some_publication;\n");
 
-# Developers would not appreciate this test adding a bunch of junk to
-# their ~/.psql_history, so be sure to redirect history into a temp file.
-# We might as well put it in the test log directory, so that buildfarm runs
-# capture the result for possible debugging purposes.
-my $historyfile = "${PostgreSQL::Test::Utils::log_path}/010_psql_history.txt";
-$ENV{PSQL_HISTORY} = $historyfile;
-
-# Another pitfall for developers is that they might have a ~/.inputrc
-# file that changes readline's behavior enough to affect this test.
-# So ignore any such file.
-$ENV{INPUTRC} = '/dev/null';
-
-# Unset $TERM so that readline/libedit won't use any terminal-dependent
-# escape sequences; that leads to way too many cross-version variations
-# in the output.
-delete $ENV{TERM};
-# Some versions of readline inspect LS_COLORS, so for luck unset that too.
-delete $ENV{LS_COLORS};
-
 # In a VPATH build, we'll be started in the source directory, but we want
 # to run in the build directory so that we can use relative paths to
 # access the tab_comp_dir subdirectory; otherwise the output from filename
@@ -91,8 +72,13 @@ open $FH, ">", "tab_comp_dir/afile456"
 print $FH "other stuff\n";
 close $FH;
 
+# Arrange to capture, not discard, the interactive session's history output.
+# Put it in the test log directory, so that buildfarm runs capture the result
+# for possible debugging purposes.
+my $historyfile = "${PostgreSQL::Test::Utils::log_path}/010_psql_history.txt";
+
 # fire up an interactive psql session
-my $h = $node->interactive_psql('postgres');
+my $h = $node->interactive_psql('postgres', history_file => $historyfile);
 
 # Simple test case: type something and see if psql responds as expected
 sub check_completion
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index f43e54282f..4d207730c9 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2128,14 +2128,17 @@ Errors occurring later are the caller's problem.
 
 Be sure to "quit" the returned object when done with it.
 
-The only extra parameter currently accepted is
-
 =over
 
 =item extra_params => ['--single-transaction']
 
 If given, it must be an array reference containing additional parameters to B.
 
+=item history_file => B
+
+Cause the interactive B session to write its command history to B.
+If not given, the history is sent to /dev/null.
+
 =back
 
 This requires IO::Pty in addition to IPC::Run.
@@ -2148,6 +2151,29 @@ sub interactive_psql
 
 	local %ENV = $self->_get_env();
 
+	# Since the invoked psql will believe it's interactive, it will use
+	# readline/libedit if available.  We need to adjust some environment
+	# settings to prevent unwanted side-effects.
+
+	# Developers would not appreciate tests adding a bunch of junk to
+	# their ~/.psql_history, so redirect readline history somewhere else.
+	# If the calling script doesn't specify anything, just bit-bucket it.
+	my $history_file = $params{history_file};
+	$history_file ||= '/dev/null';
+	$ENV{PSQL_HISTORY} = $history_file;
+
+	# Another pitfall for developers is that they might have a ~/.inputrc
+	# file that changes readline's behavior enough to affect

Re: date_trunc function in interval version

2023-12-22 Thread Przemysław Sztoch


In my opinion date_trunc is very good name.
Truncated data is timestamp type, not interval.
First parameter has same meaning in original date_trunc and in my new 
version.

New version provides only more granularity.

Pavel Stehule wrote on 12/22/2023 8:43 PM:

Hi

pá 22. 12. 2023 v 20:26 odesílatel Przemysław Sztoch 
mailto:przemys...@sztoch.pl>> napsal:


Hello.
There is date_trunc(interval, timestamptz, timezone) function.
First parameter can be '5 year', '2 month', '6 hour', '3 hour',
'15 minute', '10 second' etc.


should not be named interval_trunc instead? In this case the good name 
can be hard to choose, but with the name date_trunc it can be hard to 
find it.


Regards

Pavel

-- 
Przemysław Sztoch | Mobile +48 509 99 00 66




--
Przemysław Sztoch | Mobile +48 509 99 00 66

Fixing pgbench init overflow

2023-12-22 Thread Chen Hao Hsu

Hello,

pgbench mixes int and int64 to initialize the tables.
When a large enough scale factor is passed, initPopulateTable
overflows leading to it never completing, ie.

214740 of 22 tuples (97%) of
pgbench_accounts done (elapsed 4038.83 s, remaining 98.93 s)
-214740 of 22 tuples (-97%) of
pgbench_accounts done (elapsed 4038.97 s, remaining -8176.86 s)


Attached is a patch that fixes this, pgbench -i -s 22000 works now.

-- 
John Hsu - Amazon Web Services


0001-Fix-pgbench-init-overflow-when-total-number-of-rows.patch
Description: Binary data

Re: Transaction timeout

2023-12-22 Thread Japin Li



On Fri, 22 Dec 2023 at 23:30, Junwang Zhao  wrote:
> On Fri, Dec 22, 2023 at 10:44 PM Japin Li  wrote:
>>
>>
>> On Fri, 22 Dec 2023 at 22:37, Junwang Zhao  wrote:
>> > On Fri, Dec 22, 2023 at 10:25 PM Japin Li  wrote:
>> >> I try to set idle_in_transaction_session_timeout after begin transaction,
>> >> it changes immediately, so I think transaction_timeout should also be take
>> >> immediately.
>> >
>> > Ah, right, idle_in_transaction_session_timeout is set after the set
>> > command finishes and before the backend send *ready for query*
>> > to the client, so the value of the GUC is already set before
>> > next command.
>> >
>>
>> I mean, is it possible to set transaction_timeout before next comand?
>>
> Yeah, it's possible, set transaction_timeout in the when it first
> goes into *idle in transaction* mode, see the attached files.
>

Thanks for updating the patch, LGTM.

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Re: date_trunc function in interval version

2023-12-22 Thread John Naylor

On Sat, Dec 23, 2023 at 5:26 AM Przemysław Sztoch  wrote:
>
> In my opinion date_trunc is very good name.
> Truncated data is timestamp type, not interval.
> First parameter has same meaning in original date_trunc and in my new version.
> New version provides only more granularity.

I haven't looked at the patch, but your description sounds awfully
close to date_bin(), which already takes an arbitrary interval.

Re: A typo in a messsage?

2023-12-22 Thread John Naylor

On Fri, Dec 22, 2023 at 1:50 PM Kyotaro Horiguchi
 wrote:
>
> I found the following message introduced by a recent commit.
>
> > errdetail("The first unsummarized LSN is this range is %X/%X.",
>
> Shouldn't the "is" following "LSN" be "in"?

I think you're right, will push.

Re: Transaction timeout

2023-12-22 Thread Japin Li



On Sat, 23 Dec 2023 at 08:32, Japin Li  wrote:
> On Fri, 22 Dec 2023 at 23:30, Junwang Zhao  wrote:
>> On Fri, Dec 22, 2023 at 10:44 PM Japin Li  wrote:
>>>
>>>
>>> On Fri, 22 Dec 2023 at 22:37, Junwang Zhao  wrote:
>>> > On Fri, Dec 22, 2023 at 10:25 PM Japin Li  wrote:
>>> >> I try to set idle_in_transaction_session_timeout after begin transaction,
>>> >> it changes immediately, so I think transaction_timeout should also be 
>>> >> take
>>> >> immediately.
>>> >
>>> > Ah, right, idle_in_transaction_session_timeout is set after the set
>>> > command finishes and before the backend send *ready for query*
>>> > to the client, so the value of the GUC is already set before
>>> > next command.
>>> >
>>>
>>> I mean, is it possible to set transaction_timeout before next comand?
>>>
>> Yeah, it's possible, set transaction_timeout in the when it first
>> goes into *idle in transaction* mode, see the attached files.
>>
>
> Thanks for updating the patch, LGTM.

Sorry for the noise!

Read the previous threads, I find why the author enable transaction_timeout
in start_xact_command().

The v15 patch cannot handle COMMIT AND CHAIN, see [1]. For example:

SET transaction_timeout TO '2s'; BEGIN; SELECT 1, pg_sleep(1); COMMIT AND 
CHAIN; SELECT 2, pg_sleep(1); COMMIT;

The transaction_timeout do not reset when executing COMMIT AND CHAIN.

[1] 
https://www.postgresql.org/message-id/a906dea1-76a1-4f26-76c5-a7efad3ef5b8%40oss.nttdata.com

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Re: Transaction timeout

2023-12-22 Thread Junwang Zhao

On Sat, Dec 23, 2023 at 10:40 AM Japin Li  wrote:
>
>
> On Sat, 23 Dec 2023 at 08:32, Japin Li  wrote:
> > On Fri, 22 Dec 2023 at 23:30, Junwang Zhao  wrote:
> >> On Fri, Dec 22, 2023 at 10:44 PM Japin Li  wrote:
> >>>
> >>>
> >>> On Fri, 22 Dec 2023 at 22:37, Junwang Zhao  wrote:
> >>> > On Fri, Dec 22, 2023 at 10:25 PM Japin Li  wrote:
> >>> >> I try to set idle_in_transaction_session_timeout after begin 
> >>> >> transaction,
> >>> >> it changes immediately, so I think transaction_timeout should also be 
> >>> >> take
> >>> >> immediately.
> >>> >
> >>> > Ah, right, idle_in_transaction_session_timeout is set after the set
> >>> > command finishes and before the backend send *ready for query*
> >>> > to the client, so the value of the GUC is already set before
> >>> > next command.
> >>> >
> >>>
> >>> I mean, is it possible to set transaction_timeout before next comand?
> >>>
> >> Yeah, it's possible, set transaction_timeout in the when it first
> >> goes into *idle in transaction* mode, see the attached files.
> >>
> >
> > Thanks for updating the patch, LGTM.
>
> Sorry for the noise!
>
> Read the previous threads, I find why the author enable transaction_timeout
> in start_xact_command().
>
> The v15 patch cannot handle COMMIT AND CHAIN, see [1]. For example:

I didn't read the previous threads, sorry for that, let's stick to v14.

>
> SET transaction_timeout TO '2s'; BEGIN; SELECT 1, pg_sleep(1); COMMIT AND 
> CHAIN; SELECT 2, pg_sleep(1); COMMIT;
>
> The transaction_timeout do not reset when executing COMMIT AND CHAIN.
>
> [1] 
> https://www.postgresql.org/message-id/a906dea1-76a1-4f26-76c5-a7efad3ef5b8%40oss.nttdata.com
>
> --
> Regrads,
> Japin Li
> ChengDu WenWu Information Technology Co., Ltd.



-- 
Regards
Junwang Zhao

Re: Transaction timeout

2023-12-22 Thread Japin Li

a
On Sat, 23 Dec 2023 at 10:40, Japin Li  wrote:
> On Sat, 23 Dec 2023 at 08:32, Japin Li  wrote:
>> On Fri, 22 Dec 2023 at 23:30, Junwang Zhao  wrote:
>>> On Fri, Dec 22, 2023 at 10:44 PM Japin Li  wrote:


 On Fri, 22 Dec 2023 at 22:37, Junwang Zhao  wrote:
 > On Fri, Dec 22, 2023 at 10:25 PM Japin Li  wrote:
 >> I try to set idle_in_transaction_session_timeout after begin 
 >> transaction,
 >> it changes immediately, so I think transaction_timeout should also be 
 >> take
 >> immediately.
 >
 > Ah, right, idle_in_transaction_session_timeout is set after the set
 > command finishes and before the backend send *ready for query*
 > to the client, so the value of the GUC is already set before
 > next command.
 >

 I mean, is it possible to set transaction_timeout before next comand?

>>> Yeah, it's possible, set transaction_timeout in the when it first
>>> goes into *idle in transaction* mode, see the attached files.
>>>
>>
>> Thanks for updating the patch, LGTM.
>
> Sorry for the noise!
>
> Read the previous threads, I find why the author enable transaction_timeout
> in start_xact_command().
>
> The v15 patch cannot handle COMMIT AND CHAIN, see [1]. For example:
>
> SET transaction_timeout TO '2s'; BEGIN; SELECT 1, pg_sleep(1); COMMIT AND 
> CHAIN; SELECT 2, pg_sleep(1); COMMIT;
>
> The transaction_timeout do not reset when executing COMMIT AND CHAIN.
>
> [1] 
> https://www.postgresql.org/message-id/a906dea1-76a1-4f26-76c5-a7efad3ef5b8%40oss.nttdata.com

Attach v16 to solve this. Any suggestions?

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

>From a9d79c5a013da8fa707556f87a34ff3ade779729 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" 
Date: Sun, 3 Dec 2023 23:18:00 +0500
Subject: [PATCH v16 1/2] Introduce transaction_timeout

This commit adds timeout that is expected to be used as a prevention
of long-running queries. Any session within transaction will be
terminated after spanning longer than this timeout.

However, this timeout is not applied to prepared transactions.
Only transactions with user connections are affected.

Author: Andrey Borodin 
Reviewed-by: Nikolay Samokhvalov 
Reviewed-by: Andres Freund 
Reviewed-by: Fujii Masao 
Reviewed-by: bt23nguyent 
Reviewed-by: Yuhang Qiu 
Reviewed-by: Japin Li 

Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
---
 doc/src/sgml/config.sgml  | 35 +++
 src/backend/postmaster/autovacuum.c   |  2 +
 src/backend/storage/lmgr/proc.c   |  1 +
 src/backend/tcop/postgres.c   | 27 +++-
 src/backend/utils/errcodes.txt|  1 +
 src/backend/utils/init/globals.c  |  1 +
 src/backend/utils/init/postinit.c | 10 +++
 src/backend/utils/misc/guc_tables.c   | 11 
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/bin/pg_dump/pg_backup_archiver.c  |  2 +
 src/bin/pg_dump/pg_dump.c |  2 +
 src/bin/pg_rewind/libpq_source.c  |  1 +
 src/include/miscadmin.h   |  1 +
 src/include/storage/proc.h|  1 +
 src/include/utils/timeout.h   |  1 +
 src/test/isolation/Makefile   |  5 +-
 src/test/isolation/expected/timeouts.out  | 63 ++-
 src/test/isolation/specs/timeouts.spec| 30 +
 18 files changed, 190 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b5624ca884..d62edcf83b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9134,6 +9134,41 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  transaction_timeout (integer)
+  
+   transaction_timeout configuration parameter
+  
+  
+  
+   
+Terminate any session that spans longer than the specified amount of
+time in transaction. The limit applies both to explicit transactions
+(started with BEGIN) and to implicitly started
+transaction corresponding to single statement. But this limit is not
+applied to prepared transactions.
+If this value is specified without units, it is taken as milliseconds.
+A value of zero (the default) disables the timeout.
+   
+
+   
+If transaction_timeout is shorter than
+idle_in_transaction_session_timeout or statement_timeout
+transaction_timeout will invalidate longer timeout.
+   
+
+   
+Setting transaction_timeout in
+postgresql.conf is not recommended because it would
+affect all sessions.
+   
+
+   
+Prepared transactions are not subject for this timeout.
+   
+  
+ 
+
  
   lock_timeout (integer)
   
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index b0

Commitfest manager January 2024

2023-12-22 Thread vignesh C

Hi,

I didn't see anyone volunteering for the January Commitfest, so I'll
volunteer to be CF manager for January 2024 Commitfest.

Regards,
Vignesh

Re: Fixing backslash dot for COPY FROM...CSV

2023-12-22 Thread vignesh C

On Fri, 22 Dec 2023 at 01:17, Daniel Verite  wrote:
>
> vignesh C wrote:
>
> > Thanks for the updated patch, any reason why this is handled only in csv.
> > postgres=# copy test1 from '/home/vignesh/postgres/inst/bin/copy1.out';
> > COPY 1
> > postgres=# select * from test1;
> >  c1
> > ---
> > line1
> > (1 row)
>
> I believe it's safer to not change anything to the normal "non-csv"
> text mode.
> The current doc says that \. will not be taken as data in this format.
> From https://www.postgresql.org/docs/current/sql-copy.html :
>
>Any other backslashed character that is not mentioned in the above
>table will be taken to represent itself. However, beware of adding
>backslashes unnecessarily, since that might accidentally produce a
>string matching the end-of-data marker (\.) or the null string (\N
>by default). These strings will be recognized before any other
>backslash processing is done.

Thanks for the clarification. Then let's keep it as you have implemented.

Regards,
Vignesh

Re: Transaction timeout

2023-12-22 Thread Junwang Zhao

On Sat, Dec 23, 2023 at 11:17 AM Japin Li  wrote:
>
> a
> On Sat, 23 Dec 2023 at 10:40, Japin Li  wrote:
> > On Sat, 23 Dec 2023 at 08:32, Japin Li  wrote:
> >> On Fri, 22 Dec 2023 at 23:30, Junwang Zhao  wrote:
> >>> On Fri, Dec 22, 2023 at 10:44 PM Japin Li  wrote:
> 
> 
>  On Fri, 22 Dec 2023 at 22:37, Junwang Zhao  wrote:
>  > On Fri, Dec 22, 2023 at 10:25 PM Japin Li  wrote:
>  >> I try to set idle_in_transaction_session_timeout after begin 
>  >> transaction,
>  >> it changes immediately, so I think transaction_timeout should also be 
>  >> take
>  >> immediately.
>  >
>  > Ah, right, idle_in_transaction_session_timeout is set after the set
>  > command finishes and before the backend send *ready for query*
>  > to the client, so the value of the GUC is already set before
>  > next command.
>  >
> 
>  I mean, is it possible to set transaction_timeout before next comand?
> 
> >>> Yeah, it's possible, set transaction_timeout in the when it first
> >>> goes into *idle in transaction* mode, see the attached files.
> >>>
> >>
> >> Thanks for updating the patch, LGTM.
> >
> > Sorry for the noise!
> >
> > Read the previous threads, I find why the author enable transaction_timeout
> > in start_xact_command().
> >
> > The v15 patch cannot handle COMMIT AND CHAIN, see [1]. For example:
> >
> > SET transaction_timeout TO '2s'; BEGIN; SELECT 1, pg_sleep(1); COMMIT AND 
> > CHAIN; SELECT 2, pg_sleep(1); COMMIT;
> >
> > The transaction_timeout do not reset when executing COMMIT AND CHAIN.
> >
> > [1] 
> > https://www.postgresql.org/message-id/a906dea1-76a1-4f26-76c5-a7efad3ef5b8%40oss.nttdata.com
>
> Attach v16 to solve this. Any suggestions?

I've checked this with *COMMIT AND CHAIN* and *ABORT AND CHAIN*,
both work as expected. Thanks for the update.

>
> --
> Regrads,
> Japin Li
> ChengDu WenWu Information Technology Co., Ltd.
>


-- 
Regards
Junwang Zhao

Re: Fixing pgbench init overflow

2023-12-22 Thread Japin Li



On Sat, 23 Dec 2023 at 07:18, Chen Hao Hsu  wrote:
> Hello,
>
> pgbench mixes int and int64 to initialize the tables.
> When a large enough scale factor is passed, initPopulateTable
> overflows leading to it never completing, ie.
>
> 214740 of 22 tuples (97%) of
> pgbench_accounts done (elapsed 4038.83 s, remaining 98.93 s)
> -214740 of 22 tuples (-97%) of
> pgbench_accounts done (elapsed 4038.97 s, remaining -8176.86 s)
>
>
> Attached is a patch that fixes this, pgbench -i -s 22000 works now.

I think only the following line can fix this.

+   int64   k;

Do not need to modify the type of `n`, right?

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Re: Transaction timeout

2023-12-22 Thread Li Japin



> 在 2023年12月23日，11:35，Junwang Zhao  写道：
> 
> On Sat, Dec 23, 2023 at 11:17 AM Japin Li  wrote:
>> 
>> a
>>> On Sat, 23 Dec 2023 at 10:40, Japin Li  wrote:
>>> On Sat, 23 Dec 2023 at 08:32, Japin Li  wrote:
 On Fri, 22 Dec 2023 at 23:30, Junwang Zhao  wrote:
> On Fri, Dec 22, 2023 at 10:44 PM Japin Li  wrote:
>> 
>> 
>> On Fri, 22 Dec 2023 at 22:37, Junwang Zhao  wrote:
>>> On Fri, Dec 22, 2023 at 10:25 PM Japin Li  wrote:
 I try to set idle_in_transaction_session_timeout after begin 
 transaction,
 it changes immediately, so I think transaction_timeout should also be 
 take
 immediately.
>>> 
>>> Ah, right, idle_in_transaction_session_timeout is set after the set
>>> command finishes and before the backend send *ready for query*
>>> to the client, so the value of the GUC is already set before
>>> next command.
>>> 
>> 
>> I mean, is it possible to set transaction_timeout before next comand?
>> 
> Yeah, it's possible, set transaction_timeout in the when it first
> goes into *idle in transaction* mode, see the attached files.
> 
 
 Thanks for updating the patch, LGTM.
>>> 
>>> Sorry for the noise!
>>> 
>>> Read the previous threads, I find why the author enable transaction_timeout
>>> in start_xact_command().
>>> 
>>> The v15 patch cannot handle COMMIT AND CHAIN, see [1]. For example:
>>> 
>>> SET transaction_timeout TO '2s'; BEGIN; SELECT 1, pg_sleep(1); COMMIT AND 
>>> CHAIN; SELECT 2, pg_sleep(1); COMMIT;
>>> 
>>> The transaction_timeout do not reset when executing COMMIT AND CHAIN.
>>> 
>>> [1] 
>>> https://www.postgresql.org/message-id/a906dea1-76a1-4f26-76c5-a7efad3ef5b8%40oss.nttdata.com
>> 
>> Attach v16 to solve this. Any suggestions?
> 
> I've checked this with *COMMIT AND CHAIN* and *ABORT AND CHAIN*,
> both work as expected. Thanks for the update.
> 

Thanks for your testing and reviewing!

Re: Fixing pgbench init overflow

2023-12-22 Thread Tatsuo Ishii

 

> 
> On Sat, 23 Dec 2023 at 07:18, Chen Hao Hsu  wrote:
>> Hello,
>>
>> pgbench mixes int and int64 to initialize the tables.
>> When a large enough scale factor is passed, initPopulateTable
>> overflows leading to it never completing, ie.
>>
>> 214740 of 22 tuples (97%) of
>> pgbench_accounts done (elapsed 4038.83 s, remaining 98.93 s)
>> -214740 of 22 tuples (-97%) of
>> pgbench_accounts done (elapsed 4038.97 s, remaining -8176.86 s)
>>
>>
>> Attached is a patch that fixes this, pgbench -i -s 22000 works now.
> 
> I think only the following line can fix this.
> 
> + int64   k;
> 
> Do not need to modify the type of `n`, right?

You are right. n represents the return value of pg_snprintf, which is
the byte length of the formatted data, which is int, not int64.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Fixing pgbench init overflow

2023-12-22 Thread Japin Li


On Sat, 23 Dec 2023 at 15:22, Tatsuo Ishii  wrote:
>  
> 
>
>>
>> On Sat, 23 Dec 2023 at 07:18, Chen Hao Hsu  wrote:
>>> Hello,
>>>
>>> pgbench mixes int and int64 to initialize the tables.
>>> When a large enough scale factor is passed, initPopulateTable
>>> overflows leading to it never completing, ie.
>>>
>>> 214740 of 22 tuples (97%) of
>>> pgbench_accounts done (elapsed 4038.83 s, remaining 98.93 s)
>>> -214740 of 22 tuples (-97%) of
>>> pgbench_accounts done (elapsed 4038.97 s, remaining -8176.86 s)
>>>
>>>
>>> Attached is a patch that fixes this, pgbench -i -s 22000 works now.
>>
>> I think only the following line can fix this.
>>
>> +int64   k;
>>
>> Do not need to modify the type of `n`, right?
>
> You are right. n represents the return value of pg_snprintf, which is
> the byte length of the formatted data, which is int, not int64.
>

Thanks for you confirmation! Please consider the v2 patch to review.

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

>From cee64786a2353cbf77d67db416ba0596a461690b Mon Sep 17 00:00:00 2001
From: John Hsu 
Date: Fri, 22 Dec 2023 22:38:15 +
Subject: [PATCH v2 1/1] Fix pgbench init overflow when total number of rows is
 greater than int32

Previously when the number of rows exceeds int32, it would
result in overflow and pgbench never finishing. This
change moves towards int64 to align with the comparison.
---
 src/bin/pgbench/pgbench.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 2e1650d0ad..5b6549c89d 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4908,7 +4908,7 @@ initPopulateTable(PGconn *con, const char *table, int64 base,
   initRowMethod init_row)
 {
 	int			n;
-	int			k;
+	int64			k;
 	int			chars = 0;
 	PGresult   *res;
 	PQExpBufferData sql;

base-commit: 3e2e0d5ad7fcb89d18a71cbfc885ef184e1b6f2e
-- 
2.41.0

58 matches

Mail list logo