date:20231012

Re: Problem, partition pruning for prepared statement with IS NULL clause.

2023-10-12 Thread David Rowley

On Wed, 11 Oct 2023 at 22:59, Sergei Glukhov  wrote:
> > Unfortunately, I'd not long sent the last email and realised that the
> > step_lastkeyno parameter is now unused and can just be removed from
> > both get_steps_using_prefix() and get_steps_using_prefix_recurse().
> > This requires some comment rewriting so I've attempted to do that too
> > in the attached updated version.
>
> Thanks, verified again and everything is fine!

Thanks for looking.  I spent quite a bit more time on this again today
and fiddled lots more with the comments and tests.

I also did more testing after finding a way to easily duplicate the
quals to cause multiple quals per partition key.  The equivalence
class code will only make ECs for mergejoin-able clauses, so if we
just find a type that's not mergejoin-able but hashable, we can
duplicate the quals with a hash partitioned table

-- find a suitable non-mergejoin-able type.
select oprleft::regtype from pg_operator where oprcanmerge=false and
oprcanhash=true;
 oprleft
-
 xid
 cid
 aclitem

create table hash_xid(a xid, b xid, c xid) partition by hash(a,b,c);
create table hash_xid1 partition of hash_xid for values with (modulus
2, remainder 0);
create table hash_xid2 partition of hash_xid for values with (modulus
2, remainder 1);

I tried out various combinations of the following query.  Each
equality clause is duplicated 6 times.  When I enable all 6 for each
of the 3 columns, I see 216 pruning steps.  That's 6*6*6, just what I
expected.

The IS NULL quals are not duplicated since we can only set a bit once
in the nullkeys.

explain select * from hash_xid where
a = '123'::xid and a = '123'::xid and a = '123'::xid and a =
'123'::xid and a = '123'::xid and a = '123'::xid and
--a is null and a is null and a is null and a is null and a is null
and a is null and
b = '123'::xid and b = '123'::xid and b = '123'::xid and b =
'123'::xid and b = '123'::xid and b = '123'::xid and
--b is null and b is null and b is null and b is null and b is null
and b is null and
c = '123'::xid and c = '123'::xid and c = '123'::xid and c =
'123'::xid and c = '123'::xid and c = '123'::xid;
--c is null and c is null and c is null and c is null and c is null
and c is null;

putting a breakpoint at the final line of
gen_prune_steps_from_opexps() yields 216 steps.

I didn't include anything of the above as part of the additional
tests. Perhaps something like that is worthwhile in a reduced form.
However, someone might make xid mergejoinable some time, which would
break the test.

Thanks for reviewing the previous version of this patch.

Onto the other run-time one now...

David

Re: New WAL record to detect the checkpoint redo location

2023-10-12 Thread Michael Paquier

On Tue, Oct 10, 2023 at 02:43:34PM -0400, Robert Haas wrote:
> - I combined what were previously 0002 and 0003 into a single patch,
> since that's how this would get committed.
> 
> - I fixed up some comments.
> 
> - I updated commit messages.
> 
> Hopefully this is getting close to good enough.

I have looked at 0001, for now..  And it looks OK to me.

+* Nonetheless, this case is simpler than the normal cases handled
+* above, which must check for changes in doPageWrites and RedoRecPtr.
+* Those checks are only needed for records that can contain
+* full-pages images, and an XLOG_SWITCH record never does.
+Assert(fpw_lsn == InvalidXLogRecPtr);

Right, that's the core reason behind the refactoring.  The assertion
is a good idea.
--
Michael


signature.asc
Description: PGP signature

Re: A new strategy for pull-up correlated ANY_SUBLINK

2023-10-12 Thread Andy Fan

Hi Alena,

On Thu, Oct 12, 2023 at 5:01 AM Alena Rybakina 
wrote:

> Hi!
>
> I reviewed your patch and it was interesting for me!
>
> Thank you for the explanation. It was really informative for me!
>
Thanks for your interest in this,  and I am glad to know it is informative.

> Unfortunately, I found a request when sublink did not pull-up, as in the
>
examples above. I couldn't quite figure out why.
>
I'm not sure what you mean with the "above", I guess it should be the
"below"?


> explain (analyze, costs off, buffers)
> select b.x, b.x, a.y
> from b
> left join a
> on b.x=a.x and
>
> *b.t in (select max(a0.t) *
>  from a a0
>  where a0.x = b.x and
>a0.t = b.t);
>
...

>SubPlan 2
>

Here the sublink can't be pulled up because of its reference to
the  LHS of left join, the original logic is that no matter the 'b.t in ..'
returns the true or false,  the rows in LHS will be returned.  If we
pull it up to LHS, some rows in LHS will be filtered out, which
breaks its original semantics.

I thought it would be:
>
> explain (analyze, costs off, buffers)
> select b.x, b.x, a.y
> from b
> left join a on
> b.x=a.x and
>
> *b.t = (select max(a0.t) *
>  from a a0
>  where a0.x = b.x and
>a0.t <= b.t);
>  QUERY
> PLAN
>
> -
>  Hash Right Join (actual time=1.181..67.927 rows=1000 loops=1)
>Hash Cond: (a.x = b.x)
>*Join Filter: (b.t = (SubPlan 2))*
>Buffers: shared hit=3546
>->  Seq Scan on a (actual time=0.022..17.109 rows=10 loops=1)
>  Buffers: shared hit=541
>->  Hash (actual time=1.065..1.068 rows=1000 loops=1)
>  Buckets: 4096  Batches: 1  Memory Usage: 72kB
>  Buffers: shared hit=5
>  ->  Seq Scan on b (actual time=0.049..0.401 rows=1000 loops=1)
>Buffers: shared hit=5
>SubPlan 2
>  ->  Result (actual time=0.025..0.025 rows=1 loops=1000)
>Buffers: shared hit=3000
>InitPlan 1 (returns $2)
>  ->  Limit (actual time=0.024..0.024 rows=1 loops=1000)
>Buffers: shared hit=3000
>->  Index Only Scan Backward using a_t_x_idx on a a0
> (actual time=0.023..0.023 rows=1 loops=1000)
>  Index Cond: ((t IS NOT NULL) AND (t <= b.t) AND
> (x = b.x))
>  Heap Fetches: 1000
>  Buffers: shared hit=3000
>  Planning Time: 0.689 ms
>  Execution Time: 68.220 ms
> (23 rows)
>
> If you noticed, it became possible after replacing the "in" operator with
> "=".
>
I didn't notice much difference between the 'in'  and '=',  maybe I
missed something?

> I took the liberty of adding this to your patch and added myself as
> reviewer, if you don't mind.
>
Sure, the patch after your modification looks better than the original.
I'm not sure how the test case around "because of got one row" is
relevant to the current changes.  After we reach to some agreement
on the above discussion, I think v4 is good for committer to review!

-- 
Best Regards
Andy Fan

Some performance degradation in REL_16 vs REL_15

2023-10-12 Thread Anton A. Melnikov


Greetengs!

Found that simple test pgbench -c20 -T20 -j8 gives approximately
for REL_15_STABLE at 5143f76:  336+-1 TPS
and
for REL_16_STABLE at 4ac7635f: 324+-1 TPS

The performance drop is approximately 3,5%  while the corrected standard 
deviation is only 0.3%.
See the raw_data.txt attached.

How do you think, is there any cause for concern here?

And is it worth spending time bisecting for the commit where this degradation 
may have occurred?

Would be glad for any comments and concerns.

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres CompanyREL_15_STABLE at 5143f76, TPS
336.639765
334.376801
334.963121
336.23666
335.698673

REL_16_STABLE at 4ac7635f, TPS
324.373695
323.168622
323.728652
325.799901
324.81759

Re: Doc: Minor update for enable_partitionwise_aggregate

2023-10-12 Thread David Rowley

On Wed, 11 Oct 2023 at 19:38, David Rowley  wrote:
>
> On Wed, 11 Oct 2023 at 16:26, Andrew Atkinson  wrote:
> > "which allows grouping or aggregation on partitioned tables to be performed 
> > separately for each partition."
>
> This looks good to me. I can take care of this.

Pushed and backpatched to v11.

David

Re: Some performance degradation in REL_16 vs REL_15

2023-10-12 Thread David Rowley

On Thu, 12 Oct 2023 at 21:01, Anton A. Melnikov
 wrote:
>
> Greetengs!
>
> Found that simple test pgbench -c20 -T20 -j8 gives approximately
> for REL_15_STABLE at 5143f76:  336+-1 TPS
> and
> for REL_16_STABLE at 4ac7635f: 324+-1 TPS
>
> And is it worth spending time bisecting for the commit where this degradation 
> may have occurred?

It would be interesting to know what's to blame here and if you can
attribute it to a certain commit.

David

Re: Tab completion for AT TIME ZONE

2023-10-12 Thread Dagfinn Ilmari Mannsåker

Michael Paquier  writes:

> On Fri, Apr 14, 2023 at 12:05:25PM +0200, Jim Jones wrote:
>> The patch applies cleanly and it does what it is proposing. - and it's IMHO
>> a very nice addition.
>> 
>> I've marked the CF entry as "Ready for Committer".
>
> +/* ... AT TIME ZONE ... */
> + else if (TailMatches("AT"))
> + COMPLETE_WITH("TIME ZONE");
> + else if (TailMatches("AT", "TIME"))
> + COMPLETE_WITH("ZONE");
> + else if (TailMatches("AT", "TIME", "ZONE"))
> + COMPLETE_WITH_TIMEZONE_NAME();
>
> This style will for the completion of timezone values even if "AT" is
> the first word of a query.  Shouldn't this be more selective by making
> sure that we are at least in the context of a SELECT query?

It's valid anywhere an expression is, which is a lot more places than
just SELECT queries.  Off the top of my head I can think of WITH,
INSERT, UPDATE, VALUES, CALL, CREATE TABLE, CREATE INDEX.

As I mentioned upthread, the only place in the grammar where the word AT
occurs is in AT TIME ZONE, so there's no ambiguity.  Also, it doesn't
complete time zone names after AT, it completes the literal words TIME
ZONE, and you have to then hit tab again to get a list of time zones.
If we (or the SQL committee) were to invent more operators that start
with the word AT, we can add those to the first if clause above and
complete with the appropriate values after each one separately.

- ilmari

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-12 Thread Amit Kapila

On Wed, Oct 11, 2023 at 4:27 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Thank you for reviewing! PSA new version.
>

Some more comments:
1. Let's restruture binary_upgrade_validate_wal_logical_end() a bit.
First, let's change its name to binary_upgrade_slot_has_pending_wal()
or something like that. Then move the context creation and free
related code into DecodingContextHasDecodedItems(). We can rename
DecodingContextHasDecodedItems() as
pg_logical_replication_slot_has_pending_wal() and place it in
slotfuncs.c. This will make the code structure similar to other slot
functions like pg_replication_slot_advance().

2. + * Returns true if there are no changes after the confirmed_flush_lsn.

How about something like: "Returns true if there are no decodable WAL
records after the confirmed_flush_lsn."?

3. Shouldn't we need to call CheckSlotPermissions() in
binary_upgrade_validate_wal_logical_end?

4.
+ /*
+ * Also, set processing_required flag if the message is not
+ * transactional. It is needed to notify the message's existence to
+ * the caller side. Usually, the flag is set when either the COMMIT or
+ * ABORT records are decoded, but this must be turned on here because
+ * the non-transactional logical message is decoded without waiting
+ * for these records.
+ */

The first sentence of the comments doesn't seem to be required as that
just says what the code does. So, let's slightly change it to: "We
need to set processing_required flag to notify the message's existence
to the caller side. Usually, the flag is set when either the COMMIT or
ABORT records are decoded, but this must be turned on here because the
non-transactional logical message is decoded without waiting for these
records."

-- 
With Regards,
Amit Kapila.

Special-case executor expression steps for common combinations

2023-10-12 Thread Daniel Gustafsson

The attached patch adds special-case expression steps for common sets of steps
in the executor to shave a few cycles off during execution, and make the JIT
generated code simpler.

* Adds EEOP_FUNCEXPR_STRICT_1 and EEOP_FUNCEXPR_STRICT_2 for function calls of
  strict functions with 1 or 2 arguments (EEOP_FUNCEXPR_STRICT remains used for
  > 2 arguments).
* Adds EEOP_AGG_STRICT_INPUT_CHECK_ARGS_1 which is a special case for the
  common case of one arg aggs.
* Replace EEOP_DONE with EEOP_DONE_RETURN and EEOP_DONE_NO_RETURN to be able to
  skip extra setup for steps which are only interested in the side effects.

Stressing the EEOP_FUNCEXPR_STRICT_* steps specifically shows a 1.5%
improvement and pgbench over the branch shows a ~1% improvement in TPS (both
measured over 6 runs with outliers removed).

EEOP_FUNCEXPR_STRICT_* (10M iterations):
master  : (7503.317, 7553.691, 7634.524) 
patched : (7422.756, 7455.120, 7492.393)

pgbench:
master  : (3653.83, 3792.97, 3863.70)
patched : (3743.04, 3830.02, 3869.80)

This patch was extracted from a larger body of work from Andres [0] aiming at
providing the necessary executor infrastructure for making JIT expression
caching possible.  This patch, and more which are to be submitted, is however
separate in the sense that it is not part of the infrastructure, it's an
improvements on its own.

Thoughts?

--
Daniel Gustafsson

[0]: https://postgr.es/m/20191023163849.sosqbfs5yenoc...@alap3.anarazel.de



v1-0001-Add-fast-path-expression-steps-for-common-combina.patch
Description: Binary data

Re: cataloguing NOT NULL constraints

2023-10-12 Thread Alexander Lakhin


Hi Alvaro,

25.08.2023 14:38, Alvaro Herrera wrote:

I have now pushed this again.  Hopefully it'll stick this time.


I've discovered that that commit added several recursive functions, and
some of them are not protected from stack overflow.

Namely, with "max_locks_per_transaction = 600" and default ulimit -s (8192),
I observe server crashes with the following scripts:
# ATExecSetNotNull()
(n=4; printf "create table t0 (a int, b int);";
for ((i=1;i<=$n;i++)); do printf "create table t$i() inherits(t$(( $i - 1 ))); 
"; done;
printf "alter table t0 alter b set not null;" ) | psql >psql.log

# dropconstraint_internal()
(n=2; printf "create table t0 (a int, b int not null);";
for ((i=1;i<=$n;i++)); do printf "create table t$i() inherits(t$(( $i - 1 ))); 
"; done;
printf "alter table t0 alter b drop not null;" ) | psql >psql.log

# set_attnotnull()
(n=11; printf "create table tp (a int, b int, primary key(a, b)) partition by range (a); create table tp0 (a int 
primary key, b int) partition by range (a);";
for ((i=1;i<=$n;i++)); do printf "create table tp$i partition of tp$(( $i - 1 )) for values from ($i) to (100) 
partition by range (a);"; done;
printf "alter table tp attach partition tp0 for values from (0) to (100);") | psql >psql.log # this takes half an 
hour on my machine


May be you would find appropriate to add check_stack_depth() to these
functions.

(ATAddCheckNNConstraint() is protected because it calls
AddRelationNewConstraints(), which in turn calls StoreRelCheck() ->
CreateConstraintEntry() ->  recordDependencyOnSingleRelExpr() ->
find_expr_references_walker() ->  expression_tree_walker() ->
expression_tree_walker() -> check_stack_depth().)

(There were patches prepared for similar cases [1], but they don't cover new
functions, of course, and I'm not sure how to handle all such instances.)

[1] https://commitfest.postgresql.org/45/4239/

Best regards,
Alexander

Re: Logging parallel worker draught

2023-10-12 Thread Benoit Lobréau


On 10/11/23 17:26, Imseih (AWS), Sami wrote:

Thank you for resurrecting this thread.


Well, if you read Benoit's earlier proposal at [1] you'll see that he
does propose to have some cumulative stats; this LOG line he proposes
here is not a substitute for stats, but rather a complement.  I don't
see any reason to reject this patch even if we do get stats.


I believe both cumulative statistics and logs are needed. Logs excel in 
pinpointing specific queries at precise times, while statistics provide 
a broader overview of the situation. Additionally, I often encounter 
situations where clients lack pg_stat_statements and can't restart their 
production promptly.



Regarding the current patch, the latest version removes the separate GUC,
but the user should be able to control this behavior.


I created this patch in response to Amit Kapila's proposal to keep the 
discussion ongoing. However, I still favor the initial version with the 
GUCs.



Query text is logged when  log_min_error_statement > default level of "error".

This could be especially problematic when there is a query running more than 1 
Parallel
Gather node that is in draught. In those cases each node will end up
generating a log with the statement text. So, a single query execution could 
end up
having multiple log lines with the statement text.
...
I wonder if it will be better to accumulate the total # of workers planned and 
# of workers launched and
logging this information at the end of execution?


log_temp_files exhibits similar behavior when a query involves multiple 
on-disk sorts. I'm uncertain whether this is something we should or need 
to address. I'll explore whether the error message can be made more 
informative.


[local]:5437 postgres@postgres=# SET work_mem to '125kB';
[local]:5437 postgres@postgres=# SET log_temp_files TO 0;
[local]:5437 postgres@postgres=# SET client_min_messages TO log;
[local]:5437 postgres@postgres=# WITH a AS ( SELECT x FROM 
generate_series(1,1) AS F(x) ORDER BY 1 ) , b AS (SELECT x FROM 
generate_series(1,1) AS F(x) ORDER BY 1 ) SELECT * FROM a,b;
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp138850.20", size 
122880 => First sort

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp138850.19", size 14
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp138850.23", size 14
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp138850.22", size 
122880 => Second sort

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp138850.21", size 14

--
Benoit Lobréau
Consultant
http://dalibo.com

Re: Use virtual tuple slot for Unique node

2023-10-12 Thread Ashutosh Bapat

On Tue, Oct 10, 2023 at 2:23 PM David Rowley  wrote:
>
> On Wed, 27 Sept 2023 at 20:01, David Rowley  wrote:
> >
> > On Sat, 23 Sept 2023 at 03:15, Heikki Linnakangas  wrote:
> > > So not a win in this case. Could you peek at the outer slot type, and
> > > use the same kind of slot for the Unique's result? Or some more
> > > complicated logic, like use a virtual slot if all the values are
> > > pass-by-val? I'd also like to keep this simple, though...
> > >
> > > Would this kind of optimization make sense elsewhere?
> >
> > There are a few usages of ExecGetResultSlotOps(). e.g ExecInitHashJoin().
> >
> > If I adjust the patch to:
> >
> > -   ExecInitResultTupleSlotTL(&uniquestate->ps, &TTSOpsMinimalTuple);
> > +   ExecInitResultTupleSlotTL(&uniquestate->ps,
> > +
> > ExecGetResultSlotOps(outerPlanState(uniquestate),
> > +
> > NULL));
>
> Just to keep this from going cold, here's that in patch form for
> anyone who wants to test.

Thanks.

I don't recollect why we chose MinimalTupleSlot here - may be because
we expected the underlying node to always produce a minimal tupe. But
Unique node copies the tuple returned by the underlying node. This
copy is carried out by the TupleTableSlot specific copy function
copyslot. Every implementation of this function first converts the
source slot tuple into the required form and then copies it. Having
both the TupleTableSlots, ouput slot from the underlying node and the
output slot of Unique node, of the same type avoids the first step and
just copies the slot. It makes sense that it performs better. The code
looks fine to me.

>
> I spent a bit more time running some more benchmarks and I don't see
> any workload where it slows things down.  I'd be happy if someone else
> had a go at finding a regression.

I built on your experiments and I might have found a minor regression.

Setup
=
drop table if exists t_int;
create table t_int(a int, b int);
insert into t_int select x, x from generate_series(1,100)x;
create index on t_int (a,b);
vacuum analyze t_int;

drop table if exists t_text;
create table t_text(a text, b text);
insert into t_text select lpad(x::text, 1000, '0'), x::text from
generate_series(1,100)x;
create index on t_text (a,b);
vacuum analyze t_text;

drop table if exists t_mixed; -- this one is new but it doesn't matter much
create table t_mixed(a text, b int);
insert into t_mixed select lpad(x::text, 1000, '0'), x from
generate_series(1,100)x;
create index on t_mixed (a,b);
vacuum analyze t_mixed;

Queries and measurements (average execution time from 3 runs - on my
Thinkpad T490)
==
Q1 select distinct a,b from t_int';
HEAD: 544.45 ms
patched: 381.55 ms

Q2 select distinct a,b from t_text
HEAD: 609.90 ms
patched: 513.42 ms

Q3 select distinct a,b from t_mixed
HEAD: 626.80 ms
patched: 468.22 ms

The more the pass by ref data, more memory is allocated which seems to
reduce the gain by this patch.
Above nodes use Buffer or HeapTupleTableSlot.
Try some different nodes which output minimal or virtual TTS.

set enable_hashagg to off;
Q4 select distinct a,b from (select sum(a) over (order by a rows 2
preceding) a, b from t_int) q
HEAD: 2529.58 ms
patched: 2332.23

Q5 select distinct a,b from (select sum(a) over (order by a rows 2
preceding) a, b from t_int order by a, b) q
HEAD: 2633.69 ms
patched: 2255.99 ms

Q6 select distinct a,b from (select string_agg(a, ', ') over (order by
a rows 2 preceding) a, b from t_text) q
HEAD: 108589.85 ms
patched: 107226.82 ms

Q7 select distinct a,b from (select string_agg(left(a, 100), ', ')
over (order by a rows 2 preceding) a, b from t_text) q
HEAD: 16070.62 ms
patched: 16182.16 ms

This one is surprising though. May be the advantage of using the same
tuple table slot is so narrow when large data needs to be copied that
the execution times almost match. The patched and unpatched execution
times differ by the margin of error either way.

-- 
Best Wishes,
Ashutosh Bapat

Re: Special-case executor expression steps for common combinations

2023-10-12 Thread Heikki Linnakangas


On 12/10/2023 12:48, Daniel Gustafsson wrote:

The attached patch adds special-case expression steps for common sets of steps
in the executor to shave a few cycles off during execution, and make the JIT
generated code simpler.

* Adds EEOP_FUNCEXPR_STRICT_1 and EEOP_FUNCEXPR_STRICT_2 for function calls of
   strict functions with 1 or 2 arguments (EEOP_FUNCEXPR_STRICT remains used for
   > 2 arguments).
* Adds EEOP_AGG_STRICT_INPUT_CHECK_ARGS_1 which is a special case for the
   common case of one arg aggs.


Are these relevant when JITting? I'm a little sad if the JIT compiler 
cannot unroll these on its own. Is there something we could do to hint 
it, so that it could treat the number of arguments as a constant?


I understand that this can give a small boost in interpreter mode, so 
maybe we should do it in any case. But I'd like to know if we're missing 
a trick with the JITter, before we mask it with this.



* Replace EEOP_DONE with EEOP_DONE_RETURN and EEOP_DONE_NO_RETURN to be able to
   skip extra setup for steps which are only interested in the side effects.


I'm a little surprised if this makes a measurable performance 
difference, but sure, why not. It seems nice to be more explicit when 
you don't expect a return value.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: [PATCH] Compression dictionaries for JSONB

2023-10-12 Thread Aleksander Alekseev

Hi hackers,

I would like to continue discussing compression dictionaries.

> So I summarized the requirements we agreed on so far and ended up with
> the following list: [...]

Again, here is the summary of our current agreements, at least how I
understand them. Please feel free to correct me where I'm wrong.

We are going to focus on supporting the:

SET COMPRESSION lz4 [WITH|WITHOUT] DICTIONARY
```

... syntax for now. From the UI perspective the rest of the agreements
didn't change compared to the previous summary.

In the [1] discussion (cc: Robert) we agreed to use va_tag != 18 for
the on-disk TOAST pointer representation to make TOAST pointers
extendable. If va_tag has a different value (currently it's always
18), the TOAST pointer is followed by an utf8-like varint bitmask.
This bitmask determines the rest of the content of the TOAST pointer
and its overall size. This will allow to extend TOAST pointers to
include dictionary_id and also to extend them in the future, e.g. to
support ZSTD and other compression algorithms, use 64-bit TOAST
pointers, etc.

Several things occured to me:

- Does anyone believe that va_tag should be part of the utf8-like
bitmask in order to save a byte or two?

- The described approach means that compression dictionaries are not
going to be used when data is compressed in-place (i.e. within a
tuple), since no TOAST pointer is involved in this case. Also we will
be unable to add additional compression algorithms here. Does anyone
have problems with this? Should we use the reserved compression
algorithm id instead as a marker of an extended TOAST?

- It would be nice to decompose the feature in several independent
patches, e.g. modify TOAST first, then add compression dictionaries
without automatic update of the dictionaries, then add the automatic
update. I find it difficult to imagine however how to modify TOAST
pointers and test the code properly without a dependency on a larger
feature. Could anyone think of a trivial test case for extendable
TOAST? Maybe something we could add to src/test/modules similarly to
how we test SLRU, background workers, etc.

[1]:
https://www.postgresql.org/message-id/flat/CAN-LCVMq2X%3Dfhx7KLxfeDyb3P%2BBXuCkHC0g%3D9GF%2BJD4izfVa0Q%40mail.gmail.com

--
Best regards,
Aleksander Alekseev

Re: Server crash on RHEL 9/s390x platform against PG16

2023-10-12 Thread Suraj Kharage

Here is clang version:

[edb@9428da9d2137]$ clang --version

clang version 15.0.7 (Red Hat 15.0.7-2.el9)

Target: s390x-ibm-linux-gnu

Thread model: posix

InstalledDir: /usr/bin


Let me know if any further information is needed.

On Mon, Oct 9, 2023 at 8:21 AM Suraj Kharage 
wrote:

> It looks like an issue with JIT. If I disable the JIT then the above query
> runs successfully.
>
> postgres=# set jit to off;
>
> SET
>
> postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON
> rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON
> rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
>
>  pkey | val  | pkey |  label  | hidden | pkey | val | pkey
>
> --+--+--+-++--+-+--
>
> 1 | row1 |1 | hidden  | t  |1 |   1 |
>
> 1 | row1 |1 | hidden  | t  |2 |   1 |
>
> 2 | row2 |2 | visible | f  |1 |   1 |
>
> 2 | row2 |2 | visible | f  |2 |   1 |
>
> (4 rows)
>
> Any idea on this?
>
> On Mon, Sep 18, 2023 at 11:20 AM Suraj Kharage <
> suraj.khar...@enterprisedb.com> wrote:
>
>> Few more details on this:
>>
>> (gdb) p val
>> $1 = 0
>> (gdb) p i
>> $2 = 3
>> (gdb) f 3
>> #3  0x01a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at
>> ../../../../src/include/executor/tuptable.h:472
>> 472 return slot->tts_ops->copy_minimal_tuple(slot);
>> (gdb) p *slot
>> $3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops =
>> 0x1b6dcc8 , tts_tupleDescriptor = 0x202e0e8, tts_values =
>> 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid =
>> {ip_blkid = {bi_hi = 65535,
>>   bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
>> (gdb) p *slot->tts_tupleDescriptor
>> $2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr
>> = 0x0, attrs = 0x202cd28}
>>
>> (gdb) p slot.tts_values[3]
>> $4 = 0
>> (gdb) p slot.tts_values[2]
>> $5 = 1
>> (gdb) p slot.tts_values[1]
>> $6 = 34027556
>>
>>
>> As per the resultslot, it has 0 value for the third attribute (column
>> lable).
>> Im testing this on the docker container and facing some issues with gdb
>> hence could not able to debug it further.
>>
>> Here is a explain plan:
>>
>> postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT
>> JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN
>> rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by
>> rm32044_t1.pkey,label,hidden;
>>
>>  QUERY PLAN
>>
>>
>> -
>>  Incremental Sort
>>Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
>> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey
>>Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden
>>Presorted Key: rm32044_t1.pkey
>>->  Merge Left Join
>>  Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
>> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey
>>  Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey)
>>  ->  Sort
>>Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey,
>> rm32044_t1.pkey, rm32044_t1.val
>>Sort Key: rm32044_t1.pkey
>>->  Nested Loop
>>  Output: rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
>>  ->  Merge Left Join
>>Output: rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey
>>Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey)
>>->  Sort
>>  Output: rm32044_t3.pkey, rm32044_t3.val
>>  Sort Key: rm32044_t3.pkey
>>  ->  Seq Scan on public.rm32044_t3
>>Output: rm32044_t3.pkey,
>> rm32044_t3.val
>>->  Sort
>>  Output: rm32044_t4.pkey
>>  Sort Key: rm32044_t4.pkey
>>  ->  Seq Scan on public.rm32044_t4
>>Output: rm32044_t4.pkey
>>  ->  Materialize
>>Output: rm32044_t1.pkey, rm32044_t1.val
>>->  Seq Scan on public.rm32044_t1
>>  Output: rm32044_t1.pkey, rm32044_t1.val
>>  ->  Sort
>>Output: rm32044_t2.pkey, rm32044_t2.label,
>> rm32044_t2.hidden
>>Sort Key: rm32044_t2.pkey
>>->  Seq Scan on public.rm32044_t2
>>  Output: rm32044_t2.pkey, rm32044_t2.label,
>> rm32044_t2.hidden
>> (34 rows)
>>
>>
>> It seems like while building the innerslot for merge join, the value for
>> attnum 1 is no

Test 026_overwrite_contrecord fails on very slow machines (under Valgrind)

2023-10-12 Thread Alexander Lakhin


Hello hackers,

While investigating the recent skink failure [1], I've reproduced this
failure under Valgrind on a slow machine and found that this happens due to
the last checkpoint recorded in the segment 2, that is removed in the test:
The failure log contains:
2023-10-10 19:10:08.212 UTC [2144251][startup][:0] LOG:  invalid checkpoint 
record
2023-10-10 19:10:08.214 UTC [2144251][startup][:0] PANIC:  could not locate a 
valid checkpoint record

The line above:
[19:10:02.701](318.076s) ok 1 - 00010001 differs from 
00010002
tells us about the duration of previous operations (> 5 mins).

src/test/recovery/tmp_check/log/026_overwrite_contrecord_primary.log:
2023-10-10 19:04:50.149 UTC [1845798][postmaster][:0] LOG:  database system is 
ready to accept connections
...
2023-10-10 19:09:49.131 UTC [1847585][checkpointer][:0] LOG: checkpoint 
starting: time
...
2023-10-10 19:10:02.058 UTC [1847585][checkpointer][:0] LOG: checkpoint 
complete: ... lsn=0/*2093980*, redo lsn=0/1F62760

And here is one more instance of this failure [2]:
2022-11-08 02:35:25.826 UTC [1614205][][:0] PANIC:  could not locate a valid 
checkpoint record
2022-11-08 02:35:26.164 UTC [1612967][][:0] LOG:  startup process (PID 1614205) 
was terminated by signal 6: Aborted

src/test/recovery/tmp_check/log/026_overwrite_contrecord_primary.log:
2022-11-08 02:29:57.961 UTC [1546469][][:0] LOG:  database system is ready to 
accept connections
...
2022-11-08 02:35:10.764 UTC [1611737][][2/10:0] LOG:  statement: SELECT 
pg_walfile_name(pg_current_wal_insert_lsn())
2022-11-08 02:35:11.598 UTC [1546469][][:0] LOG:  received immediate shutdown 
request

The next successful run after the failure [1] shows the following duration:
[21:34:48.556](180.150s) ok 1 - 00010001 differs from 
00010002
And the last successful run:
[03:03:53.892](126.206s) ok 1 - 00010001 differs from 
00010002

So to fail on the test, skink should perform at least twice slower than
usual, and may be it's an extraordinary condition indeed, but on the other
hand, may be increase checkpoint_timeout as already done in several tests
(015_promotion_pages, 038_save_logical_slots_shutdown, 039_end_of_wal, ...).

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-10-10%2017%3A10%3A11
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-11-07%2020%3A27%3A11

Best regards,
Alexander

pg_upgrade's interaction with pg_resetwal seems confusing

2023-10-12 Thread Amit Kapila

In pg_upgrade, we reset WAL archives (remove WAL), transaction id,
etc. in copy_xact_xlog_xid() for the new cluster. Then, we create new
objects in the new cluster, and again towards the end of the upgrade
we invoke pg_resetwal with the -o option to reset the next OID. Now,
along with resetting the OID, pg_resetwal will again reset the WAL. I
am not sure if that is intentional and it may not have any problem
today except that it seems redundant to reset WAL again.

However, this can be problematic for the ongoing work to upgrade the
logical replication slots [1]. We want to create/migrate logical slots
in the new cluster before calling the final pg_resetwal (which resets
the next OID) to ensure that there is no new WAL inserted by
background processes or otherwise between resetwal location and
creation of slots. So, we thought that we would compute the next WAL
location by doing the computation similar to what pg_resetwal does to
reset a new WAL location, create slots using that location, and pass
the same to pg_resetwal using the -l option. However, that doesn't
work because pg_resetwal uses the passed -l option only as a hint but
can reset the later WAL if present which can remove the WAL position
we have decided as restart_lsn (point to start reading WAL) for slots.
So, we came up with another idea that we will reset the WAL just
before creating slots and use that location to create slots and then
invent a new option in pg_resetwal where it won't reset the WAL.

Now, as mentioned in the first paragraph, it seems we anyway don't
need to reset the WAL at the end when setting the next OID for the new
cluster with the -o option. If that is true, then I think even without
slots work it will be helpful to have such an option in pg_resetwal.

Thoughts?

[1] - https://commitfest.postgresql.org/45/4273/

-- 
With Regards,
Amit Kapila.

Re: Removing unneeded self joins

2023-10-12 Thread Alexander Korotkov

On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
 wrote:
> On 4/10/2023 14:34, Alexander Korotkov wrote:
> >  > Relid replacement machinery is the most contradictory code here. We used
> >  > a utilitarian approach and implemented a simplistic variant.
> >
> >  > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> >  > > they are not necessary.  [1] points that infrastructure from [2] might
> >  > > be useful.  The patchset from [2] seems committed mow.  However, I
> >  > > can't see it is directly helpful in this matter.  Could we just skip
> >  > > adding IS NOT NULL clause for the columns, that have
> >  > > pg_attribute.attnotnull set?
> >  > Thanks for the links, I will look into that case.
> To be more precise, in the attachment, you can find a diff to the main
> patch, which shows the volume of changes to achieve the desired behaviour.
> Some explains in regression tests shifted. So, I've made additional tests:
>
> DROP TABLE test CASCADE;
> CREATE TABLE test (a int, b int not null);
> CREATE UNIQUE INDEX abc ON test(b);
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b;
> CREATE UNIQUE INDEX abc1 ON test(a,b);
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b;
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
> DROP INDEX abc1;
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
>
> We have almost the results we wanted to have. But in the last explain
> you can see that nothing happened with the OR clause. We should use the
> expression mutator instead of walker to handle such clauses. But It
> doesn't process the RestrictInfo node ... I'm inclined to put a solution
> of this issue off for a while.

OK.  I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).

I made improvements over the code.  Mostly new comments, grammar
corrections of existing comments and small refactoring.

Also, I found that the  suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it.  That helps to evade quadratic complexity with
large number of relations.

Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).

Links
1. 
https://www.postgresql.org/message-id/CAKJS1f8ySSsBfooH3bJK7OD3LBEbDb99d8J_FtqDd6w50p-eAQ%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/96f66ae3-df10-4060-9844-4c9633062cd3%40yandex.ru

--
Regards,
Alexander Korotkov


0001-Remove-useless-self-joins-v44.patch
Description: Binary data

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-12 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

Thanks for your suggestion! PSA new version.

> The other problem is that pg_resetwal removes all pre-existing WAL
> files which in this case could lead to the removal of the WAL file
> corresponding to restart_lsn. This is because at least the shutdown
> checkpoint record will be written after the creation of slots which
> could be in the new file used for restart_lsn. Then when we invoke
> pg_resetwal, it can remove that file.
> 
> One idea to deal with this could be to do the reset WAL stuff
> (FindEndOfXLOG(), KillExistingXLOG(), KillExistingArchiveStatus(),
> WriteEmptyXLOG()) in a separate function (say in pg_upgrade) and then
> create slots. If we do this, then we additionally need an option in
> pg_resetwal which skips resetting the WAL as that would have been done
> before creating the slots.

Based on above idea, I made new version patch which some functionalities were
exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs and
then create logical slots, then pg_resetwal would be called with new option
--no-switch, which avoid to switch a WAL segment file. The option is only used
for the upgrading purpose so it is not written in doc and usage(). This option
is not required if pg_resetwal -o does not discard WAL records. Please see the
fork thread [1].

We do not have to reserve future restart_lsn while creating a slot, so the 
binary
function binary_upgrade_create_logical_replication_slot() was removed.

Another advantage of this approach is to avoid calling pg_log_standby_snapshot()
after the pg_resetwal. This was needed because of two reasons, but they were
resolved automatically.
  1) pg_resetwal removes all WAL files.
  2) Logical slots requires a RUNNING_XACTS record for building a snapshot.
 
[1]: 
https://www.postgresql.org/message-id/CAA4eK1KRyPMiY4fW98qFofsYrPd87Oc83zDNxSeHfTYh_asdBg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


v49-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v49-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-12 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

Thanks for reviewing! New patch is available at [1].

> 
> Some more comments:
> 1. Let's restruture binary_upgrade_validate_wal_logical_end() a bit.
> First, let's change its name to binary_upgrade_slot_has_pending_wal()
> or something like that. Then move the context creation and free
> related code into DecodingContextHasDecodedItems(). We can rename
> DecodingContextHasDecodedItems() as
> pg_logical_replication_slot_has_pending_wal() and place it in
> slotfuncs.c. This will make the code structure similar to other slot
> functions like pg_replication_slot_advance().

Seems clearer than mine. Fixed.

> 2. + * Returns true if there are no changes after the confirmed_flush_lsn.
> 
> How about something like: "Returns true if there are no decodable WAL
> records after the confirmed_flush_lsn."?

Fixed.

> 3. Shouldn't we need to call CheckSlotPermissions() in
> binary_upgrade_validate_wal_logical_end?

Added, but actually it is not needed. This is because only superusers can 
connect
to the server while upgrading. Please see below codes in InitPostgres().

```
if (IsBinaryUpgrade && !am_superuser)
{
ereport(FATAL,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 errmsg("must be superuser to connect in binary 
upgrade mode")));
}
```

> 4.
> + /*
> + * Also, set processing_required flag if the message is not
> + * transactional. It is needed to notify the message's existence to
> + * the caller side. Usually, the flag is set when either the COMMIT or
> + * ABORT records are decoded, but this must be turned on here because
> + * the non-transactional logical message is decoded without waiting
> + * for these records.
> + */
> 
> The first sentence of the comments doesn't seem to be required as that
> just says what the code does. So, let's slightly change it to: "We
> need to set processing_required flag to notify the message's existence
> to the caller side. Usually, the flag is set when either the COMMIT or
> ABORT records are decoded, but this must be turned on here because the
> non-transactional logical message is decoded without waiting for these
> records."

Fixed.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866B0614F80CE9F5EF051BDF5D3A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Special-case executor expression steps for common combinations

2023-10-12 Thread David Rowley

On Thu, 12 Oct 2023 at 22:54, Daniel Gustafsson  wrote:
> EEOP_FUNCEXPR_STRICT_* (10M iterations):
> master  : (7503.317, 7553.691, 7634.524)
> patched : (7422.756, 7455.120, 7492.393)
>
> pgbench:
> master  : (3653.83, 3792.97, 3863.70)
> patched : (3743.04, 3830.02, 3869.80)
>
> Thoughts?

Did any of these tests compile the expression with JIT?

If not, how does the performance compare for a query that JITs the expression?

David

Re: Problem, partition pruning for prepared statement with IS NULL clause.

2023-10-12 Thread David Rowley

On Mon, 9 Oct 2023 at 12:26, David Rowley  wrote:
>
> On Sat, 7 Oct 2023 at 03:11, Sergei Glukhov  wrote:
> > I noticed that combination of prepared statement with generic plan and
> > 'IS NULL' clause could lead partition pruning to crash.
>
> > Test case:
> > --
> > set plan_cache_mode to force_generic_plan;
> > prepare stmt AS select * from hp where a is null and b = $1;
> > explain execute stmt('xxx');
>
> Thanks for the detailed report and proposed patch.
>
> I think your proposed fix isn't quite correct.  I think the problem
> lies in InitPartitionPruneContext() where we assume that the list
> positions of step->exprs are in sync with the keyno.  If you look at
> perform_pruning_base_step() the code there makes a special effort to
> skip over any keyno when a bit is set in opstep->nullkeys.

I've now also pushed the fix for the incorrect logic for nullkeys in
ExecInitPruningContext().

I didn't quite find a test to make this work for v11. I tried calling
execute 5 times as we used to have to before the plan_cache_mode GUC
was added in v12, but the test case kept picking the custom plan. So I
ended up pushing v11 without any test.  This goes out of support in ~1
month, so I'm not too concerned about the lack of test.  I did do a
manual test to ensure it works with:

create table hp (a int, b text, c int) partition by hash (a, b);
create table hp0 partition of hp for values with (modulus 4, remainder 0);
create table hp3 partition of hp for values with (modulus 4, remainder 3);
create table hp1 partition of hp for values with (modulus 4, remainder 1);
create table hp2 partition of hp for values with (modulus 4, remainder 2);

prepare hp_q1 (text) as select * from hp where a is null and b = $1;

(set breakpoint in choose_custom_plan() and have it return false when
we hit it.)

explain (costs off) execute hp_q1('xxx');

David

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2023-10-12 Thread Dilip Kumar

On Wed, Oct 11, 2023 at 5:57 PM Dilip Kumar  wrote:
>
> On Wed, Oct 11, 2023 at 4:34 PM Dilip Kumar  wrote:

> In my last email, I forgot to give the link from where I have taken
> the base path for dividing the buffer pool in banks so giving the same
> here[1].  And looking at this again it seems that the idea of that
> patch was from
> Andrey M. Borodin and the idea of the SLRU scale factor were
> introduced by Yura Sokolov and Ivan Lazarev.  Apologies for missing
> that in the first email.
>
> [1] https://commitfest.postgresql.org/43/2627/

In my last email I have just rebased the base patch, so now while
reading through that patch I realized that there was some refactoring
needed and some unused functions were there so I have removed that and
also added some comments.  Also did some refactoring to my patches. So
reposting the patch series.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v2-0002-bank-wise-slru-locks.patch
Description: Binary data


v2-0003-Introduce-bank-wise-LRU-counter.patch
Description: Binary data


v2-0001-Divide-SLRU-buffers-into-banks.patch
Description: Binary data

Re: RFC: Logging plan of the running query

2023-10-12 Thread torikoshia


On 2023-10-11 16:22, Ashutosh Bapat wrote:

Like many others I think this feature is useful to debug a long running 
query.


Sorry for jumping late into this.

I have a few of high level comments


Thanks for your comments!


There is a lot of similarity between what this feature does and what
auto explain does. I see the code is also duplicated. There is some
merit in avoiding this duplication
1. we will get all the features of auto_explain automatically like
choosing a format (this was expressed somebody earlier in this
thread), setings etc.
2. avoid bugs. E.g your code switches context after ExplainState has
been allocated. These states may leak depending upon when this
function gets called.
3. Building features on top as James envisions will be easier.

Considering the similarity with auto_explain I wondered whether this
function should be part of auto_explain contrib module itself? If we
do that users will need to load auto_explain extension and thus
install executor hooks when this function doesn't need those. So may
not be such a good idea. I didn't see any discussion on this.


I once thought about adding this to auto_explain, but I left it asis for 
below reasons:


- One of the typical use case of pg_log_query_plan() would be analyzing 
slow query on customer environments. On such environments, We cannot 
always control what extensions to install.
  Of course auto_explain is a major extension and it is quite possible 
that they installed auto_explain, but but it is also possible they do 
not.
- It seems a bit counter-intuitive that pg_log_query_plan() is in an 
extension called auto_explain, since it `manually`` logs plans


I tried following query to pass PID of a non-client backend to this 
function.

#select pg_log_query_plan(pid), application_name, backend_type from
pg_stat_activity where backend_type = 'autovacuum launcher';
 pg_log_query_plan | application_name |backend_type
---+--+-
 t |  | autovacuum launcher
(1 row)
I see "LOG:  backend with PID 2733631 is not running a query or a
subtransaction is aborted" in server logs. That's ok. But may be we
should not send signal to these kinds of backends at all, thus
avoiding some system calls.


Agreed, it seems better.
Attached patch checks if the backendType of target process is 'client 
backend'.


  =# select pg_log_query_plan(pid), application_name, backend_type from  
pg_stat_activity where backend_type = 'autovacuum launcher';

  WARNING:  PID 63323 is not a PostgreSQL client backend process
   pg_log_query_plan | application_name |backend_type
  ---+--+-
   f |  | autovacuum launcher



I am also wondering whether it's better to report the WARNING as
status column in the output. E.g. instead of
#select pg_log_query_plan(100);
WARNING:  PID 100 is not a PostgreSQL backend process
 pg_log_query_plan
---
 f
(1 row)
we output
#select pg_log_query_plan(100);
 pg_log_query_plan |   status
---+-
 f | PID 100 is not a PostgreSQL backend process
(1 row)

That looks neater and can easily be handled by scripts, applications
and such. But it will be inconsistent with other functions like
pg_terminate_backend() and pg_log_backend_memory_contexts().


It seems neater, but it might be inconvenient because we can no longer 
use it  in select list like the following query as you wrote:


  #select pg_log_query_plan(pid), application_name, backend_type from
  pg_stat_activity where backend_type = 'autovacuum launcher';


I do share a concern that was discussed earlier. If a query is running
longer, there's something problematic with it. A diagnostic
intervention breaking it further would be unwelcome. James has run
experiments to shake this code for any loose breakages. He has not
found any. So may be we are good. And we wouldn't know about very rare
corner cases so easily without using it in the field. So fine with it.
If we could add some safety net that will be great but may not be
necessary for the first cut.


If there are candidates for the safety net, I'm willing to add them.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom b7902cf43254450cc7831c235982438ea1e5e8b7 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Thu, 12 Oct 2023 22:03:48 +0900
Subject: [PATCH v31] Add function to log the plan of the query

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue

PostgreSQL domains and NOT NULL constraint

2023-10-12 Thread Erki Eessaar

Hello

PostgreSQL's CREATE DOMAIN documentation (section Notes) describes a way how 
one can add NULL's to a column that has a domain with the NOT NULL constraint.
https://www.postgresql.org/docs/current/sql-createdomain.html

To me it seems very strange and amounts to a bug because it defeats the purpose 
of domains (to be a reusable assets) and constraints (to avoid any bypassing of 
these).

Oracle 23c added the support of domains 
(https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/create-domain.html).
 I tested the same scenario both in PostgreSQL and Oracle 
(https://www.oracle.com/database/free/) and found out that in these situations 
Oracle does not allow NULL's to be added to the column. I do not know as to 
whether the behaviour that is implemented in PostgreSQL is specified by the 
standard. However, if it is not the case, then how it could be that Oracle can 
but PostgreSQL cannot.

Best regards
Erki Eessaar

The scenario that I tested both in PostgreSQL (16) and Oracle (23c).
***
/*PostgreSQL 16*/

CREATE DOMAIN d_name VARCHAR(50) NOT NULL;

CREATE TABLE Product_state_type (product_state_type_code SMALLINT NOT NULL,
name d_name,
CONSTRAINT pk_product_state_type PRIMARY KEY (product_state_type_code),
CONSTRAINT ak_product_state_type_name UNIQUE (name));

CREATE TABLE Product (product_code INTEGER NOT NULL,
name d_name,
product_state_type_code SMALLINT NOT NULL,
CONSTRAINT pk_product PRIMARY KEY (product_code),
CONSTRAINT fk_product_product_state_type FOREIGN KEY (product_state_type_code)
REFERENCES Product_state_type(product_state_type_code) ON UPDATE CASCADE);

INSERT INTO Product_state_type (product_state_type_code, name)
VALUES (1, (SELECT name FROM Product_state_type WHERE FALSE));
/*Insertion succeeds, name is NULL!*/

INSERT INTO Product (product_code, name, product_state_type_code)
SELECT 1 AS product_code, Product.name, 1 AS product_state_type_code
FROM Product_state_type LEFT JOIN Product USING (product_state_type_code);
/*Insertion succeeds, name is NULL!*/

/*Oracle 23c*/

CREATE DOMAIN d_name AS VARCHAR2(50) NOT NULL;

CREATE TABLE Product_state_type (product_state_type_code NUMBER(4) NOT NULL,
name d_name,
CONSTRAINT pk_product_state_type PRIMARY KEY (product_state_type_code),
CONSTRAINT ak_product_state_type_name UNIQUE (name));

CREATE TABLE Product (product_code NUMBER(8) NOT NULL,
name d_name,
product_state_type_code NUMBER(4) NOT NULL,
CONSTRAINT pk_product PRIMARY KEY (product_code),
CONSTRAINT fk_product_product_state_type FOREIGN KEY (product_state_type_code)
REFERENCES Product_state_type(product_state_type_code));


INSERT INTO Product_state_type (product_state_type_code, name)
VALUES (1, (SELECT name FROM Product_state_type WHERE FALSE));
/*Fails.
Error report -
SQL Error: ORA-01400: cannot insert NULL into
("SYSTEM"."PRODUCT_STATE_TYPE"."NAME")
Help: https://docs.oracle.com/error-help/db/ora-01400/
01400. 0 -  "cannot insert NULL into (%s)"
*Cause:An attempt was made to insert NULL into previously listed objects.
*Action:   These objects cannot accept NULL values.*/

INSERT INTO Product_state_type (product_state_type_code, name)
VALUES (1, 'Active');

INSERT INTO Product (product_code, name, product_state_type_code)
SELECT 1 AS product_code, Product.name, 1 AS product_state_type_code
FROM Product_state_type LEFT JOIN Product USING (product_state_type_code);
/*Fails.
SQL Error: ORA-01400: cannot insert NULL into
("SYSTEM"."PRODUCT"."NAME")
Help: https://docs.oracle.com/error-help/db/ora-01400/
01400. 0 -  "cannot insert NULL into (%s)"
*Cause:An attempt was made to insert NULL into previously listed objects.
*Action:   These objects cannot accept NULL values.*/

1 2 >

1 - 100 of 111 matches

Mail list logo