Re: Collation versioning

2020-03-17 Thread Julien Rouhaud
On Tue, Mar 17, 2020 at 03:37:49PM +0900, Michael Paquier wrote:
> On Mon, Mar 16, 2020 at 03:05:20PM +0100, Julien Rouhaud wrote:
> > On Mon, Mar 16, 2020 at 04:57:38PM +0900, Michael Paquier wrote:
> >> This comes from a regression test doing a sanity check to look for
> >> catalogs which have a toastable column but no toast tables.  As an
> >> exception, it should be documented in the test's comment.  Actually,
> >> does it need to be an exception?  This does not depend on
> >> relation-level facilities so there should be no risk of recursive
> >> dependencies, though I have not looked in details at this part.
> >
> > I totally missed that, and I agree that there's no need for an exception, so
> > fixed.
>
> How long can actually be collation version strings?  Note also
> 96cdeae, which makes sense for pg_depend to have one.


Versions shouldn't be that long usually, but there were some previous
discussions on how to try to come up with some workaround on systems that don't
provide a version, using a hash of the underlying file or something like that.
Using a hash value big enough to require toasting wouldn't make much sense, but
it feels safer to be ready to handle any later use, whether for collation or
other kind of objects


> >> Regarding patch 0003, it would be nice to include some tests
> >> independent on the rest and making use of the new functions.  These
> >> normally go in regproc.sql.  For example with a collation that needs
> >> double quotes as this is not obvious:
> >> =# select regcollation('"POSIX"');
> >> regcollation
> >>  --
> >>  "POSIX"
> >> (1 row)
> >>
> >> On top of that, this needs tests with to_regcollation() and tests with
> >> schema-qualified collations.
> >
> >
> > Done too, using the same collation name, for both with and without schema
> > qualification.
>
> It seems to me that you could add an extra test with a catalog that
> does not exist, making sure that NULL is returned:
> SELECT to_regtype('ng_catalog."POSIX"');


Agreed, I'll add that, but using a name that looks less like a typo :)


>
> pg_collation_actual_version
> -   
> pg_collation_actual_version(oid)
> +   
> pg_collation_actual_version(regcollation)
>
> The function's input argument is not changed, why?


That's a mistake, will fix.


> Patch 0003 is visibly getting in shape, and that's an independent
> piece.  I guess that Thomas is looking at that, so let's wait for his
> input.
>
> Note that patch 0002 fails to compile because it is missing to include
> utils/builtins.h for CStringGetTextDatum(), and that you cannot pass
> down a NameData to this API, because it needs a simple char string or
> you would need NameStr() or such.  Anyway, it happens that you don't
> need recordDependencyOnVersion() at all, because it is removed by
> patch 0004 in the set, so you could just let it go.


Ah good catch, I missed that during the NameData/text refactoring.  I'll fix it
anyway, better to have clean history.




Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-03-17 Thread Etsuro Fujita
Hi Ashutosh,

On Wed, Mar 4, 2020 at 1:48 AM Ashutosh Bapat
 wrote:
> I reviewed the patch. Except for the logic of matching the pairs of 
> partitions from already merged partitions, I think the code changes are good. 
> But there are several places where it needs further changes to comments. The 
> attached patch has those. I have described some of them below.

Thanks for reviewing!

> + * We can not perform partition-wise join if either of the joining
> + * relations is not partitioned.
>
> We are consistently using partitionwise instead of partition-wise.

Will fix.

> + /*
> + * See if the partition bounds for inputs are exactly the same, in
> + * which case we don't need to work hard: partitions with the same
> + * partition indexes will form join pairs, and the join rel will have
> + * the same partition bounds as inputs; otherwise try to merge the
> + * partition bounds along with generating join pairs.
>
> Phrase "joining relations" is better than "inputs", IMO. Updated in the
> attached patch.

"inputs" is used in many places in the planner performing join
planning, so I'm not sure "joining relations" is better than "inputs".

> + /*
> + * If the partition bounds for the join rel are not merged ones,
> + * inputs are guaranteed to have the same partition bounds, so
> + * partitions with the same partition indexes will form join pairs;
> + * else let get_matching_part_pairs() do the work.
> + */
> + if (joinrel->merged)
> + {
>
> This condition in the comment is opposite to the condition being checked in
> code, so looks confusing. BTW this comment is also repeated around line 1405.
> See attached patch for correction.

OK, I'll revise the comments as proposed.

> + /*
> + * If this segment of the join is empty, it means that this segment
>
> "partition of the join" looks consistent with other usages than "segment of 
> the
> join".

Actually, "segment" is used in the existing comments in the caller
function try_partitionwise_join(), so I think "segment" is better here
for consistency.

> + /*
> + * Get a relids set of partition(s) involved in this join segment that
> + * are from the rel1 side.
> + */
> + child_relids1 = bms_intersect(child_joinrel->relids,
> +  rel1->all_partrels);
>
> The partition bounds are sorted by their values. Even for a list partitioned
> table, the bounds are sorted by the least partition value. We do not allow
> multiple paritions from one side to be joined with one partition on the other
> and vice versa. All this together means that the partitions of the join
> relation are formed by joining partitions from either side in the order of
> their bounds. This means that the matching pairs of partitions can be found by
> matching relids of partitions of join with those of the joining relation by
> traversing partitions from all the three relations only once in the order they
> appears in partition bounds of corresponding relations.

Consider this 2-way join for list partitioned tables:

CREATE TABLE plt1_ad (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt1_ad_p1 PARTITION OF plt1_ad FOR VALUES IN ('0001', '0003');
CREATE TABLE plt1_ad_p2 PARTITION OF plt1_ad FOR VALUES IN ('0002');
INSERT INTO plt1_ad SELECT i, i, to_char(i % 10, 'FM') FROM
generate_series(1, 100) i WHERE i % 10 in (1, 2, 3);
ANALYZE plt1_ad;
CREATE TABLE plt2_ad (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt2_ad_p1 PARTITION OF plt2_ad FOR VALUES IN ('0002', '0004');
CREATE TABLE plt2_ad_p2 PARTITION OF plt2_ad FOR VALUES IN ('0003');
INSERT INTO plt2_ad SELECT i, i, to_char(i % 10, 'FM') FROM
generate_series(1, 100) i WHERE i % 10 IN (2, 3, 4);
ANALYZE plt2_ad;

EXPLAIN (COSTS OFF)
SELECT t1.c, t1.a, t2.a FROM plt1_ad t1 INNER JOIN plt2_ad t2 ON (t1.c
= t2.c) WHERE t1.a < 10  ORDER BY t1.c, t1.a, t2.a;
 QUERY PLAN
-
 Sort
   Sort Key: t1.c, t1.a, t2.a
   ->  Append
 ->  Hash Join
   Hash Cond: (t2_1.c = t1_2.c)
   ->  Seq Scan on plt2_ad_p1 t2_1
   ->  Hash
 ->  Seq Scan on plt1_ad_p2 t1_2
   Filter: (a < 10)
 ->  Hash Join
   Hash Cond: (t2_2.c = t1_1.c)
   ->  Seq Scan on plt2_ad_p2 t2_2
   ->  Hash
 ->  Seq Scan on plt1_ad_p1 t1_1
   Filter: (a < 10)
(15 rows)

As you can see from the EXPLAIN result, the first partition on the
outer side matches the second partition on the inner side, and the
second partition on the outer side matches the first partition on the
inner side.  How does the algorithm you proposed work e.g., when an
N-way join for list partitioned tables contains this join as its lower
join?

> If we use this
> algorithm, we don't need all_partrels to be collected and also don't need to
> search base or join relation. That, I think, will reduce the time and space
> complexity of this logic. Am I missing some

Missing errcode() in ereport

2020-03-17 Thread Masahiko Sawada
Hi,

In PageIsVerified() we report a WARNING as follows:

ereport(WARNING,
(ERRCODE_DATA_CORRUPTED,
 errmsg("page verification failed, calculated checksum
%u but expected %u",
checksum, p->pd_checksum)));

However the error message won't have sql error code due to missing
errcode(). As far as I can see there are four places:

$ git grep "(ERRCODE" | grep -v errcode
contrib/adminpack/adminpack.c:
(ERRCODE_DUPLICATE_FILE,
contrib/adminpack/adminpack.c:  (ERRCODE_DUPLICATE_FILE,
contrib/adminpack/adminpack.c:
 (ERRCODE_UNDEFINED_FILE,
src/backend/storage/page/bufpage.c:
(ERRCODE_DATA_CORRUPTED,
src/pl/plpgsql/src/pl_exec.c:   else if
(ERRCODE_IS_CATEGORY(sqlerrstate) &&

Attached patch add errcode() to these places.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


add_errcode.patch
Description: Binary data


Re: Missing errcode() in ereport

2020-03-17 Thread Amit Kapila
On Tue, Mar 17, 2020 at 2:08 PM Masahiko Sawada
 wrote:
>
> Hi,
>
> In PageIsVerified() we report a WARNING as follows:
>
> ereport(WARNING,
> (ERRCODE_DATA_CORRUPTED,
>  errmsg("page verification failed, calculated checksum
> %u but expected %u",
> checksum, p->pd_checksum)));
>
> However the error message won't have sql error code due to missing
> errcode(). As far as I can see there are four places:
>
> $ git grep "(ERRCODE" | grep -v errcode
> contrib/adminpack/adminpack.c:
> (ERRCODE_DUPLICATE_FILE,
> contrib/adminpack/adminpack.c:  
> (ERRCODE_DUPLICATE_FILE,
> contrib/adminpack/adminpack.c:
>  (ERRCODE_UNDEFINED_FILE,
> src/backend/storage/page/bufpage.c:
> (ERRCODE_DATA_CORRUPTED,
> src/pl/plpgsql/src/pl_exec.c:   else if
> (ERRCODE_IS_CATEGORY(sqlerrstate) &&
>
> Attached patch add errcode() to these places.
>

+1.  This looks like an oversight to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Missing errcode() in ereport

2020-03-17 Thread Julien Rouhaud
On Tue, Mar 17, 2020 at 10:00 AM Amit Kapila  wrote:
>
> On Tue, Mar 17, 2020 at 2:08 PM Masahiko Sawada
>  wrote:
> >
> > Hi,
> >
> > In PageIsVerified() we report a WARNING as follows:
> >
> > ereport(WARNING,
> > (ERRCODE_DATA_CORRUPTED,
> >  errmsg("page verification failed, calculated checksum
> > %u but expected %u",
> > checksum, p->pd_checksum)));
> >
> > However the error message won't have sql error code due to missing
> > errcode(). As far as I can see there are four places:
> >
> > $ git grep "(ERRCODE" | grep -v errcode
> > contrib/adminpack/adminpack.c:
> > (ERRCODE_DUPLICATE_FILE,
> > contrib/adminpack/adminpack.c:  
> > (ERRCODE_DUPLICATE_FILE,
> > contrib/adminpack/adminpack.c:
> >  (ERRCODE_UNDEFINED_FILE,
> > src/backend/storage/page/bufpage.c:
> > (ERRCODE_DATA_CORRUPTED,
> > src/pl/plpgsql/src/pl_exec.c:   else if
> > (ERRCODE_IS_CATEGORY(sqlerrstate) &&
> >
> > Attached patch add errcode() to these places.
> >
>
> +1.  This looks like an oversight to me.

good catch!  And patch LGTM.




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-03-17 Thread Fabien COELHO


About v13, seens as one patch:

Function "pg_ls_dir_metadata" documentation suggests a variable number of 
arguments with brackets, but parameters are really mandatory.


 postgres=# SELECT pg_ls_dir_metadata('.');
 ERROR:  function pg_ls_dir_metadata(unknown) does not exist
 LINE 1: SELECT pg_ls_dir_metadata('.');
   ^
 HINT:  No function matches the given name and argument types. You might need 
to add explicit type casts.
 postgres=# SELECT pg_ls_dir_metadata('.', true, true);
 …

The example in the documentation could be better indented. Also, ISTM that 
there are two implicit laterals (format & pg_ls_dir_recurse) that I would 
make explicit. I'd use the pcs alias explicitely. I'd use meaningful 
aliases (eg ts instead of b, …).


On reflection, I think that a boolean "isdir" column is a bad idea because 
it is not extensible. I'd propose to switch to the standard "ls" approach 
of providing the type as one character: '-' for regular, 'd' for 
directory, 'l' for link, 's' for socket, 'c' for character special…


ISTM that "lstat" is not available on windows, which suggests to call 
"stat" always, and then "lstat" on un*x and pg ports stuff on win.


I'm wondering about the restriction on directories only. Why should it not 
work on a file? Can it be easily extended to work on a simple file? If so, 
it could be just "pg_ls".


--
Fabien.

Re: Missing errcode() in ereport

2020-03-17 Thread Michael Paquier
On Tue, Mar 17, 2020 at 10:26:57AM +0100, Julien Rouhaud wrote:
> On Tue, Mar 17, 2020 at 10:00 AM Amit Kapila  wrote:
>> +1.  This looks like an oversight to me.
> 
> good catch!  And patch LGTM.

Definitely an oversight.  All stable branches down to 9.5 have
mistakes in the same area, with nothing extra by grepping around.
Amit, I guess that you will take care of it?
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] Generic type subscripting

2020-03-17 Thread Pavel Stehule
Hi


> Hi
>
> please rebase this patch
>

here is a attached fixed first patch

v30-0001-Base-implementation-of-subscripting-mechanism.patch

My objectives are fixed. I checked this patch and

There are not problems with build (code, documentation)
All tests passed
The code is well documented

I like the functionality introduced by this patch. It opens a door for easy
work with json, jsonb, xml, ... and lot of other types with array access
syntax.

This is first step, but necessary steps. A write operations are not
supported by PL/pgSQL. But plpgsql developers still has some benefits. It
is working for read operations (in plpgsql).

I'll mark this patch as ready for commiters

Thank you for your work.

Regards

Pavel





> Regards
>
> Pavel
>
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 20dc8c605b..728f781620 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2597,6 +2597,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
 JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr);
 JumbleExpr(jstate, (Node *) sbsref->refexpr);
 JumbleExpr(jstate, (Node *) sbsref->refassgnexpr);
+APP_JUMB(sbsref->refnestedfunc);
 			}
 			break;
 		case T_FuncExpr:
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9d9e915979..db746c99b6 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1058,7 +1058,8 @@ AddNewRelationType(const char *typeName,
    -1,			/* typmod */
    0,			/* array dimensions for typBaseType */
    false,		/* Type NOT NULL */
-   InvalidOid); /* rowtypes never have a collation */
+   InvalidOid,  /* rowtypes never have a collation */
+   InvalidOid);	/* typsubshandler - none */
 }
 
 /* 
@@ -1339,7 +1340,8 @@ heap_create_with_catalog(const char *relname,
    -1,			/* typmod */
    0,			/* array dimensions for typBaseType */
    false,		/* Type NOT NULL */
-   InvalidOid); /* rowtypes never have a collation */
+   InvalidOid,  /* rowtypes never have a collation */
+   0);	/* array implementation */
 
 		pfree(relarrayname);
 	}
diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
index cd56714968..9eb43ec3d6 100644
--- a/src/backend/catalog/pg_type.c
+++ b/src/backend/catalog/pg_type.c
@@ -118,6 +118,7 @@ TypeShellMake(const char *typeName, Oid typeNamespace, Oid ownerId)
 	values[Anum_pg_type_typtypmod - 1] = Int32GetDatum(-1);
 	values[Anum_pg_type_typndims - 1] = Int32GetDatum(0);
 	values[Anum_pg_type_typcollation - 1] = ObjectIdGetDatum(InvalidOid);
+	values[Anum_pg_type_typsubshandler - 1] = ObjectIdGetDatum(InvalidOid);
 	nulls[Anum_pg_type_typdefaultbin - 1] = true;
 	nulls[Anum_pg_type_typdefault - 1] = true;
 	nulls[Anum_pg_type_typacl - 1] = true;
@@ -158,10 +159,10 @@ TypeShellMake(const char *typeName, Oid typeNamespace, Oid ownerId)
 		GenerateTypeDependencies(tup,
  pg_type_desc,
  NULL,
- NULL,
  0,
  false,
  false,
+ InvalidOid,
  false);
 
 	/* Post creation hook for new shell type */
@@ -219,7 +220,8 @@ TypeCreate(Oid newTypeOid,
 		   int32 typeMod,
 		   int32 typNDims,		/* Array dimensions for baseType */
 		   bool typeNotNull,
-		   Oid typeCollation)
+		   Oid typeCollation,
+		   Oid subscriptingHandlerProcedure)
 {
 	Relation	pg_type_desc;
 	Oid			typeObjectId;
@@ -372,6 +374,7 @@ TypeCreate(Oid newTypeOid,
 	values[Anum_pg_type_typtypmod - 1] = Int32GetDatum(typeMod);
 	values[Anum_pg_type_typndims - 1] = Int32GetDatum(typNDims);
 	values[Anum_pg_type_typcollation - 1] = ObjectIdGetDatum(typeCollation);
+	values[Anum_pg_type_typsubshandler - 1] = ObjectIdGetDatum(subscriptingHandlerProcedure);
 
 	/*
 	 * initialize the default binary value for this type.  Check for nulls of
@@ -720,6 +723,14 @@ GenerateTypeDependencies(HeapTuple typeTuple,
 	/* Normal dependency on the default expression. */
 	if (defaultExpr)
 		recordDependencyOnExpr(&myself, defaultExpr, NIL, DEPENDENCY_NORMAL);
+
+	if (OidIsValid(typeForm->typsubshandler))
+	{
+		referenced.classId = ProcedureRelationId;
+		referenced.objectId = typeForm->typsubshandler;
+		referenced.objectSubId = 0;
+		recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+	}
 }
 
 /*
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 8891b1d564..e925389297 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -115,6 +115,7 @@ static Oid	findTypeSendFunction(List *procname, Oid typeOid);
 static Oid	findTypeTypmodinFunction(List *procname);
 static Oid	findTypeTypmodoutFunction(List *procname);
 static Oid	findTypeAnalyzeFunction(List *procname, Oid typeOid);
+static Oid	findTypeSubscriptingFunction(List *procname, Oid typeOid, bool parseFunc);
 static Oid	findRangeSubOpclass(List *opcname, Oid subtyp

Re: Missing errcode() in ereport

2020-03-17 Thread Amit Kapila
On Tue, Mar 17, 2020 at 3:28 PM Michael Paquier  wrote:
>
> On Tue, Mar 17, 2020 at 10:26:57AM +0100, Julien Rouhaud wrote:
> > On Tue, Mar 17, 2020 at 10:00 AM Amit Kapila  
> > wrote:
> >> +1.  This looks like an oversight to me.
> >
> > good catch!  And patch LGTM.
>
> Definitely an oversight.  All stable branches down to 9.5 have
> mistakes in the same area, with nothing extra by grepping around.
> Amit, I guess that you will take care of it?
>

Yes, I will unless I see any objections in a day or so.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] [PATCH] Generic type subscripting

2020-03-17 Thread Dmitry Dolgov
> On Tue, Mar 17, 2020 at 11:03:22AM +0100, Pavel Stehule wrote:
>
> here is a attached fixed first patch
>
> v30-0001-Base-implementation-of-subscripting-mechanism.patch
>
> My objectives are fixed. I checked this patch and
>
> There are not problems with build (code, documentation)
> All tests passed
> The code is well documented
>
> I like the functionality introduced by this patch. It opens a door for easy
> work with json, jsonb, xml, ... and lot of other types with array access
> syntax.
>
> This is first step, but necessary steps. A write operations are not
> supported by PL/pgSQL. But plpgsql developers still has some benefits. It
> is working for read operations (in plpgsql).
>
> I'll mark this patch as ready for commiters
>
> Thank you for your work.

Thanks a lot, Pavel!




Re: Improving connection scalability: GetSnapshotData()

2020-03-17 Thread David Rowley
Hi,

Nice performance gains.

On Sun, 1 Mar 2020 at 21:36, Andres Freund  wrote:
> The series currently consists out of:
>
> 0001-0005: Fixes and assert improvements that are independent of the patch, 
> but
>are hit by the new code (see also separate thread).
>
> 0006: Move delayChkpt from PGXACT to PGPROC it's rarely checked & frequently 
> modified
>
> 0007: WIP: Introduce abstraction layer for "is tuple invisible" tests.
>
>   This is the most crucial piece. Instead of code directly using
>   RecentOldestXmin, there's a new set of functions for testing
>   whether an xid is visible (InvisibleToEveryoneTestXid() et al).
>
>   Those function use new horizon boundaries computed as part of
>   GetSnapshotData(), and recompute an accurate boundary when the
>   tested xid falls inbetween.
>
>   There's a bit more infrastructure needed - we need to limit how
>   often an accurate snapshot is computed. Probably to once per
>   snapshot? Or once per transaction?
>
>
>   To avoid issues with the lower boundary getting too old and
>   presenting a wraparound danger, I made all the xids be
>   FullTransactionIds. That imo is a good thing?
>
>
>   This patch currently breaks old_snapshot_threshold, as I've not
>   yet integrated it with the new functions. I think we can make the
>   old snapshot stuff a lot more precise with this - instead of
>   always triggering conflicts when a RecentGlobalXmin is too old, we
>   can do so only in the cases we actually remove a row. I ran out of
>   energy threading that through the heap_page_prune and
>   HeapTupleSatisfiesVacuum.
>
>
> 0008: Move PGXACT->xmin back to PGPROC.
>
>   Now that GetSnapshotData() doesn't access xmin anymore, we can
>   make it a normal field in PGPROC again.
>
>
> 0009: Improve GetSnapshotData() performance by avoiding indirection for xid 
> access.

I've only looked at 0001-0009 so far. I'm not quite the expert in this
area, so the review feels a bit superficial.  Here's what I noted down
during my pass.

0001

1. cant't -> can't

* snapshot cant't change in the midst of a relcache build, so there's no

0002

2. I don't quite understand your change in
UpdateSubscriptionRelState(). snap seems unused. Drilling down into
SearchSysCacheCopy2, in SearchCatCacheMiss() the systable_beginscan()
passes a NULL snapshot.

the whole patch does this. I guess I don't understand why 0002 does this.

0004

3. This comment seems to have the line order swapped in bt_check_every_level

/*
* RecentGlobalXmin/B-Tree page deletion.
* This assertion matches the one in index_getnext_tid().  See note on
*/
Assert(SnapshotSet());

0006

4. Did you consider the location of 'delayChkpt' in PGPROC. Couldn't
you slot it in somewhere it would fit in existing padding?

0007

5. GinPageIsRecyclable() has no comments at all. I know that
ginvacuum.c is not exactly the modal citizen for function header
comments, but likely this patch is no good reason to continue the
trend.

6. The comment rearrangement in bt_check_every_level should be in the
0004 patch.

7. struct InvisibleToEveryoneState could do with some comments
explaining the fields.

8. The header comment in GetOldestXminInt needs to be updated. It
talks about "if rel = NULL and there are no transactions", but there's
no parameter by that name now. Maybe the whole comment should be moved
down to the external implementation of the function

9. I get the idea you don't intend to keep the debug message in
InvisibleToEveryoneTestFullXid(), but if you do, then shouldn't it be
using UINT64_FORMAT?

10. teh -> the

* which is based on teh value computed when getting the current snapshot.

11. InvisibleToEveryoneCheckXid and InvisibleToEveryoneCheckFullXid
seem to have their extern modifiers in the .c file.

0009

12. iare -> are

* These iare separate from the main PGPROC array so that the most heavily

13. is -> are

* accessed data is stored contiguously in memory in as few cache lines as

14. It doesn't seem to quite make sense to talk about "this proc" in:

/*
* TransactionId of top-level transaction currently being executed by this
* proc, if running and XID is assigned; else InvalidTransactionId.
*
* Each PGPROC has a copy of its value in PGPROC.xidCopy.
*/
TransactionId *xids;

maybe "this" can be replaced with "each"

I will try to continue with the remaining patches soon. However, it
would be good to get a more complete patchset. I feel there are quite
a few XXX comments remaining for things you need to think about later,
and ... it's getting late.




Assert() failures during RI checks

2020-03-17 Thread Antonin Houska
I was trying to figure out what exactly the "crosscheck snapshot" does in the
RI checks, and hit some assertion failures:

postgres=# create table p(i int primary key);
CREATE TABLE
postgres=# create table f (i int references p on delete cascade on update 
cascade deferrable initially deferred);
CREATE TABLE
postgres=# insert into p values (1);
INSERT 0 1
postgres=# begin isolation level repeatable read;
BEGIN
postgres=*# table f;
 i 
---
(0 rows)


In another session:

postgres=# insert into f values (1);
INSERT 0 1

Back in the first session:

postgres=*# delete from p where i=1;
TRAP: FailedAssertion("!(tp.t_data->t_infomask & HEAP_XMAX_INVALID)", File: 
"heapam.c", Line: 2652)

I'm not familiar enough with this code but I wonder if it's only about
incorrect assertions. When I commented out some, I got error message that
makes sense to me:

postgres=*# delete from p where i=1;
2020-03-17 11:59:19.214 CET [89379] ERROR:  could not serialize access due to 
concurrent update
2020-03-17 11:59:19.214 CET [89379] CONTEXT:  SQL statement "DELETE FROM ONLY 
"public"."f" WHERE $1 OPERATOR(pg_catalog.=) "i""
2020-03-17 11:59:19.214 CET [89379] STATEMENT:  delete from p where i=1;
ERROR:  could not serialize access due to concurrent update
CONTEXT:  SQL statement "DELETE FROM ONLY "public"."f" WHERE $1 
OPERATOR(pg_catalog.=) "i""


Similarly, if the test ends with an UPDATE statement, I get this failure:

postgres=*# update p set i=i+1 where i=1;
TRAP: FailedAssertion("!ItemPointerEquals(&oldtup.t_self, 
&oldtup.t_data->t_ctid)", File: "heapam.c", Line: 3275)

Likewise, with the Assert() statements commented out, the right thing seems to
happen:

2020-03-17 11:57:04.810 CET [88678] ERROR:  could not serialize access due to 
concurrent update
2020-03-17 11:57:04.810 CET [88678] CONTEXT:  SQL statement "UPDATE ONLY 
"public"."f" SET "i" = $1 WHERE $2 OPERATOR(pg_catalog.=) "i""
2020-03-17 11:57:04.810 CET [88678] STATEMENT:  update p set i=i+1 where i=1;
ERROR:  could not serialize access due to concurrent update
CONTEXT:  SQL statement "UPDATE ONLY "public"."f" SET "i" = $1 WHERE $2 
OPERATOR(pg_catalog.=) "i""

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-17 Thread Amit Kapila
On Mon, Mar 16, 2020 at 3:24 PM Dilip Kumar  wrote:
>

+
+ /*
+ * Indicate that the lock is released for certain types of locks
+ */
+#ifdef USE_ASSERT_CHECKING
+ CheckAndSetLockHeld(locallock, false);
+#endif
 }

 /*
@@ -1618,6 +1666,11 @@ GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner)
  locallock->numLockOwners++;
  if (owner != NULL)
  ResourceOwnerRememberLock(owner, locallock);
+
+ /* Indicate that the lock is acquired for certain types of locks. */
+#ifdef USE_ASSERT_CHECKING
+ CheckAndSetLockHeld(locallock, true);
+#endif
 }

There is no need to sprinkle USE_ASSERT_CHECKING at so many places,
having inside the new function is sufficient.  I have changed that,
added few more comments and
made minor changes.  See, what you think about attached?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v12-0001-Assert-that-we-don-t-acquire-a-heavyweight-lock-on-a.patch
Description: Binary data


v12-0002-Add-assert-to-ensure-that-page-locks-don-t-participa.patch
Description: Binary data


v12-0003-Allow-relation-extension-lock-to-conflict-among-para.patch
Description: Binary data


v12-0004-Allow-page-lock-to-conflict-among-parallel-group-mem.patch
Description: Binary data


Re: plan cache overhead on plpgsql expression

2020-03-17 Thread David Steele

Hi Amit,

On 2/25/20 3:42 AM, Amit Langote wrote:

On Tue, Feb 25, 2020 at 4:16 PM Pavel Stehule  wrote:

I added this patch to a commitfest

https://commitfest.postgresql.org/27/2467/

It is very interesting speedup and it is in good direction to JIT expressions


Thank you.  I was planning to do that myself.

I will take a look at your other comments in a day or two.


Do you know when you'll have chance to look at Pavel's comments?

Regards,
--
-David
da...@pgmasters.net




Re: Commitfest 2020-03 Now in Progress

2020-03-17 Thread David Steele

On 3/1/20 4:10 PM, David Steele wrote:

The last Commitfest for v13 is now in progress!

Current stats for the Commitfest are:

Needs review: 192
Waiting on Author: 19
Ready for Committer: 4
Total: 215


Halfway through, here's where we stand.  Note that a CF entry was added 
after the stats above were recorded so the total has increased.


Needs review: 151 (-41)
Waiting on Author: 20 (+1)
Ready for Committer: 9 (+5)
Committed: 25
Moved to next CF: 1
Withdrawn: 4
Returned with Feedback: 6
Total: 216

Regards,
--
-David
da...@pgmasters.net




Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-17 Thread Dean Rasheed
On Sat, 14 Mar 2020 at 18:45, Tomas Vondra  wrote:
>
> I realized there's one more thing that probably needs discussing.
> Essentially, these two clause types are the same:
>
>a IN (1, 2, 3)
>
>(a = 1 OR a = 2 OR a = 3)
>
> but with 8f321bd1 we only recognize the first one as compatible with
> functional dependencies. It was always the case that we estimated those
> two clauses a bit differently, but the differences were usually small.
> But now that we recognize IN as compatible with dependencies, the
> difference may be much larger, which bugs me a bit ...
>
> So I wonder if we should recognize the special form of an OR clause,
> with all Vars referencing to the same attribute etc. and treat this as
> supported by functional dependencies - the attached patch does that.
> MCV lists there's already no difference because OR clauses are
> supported.
>

Makes sense, and the patch looks straightforward enough.

> The question is whether we want to do this, and whether we should also
> teach the per-column estimates to recognize this special case of IN
> clause.

I'm not convinced about that second part though. I'd say that
recognising the OR clause for functional dependencies is sufficient to
prevent the large differences in estimates relative to the equivalent
IN clauses. The small differences between the way that OR and IN
clauses are handled have always been there, and I think that changing
that is out of scope for this work.

The other thing that I'm still concerned about is the possibility of
returning estimates with P(a,b) > P(a) or P(b). I think that such a
thing becomes much more likely with the new types of clause supported
here, because they now allow multiple values from each column, where
before we only allowed one. I took another look at the patch I posted
on the other thread, and I've convinced myself that it's correct.
Attached is an updated version, with some cosmetic tidying up and now
with some additional regression tests.

Regards,
Dean
diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
new file mode 100644
index 72dc1cd..e0cc2b9
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -30,6 +30,7 @@
 #include "utils/fmgroids.h"
 #include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
 #include "utils/syscache.h"
 #include "utils/typcache.h"
 
@@ -73,8 +74,6 @@ static double dependency_degree(int numr
 AttrNumber *dependency, VacAttrStats **stats, Bitmapset *attrs);
 static bool dependency_is_fully_matched(MVDependency *dependency,
 		Bitmapset *attnums);
-static bool dependency_implies_attribute(MVDependency *dependency,
-		 AttrNumber attnum);
 static bool dependency_is_compatible_clause(Node *clause, Index relid,
 			AttrNumber *attnum);
 static MVDependency *find_strongest_dependency(MVDependencies **dependencies,
@@ -614,19 +613,6 @@ dependency_is_fully_matched(MVDependency
 }
 
 /*
- * dependency_implies_attribute
- *		check that the attnum matches is implied by the functional dependency
- */
-static bool
-dependency_implies_attribute(MVDependency *dependency, AttrNumber attnum)
-{
-	if (attnum == dependency->attributes[dependency->nattributes - 1])
-		return true;
-
-	return false;
-}
-
-/*
  * statext_dependencies_load
  *		Load the functional dependencies for the indicated pg_statistic_ext tuple
  */
@@ -966,7 +952,7 @@ find_strongest_dependency(MVDependencies
  * between them, i.e. either (a=>b) or (b=>a). Assuming (a=>b) is the selected
  * dependency, we then combine the per-clause selectivities using the formula
  *
- *	   P(a,b) = P(a) * [f + (1-f)*P(b)]
+ *	   P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)
  *
  * where 'f' is the degree of the dependency.
  *
@@ -974,7 +960,7 @@ find_strongest_dependency(MVDependencies
  * recursively, starting with the widest/strongest dependencies. For example
  * P(a,b,c) is first split like this:
  *
- *	   P(a,b,c) = P(a,b) * [f + (1-f)*P(c)]
+ *	   P(a,b,c) = f * Min(P(a,b), P(c)) + (1-f) * P(a,b) * P(c)
  *
  * assuming (a,b=>c) is the strongest dependency.
  */
@@ -987,20 +973,27 @@ dependencies_clauselist_selectivity(Plan
 	RelOptInfo *rel,
 	Bitmapset **estimatedclauses)
 {
-	Selectivity s1 = 1.0;
+	Selectivity s1;
 	ListCell   *l;
-	Bitmapset  *clauses_attnums = NULL;
-	Bitmapset **list_attnums;
+	Bitmapset  *clauses_attnums;
+	AttrNumber *list_attnums;
 	int			listidx;
-	MVDependencies**dependencies = NULL;
-	int	ndependencies = 0;
+	MVDependencies **dependencies;
+	int			ndependencies;
+	int			total_ndeps;
+	MVDependency **dependencies_used;
+	int			ndependencies_used;
+	Bitmapset  *attrs_used;
+	int			nattrs_used;
+	Selectivity *attr_sel;
 	int			i;
+	int			attidx;
 
 	/* check if there's any stats that might be useful for us. */
 	if (!has_stats_of_kind(rel->statlist, STATS_EXT_DEPENDENCIES))
 		return 1.0;
 
-	list_attnums = (Bitmapset **) palloc(sizeof(B

Re: [HACKERS] make async slave to wait for lsn to be replayed

2020-03-17 Thread Kartyshov Ivan

I made some improvements over old implementation WAIT FOR.

Synopsis
==
WAIT FOR [ANY | SOME | ALL] event [, event ...]
and event is:
LSN value options
TIMESTAMP value

and options is:
TIMEOUT delay
UNTIL TIMESTAMP timestamp

ALL - option used by default.

P.S. Now I testing BEGIN base WAIT prototype as discussed earlier.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8d91f3529e..8697f9807f 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -187,6 +187,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/wait.sgml b/doc/src/sgml/ref/wait.sgml
new file mode 100644
index 00..9a79524779
--- /dev/null
+++ b/doc/src/sgml/ref/wait.sgml
@@ -0,0 +1,138 @@
+
+
+
+ 
+  WAIT FOR
+ 
+
+ 
+  WAIT FOR
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAIT FOR
+  wait for the target LSN to be replayed
+ 
+
+ 
+
+WAIT FOR LSN 'lsn_number'
+WAIT FOR LSN 'lsn_number' TIMEOUT wait_timeout
+WAIT FOR LSN 'lsn_number' UNTIL TIMESTAMP wait_time
+WAIT FOR TIMESTAMP wait_time
+WAIT FOR ALL LSN 'lsn_number' TIMEOUT wait_timeout, TIMESTAMP wait_time
+WAIT FOR ANY LSN 'lsn_number', TIMESTAMP wait_time
+
+ 
+
+ 
+  Description
+
+  
+   WAIT FOR provides a simple
+   interprocess communication mechanism to wait for timestamp or the target log sequence
+   number (LSN) on standby in PostgreSQL
+   databases with master-standby asynchronous replication. When run with the
+   LSN option, the WAIT FOR command
+   waits for the specified LSN to be replayed. By default, wait
+   time is unlimited. Waiting can be interrupted using Ctrl+C, or
+   by shutting down the postgres server. You can also limit the wait
+   time using the TIMEOUT option.
+  
+
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+  Specify the target log sequence number to wait for.
+ 
+
+   
+
+   
+TIMEOUT wait_timeout
+
+ 
+  Limit the time interval to wait for the LSN to be replayed.
+  The specified wait_timeout must be an integer
+  and is measured in milliseconds.
+ 
+
+   
+
+   
+UNTIL TIMESTAMP wait_time
+
+ 
+  Limit the time to wait for the LSN to be replayed.
+  The specified wait_time must be timestamp.
+ 
+
+   
+
+  
+ 
+
+ 
+  Examples
+
+  
+   Run WAIT FOR from psql,
+   limiting wait time to 1 milliseconds:
+
+
+WAIT FOR LSN '0/3F07A6B1' TIMEOUT 1;
+NOTICE:  LSN is not reached. Try to increase wait time.
+LSN reached
+-
+ f
+(1 row)
+
+  
+
+  
+   Wait until the specified LSN is replayed:
+
+WAIT FOR LSN '0/3F07A611';
+LSN reached
+-
+ t
+(1 row)
+
+  
+
+  
+   Limit LSN wait time to 50 milliseconds, and then cancel the command:
+
+WAIT FOR LSN '0/3F0FF791' TIMEOUT 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to increase wait time.
+ERROR:  canceling statement due to user request
+ LSN reached
+-
+ f
+(1 row)
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAIT FOR statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b..588e96aa14 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -215,6 +215,7 @@
&update;
&vacuum;
&values;
+   &wait;
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4361568882..f7f5a76216 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -41,6 +41,7 @@
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
 #include "commands/tablespace.h"
+#include "commands/wait.h"
 #include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -7285,6 +7286,15 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * If lastReplayedEndRecPtr was updated,
+ * set latches in SHMEM array.
+ */
+if (XLogCtl->lastReplayedEndRecPtr >= GetMinWait())
+{
+	WaitSetLatch(XLogCtl->lastReplayedEndRecPtr);
+}
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3ce6..9b310926c1 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	user.o \
 	vacuum.o \
 	variable.o \
-	view.o
+	view.o \
+	wait.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/commands/wait.c b/src/backend/commands/wait.c
new file mode 100644
index 00..17a201b31c
--- /dev/null
+++ b/src/backend/commands/wait.c
@@ -0,0 +1,308 @@
+/*-
+ *
+ * wait.c
+ *	  Implements WAIT - a utility command that allow

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-17 Thread Dilip Kumar
On Tue, Mar 17, 2020 at 5:14 PM Amit Kapila  wrote:
>
> On Mon, Mar 16, 2020 at 3:24 PM Dilip Kumar  wrote:
> >
>
> +
> + /*
> + * Indicate that the lock is released for certain types of locks
> + */
> +#ifdef USE_ASSERT_CHECKING
> + CheckAndSetLockHeld(locallock, false);
> +#endif
>  }
>
>  /*
> @@ -1618,6 +1666,11 @@ GrantLockLocal(LOCALLOCK *locallock, ResourceOwner 
> owner)
>   locallock->numLockOwners++;
>   if (owner != NULL)
>   ResourceOwnerRememberLock(owner, locallock);
> +
> + /* Indicate that the lock is acquired for certain types of locks. */
> +#ifdef USE_ASSERT_CHECKING
> + CheckAndSetLockHeld(locallock, true);
> +#endif
>  }
>
> There is no need to sprinkle USE_ASSERT_CHECKING at so many places,
> having inside the new function is sufficient.  I have changed that,
> added few more comments and
> made minor changes.  See, what you think about attached?

Your changes look fine to me.  I have also verified all the test and
everything works fine.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: plan cache overhead on plpgsql expression

2020-03-17 Thread Amit Langote
Hi David,

On Tue, Mar 17, 2020 at 8:53 PM David Steele  wrote:
>
> Hi Amit,
>
> On 2/25/20 3:42 AM, Amit Langote wrote:
> > On Tue, Feb 25, 2020 at 4:16 PM Pavel Stehule  
> > wrote:
> >> I added this patch to a commitfest
> >>
> >> https://commitfest.postgresql.org/27/2467/
> >>
> >> It is very interesting speedup and it is in good direction to JIT 
> >> expressions
> >
> > Thank you.  I was planning to do that myself.
> >
> > I will take a look at your other comments in a day or two.
>
> Do you know when you'll have chance to look at Pavel's comments?

Sorry, I had forgotten about this. I will try to post an update by Thursday.

-- 
Thank you,
Amit




How to install https://github.com/sraoss/pgsql-ivm on postgress

2020-03-17 Thread ankurthakkar
How to install https://github.com/sraoss/pgsql-ivm on postgress running on
Ubuntu and aws postgress



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: custom postgres launcher for tests

2020-03-17 Thread Ivan N. Taranov
> If we support a wrapper we should support it for all pg_ctl usage IMO.

As i understand it  - you propose to patch pg_ctl.c & regress.c instead of
PostgresNode.pm & regress.c?

This is a deeper invasion to  pg_ctl. There will be a conflict between the
environment variable and the pg_ctl -p parameter, and possibly induced
bugs.

I suggest microscopic changes for iterative  recompilation/test/debug
without installation.

> For the sake of others with similar needs I attach my current
> wrapper/launcher script. To use it you have to:

IMHO, the method what you proposed (wrapper/launcher) - is more suitable for
complex testing.

I agree that the my proposed way  is incomplete.




Re: custom postgres launcher for tests

2020-03-17 Thread Ivan N. Taranov
> If we support a wrapper we should support it for all pg_ctl usage IMO.

As i understand it  - you propose to patch pg_ctl.c & regress.c instead of
PostgresNode.pm & regress.c?

This is a deeper invasion to  pg_ctl. There will be a conflict between the
environment variable and the pg_ctl -p parameter, and possibly induced
bugs.

I suggest microscopic changes for iterative  recompilation/test/debug
without installation.

> For the sake of others with similar needs I attach my current
> wrapper/launcher script. To use it you have to:

IMHO, the method what you proposed (wrapper/launcher) - is more suitable for
complex testing.

I agree that the my proposed way  is incomplete.




Re: Missing errcode() in ereport

2020-03-17 Thread Tom Lane
Amit Kapila  writes:
> On Tue, Mar 17, 2020 at 3:28 PM Michael Paquier  wrote:
>> Definitely an oversight.  All stable branches down to 9.5 have
>> mistakes in the same area, with nothing extra by grepping around.
>> Amit, I guess that you will take care of it?

> Yes, I will unless I see any objections in a day or so.

No need to wait, it's a pretty obvious thinko.

We might want to spend some effort thinking how to find or prevent
additional bugs of the same ilk ... but correcting the ones already
found needn't wait on that.

regards, tom lane




Re: How to install https://github.com/sraoss/pgsql-ivm on postgress

2020-03-17 Thread Chapman Flack
On 3/16/20 5:51 PM, ankurthakkar wrote:
> How to install https://github.com/sraoss/pgsql-ivm on postgress running on
> Ubuntu and aws postgress

That project appears to include its own modified version of
PostgreSQL, so to use it you would simply build it. It isn't something
that can just be installed into an unmodified PostgreSQL version.

Regards,
-Chap




Re: Collation versioning

2020-03-17 Thread opolofdez
njhjo



Enviado desde mi Redmi 4AEl Julien Rouhaud , 16 mar. 2020 3:05 p. m. escribió:On Mon, Mar 16, 2020 at 04:57:38PM +0900, Michael Paquier wrote:
> On Thu, Mar 12, 2020 at 03:00:26PM +0100, Julien Rouhaud wrote:
> > And v15 due to conflict with b08dee24a5 (Add pg_dump support for ALTER obj
> > DEPENDS ON EXTENSION).
>
> I have looked at patches 0001~0003 in the set for now.


Thanks!


> In patch 0002, you have the following addition:
> @@ -103,9 +103,10 @@ ORDER BY 1, 2;
>   pg_class    | relacl    | aclitem[]
>   pg_class    | reloptions    | text[]
>   pg_class    | relpartbound  | pg_node_tree
> + pg_depend   | refobjversion | text
> This comes from a regression test doing a sanity check to look for
> catalogs which have a toastable column but no toast tables.  As an
> exception, it should be documented in the test's comment.  Actually,
> does it need to be an exception?  This does not depend on
> relation-level facilities so there should be no risk of recursive
> dependencies, though I have not looked in details at this part.


I totally missed that, and I agree that there's no need for an exception, so
fixed.


> +  
> +   The only current use of refobjversion is to
> +   record dependencies between indexes and collation versions.
> +  
> [...]
> + 
> +  refobjversion
> +  text
> +  
> +  
> +   An optional version for the referenced object; see text
> +  
> + 
> Couldn't you merge both paragraphs here?


Done.


> Regarding patch 0003, it would be nice to include some tests
> independent on the rest and making use of the new functions.  These
> normally go in regproc.sql.  For example with a collation that needs
> double quotes as this is not obvious:
> =# select regcollation('"POSIX"');
> regcollation
>  --
>  "POSIX"
> (1 row)
>
> On top of that, this needs tests with to_regcollation() and tests with
> schema-qualified collations.


Done too, using the same collation name, for both with and without schema
qualification.


> Documentation for to_regcollation() is missing.  And it looks that
> many parts of the documentation are missing an update.  One example in
> datatype.sgml:
> Type oid represents an object identifier.  There are also
> several alias types for oid: regproc,
> regprocedure, regoper, regoperator,
> regclass, regtype, regrole,
> regnamespace, regconfig, and
> regdictionary.   shows an
> overview.
> At quick glance, there are more sections in need of a refresh..


Indeed.  I found missing reference in datatype.sgml; func.sgml and
pgupgrade.sgml.

v16 attached.



Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby
On Tue, Mar 17, 2020 at 01:14:02AM +0100, Laurenz Albe wrote:
> lazy_check_needs_freeze() is only called for an aggressive vacuum, which
> this isn't.

> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -1388,17 +1388,26 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, 
> LVRelStats *vacrelstats,
>   else
>   {
>   booltuple_totally_frozen;
> + boolfreeze_all;
>  
>   num_tuples += 1;
>   hastup = true;
>  
> + /*
> +  * If any tuple was already frozen in the block 
> and this is
> +  * an insert-only vacuum, we might as well 
> freeze all other
> +  * tuples in that block.
> +  */
> + freeze_all = params->is_insert_only && 
> has_dead_tuples;
> +

You're checking if any (previously-scanned) tuple was *dead*, but I think you
need to check nfrozen>=0.

Also, this will fail to freeze tuples on a page which *could* be
oppotunistically-frozen, but *follow* the first tuple which *needs* to be
frozen.

I think Andres was thinking this would maybe be an optimization independent of
is_insert_only (?)

>   /*
>* Each non-removable tuple must be checked to 
> see if it needs
>* freezing.  Note we already have exclusive 
> buffer lock.
>*/
>   if (heap_prepare_freeze_tuple(tuple.t_data,
>   
>   relfrozenxid, relminmxid,
> - 
>   FreezeLimit, MultiXactCutoff,
> + 
>   freeze_all ? 0 : FreezeLimit,
> + 
>   freeze_all ? 0 : MultiXactCutoff,
>   
>   &frozen[nfrozen],
>   
>   &tuple_totally_frozen))

> + /* normal autovacuum shouldn't freeze aggressively */
> + *insert_only = false;

Aggressively is a bad choice of words.  In the context of vacuum, it usually
means "visit all pages, even those which are allvisible".

-- 
Justin




Re: WAL usage calculation patch

2020-03-17 Thread Julien Rouhaud
On Sun, Mar 15, 2020 at 09:52:18PM +0300, Kirill Bychik wrote:
> > > > On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik  
> > > > wrote:
> After extensive thinking and some code diving, I did not manage to
> come up with a sane idea on how to expose data about autovacuum WAL
> usage. Must be the flu.
>
> > > Anyways, I believe this change could be bigger than FPI. I propose to
> > > plan a separate patch for it, or even add it to the TODO after the
> > > core patch of wal usage is merged.
> >
> > Just in case, if the problem is a lack of time, I'd be happy to help
> > on that if needed.  Otherwise, I'll definitely not try to block any
> > progress for the feature as proposed.
>
> Please feel free to work on any extension of this patch idea. I lack
> both time and knowledge to do it all by myself.


I'm adding a 3rd patch on top of yours to expose the new WAL counters in
pg_stat_database, for vacuum and autovacuum.  I'm not really enthiusiastic with
this approach but I didn't find better, and maybe this will raise some better
ideas.  The only sure thing is that we're not going to add a bunch of new
fields in pg_stat_all_tables anyway.

We can also drop this 3rd patch entirely if no one's happy about it without
impacting the first two.


> > > Please expect a new patch version next week, with FPI counters added.
>
> Please find attached patch version 003, with FP writes and minor
> corrections. Hope i use attachment versioning as expected in this
> group :)


Thanks!


> Test had been reworked, and I believe it should be stable now, the
> part which checks WAL is written and there is a correlation between
> affected rows and WAL records. I still have no idea how to test
> full-page writes against regular updates, it seems very unstable.
> Please share ideas if any.


I just reviewed the patches, and it globally looks good to me.  The way to
detect full page images looks sensible, but I'm really not familiar with that
code so additional review would be useful.

I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't
used in the test.  Since I have to add all the patches to make the cfbot happy,
I slightly adapted the tests to reference the fp column too.  There was also a
minor issue in the documentation, as wal_records and wal_bytes were copy/pasted
twice while wal_write_fp_records wasn't documented, so I also changed it.

Let me know if you're ok with those changes.
>From 295f328e7ea9fd9207df789dd3db5bab458decf3 Mon Sep 17 00:00:00 2001
From: Kirill Bychik 
Date: Tue, 17 Mar 2020 14:41:50 +0100
Subject: [PATCH v4 1/3] Track WAL usage.

---
 src/backend/access/transam/xlog.c   |  8 
 src/backend/access/transam/xloginsert.c |  6 +++
 src/backend/executor/execParallel.c | 22 ++-
 src/backend/executor/instrument.c   | 51 ++---
 src/include/executor/execParallel.h |  1 +
 src/include/executor/instrument.h   | 16 +++-
 src/tools/pgindent/typedefs.list|  1 +
 7 files changed, 95 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index de2d4ee582..7cab00323d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "commands/progress.h"
 #include "commands/tablespace.h"
 #include "common/controldata_utils.h"
+#include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
@@ -1231,6 +1232,13 @@ XLogInsertRecord(XLogRecData *rdata,
ProcLastRecPtr = StartPos;
XactLastRecEnd = EndPos;
 
+   /* Provide WAL update data to the instrumentation */
+   if (inserted)
+   {
+   pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+   pgWalUsage.wal_records++;
+   }
+
return EndPos;
 }
 
diff --git a/src/backend/access/transam/xloginsert.c 
b/src/backend/access/transam/xloginsert.c
index 2fa0a7f667..1f71cc0a76 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
 #include "access/xloginsert.h"
 #include "catalog/pg_control.h"
 #include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "replication/origin.h"
@@ -635,6 +636,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 */
bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
 
+   /*
+* Report a full page image constructed for the WAL 
record
+*/
+   pgWalUsage.wal_fp_records++;
+
/*
 * Construct XLogRecData entries for the page content.
 */
diff --git a/src/backend/executor/execParallel.c 
b/src/backend/executor/execParallel.c
index a753d6efa0..017367878f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -62,6 +62,7 @@
 #

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-17 Thread Tomas Vondra

On Tue, Mar 17, 2020 at 12:42:52PM +, Dean Rasheed wrote:

On Sat, 14 Mar 2020 at 18:45, Tomas Vondra  wrote:


I realized there's one more thing that probably needs discussing.
Essentially, these two clause types are the same:

   a IN (1, 2, 3)

   (a = 1 OR a = 2 OR a = 3)

but with 8f321bd1 we only recognize the first one as compatible with
functional dependencies. It was always the case that we estimated those
two clauses a bit differently, but the differences were usually small.
But now that we recognize IN as compatible with dependencies, the
difference may be much larger, which bugs me a bit ...

So I wonder if we should recognize the special form of an OR clause,
with all Vars referencing to the same attribute etc. and treat this as
supported by functional dependencies - the attached patch does that.
MCV lists there's already no difference because OR clauses are
supported.



Makes sense, and the patch looks straightforward enough.


The question is whether we want to do this, and whether we should also
teach the per-column estimates to recognize this special case of IN
clause.


I'm not convinced about that second part though. I'd say that
recognising the OR clause for functional dependencies is sufficient to
prevent the large differences in estimates relative to the equivalent
IN clauses. The small differences between the way that OR and IN
clauses are handled have always been there, and I think that changing
that is out of scope for this work.



Not sure. I think the inconsistency between plan and extended stats may
be a bit surprising, but I agree that issue may be negligible.


The other thing that I'm still concerned about is the possibility of
returning estimates with P(a,b) > P(a) or P(b). I think that such a
thing becomes much more likely with the new types of clause supported
here, because they now allow multiple values from each column, where
before we only allowed one. I took another look at the patch I posted
on the other thread, and I've convinced myself that it's correct.
Attached is an updated version, with some cosmetic tidying up and now
with some additional regression tests.



Yeah, I agree that's something we need to fix. Do you plan to push the
fix, or do you want me to do it?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library

2020-03-17 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 16 Mar 2020, at 17:12, Tom Lane  wrote:
>> So I'm now leaning to "back-patch and make sure to mention this in
>> the next release notes".  Barring objections, I'll do that soon.

> None from me.

Done.  In the event, it only seemed practical to back-patch as far as
v10.  9.x didn't use pkg-config for anything, so our infrastructure
for it isn't there.

regards, tom lane




Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-17 Thread Dean Rasheed
On Tue, 17 Mar 2020 at 15:37, Tomas Vondra  wrote:
>
> On Tue, Mar 17, 2020 at 12:42:52PM +, Dean Rasheed wrote:
>
> >The other thing that I'm still concerned about is the possibility of
> >returning estimates with P(a,b) > P(a) or P(b). I think that such a
> >thing becomes much more likely with the new types of clause supported
> >here, because they now allow multiple values from each column, where
> >before we only allowed one. I took another look at the patch I posted
> >on the other thread, and I've convinced myself that it's correct.
> >Attached is an updated version, with some cosmetic tidying up and now
> >with some additional regression tests.
>
> Yeah, I agree that's something we need to fix. Do you plan to push the
> fix, or do you want me to do it?
>

I can push it. Have you had a chance to review it?

Regards,
Dean




Re: ALTER tbl rewrite loses CLUSTER ON index

2020-03-17 Thread Justin Pryzby
On Tue, Mar 17, 2020 at 02:33:32PM +0900, Michael Paquier wrote:
> > Yeah, in cluster(), mark_index_clustered().
> 
> Patch 0002 from Justin does that, I would keep this refactoring as
> HEAD-only material though, and I don't spot any other code paths in
> need of patching.
> 
> The commit message of patch 0001 is not what you wanted I guess.

That's what git-am does, and I didn't find any option to make it less
unreadable.  I guess I should just delete the email body it inserts.

|   The commit message is formed by the title taken from the "Subject: ", a
|   blank line and the body of the message up to where the patch begins. 
Excess
|   whitespace at the end of each line is automatically stripped.

-- 
Justin




Re: Collation versioning

2020-03-17 Thread Peter Eisentraut
Did we discuss the regcollation type?  In the current patch set, it's 
only used in two places in a new regression test, where it can easily be 
replaced by a join.  Do we need it?


I realize we've been adding new reg* types lately; I'm not sure what the 
current idea is on that.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Collation versioning

2020-03-17 Thread Christoph Berg
Re: Peter Eisentraut 2020-03-17 

> Did we discuss the regcollation type?  In the current patch set, it's only
> used in two places in a new regression test, where it can easily be replaced
> by a join.  Do we need it?
> 
> I realize we've been adding new reg* types lately; I'm not sure what the
> current idea is on that.

Not sure if that's the case there, but reg* typecasts are very handy
when used interactively in ad-hoc queries.

Christoph




Re: allow online change primary_conninfo

2020-03-17 Thread Sergei Kornilov
Hello

Sorry for late replies.

> Yes. In my opinion, patch 0002 should not change the GUC mode of
> wal_receiver_create_temp_slot as the discussion here is about
> primary_conninfo, even if both may share some logic regarding WAL
> receiver shutdown and its restart triggered by the startup process.

Ok, I removed related changes from main patch. Along with minor merge conflict.

> Patch 0001 has actually been presented on this thread first:
> https://www.postgresql.org/message-id/753391579708...@iva3-77ae5995f07f.qloud-c.yandex.net
> And there is an independent patch registered in this CF:
> https://commitfest.postgresql.org/27/2456/

Yep, 0001 is separate patch. I will post copy of this patch here to make cfbot 
works.  Main patch 0002 requires resetting of is_temp_slot in 
RequestXLogStreaming to works properly.

> Should we add patch 0001 as an open item for v13 as there is a risk of
> forgetting this issue?

I think yes.

Well, it seems better to move this patch to next commitfest?

regards, Sergeidiff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 98b033fc20..12362421d7 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -129,6 +129,7 @@ extern int	recoveryTargetAction;
 extern int	recovery_min_apply_delay;
 extern char *PrimaryConnInfo;
 extern char *PrimarySlotName;
+extern bool wal_receiver_create_temp_slot;
 
 /* indirectly set via GUC system */
 extern TransactionId recoveryTargetXid;
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index e08afc6548..cf3e43128c 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -23,7 +23,6 @@
 #include "utils/tuplestore.h"
 
 /* user-settable parameters */
-extern bool wal_receiver_create_temp_slot;
 extern int	wal_receiver_status_interval;
 extern int	wal_receiver_timeout;
 extern bool hot_standby_feedback;
@@ -321,7 +320,8 @@ extern void ShutdownWalRcv(void);
 extern bool WalRcvStreaming(void);
 extern bool WalRcvRunning(void);
 extern void RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr,
- const char *conninfo, const char *slotname);
+ const char *conninfo, const char *slotname,
+ bool create_temp_slot);
 extern XLogRecPtr GetWalRcvWriteRecPtr(XLogRecPtr *latestChunkStart, TimeLineID *receiveTLI);
 extern int	GetReplicationApplyDelay(void);
 extern int	GetReplicationTransferLatency(void);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3813eadfb4..e0f3ed5c2a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -283,6 +283,7 @@ bool		StandbyModeRequested = false;
 char	   *PrimaryConnInfo = NULL;
 char	   *PrimarySlotName = NULL;
 char	   *PromoteTriggerFile = NULL;
+bool		wal_receiver_create_temp_slot = false;
 
 /* are we currently in standby mode? */
 bool		StandbyMode = false;
@@ -11901,7 +11902,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		}
 		curFileTLI = tli;
 		RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
-			 PrimarySlotName);
+			 PrimarySlotName, wal_receiver_create_temp_slot);
 		receivedUpto = 0;
 	}
 
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 2ab15c3cbb..ff45482faa 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -15,6 +15,12 @@
  * WalRcv->receivedUpto variable in shared memory, to inform the startup
  * process of how far it can proceed with XLOG replay.
  *
+ * WAL receivers cannot load directly GUC parameters used when establishing
+ * their connection to the primary, and rely on parameter values passed down
+ * by the startup process when WAL streaming is requested.  This applies
+ * to for example the replication slot creation and the connection string to
+ * use for the connection with the primary.
+ *
  * If the primary server ends streaming, but doesn't disconnect, walreceiver
  * goes into "waiting" mode, and waits for the startup process to give new
  * instructions. The startup process will treat that the same as
@@ -73,7 +79,6 @@
 
 
 /* GUC variables */
-bool		wal_receiver_create_temp_slot;
 int			wal_receiver_status_interval;
 int			wal_receiver_timeout;
 bool		hot_standby_feedback;
@@ -348,42 +353,23 @@ WalReceiverMain(void)
 		WalRcvFetchTimeLineHistoryFiles(startpointTLI, primaryTLI);
 
 		/*
-		 * Create temporary replication slot if no slot name is configured or
-		 * the slot from the previous run was temporary, unless
-		 * wal_receiver_create_temp_slot is disabled.  We also need to handle
-		 * the case where the previous run used a temporary slot but
-		 * wal_receiver_create_temp_slot was changed in the meantime.  In that
-		 * case, we delete the old slot name in shared memory.  (This would
+		 * Create temporary replication slot if requested.  In that
+		 * case, we update slot name in shared memory.  (This would
 		

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-17 Thread Tomas Vondra

On Tue, Mar 17, 2020 at 04:14:26PM +, Dean Rasheed wrote:

On Tue, 17 Mar 2020 at 15:37, Tomas Vondra  wrote:


On Tue, Mar 17, 2020 at 12:42:52PM +, Dean Rasheed wrote:

>The other thing that I'm still concerned about is the possibility of
>returning estimates with P(a,b) > P(a) or P(b). I think that such a
>thing becomes much more likely with the new types of clause supported
>here, because they now allow multiple values from each column, where
>before we only allowed one. I took another look at the patch I posted
>on the other thread, and I've convinced myself that it's correct.
>Attached is an updated version, with some cosmetic tidying up and now
>with some additional regression tests.

Yeah, I agree that's something we need to fix. Do you plan to push the
fix, or do you want me to do it?



I can push it. Have you had a chance to review it?



Not yet, I'll take a look today.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: BEFORE ROW triggers for partitioned tables

2020-03-17 Thread Ashutosh Bapat
On Fri, 13 Mar 2020 at 21:55, Alvaro Herrera 
wrote:

> On 2020-Mar-11, Ashutosh Bapat wrote:
>
> > On Thu, Feb 27, 2020 at 10:22 PM Alvaro Herrera
> >  wrote:
>
> > > * The new function I added, ReportTriggerPartkeyChange(), contains one
> > > serious bug (namely: it doesn't map attribute numbers properly if
> > > partitions are differently defined).
> >
> > IIUC the code in your patch, it seems you are just looking at
> > partnatts. But partition key can contain expressions also which are
> > stored in partexprs. So, I think the code won't catch change in the
> > partition key values when it contains expression. Using
> > RelationGetPartitionQual() will fix this problem and also problem of
> > attribute remapping across the partition hierarchy.
>
> Oh, of course.
>
> In fact, I don't need to deal with PartitionQual directly; I can just
> use ExecPartitionCheck(), since in ExecBRInsertTriggers et al we already
> have all we need.  v2 attached.
>

Thanks.


>  insert into parted values (1, 1, 'uno uno v2');-- fail
>  ERROR:  moving row to another partition during a BEFORE trigger is not
> supported
>  DETAIL:  Before trigger "t", row was to be in partition
> "public.parted_1_1"
>
> Note that in this implementation I no longer know which column is the
> problematic one, but I suppose users have clue enough.  Wording
> suggestions welcome.
>

When we have expression as a partition key, there may not be one particular
column which causes the row to move out of partition. So, this should be
fine.
A slight wording suggestion below.

- * Complain if we find an unexpected trigger type.
- */
- if (!TRIGGER_FOR_AFTER(trigForm->tgtype))
- elog(ERROR, "unexpected trigger \"%s\" found",
- NameStr(trigForm->tgname));

!AFTER means INSTEAD OF and BEFORE. Do you intend to allow INSTEAD OF
triggers
as well?
- */
- if (stmt->timing != TRIGGER_TYPE_AFTER)

Same comment as the above?

+ /*
+ * After a tuple in a partition goes through a trigger, the user
+ * could have changed the partition key enough that the tuple
+ * no longer fits the partition.  Verify that.
+ */
+ if (trigger->tgisclone &&

Why do we want to restrict this check only for triggers which are cloned
from
the ancestors?

+ !ExecPartitionCheck(relinfo, slot, estate, false))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("moving row to another partition during a BEFORE trigger is not
supported"),
+ errdetail("Before trigger \"%s\", row was to be in partition \"%s.%s\"",

In the error message you removed above, we are mentioning BEFORE FOR EACH
ROW
trigger. Should we continue to use the same terminology?

I was wondering whether it would be good to check the partition constraint
only
once i.e. after all before row triggers have been executed. This would avoid
throwing an error in case multiple triggers together worked to keep the
tuple
in the same partition when individual trigger/s caused it to move out of
that
partition. But then we would loose the opportunity to point out the before
row
trigger which actually caused the row to move out of the partition. Anyway,
wanted to bring that for the discussion here.

@@ -307,7 +307,7 @@ CreatePartitionDirectory(MemoryContext mcxt)
  *
  * The purpose of this function is to ensure that we get the same
  * PartitionDesc for each relation every time we look it up.  In the
- * face of current DDL, different PartitionDescs may be constructed with
+ * face of concurrent DDL, different PartitionDescs may be constructed with

Thanks for catching it. Looks unrelated though.

+-- Before triggers and partitions

The test looks good. Should we add a test for partitioned table with
partition
key as expression?

The approach looks good to me.

-- 
Best Wishes,
Ashutosh


Re: Collation versioning

2020-03-17 Thread Julien Rouhaud
On Tue, Mar 17, 2020 at 05:31:47PM +0100, Christoph Berg wrote:
> Re: Peter Eisentraut 2020-03-17 
> 
> > Did we discuss the regcollation type?  In the current patch set, it's only
> > used in two places in a new regression test, where it can easily be replaced
> > by a join.  Do we need it?


I originally wrote it for a previous version of the patchset, to shorten the
pg_dump query, but that went out when I replaced the DDL command with native
functions instead.  It didn't seem to hurt to keep it, so I relied on it in the
regression tests.


> > I realize we've been adding new reg* types lately; I'm not sure what the
> > current idea is on that.
>
> Not sure if that's the case there, but reg* typecasts are very handy
> when used interactively in ad-hoc queries.


+1.  I'm obviously biased, but I find it quite useful when querying pg_depend,
which may become more frequent once we start generating warnings about possibly
corrupted indexes.




Re: allow online change primary_conninfo

2020-03-17 Thread Alvaro Herrera
On 2020-Mar-17, Sergei Kornilov wrote:

> Well, it seems better to move this patch to next commitfest?

What?  You want to make wal_receiver_create_temp_slot unchangeable and
default to off in pg13, and delay the patch that fixes those things to
pg14?  That makes no sense to me.  Please keep them both here so that we
can get things to a usable state.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: SQL/JSON: functions

2020-03-17 Thread Pavel Stehule
út 17. 3. 2020 v 1:55 odesílatel Nikita Glukhov 
napsal:

> Attached 44th version of the patches.
>
>
> On 12.03.2020 16:41, Pavel Stehule wrote:
>
>
> On 12.03.2020 00:09 Nikita Glukhov wrote:
>
>> Attached 43rd version of the patches.
>>
>> The previous patch #4 ("Add function formats") was removed.
>> Instead, introduced new executor node JsonCtorExpr which is used to wrap
>> SQL/JSON constructor function calls (FuncExpr, Aggref, WindowFunc).
>>
>> Also, JsonIsPredicate node began to be used as executor node.
>> This helped to remove unnecessary json[b]_is_valid() user functions.
>>
>>
> It looks very well - all tests passed, code looks well.
>
> Now, when there are special nodes, then the introduction of
> COERCE_INTERNAL_CAST is not necessary, and it is only my one and last
> objection again this patch's set.
>
> I have removed patch #3 with COERCE_INTERNAL_CAST too.
>
> Coercions in JsonValueExpr in JsonCtorExpr, which were previously hidden with
> COERCE_INTERNAL_CAST and which were outputted as RETURNING or FORMAT JSON
> clauses, now moved into separate expressions.
>
> I am looking on the code, and although the code is correct it doesn't look
well (consistently with other node types).

I think so JsonFormat and JsonReturning should be node types, not just
structs. If these types will be nodes, then you can simplify _readJsonExpr
and all node operations on this node.


>
>
> User functions json[b]_build_object_ext() and json[b]_build_array_ext() also
> can be easily removed.   But it seems harder to remove new aggregate functions
> json[b]_objectagg() and json[b]_agg_strict(), because they can't be called
> directly from JsonCtorExpr node.
>
>
I don't see reasons for another reduction now. Can be great if you can
finalize work what you plan for pg13.

+<->READ_ENUM_FIELD(on_error.btype, JsonBehaviorType);
+<->READ_NODE_FIELD(on_error.default_expr);
+<->READ_ENUM_FIELD(on_empty.btype, JsonBehaviorType);
+<->READ_NODE_FIELD(on_empty.default_expr);

JsonBehavior is node type. Then why you don't write just

READ_NODE_FIELD(on_error);
READ_NODE_FIELD(on_empty)

??

And maybe the code can be better if you don't use JsonPassing struct (or
use it only inside gram.y) and pass just List *names, List *values.

Nodes should to contains another nodes or scalar types. Using structs (that
are not nodes)  inside doesn't look consistently.



I found some not finished code in 0003 patch
+
+json_name_and_value:
+/* TODO
+<-><--><-->KEY c_expr VALUE_P json_value_expr %prec POSTFIXOP
+<-><--><--><-->{ $$ = makeJsonKeyValue($2, $4); }
+<-><--><-->|
+*/



> The support for json type in jsonpath also seems to be immature, so I will try
> to remove it in the next iteration.
>
> What do you think? This patch is little bit off topic, so if you don't
need it, then can be removed. Is there some dependency for "jsontable" ?

Regards

Pavel





> --
> Nikita Glukhov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: Adding missing object access hook invocations

2020-03-17 Thread Andres Freund
Hi,

On 2020-03-16 16:03:51 -0700, Mark Dilger wrote:
> While working on object access hooks, I noticed several locations
> where I would expect the hook to be invoked, but no actual invocation.
> I think this just barely qualifies as a bug.  It's debatable because
> whether it is a bug depends on the user's expectations and whether not
> invoking the hook in these cases is defensible.  Does anybody have any
> recollection of an intentional choice not to invoke in these
> locations?

I am strongly against treating this as a bug, which'd likely imply
backpatching. New hook invocations are a noticable behavioural change,
and very plausibly will break currently working extensions. That's fine
for a major version upgrade, but not for a minor one, unless there are
very good reasons.

Andres





Re: backend type in log_line_prefix?

2020-03-17 Thread Mike Palmiotto
> On 2020/03/15 19:32, Peter Eisentraut wrote:
> > On 2020-03-13 22:24, Peter Eisentraut wrote:
> >> On 2020-03-10 19:07, Alvaro Herrera wrote:
> >>> I like these patches; the first two are nice cleanup.
> >>>
> >>> My only gripe is that pgstat_get_backend_desc() is not really a pgstat
> >>> function; I think it should have a different name with a prototype in
> >>> miscadmin.h (next to the enum's new location, which I would put
> >>> someplace near the "pmod.h" comment rather than where you put it;
> >>> perhaps just above the AuxProcType definition), and implementation
> >>> probably in miscinit.c.
> >>
> >> I have committed the refactoring patches with adjustments along these
> >> lines.  The patch with the log_line_prefix and csvlog enhancements is
> >> still under discussion.
> >
> > I have committed that last one also, after some corrections.

Sorry for being late to this thread, but was wondering if anyone had
taken a look at the Process Centralization patchset that I submitted
to this CF:
https://www.postgresql.org/message-id/CAMN686HgTVRJBAw6hqFE4Lj8bgPLQqfp1c-%2BWBGUtEmg6wPVhg%40mail.gmail.com

There's quite a bit of that code that is in the same vein as the
MyBackendType changes proposed/merged in this thread.

I think we could reduce a large portion of redundant code (including
the pgstat_get_backend_desc code) while also
centralizing/standardizing process startup. A few useful features
(outside of code reduction) include the ability to identify backends
prior to their Main functions, cleaner control of SubPostmasterMain
logic (including implicit handling of shmem timing considerations).

If others think it's worthwhile, I will work on rebasing those changes
on the changes proposed/merged in this thread (re: MyBackendType).

Thanks,
-- 
Mike Palmiotto
https://crunchydata.com




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-03-17 Thread Justin Pryzby
On Tue, Mar 17, 2020 at 10:21:48AM +0100, Fabien COELHO wrote:
> 
> About v13, seens as one patch:
> 
> Function "pg_ls_dir_metadata" documentation suggests a variable number of
> arguments with brackets, but parameters are really mandatory.

Fixed, and added tests on 1 and 3 arg versions of both pg_ls_dir() and
pg_ls_dir_metadata().

It seems like the only way to make variable number of arguments is is with
multiple entries in pg_proc.dat, one for each "number of" arguments.  Is that
right ?

> The example in the documentation could be better indented. Also, ISTM that
> there are two implicit laterals (format & pg_ls_dir_recurse) that I would
> make explicit. I'd use the pcs alias explicitely. I'd use meaningful aliases
> (eg ts instead of b, …).

> On reflection, I think that a boolean "isdir" column is a bad idea because
> it is not extensible. I'd propose to switch to the standard "ls" approach of
> providing the type as one character: '-' for regular, 'd' for directory, 'l'
> for link, 's' for socket, 'c' for character special…

I think that's outside the scope of the patch, since I'd want to change
pg_stat_file; that's where I borrowed "isdir" from, for consistency.

Note that both LS_DIR_HISTORIC and LS_DIR_MODERN include LS_DIR_SKIP_SPECIAL,
so only pg_ls_dir itself show specials, so they way to do it would be to 1)
change pg_stat_file to expose the file's "type", 2) use pg_ls_dir() AS a,
lateral pg_stat_file(a) AS b, 3) then consider also changing LS_DIR_MODERN and
all the existing pg_ls_*.

> ISTM that "lstat" is not available on windows, which suggests to call "stat"
> always, and then "lstat" on un*x and pg ports stuff on win.

I believe that's handled here.
src/include/port/win32_port.h:#define lstat(path, sb) stat(path, sb)

> I'm wondering about the restriction on directories only. Why should it not
> work on a file? Can it be easily extended to work on a simple file? If so,
> it could be just "pg_ls".

I think that's a good idea, except it doesn't fit with what the code does:
AllocDir() and ReadDir().  Instead, use pg_stat_file() for that.

Hm, I realized that the existing pg_ls_dir_metadata was skipping links to dirs,
since !ISREG().  So changed to use both stat() and lstat().

-- 
Justin
>From d8294c4747c5ba1f3bec858c137cc2d31e5a0425 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 16 Mar 2020 14:12:55 -0500
Subject: [PATCH v14 1/8] Document historic behavior of links to directories..

Backpatch to 9.5: pg_stat_file
---
 doc/src/sgml/func.sgml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fc4d7f0f78..2c6142a0e0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21528,7 +21528,8 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 size, last accessed time stamp, last modified time stamp,
 last file status change time stamp (Unix platforms only),
 file creation time stamp (Windows only), and a boolean
-indicating if it is a directory.  Typical usages include:
+indicating if it is a directory (or a symbolic link to a directory).
+Typical usages include:
 
 SELECT * FROM pg_stat_file('filename');
 SELECT (pg_stat_file('filename')).modification;
-- 
2.17.0

>From 031ce2684587d6d8d84c6d4b6c87f5399b6efb78 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 17 Mar 2020 13:16:24 -0500
Subject: [PATCH v14 2/8] Add tests on pg_ls_dir before changing it

---
 src/test/regress/expected/misc_functions.out | 18 ++
 src/test/regress/sql/misc_functions.sql  |  5 +
 2 files changed, 23 insertions(+)

diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index d3acb98d04..2e87c548eb 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -201,6 +201,24 @@ select count(*) > 0 from
  t
 (1 row)
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true
+ name 
+--
+ .
+(1 row)
+
+select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false
+ name 
+--
+(0 rows)
+
+select pg_ls_dir('does not exist', true, false); -- ok with missingok=true
+ pg_ls_dir 
+---
+(0 rows)
+
+select pg_ls_dir('does not exist'); -- fails with missingok=false
+ERROR:  could not open directory "does not exist": No such file or directory
 --
 -- Test adding a support function to a subject function
 --
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 094e8f8296..f6857ad177 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -60,6 +60,11 @@ select count(*) > 0 from
where spcname = 'pg_default') pts
   join pg_database db on pts.pts = db.oid;
 
+select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_d

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-03-17 Thread Tom Lane
Justin Pryzby  writes:
> It seems like the only way to make variable number of arguments is is with
> multiple entries in pg_proc.dat, one for each "number of" arguments.  Is that
> right ?

Another way to do it is to have one entry, put the full set of arguments
into the initial pg_proc.dat data, and then use CREATE OR REPLACE FUNCTION
later during initdb to install some defaults.  See existing cases in
system_views.sql, starting about line 1180.  Neither way is especially
pretty, so take your choice.

regards, tom lane




Re: logical copy_replication_slot issues

2020-03-17 Thread Alvaro Herrera
Thanks Arseny and Masahiko, I pushed this patch just now.  I changed
some comments while at it, hopefully they are improvements.

On 2020-Mar-09, Masahiko Sawada wrote:

> ctx = CreateInitDecodingContext(plugin, NIL,
> -   false,  /* do not build snapshot */
> +   false,  /* do not build data snapshot */
> restart_lsn,
> logical_read_local_xlog_page, NULL, NULL,
> NULL);
> 
> I'm not sure this change makes the comment better. Could you elaborate
> on the motivation of this change?

I addressed this issue by adding a comment in CreateInitDecodingContext
to explain the parameter, and then reference that comment's terminology
in this call.  I think it ends up clearer overall -- not that this whole
area is at all particularly clear.

Thanks again.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: WAL usage calculation patch

2020-03-17 Thread Kirill Bychik
> > Please feel free to work on any extension of this patch idea. I lack
> > both time and knowledge to do it all by myself.
>
>
> I'm adding a 3rd patch on top of yours to expose the new WAL counters in
> pg_stat_database, for vacuum and autovacuum.  I'm not really enthiusiastic 
> with
> this approach but I didn't find better, and maybe this will raise some better
> ideas.  The only sure thing is that we're not going to add a bunch of new
> fields in pg_stat_all_tables anyway.
>
> We can also drop this 3rd patch entirely if no one's happy about it without
> impacting the first two.

No objections about 3rd on my side, unless we miss the CF completely.

As for the code, I believe:
+ walusage.wal_records = pgWalUsage.wal_records -
+ walusage_start.wal_records;
+ walusage.wal_fp_records = pgWalUsage.wal_fp_records -
+ walusage_start.wal_fp_records;
+ walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;

Could be done much simpler via the utility:
WalUsageAccumDiff(walusage, pgWalUsage, walusage_start);

On a side note, I agree API to the buf/wal usage is far from perfect.

> > Test had been reworked, and I believe it should be stable now, the
> > part which checks WAL is written and there is a correlation between
> > affected rows and WAL records. I still have no idea how to test
> > full-page writes against regular updates, it seems very unstable.
> > Please share ideas if any.
>
>
> I just reviewed the patches, and it globally looks good to me.  The way to
> detect full page images looks sensible, but I'm really not familiar with that
> code so additional review would be useful.
>
> I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't
> used in the test.  Since I have to add all the patches to make the cfbot 
> happy,
> I slightly adapted the tests to reference the fp column too.  There was also a
> minor issue in the documentation, as wal_records and wal_bytes were 
> copy/pasted
> twice while wal_write_fp_records wasn't documented, so I also changed it.
>
> Let me know if you're ok with those changes.

Sorry for not getting wal_fp_usage into the docs, my fault.

As for the tests, please get somebody else to review this. I strongly
believe checking full page writes here could be a source of
instability.




Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM

2020-03-17 Thread Peter Geoghegan
On Mon, Mar 16, 2020 at 7:08 AM Michail Nikolaev
 wrote:
> -- ABSTRACT --
> There is a race condition between btree_xlog_unlink_page and _bt_walk_left.
> A lot of versions are affected including 12 and new-coming 13.
> Happens only on standby. Seems like could not cause invalid query results.

(CC'ing Heikki, just in case.)

Good catch! I haven't tried to reproduce the problem here just yet,
but your explanation is very easy for me to believe.

As you pointed out, the best solution is likely to involve having the
standby imitate the buffer lock acquisitions that take place on the
primary. We don't do that for page splits and page deletions. I think
that it's okay in the case of page splits, since we're only failing to
perform the same bottom-up lock coupling (I added something about that
specific thing to the README recently). Even btree_xlog_unlink_page()
would probably be safe if we didn't have to worry about backwards
scans, which are really a special case. But we do.

FWIW, while I agree that this issue is more likely to occur due to the
effects of commit 558a9165, especially when running your test case, my
own work on B-Tree indexes for Postgres 12 might also be a factor. I
won't get into the reasons now, since they're very subtle, but I have
observed that the Postgres 12 work tends to make page deletion occur
far more frequently with certain workloads. This was really obvious
when I examined the structure of B-Tree indexes over many hours while
BenchmarkSQL/TPC-C [1] ran, for example.

[1] https://github.com/petergeoghegan/benchmarksql
-- 
Peter Geoghegan




Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM

2020-03-17 Thread Peter Geoghegan
On Mon, Mar 16, 2020 at 10:20 PM Andrey M. Borodin  wrote:
> It seems to me that it's exactly the same check that I was trying to verify 
> in amcheck patch [0].
> But there it was verified inside amcheck, but here it is verified by index 
> scan.

Maybe we can accept your patch after fixing this bug. My objection to
the patch was that it couples locks in a way that's not compatible
with btree_xlog_unlink_page(). But the problem now seems to have been
btree_xlog_unlink_page() itself. It's possible that there are problems
elsewhere, but my recollection is that btree_xlog_unlink_page() was
the problem.

-- 
Peter Geoghegan




Re: Adding missing object access hook invocations

2020-03-17 Thread Mark Dilger



> On Mar 17, 2020, at 11:49 AM, Andres Freund  wrote:
> 
> On 2020-03-16 16:03:51 -0700, Mark Dilger wrote:
>> While working on object access hooks, I noticed several locations
>> where I would expect the hook to be invoked, but no actual invocation.
>> I think this just barely qualifies as a bug.  It's debatable because
>> whether it is a bug depends on the user's expectations and whether not
>> invoking the hook in these cases is defensible.  Does anybody have any
>> recollection of an intentional choice not to invoke in these
>> locations?
> 
> I am strongly against treating this as a bug, which'd likely imply
> backpatching. New hook invocations are a noticable behavioural change,
> and very plausibly will break currently working extensions. That's fine
> for a major version upgrade, but not for a minor one, unless there are
> very good reasons.

I agree that this does not need to be back-patched.  I was debating whether it 
constitutes a bug for the purpose of putting the fix into v13 vs. punting the 
patch forward to the v14 cycle.  I don't have a strong opinion on that.

Thoughts?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Laurenz Albe
On Tue, 2020-03-17 at 10:24 -0500, Justin Pryzby wrote:
> > --- a/src/backend/access/heap/vacuumlazy.c
> > +++ b/src/backend/access/heap/vacuumlazy.c
> > @@ -1388,17 +1388,26 @@ lazy_scan_heap(Relation onerel, VacuumParams 
> > *params, LVRelStats *vacrelstats,
> >else
> >{
> >booltuple_totally_frozen;
> > + boolfreeze_all;
> >   
> >num_tuples += 1;
> >hastup = true;
> >   
> > + /*
> > +  * If any tuple was already frozen in the 
> > block and this is
> > +  * an insert-only vacuum, we might as well 
> > freeze all other
> > +  * tuples in that block.
> > +  */
> > + freeze_all = params->is_insert_only && 
> > has_dead_tuples;
> > +
> 
> You're checking if any (previously-scanned) tuple was *dead*, but I think you
> need to check nfrozen>=0.

Yes, that was a silly typo.

> Also, this will fail to freeze tuples on a page which *could* be
> oppotunistically-frozen, but *follow* the first tuple which *needs* to be
> frozen.

I am aware of that.  I was trying to see if that went in the direction that
Andres intends before trying more invasive modifications.

> I think Andres was thinking this would maybe be an optimization independent of
> is_insert_only (?)

I wasn't sure.

In the light of that, I have ripped out that code again.

Also, since aggressive^H^H^H^H^H^H^H^H^H^Hproactive freezing seems to be a
performance problem in some cases (pages with UPDATEs and DELETEs in otherwise
INSERT-mostly tables), I have done away with the whole freezing thing,
which made the whole patch much smaller and simpler.

Now all that is introduced are the threshold and scale factor and
the new statistics counter to track the number of inserts since the last
VACUUM.

> > + /* normal autovacuum shouldn't freeze aggressively */
> > + *insert_only = false;
> 
> Aggressively is a bad choice of words.  In the context of vacuum, it usually
> means "visit all pages, even those which are allvisible".

This is gone in the latest patch.

Updated patch attached.

Perhaps we can reach a consensus on this reduced functionality.

Yours,
Laurenz Albe
From 547481033898f6e8e028e45684d4bbaa86d6bc9c Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 17 Mar 2020 20:31:12 +0100
Subject: [PATCH] Autovacuum tables that have received only inserts

Add "autovacuum_vacuum_insert_threshold" and
"autovacuum_vacuum_insert_scale_factor" GUC and reloption.
The default value for the threshold is 1000.
The scale factor defaults to 0, which means that it is
effectively disabled, but it offers some flexibility
to tune the feature similar to other autovacuum knobs.

Any table that has received more inserts since it was
last vacuumed (and that is not vacuumed for another
reason) will be autovacuumed.

This avoids the known problem that insert-only tables
are never autovacuumed until they need to have their
anti-wraparound autovacuum, which then can be massive
and disruptive.

To track the number of inserts since the last vacuum,
introduce a StatTabEntry "inserts_since_vacuum" that
gets reset to 0 after a vacuum.  This value is available
in "pg_stat_*_tables" as "n_ins_since_vacuum".

Author: Laurenz Albe, based on a suggestion from Darafei Praliaskouski
Reviewed-by: David Rowley, Justin Pryzby, Masahiko Sawada, Andres Freund
Discussion: https://postgr.es/m/CAC8Q8t+j36G_bLF=+0imo6jgnwnlnwb1tujxujr-+x8zcct...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 41 +++
 doc/src/sgml/maintenance.sgml |  5 +++
 doc/src/sgml/monitoring.sgml  |  5 +++
 doc/src/sgml/ref/create_table.sgml| 30 ++
 src/backend/access/common/reloptions.c| 22 ++
 src/backend/catalog/system_views.sql  |  1 +
 src/backend/postmaster/autovacuum.c   | 22 --
 src/backend/postmaster/pgstat.c   |  5 +++
 src/backend/utils/adt/pgstatfuncs.c   | 16 
 src/backend/utils/misc/guc.c  | 20 +
 src/backend/utils/misc/postgresql.conf.sample |  4 ++
 src/bin/psql/tab-complete.c   |  4 ++
 src/include/catalog/pg_proc.dat   |  5 +++
 src/include/pgstat.h  |  1 +
 src/include/postmaster/autovacuum.h   |  2 +
 src/include/utils/rel.h   |  2 +
 src/test/regress/expected/rules.out   |  3 ++
 17 files changed, 185 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1128f89ec..0ed1bb9d5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7244,6 +7244,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WI

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby
On Tue, Mar 17, 2020 at 08:42:07PM +0100, Laurenz Albe wrote:
> Also, since aggressive^H^H^H^H^H^H^H^H^H^Hproactive freezing seems to be a
> performance problem in some cases (pages with UPDATEs and DELETEs in otherwise
> INSERT-mostly tables), I have done away with the whole freezing thing,
> which made the whole patch much smaller and simpler.
> 
> Now all that is introduced are the threshold and scale factor and
> the new statistics counter to track the number of inserts since the last
> VACUUM.
> 
> Updated patch attached.
> 
> Perhaps we can reach a consensus on this reduced functionality.

+1

I still suggest scale_factor maximum of 1e10, like
4d54543efa5eb074ead4d0fadb2af4161c943044

Which alows more effectively disabling it than a factor of 100, which would
progress like: ~1, 1e2, 1e4, 1e6, 1e8, 1e10, ..

I don't think that 1e4 would be a problem, but 1e6 and 1e8 could be.  With
1e10, it's first vacuumed when there's 10billion inserts, if we didn't previous
hit the n_dead threshold.

I think that's ok?  If one wanted to disable it up to 1e11 tuples, I think
they'd disable autovacuum, or preferably just implement an vacuum job.

The commit message says:
|The scale factor defaults to 0, which means that it is
|effectively disabled, but it offers some flexibility
..but "it" is ambiguous, so should say something like: "the table size does not
contribute to the autovacuum threshold".

-- 
Justin




Re: allow online change primary_conninfo

2020-03-17 Thread Sergei Kornilov
Hello

>>  Well, it seems better to move this patch to next commitfest?
>
> What? You want to make wal_receiver_create_temp_slot unchangeable and
> default to off in pg13, and delay the patch that fixes those things to
> pg14? That makes no sense to me.

I want to handle similar things in a similar way.
wal_receiver_create_temp_slot has good design? I will change my patch in same 
way in this case. But something like that was strongly rejected a year ago.

> Please keep them both here so that we can get things to a usable state.

Yes, of course.

Here I attached 3 patches:
0001 is copy from https://commitfest.postgresql.org/27/2456/ It changes 
wal_receiver_create_temp_slot to PGC_POSTMASTER, changes the default value to 
off, and moves the logic to the startup process.
0002 changes primary_conninfo and primary_slot_name to be PGC_SIGHUP
0003 changes wal_receiver_create_temp_slot back to be PGC_SIGHUP. Michael 
Paquier asks to remove this from 0002, you ask to leave it in this thread. So, 
I made separate patch on top of 0002.

Thank you

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f1d2a77043..eb6b3fb876 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4170,7 +4170,11 @@ ANY num_sync ( ).
-The default is off.  This parameter can only be set at server start.
+The default is off.  This parameter can only be set in the 
+postgresql.conf file or on the server command line.
+   
+   
+The WAL receiver is restarted after an update of wal_receiver_create_temp_slot.

   
  
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e1f3e905bb..47d69d92d2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12278,8 +12278,10 @@ ProcessStartupSigHup(void)
 {
 	char	   *conninfo = pstrdup(PrimaryConnInfo);
 	char	   *slotname = pstrdup(PrimarySlotName);
+	bool		tempSlot = wal_receiver_create_temp_slot;
 	bool		conninfoChanged;
 	bool		slotnameChanged;
+	bool		tempSlotChanged = false;
 
 	ProcessConfigFile(PGC_SIGHUP);
 
@@ -12289,10 +12291,17 @@ ProcessStartupSigHup(void)
 	conninfoChanged = (strcmp(conninfo, PrimaryConnInfo) != 0);
 	slotnameChanged = (strcmp(slotname, PrimarySlotName) != 0);
 
+	/*
+	 * wal_receiver_create_temp_slot is used only when we have no slot
+	 * configured. We do not need to track this change if it has no effect.
+	 */
+	if (!slotnameChanged && strcmp(PrimarySlotName, "") == 0)
+		tempSlotChanged = (tempSlot != wal_receiver_create_temp_slot);
+
 	pfree(conninfo);
 	pfree(slotname);
 
-	if ((conninfoChanged || slotnameChanged) &&
+	if ((conninfoChanged || slotnameChanged || tempSlotChanged) &&
 		currentSource == XLOG_FROM_STREAM
 		&& WalRcvRunning())
 	{
@@ -12300,14 +12309,16 @@ ProcessStartupSigHup(void)
 			ereport(LOG,
 	(errmsg("The WAL receiver is going to be shut down due to change of %s",
 			"primary_conninfo")));
-		else if (conninfoChanged && slotnameChanged)
+		else if (conninfoChanged && (slotnameChanged || tempSlotChanged))
 			ereport(LOG,
 	(errmsg("The WAL receiver is going to be restarted due to change of %s and %s",
-			"primary_conninfo", "primary_slot_name")));
+			"primary_conninfo",
+			slotnameChanged ? "primary_slot_name" : "wal_receiver_create_temp_slot")));
 		else
 			ereport(LOG,
 	(errmsg("The WAL receiver is going to be restarted due to change of %s",
-			conninfoChanged ? "primary_conninfo" : "primary_slot_name")));
+			conninfoChanged ? "primary_conninfo"
+			: (slotnameChanged ? "primary_slot_name" : "wal_receiver_create_temp_slot";
 
 		pendingWalRcvRestart = true;
 	}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8d9bdc0eaf..a3650518c4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2030,7 +2030,7 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"wal_receiver_create_temp_slot", PGC_POSTMASTER, REPLICATION_STANDBY,
+		{"wal_receiver_create_temp_slot", PGC_SIGHUP, REPLICATION_STANDBY,
 			gettext_noop("Sets whether a WAL receiver should create a temporary replication slot if no permanent slot is configured."),
 		},
 		&wal_receiver_create_temp_slot,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 31096725fd..f01e43b818 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -321,7 +321,6 @@
 	# -1 allows indefinite delay
 #wal_receiver_create_temp_slot = off	# Create temp slot if primary_slot_name
 	# is not set.
-	# (change requires restart)
 #wal_receiver_status_interval = 10s	# send replies at least this often
 	# 0 disables
 #hot_standby_feedback = off		# send info from standby to prevent
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8b742c83c5..f1d2a77043 100644
--- a/doc/src/

Re: WAL usage calculation patch

2020-03-17 Thread Julien Rouhaud
On Tue, Mar 17, 2020 at 10:27:05PM +0300, Kirill Bychik wrote:
> > > Please feel free to work on any extension of this patch idea. I lack
> > > both time and knowledge to do it all by myself.
> >
> > I'm adding a 3rd patch on top of yours to expose the new WAL counters in
> > pg_stat_database, for vacuum and autovacuum.  I'm not really enthiusiastic 
> > with
> > this approach but I didn't find better, and maybe this will raise some 
> > better
> > ideas.  The only sure thing is that we're not going to add a bunch of new
> > fields in pg_stat_all_tables anyway.
> >
> > We can also drop this 3rd patch entirely if no one's happy about it without
> > impacting the first two.
>
> No objections about 3rd on my side, unless we miss the CF completely.
>
> As for the code, I believe:
> + walusage.wal_records = pgWalUsage.wal_records -
> + walusage_start.wal_records;
> + walusage.wal_fp_records = pgWalUsage.wal_fp_records -
> + walusage_start.wal_fp_records;
> + walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;
>
> Could be done much simpler via the utility:
> WalUsageAccumDiff(walusage, pgWalUsage, walusage_start);


Indeed, but this function is private to instrument.c.  AFAICT
pg_stat_statements is already duplicating similar code for buffers rather than
having BufferUsageAccumDiff being exported, so I chose the same approach.

I'd be in favor of exporting both functions though.


> On a side note, I agree API to the buf/wal usage is far from perfect.


Yes clearly.


> > > Test had been reworked, and I believe it should be stable now, the
> > > part which checks WAL is written and there is a correlation between
> > > affected rows and WAL records. I still have no idea how to test
> > > full-page writes against regular updates, it seems very unstable.
> > > Please share ideas if any.
> >
> >
> > I just reviewed the patches, and it globally looks good to me.  The way to
> > detect full page images looks sensible, but I'm really not familiar with 
> > that
> > code so additional review would be useful.
> >
> > I noticed that the new wal_write_fp_records field in pg_stat_statements 
> > wasn't
> > used in the test.  Since I have to add all the patches to make the cfbot 
> > happy,
> > I slightly adapted the tests to reference the fp column too.  There was 
> > also a
> > minor issue in the documentation, as wal_records and wal_bytes were 
> > copy/pasted
> > twice while wal_write_fp_records wasn't documented, so I also changed it.
> >
> > Let me know if you're ok with those changes.
>
> Sorry for not getting wal_fp_usage into the docs, my fault.
>
> As for the tests, please get somebody else to review this. I strongly
> believe checking full page writes here could be a source of
> instability.


I'm also a little bit dubious about it.  The initial checkpoint should make
things stable (of course unless full_page_writes is disabled), and Cfbot also
seems happy about it.  At least keeping it for the temporary tables test
shouldn't be a problem.




Re: pgsql: walreceiver uses a temporary replication slot by default

2020-03-17 Thread Sergei Kornilov
Hello

> I have reworked that part, adding more comments about the use of GUC
> parameters when establishing the connection to the primary for a WAL
> receiver. And also I have added an extra comment to walreceiver.c
> about the use of GUcs in general, to avoid this stuff again in the
> future. There were some extra nits with the format of
> postgresql.conf.sample.

Thank you! I just noticed that you removed my proposed change to this condition 
in RequestXLogStreaming

-   if (slotname != NULL)
+   if (slotname != NULL && slotname[0] != '\0')

We need this change to set is_temp_slot properly. PrimarySlotName GUC can 
usually be an empty string, so just "slotname != NULL" is not enough.

I attached patch with this change.

regards, Sergeidiff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 672bf6f1ee..8b742c83c5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4160,11 +4160,7 @@ ANY num_sync ( ).
-The default is on.  The only reason to turn this off would be if the
-remote instance is currently out of available replication slots.  This
-parameter can only be set in the postgresql.conf
-file or on the server command line.  Changes only take effect when the
-WAL receiver process starts a new connection.
+The default is off.  This parameter can only be set at server start.

   
  
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index de2d4ee582..483fedb218 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -284,6 +284,7 @@ bool		StandbyModeRequested = false;
 char	   *PrimaryConnInfo = NULL;
 char	   *PrimarySlotName = NULL;
 char	   *PromoteTriggerFile = NULL;
+bool		wal_receiver_create_temp_slot = false;
 
 /* are we currently in standby mode? */
 bool		StandbyMode = false;
@@ -11951,7 +11952,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		}
 		curFileTLI = tli;
 		RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
-			 PrimarySlotName);
+			 PrimarySlotName, wal_receiver_create_temp_slot);
 		receivedUpto = 0;
 	}
 
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 25e0333c9e..779d19f1c1 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -15,6 +15,12 @@
  * WalRcv->receivedUpto variable in shared memory, to inform the startup
  * process of how far it can proceed with XLOG replay.
  *
+ * WAL receivers cannot load directly GUC parameters used when establishing
+ * their connection to the primary, and rely on parameter values passed down
+ * by the startup process when WAL streaming is requested.  This applies
+ * to for example the replication slot creation and the connection string to
+ * use for the connection with the primary.
+ *
  * If the primary server ends streaming, but doesn't disconnect, walreceiver
  * goes into "waiting" mode, and waits for the startup process to give new
  * instructions. The startup process will treat that the same as
@@ -74,7 +80,6 @@
 
 
 /* GUC variables */
-bool		wal_receiver_create_temp_slot;
 int			wal_receiver_status_interval;
 int			wal_receiver_timeout;
 bool		hot_standby_feedback;
@@ -349,42 +354,23 @@ WalReceiverMain(void)
 		WalRcvFetchTimeLineHistoryFiles(startpointTLI, primaryTLI);
 
 		/*
-		 * Create temporary replication slot if no slot name is configured or
-		 * the slot from the previous run was temporary, unless
-		 * wal_receiver_create_temp_slot is disabled.  We also need to handle
-		 * the case where the previous run used a temporary slot but
-		 * wal_receiver_create_temp_slot was changed in the meantime.  In that
-		 * case, we delete the old slot name in shared memory.  (This would
+		 * Create temporary replication slot if requested.  In that
+		 * case, we update slot name in shared memory.  (This would
 		 * all be a bit easier if we just didn't copy the slot name into
 		 * shared memory, since we won't need it again later, but then we
 		 * can't see the slot name in the stats views.)
 		 */
-		if (slotname[0] == '\0' || is_temp_slot)
+		if (is_temp_slot)
 		{
-			bool		changed = false;
+			snprintf(slotname, sizeof(slotname),
+	 "pg_walreceiver_%lld",
+	 (long long int) walrcv_get_backend_pid(wrconn));
 
-			if (wal_receiver_create_temp_slot)
-			{
-snprintf(slotname, sizeof(slotname),
-		 "pg_walreceiver_%lld",
-		 (long long int) walrcv_get_backend_pid(wrconn));
+			walrcv_create_slot(wrconn, slotname, true, 0, NULL);
 
-walrcv_create_slot(wrconn, slotname, true, 0, NULL);
-changed = true;
-			}
-			else if (slotname[0] != '\0')
-			{
-slotname[0] = '\0';
-changed = true;
-			}
-
-			if (changed)
-			{
-SpinLockAcquire(&walrcv->mutex);
-strlcpy(walrcv->slotname, slotname, NAMEDATALEN);
-walrcv->is_temp_slot = wal_receiver_create_temp_slot;
-SpinLockRelease

Re: Error on failed COMMIT

2020-03-17 Thread Bruce Momjian
On Fri, Mar  6, 2020 at 01:12:10PM -0500, Robert Haas wrote:
> On Fri, Mar 6, 2020 at 11:55 AM Dave Cramer  wrote:
> > There have been some arguments that the client can fix this easily.
> >
> > Turns out it is not as easy as one might think.
> >
> > If the client (in this case JDBC) uses conn.commit() then yes relatively 
> > easy as we know that commit is being executed.
> 
> Right...
> 
> > however if the client executes commit using direct SQL and possibly 
> > multiplexes a number of commands we would have to parse the SQL to figure 
> > out what is being sent. This could include a column named commit_date or a 
> > comment with commit embedded in it. It really doesn't make sense to have a 
> > full fledged PostgreSQL SQL parser in every client. This is something the 
> > server does very well.
> 
> That's true. If the command tag is either COMMIT or ROLLBACK then the
> statement was either COMMIT or ROLLBACK, but Vladimir's example query
> /*commit*/rollback does seem like a pretty annoying case. I was
> assuming that the JDBC driver required use of con.commit() in the
> cases we care about, but perhaps that's not so.

Let me try to summarize where I think we are on this topic.

First, Vik reported that we don't follow the SQL spec when issuing a
COMMIT WORK in a failed transaction.  We return success and issue the
ROLLBACK command tag, rather than erroring.  In general, if we don't
follow the spec, we should either have a good reason, or the breakage to
match the spec is too severe.  (I am confused why this has not been
reported before.)

Second, someone suggested that if COMMIT throws an error, that future
statements would be considered to be in the same transaction block until
ROLLBACK is issued.  It was determined that this is not required, and
that the API should have COMMIT WORK on a failed transaction still exit
the transaction block.  This behavior is much more friendly for SQL
scripts piped into psql.

Third, the idea that individual interfaces, e.g. JDBC, should throw an
error in this case while the server just changes the COMMIT return tag
to ROLLBACK is confusing.  People regularly test SQL commands in the
server before writing applications or while debugging, and a behavior
mismatch would cause confusion.

Fourth, it is not clear how many applications would break if COMMIT
started issuing an error rather than return success a with ROLLBACK tag.
Certainly SQL scripts would be fine.  They would have one additional
error in the script output, but if they had ON_ERROR_STOP enabled, they
would have existed before the commit.  Applications that track statement
errors and issue rollbacks will be fine.  So, we are left with
applications that issue COMMIT and expect success after a transaction
block has failed.  Do we know how other database systems handle this?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Laurenz Albe
On Tue, 2020-03-17 at 14:56 -0500, Justin Pryzby wrote:
> I still suggest scale_factor maximum of 1e10, like
> 4d54543efa5eb074ead4d0fadb2af4161c943044
> 
> Which alows more effectively disabling it than a factor of 100, which would
> progress like: ~1, 1e2, 1e4, 1e6, 1e8, 1e10, ..
> 
> I don't think that 1e4 would be a problem, but 1e6 and 1e8 could be.  With
> 1e10, it's first vacuumed when there's 10billion inserts, if we didn't 
> previous
> hit the n_dead threshold.
> 
> I think that's ok?  If one wanted to disable it up to 1e11 tuples, I think
> they'd disable autovacuum, or preferably just implement an vacuum job.

Assume a scale factor >= 1, for example 2, and n live tuples.
The table has just been vacuumed.

Now we insert m number tuples (which are live).

Then the condition

  threshold + scale_factor * live_tuples < newly_inserted_tuples

becomes

  1000 + 2 * (n + m) < m

which can never be true for non-negative n and m.

So a scale factor >= 1 disables the feature.

Yours,
Laurenz Albe





Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby
On Tue, Mar 17, 2020 at 10:01:15PM +0100, Laurenz Albe wrote:
> On Tue, 2020-03-17 at 14:56 -0500, Justin Pryzby wrote:
> > I still suggest scale_factor maximum of 1e10, like
> > 4d54543efa5eb074ead4d0fadb2af4161c943044
> > 
> > Which alows more effectively disabling it than a factor of 100, which would
> > progress like: ~1, 1e2, 1e4, 1e6, 1e8, 1e10, ..
> > 
> > I don't think that 1e4 would be a problem, but 1e6 and 1e8 could be.  With
> > 1e10, it's first vacuumed when there's 10billion inserts, if we didn't 
> > previous
> > hit the n_dead threshold.
> > 
> > I think that's ok?  If one wanted to disable it up to 1e11 tuples, I think
> > they'd disable autovacuum, or preferably just implement an vacuum job.
> 
> Assume a scale factor >= 1, for example 2, and n live tuples.
> The table has just been vacuumed.
> 
> Now we insert m number tuples (which are live).
> 
> Then the condition
> 
>   threshold + scale_factor * live_tuples < newly_inserted_tuples
> 
> becomes
> 
>   1000 + 2 * (n + m) < m
> 
> which can never be true for non-negative n and m.
> 
> So a scale factor >= 1 disables the feature.

No, this is what we mailed about privately yesterday, and I demonstrated that
autovac can still run with factor=100.  I said:

|It's a multiplier, not a percent out of 100 (fraction is not a great choice of
|words).
|
|&autovacuum_vac_scale,
|0.2, 0.0, 100.0,
|
|The default is 0.2 (20%), so 100 means after updating/deleting 100*reltuples.

live tuples is an estimate, from the most recent vacuum OR analyze.

If 1.0 disabled the feature, it wouldn't make much sense to allow factor up to
100.

+   {
+   {"autovacuum_vacuum_insert_scale_factor", PGC_SIGHUP, 
AUTOVACUUM,
+   gettext_noop("Number of tuple inserts prior to vacuum 
as a fraction of reltuples."),
+   NULL
+   },
+   &autovacuum_vac_ins_scale,
+   0.0, 0.0, 100.0,
+   NULL, NULL, NULL
+   },

-- 
Justin




Re: Error on failed COMMIT

2020-03-17 Thread Vladimir Sitnikov
Bruce, thanks for taking the time to summarize.

Bruce>Fourth, it is not clear how many applications would break if COMMIT
Bruce>started issuing an error rather than return success

None.

Bruce>applications that issue COMMIT and expect success after a transaction
Bruce>block has failed

An application must expect an exception from a COMMIT statement like any
other SQL.

Wire protocol specification explicitly says implementations must expect
error messages at any time.

---

Bruce>Do we know how other database systems handle this?

Oracle DB produces an error from COMMIT if transaction can't be committed
(e.g. failure in the processing of "on commit refresh materialized view").

---

The bug is "deferred constraint violation" and "non-deferred constraint
violation" end up with
**different** behavior for COMMIT.

deferred violation produces an error while non-deferred violation produces
"silent rollback".

In other words, there are already cases in PostgreSQL when commit produces
an error. It is nothing new.
The new part is that PostgreSQL must not produce "silent rollbacks".

Bruce>First, Vik reported that we don't follow the SQL spec

+1

Bruce>Second, someone suggested that if COMMIT throws an error, that future
Bruce>statements would be considered to be in the same transaction

No. Please disregard that. That is ill. COMMIT (and/or ROLLBACK) must
terminate the transaction in any case.
The transaction must not exist after COMMIT finishes (successfully or not).
The same for PREPARE TRANSACTION. If it fails, then the transaction must be
clear.

A litmus test is "deferred constraint violation". It works Ok in the
current PostgreSQL.
If the database can't commit, it should respond with a clear error that
describes the reason for the failure.

Bruce>Third, the idea that individual interfaces, e.g. JDBC, should throw

Individual interfaces should not deviate from server behavior much.
They should convert server-provided errors to the language-native format.
They should not invent their own rules to convert server messages to errors.
That would provide a uniform PostgreSQL experience for the end-users.

Note: there are even multiple JDBC implementations for PostgreSQL, so
slight differences in transaction handling
is the very last "feature" people want from PostgreSQL database.

Vladimir


Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Laurenz Albe
On Tue, 2020-03-17 at 16:07 -0500, Justin Pryzby wrote:
> > Assume a scale factor >= 1, for example 2, and n live tuples.
> > The table has just been vacuumed.
> > 
> > Now we insert m number tuples (which are live).
> > 
> > Then the condition
> > 
> >threshold + scale_factor * live_tuples < newly_inserted_tuples
> > 
> > becomes
> > 
> >1000 + 2 * (n + m) < m
> > 
> > which can never be true for non-negative n and m.
> > 
> > So a scale factor >= 1 disables the feature.
> 
> No, this is what we mailed about privately yesterday, and I demonstrated that
> autovac can still run with factor=100.  I said:

I remember.
Can you point out where exactly the flaw in my reasoning is?

> > It's a multiplier, not a percent out of 100 (fraction is not a great choice 
> > of
> > words).
> > 
> > &autovacuum_vac_scale,
> > 0.2, 0.0, 100.0,
> > 
> > The default is 0.2 (20%), so 100 means after updating/deleting 
> > 100*reltuples.

Yes, exactly.

> If 1.0 disabled the feature, it wouldn't make much sense to allow factor up to
> 100.

True, we could set the upper limit to 2, but it doesn't matter much.

Note that this is different from autovacuum_vacuum_scale_factor,
because inserted tuples are live, while dead tuples are not.

Yours,
Laurenz Albe





Re: Attempt to consolidate reading of XLOG page

2020-03-17 Thread Alvaro Herrera
On 2019-Nov-20, Antonin Houska wrote:

> Alvaro Herrera  wrote:

> > What is logical_read_local_xlog_page all about?  Seems useless.  Let's
> > get rid of it.
> 
> It seems so. Should I post a patch for that?

No need .. it was simple enough.  Just pushed it.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby
On Tue, Mar 17, 2020 at 10:22:44PM +0100, Laurenz Albe wrote:
> On Tue, 2020-03-17 at 16:07 -0500, Justin Pryzby wrote:
> > > Assume a scale factor >= 1, for example 2, and n live tuples.
> > > The table has just been vacuumed.
> > > 
> > > Now we insert m number tuples (which are live).

.. but not yet counted in reltuples.

On Tue, Mar 17, 2020 at 10:22:44PM +0100, Laurenz Albe wrote:
> Note that this is different from autovacuum_vacuum_scale_factor,
> because inserted tuples are live, while dead tuples are not.

But they're not counted in reltuples until after the next vacuum (or analyze),
which is circular, since it's exactly what we're trying to schedule.

reltuples = classForm->reltuples;
vactuples = tabentry->n_dead_tuples;
+   instuples = tabentry->inserts_since_vacuum;
anltuples = tabentry->changes_since_analyze;
 
vacthresh = (float4) vac_base_thresh + vac_scale_factor * 
reltuples;
+   vacinsthresh = (float4) vac_ins_base_thresh + 
vac_ins_scale_factor * reltuples;

-- 
Justin




Re: Auxiliary Processes and MyAuxProc

2020-03-17 Thread Justin Pryzby
On Tue, Mar 17, 2020 at 02:50:19PM -0400, Mike Palmiotto wrote:
> The patchset is now split out. I've just noticed that Peter Eisentraut
> included some changes for a generic MyBackendType, which I should have
> been aware of. I was unable to rebase due to these changes, but can
> fold these patches into that framework if others think it's
> worthwhile.

I don't have many comments on the patch, but it's easy enough to rebase.

I think maybe you'll want to do something more with this new function:
GetBackendTypeDesc()

+   /* Don't panic. */

+1

-- 
Justin
>From 4234a083913d93b27cd6763f41c092adc8d4cf6c Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Mon, 9 Mar 2020 22:12:10 +
Subject: [PATCH v2 01/12] Add subprocess infrastructure

This commit introduces a new infrastructure for postmaster subprocesses
initialization, forking, exec'ing and main function control. The intent
is to have all subprocess control information live in the process_types
struct, defined in subprocess.h. As more processes are added to the
struct, the fields will be expanded to differentiate actions.

There are several advantages to such an infrastructure:
1) Centralized definition of subprocess information
2) Standardized function arguments/naming schemes
3) Code minimization and removal of duplicate code
---
 src/backend/bootstrap/bootstrap.c | 74 +++
 src/backend/postmaster/Makefile   |  1 +
 src/backend/postmaster/bgwriter.c |  2 +-
 src/backend/postmaster/checkpointer.c |  2 +-
 src/backend/postmaster/startup.c  |  3 +-
 src/backend/postmaster/subprocess.c   | 62 ++
 src/backend/postmaster/walwriter.c|  2 +-
 src/backend/replication/walreceiver.c |  2 +-
 src/include/bootstrap/bootstrap.h |  3 +-
 src/include/postmaster/bgwriter.h |  4 +-
 src/include/postmaster/startup.h  |  5 +-
 src/include/postmaster/subprocess.h   | 44 
 src/include/postmaster/walwriter.h|  2 +-
 src/include/replication/walreceiver.h |  2 +-
 14 files changed, 140 insertions(+), 68 deletions(-)
 create mode 100644 src/backend/postmaster/subprocess.c
 create mode 100644 src/include/postmaster/subprocess.h

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 5480a024e0..9d41567467 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -35,6 +35,7 @@
 #include "pgstat.h"
 #include "postmaster/bgwriter.h"
 #include "postmaster/startup.h"
+#include "postmaster/subprocess.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
 #include "storage/bufmgr.h"
@@ -56,8 +57,6 @@ uint32		bootstrap_data_checksum_version = 0;	/* No checksum */
 #define ALLOC(t, c) \
 	((t *) MemoryContextAllocZero(TopMemoryContext, (unsigned)(c) * sizeof(t)))
 
-static void CheckerModeMain(void);
-static void BootstrapModeMain(void);
 static void bootstrap_signals(void);
 static void ShutdownAuxiliaryProcess(int code, Datum arg);
 static Form_pg_attribute AllocateAttribute(void);
@@ -334,6 +333,12 @@ AuxiliaryProcessMain(int argc, char *argv[])
 		default:
 			MyBackendType = B_INVALID;
 	}
+
+	/*
+	 * We've read the arguments and know what backend type we are.
+	 */
+	InitMySubprocess((SubprocessType)MyAuxProcType);
+
 	if (IsUnderPostmaster)
 		init_ps_display(NULL);
 
@@ -418,56 +423,10 @@ AuxiliaryProcessMain(int argc, char *argv[])
 	 */
 	SetProcessingMode(NormalProcessing);
 
-	switch (MyAuxProcType)
-	{
-		case CheckerProcess:
-			/* don't set signals, they're useless here */
-			CheckerModeMain();
-			proc_exit(1);		/* should never return */
-
-		case BootstrapProcess:
-
-			/*
-			 * There was a brief instant during which mode was Normal; this is
-			 * okay.  We need to be in bootstrap mode during BootStrapXLOG for
-			 * the sake of multixact initialization.
-			 */
-			SetProcessingMode(BootstrapProcessing);
-			bootstrap_signals();
-			BootStrapXLOG();
-			BootstrapModeMain();
-			proc_exit(1);		/* should never return */
-
-		case StartupProcess:
-			/* don't set signals, startup process has its own agenda */
-			StartupProcessMain();
-			proc_exit(1);		/* should never return */
-
-		case BgWriterProcess:
-			/* don't set signals, bgwriter has its own agenda */
-			BackgroundWriterMain();
-			proc_exit(1);		/* should never return */
-
-		case CheckpointerProcess:
-			/* don't set signals, checkpointer has its own agenda */
-			CheckpointerMain();
-			proc_exit(1);		/* should never return */
-
-		case WalWriterProcess:
-			/* don't set signals, walwriter has its own agenda */
-			InitXLOGAccess();
-			WalWriterMain();
-			proc_exit(1);		/* should never return */
-
-		case WalReceiverProcess:
-			/* don't set signals, walreceiver has its own agenda */
-			WalReceiverMain();
-			proc_exit(1);		/* should never return */
+	/* Now jump into the subprocess main function and never look back! */
+	MySubprocess->entrypoint(argc, argv);
 
-		default:
-			elog(PANIC, "unrecognized process ty

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Laurenz Albe
On Tue, 2020-03-17 at 16:34 -0500, Justin Pryzby wrote:
> > > > Now we insert m number tuples (which are live).
> 
> .. but not yet counted in reltuples.

Thanks for pointing out my mistake.

Here is another patch, no changes except setting the upper limit
for autovacuum_vacuum_insert_scale_factor to 1e10.

Yours,
Laurenz Albe
From cc44042d4a07804a21abe7ad54a8dfafd3162228 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 17 Mar 2020 22:51:46 +0100
Subject: [PATCH] Autovacuum tables that have received only inserts

Add "autovacuum_vacuum_insert_threshold" and
"autovacuum_vacuum_insert_scale_factor" GUC and reloption.
The default value for the threshold is 1000.
The scale factor defaults to 0, which means that by
default the table size doesn't contribute,
but this way we have some flexibility to tune the
feature similar to other autovacuum knobs.

Any table that has received more inserts since it was
last vacuumed (and that is not vacuumed for another
reason) will be autovacuumed.

This avoids the known problem that insert-only tables
are never autovacuumed until they need to have their
anti-wraparound autovacuum, which then can be massive
and disruptive.

To track the number of inserts since the last vacuum,
introduce a StatTabEntry "inserts_since_vacuum" that
gets reset to 0 after a vacuum.  This value is available
in "pg_stat_*_tables" as "n_ins_since_vacuum".

Author: Laurenz Albe, based on a suggestion from Darafei Praliaskouski
Reviewed-by: David Rowley, Justin Pryzby, Masahiko Sawada, Andres Freund
Discussion: https://postgr.es/m/CAC8Q8t+j36G_bLF=+0imo6jgnwnlnwb1tujxujr-+x8zcct...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 41 +++
 doc/src/sgml/maintenance.sgml |  5 +++
 doc/src/sgml/monitoring.sgml  |  5 +++
 doc/src/sgml/ref/create_table.sgml| 30 ++
 src/backend/access/common/reloptions.c| 22 ++
 src/backend/catalog/system_views.sql  |  1 +
 src/backend/postmaster/autovacuum.c   | 22 --
 src/backend/postmaster/pgstat.c   |  5 +++
 src/backend/utils/adt/pgstatfuncs.c   | 16 
 src/backend/utils/misc/guc.c  | 20 +
 src/backend/utils/misc/postgresql.conf.sample |  4 ++
 src/bin/psql/tab-complete.c   |  4 ++
 src/include/catalog/pg_proc.dat   |  5 +++
 src/include/pgstat.h  |  1 +
 src/include/postmaster/autovacuum.h   |  2 +
 src/include/utils/rel.h   |  2 +
 src/test/regress/expected/rules.out   |  3 ++
 17 files changed, 185 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1128f89ec..0ed1bb9d5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7244,6 +7244,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_vacuum_insert_threshold (integer)
+  
+   autovacuum_vacuum_insert_threshold
+   configuration parameter
+  
+  
+  
+   
+Specifies the number of inserted tuples needed to trigger a
+VACUUM in any one table.
+The default is 1000 tuples.
+This parameter can only be set in the postgresql.conf
+file or on the server command line;
+but the setting can be overridden for individual tables by
+changing table storage parameters.
+   
+  
+ 
+
  
   autovacuum_analyze_threshold (integer)
   
@@ -7285,6 +7305,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_vacuum_insert_scale_factor (floating point)
+  
+   autovacuum_vacuum_insert_scale_factor
+   configuration parameter
+  
+  
+  
+   
+Specifies a fraction of the table size to add to
+autovacuum_vacuum_insert_threshold
+when deciding whether to trigger a VACUUM.
+The default is 0.0, which means that the table size has no effect.
+This parameter can only be set in the postgresql.conf
+file or on the server command line;
+but the setting can be overridden for individual tables by
+changing table storage parameters.
+   
+  
+ 
+
  
   autovacuum_analyze_scale_factor (floating point)
   
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index ec8bdcd7a4..dbf418c62a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -786,6 +786,11 @@ vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuple
 vacuum is performed to freeze old tuples and advance
 relfrozenxid; otherwise, only pages that have been modified
 since the last vacuum are scanned.
+Finally, a threshold similar to the above is calculated from
+ and
+.
+Tables that have received more inserts than the 

Re: Parallel leader process info in EXPLAIN

2020-03-17 Thread James Coleman
On Thu, Nov 7, 2019 at 9:48 PM Thomas Munro  wrote:
>
> On Thu, Nov 7, 2019 at 11:37 PM Rafia Sabih  wrote:
> > ...
> > Also, I noticed that the worker details are displayed for sort node even 
> > without verbose, but for scans it is only with verbose. Am I missing 
> > something or there is something behind? However, I am not sure if this is 
> > the introduced by this patch-set.
>
> Yeah, it's a pre-existing thing, but I agree it's an interesting
> difference.  We currently don't have a way to show a 'combined'
> version of a parallel (oblivious) sort: we always show the per-process
> version, and all this patch changes is how we label the leader's
> stats.  I suppose someone could argue that in non-VERBOSE mode we
> should show the total memory usage (sum from all processes). I suppose
> it's possible they use different sort types (one worker runs out of
> work_mem and another doesn't), and I'm not sure how how you'd
> aggregate that.

Over at [1] (incremental sort patch) I had a similar question, since
each sort node (even non-parallel) can execute multiple tuplesorts.
The approach I took was to show both average and max for both disk and
memory usage as well as all sort strategies used. It looks like this:

   ->  Incremental Sort
 Sort Key: a, b
 Presorted Key: a
 Full-sort Groups: 4 (Methods: quicksort) Memory: 26kB (avg), 26kB (max)
 ->  Index Scan using idx_t_a...

It'd be great if that had a use here too :)

James

[1]: 
https://www.postgresql.org/message-id/CAAaqYe_ctGqQsauuYS5StPULkES7%3Dt8vNwvEPyzXQdbjAuZ6vA%40mail.gmail.com




Re: Error on failed COMMIT

2020-03-17 Thread Dave Cramer
On Tue, 17 Mar 2020 at 16:47, Bruce Momjian  wrote:

> On Fri, Mar  6, 2020 at 01:12:10PM -0500, Robert Haas wrote:
> > On Fri, Mar 6, 2020 at 11:55 AM Dave Cramer 
> wrote:
> > > There have been some arguments that the client can fix this easily.
> > >
> > > Turns out it is not as easy as one might think.
> > >
> > > If the client (in this case JDBC) uses conn.commit() then yes
> relatively easy as we know that commit is being executed.
> >
> > Right...
> >
> > > however if the client executes commit using direct SQL and possibly
> multiplexes a number of commands we would have to parse the SQL to figure
> out what is being sent. This could include a column named commit_date or a
> comment with commit embedded in it. It really doesn't make sense to have a
> full fledged PostgreSQL SQL parser in every client. This is something the
> server does very well.
> >
> > That's true. If the command tag is either COMMIT or ROLLBACK then the
> > statement was either COMMIT or ROLLBACK, but Vladimir's example query
> > /*commit*/rollback does seem like a pretty annoying case. I was
> > assuming that the JDBC driver required use of con.commit() in the
> > cases we care about, but perhaps that's not so.
>
> Let me try to summarize where I think we are on this topic.
>
> First, Vik reported that we don't follow the SQL spec when issuing a
> COMMIT WORK in a failed transaction.  We return success and issue the
> ROLLBACK command tag, rather than erroring.  In general, if we don't
> follow the spec, we should either have a good reason, or the breakage to
> match the spec is too severe.  (I am confused why this has not been
> reported before.)
>

Good question.

>
> Second, someone suggested that if COMMIT throws an error, that future
> statements would be considered to be in the same transaction block until
> ROLLBACK is issued.  It was determined that this is not required, and
> that the API should have COMMIT WORK on a failed transaction still exit
> the transaction block.  This behavior is much more friendly for SQL
> scripts piped into psql.
>
> This is correct. The patch I provided does exactly this.
The Rollback occurs. The transaction is finished, but an error message is
sent


> Third, the idea that individual interfaces, e.g. JDBC, should throw an
> error in this case while the server just changes the COMMIT return tag
> to ROLLBACK is confusing.  People regularly test SQL commands in the
> server before writing applications or while debugging, and a behavior
> mismatch would cause confusion.
>

I'm not sure what you mean by this. The server would throw an error.

>
> Fourth, it is not clear how many applications would break if COMMIT
> started issuing an error rather than return success a with ROLLBACK tag.
> Certainly SQL scripts would be fine.  They would have one additional
> error in the script output, but if they had ON_ERROR_STOP enabled, they
> would have existed before the commit.  Applications that track statement
> errors and issue rollbacks will be fine.  So, we are left with
> applications that issue COMMIT and expect success after a transaction
> block has failed.  Do we know how other database systems handle this?
>

Well I know pgjdbc handles my patch fine without any changes to the code
As I mentioned upthread 2 of the 3 go drivers already error if rollback is
returned. 1 of them does not.

I suspect npgsql would be fine. Shay ?


Dave Cramer
www.postgres.rocks


Re: Error on failed COMMIT

2020-03-17 Thread Bruce Momjian
On Tue, Mar 17, 2020 at 07:15:05PM -0400, Dave Cramer wrote:
> On Tue, 17 Mar 2020 at 16:47, Bruce Momjian  wrote:
> Third, the idea that individual interfaces, e.g. JDBC, should throw an
> error in this case while the server just changes the COMMIT return tag
> to ROLLBACK is confusing.  People regularly test SQL commands in the
> server before writing applications or while debugging, and a behavior
> mismatch would cause confusion.
> 
> 
> I'm not sure what you mean by this. The server would throw an error. 

I am saying it is not wise to have interfaces behaving differently than
the server, for the reasons stated above.

> Fourth, it is not clear how many applications would break if COMMIT
> started issuing an error rather than return success a with ROLLBACK tag.
> Certainly SQL scripts would be fine.  They would have one additional
> error in the script output, but if they had ON_ERROR_STOP enabled, they
> would have existed before the commit.  Applications that track statement
> errors and issue rollbacks will be fine.  So, we are left with
> applications that issue COMMIT and expect success after a transaction
> block has failed.  Do we know how other database systems handle this?
> 
> Well I know pgjdbc handles my patch fine without any changes to the code
> As I mentioned upthread 2 of the 3 go drivers already error if rollback is
> returned. 1 of them does not.

Good point.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Error on failed COMMIT

2020-03-17 Thread Dave Cramer
On Tue, 17 Mar 2020 at 19:23, Bruce Momjian  wrote:

> On Tue, Mar 17, 2020 at 07:15:05PM -0400, Dave Cramer wrote:
> > On Tue, 17 Mar 2020 at 16:47, Bruce Momjian  wrote:
> > Third, the idea that individual interfaces, e.g. JDBC, should throw
> an
> > error in this case while the server just changes the COMMIT return
> tag
> > to ROLLBACK is confusing.  People regularly test SQL commands in the
> > server before writing applications or while debugging, and a behavior
> > mismatch would cause confusion.
> >
> >
> > I'm not sure what you mean by this. The server would throw an error.
>
> I am saying it is not wise to have interfaces behaving differently than
> the server, for the reasons stated above.
>
> Agreed and this is why I think it is important for the server to be
defining the behaviour instead of each interface deciding how to handle
this situation.


Dave Cramer
www.postgres.rocks

>
>


Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby
On Mon, Mar 16, 2020 at 07:47:13AM -0500, Justin Pryzby wrote:
> Normally, when someone complains about bad plan related to no index-onlyscan,
> we tell them to run vacuum, and if that helps, then ALTER TABLE .. SET
> (autovacuum_vacuum_scale_factor=0.005).
> 
> If there's two thresholds (4 GUCs and 4 relopts) for autovacuum, then do we
> have to help determine which one was being hit, and which relopt to set?

I don't think we came to any resolution on this.

Right now, to encourage IOS, we'd tell someone to set
autovacuum_vacuum_scale_factor=0.005.  That wouldn't work for an insert-only
table, but I've never heard back from someone that it didn't work.

So with this patch, we'd maybe tell them to do this, to also get IOS on
insert-only tables ?
|ALTER TABLE .. SET (autovacuum_vacuum_scale_factor=0.005, 
autovacuum_vacuum_insert_threshold=5);

> I wonder if the new insert GUCs should default to -1 (disabled)?  And the
> insert thresholds should be set by new insert relopt (if set), or by new 
> insert
> GUC (default -1), else normal relopt, or normal GUC.  The defaults would give
> 50 + 0.20*n.  When someone asks about IOS, we'd tell them to set
> autovacuum_vacuum_scale_factor=0.005, same as now.
> 
> vac_ins_scale_factor =
>   (relopts && relopts->vacuum_ins_scale_factor >= 0) ? 
> relopts->vacuum_ins_scale_factor :
>   autovacuum_vac_ins_scale >= 0 ? autovacuum_vac_ins_scale : 
>   (relopts && relopts->vacuum_scale_factor >= 0) ? 
> relopts->vacuum_scale_factor :
>   autovacuum_vac_scale;

-- 
Justin




Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-17 Thread Tom Lane
I wrote:
> Pavel Stehule  writes:
>> There was a problem just with anyrange type. This last version looks
>> perfect.

> If you think that "matching polymorphic types" is too vague, I'm
> not sure there's much daylight between there and spelling it out
> in full as this latest patch does.  "anyrange is the only problem"
> might be a tenable viewpoint today, but once this patchset goes
> in there's going to be much more scope for confusion about which
> arguments potentially match a polymorphic result.

On further reflection it seems like that's actually a fairly convincing
argument for going with the more-verbose style.  Hence, I pushed 0001
that way.

The cfbot will be unhappy at this point, but I need to rebase the
main patch again ...

regards, tom lane




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Andres Freund
Hi,

On 2020-03-17 01:14:02 +0100, Laurenz Albe wrote:
> lazy_check_needs_freeze() is only called for an aggressive vacuum, which
> this isn't.

Hm? I mean some of these will be aggressive vacuums, because it's older
than vacuum_freeze_table_age? And the lower age limit would make that
potentially more painful, no?

Greetings,

Andres Freund




Re: [PATCH] Opclass parameters

2020-03-17 Thread Nikita Glukhov

Attached new version of reordered patches.

Questionable patches for AM-specific per-attribute options were moved to
the end, so they can be skipped now.

On 16.03.2020 18:22, Alexander Korotkov wrote:


Hi!

I took a look on this patchset.  There is a first set of questions.

* Patchset badly needs comments.  I've to literally reverse engineer
to get what's going on.  But I still don't understand many things.

* I'm curious about what local_relopts.base field means.

void
extend_local_reloptions(local_relopts *opts, void *base, Size base_size)
{
 Assert(opts->base_size < base_size);
 opts->base = base;
 opts->base_size = base_size;
}

/*
  * add_local_reloption
  *  Add an already-created custom reloption to the local list.
  */
static void
add_local_reloption(local_relopts *relopts, relopt_gen *newoption, void *pval)
{
 local_relopt *opt = palloc(sizeof(*opt));

 opt->option = newoption;
 opt->offset = (char *) pval - (char *) relopts->base;

 relopts->options = lappend(relopts->options, opt);
}

Datum
ghstore_options(PG_FUNCTION_ARGS)
{
 local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0);
 GistHstoreOptions *options = NULL;

 extend_local_reloptions(relopts, options, sizeof(*options));
 add_local_int_reloption(relopts, "siglen",
 "signature length in bytes",
 SIGLEN_DEFAULT, 1, SIGLEN_MAX,
 &options->siglen);

 PG_RETURN_VOID();
}

It's not commented, but I guess it's used to calculate offsets from
pointers passed to add_local_*_reloption().  Is it better to just pass
offsets to add_local_*_reloption()?


Yes, 'base' field was used to calculate offsets.  Now I started to pass offsets
instead of pointers to the fields of template structure (that gave us
additional type checking).  Some comments were added.


* It's generally unclear how does amattoptions and opclass options
interact.  As I get we now don't have an example where both
amattoptions and opclass options involved.  What is general benefit
from putting both two kind of options into single bytea?  Can opclass
options method do something useful with amattoptions?  For instance,
some amattoptions can be calculated from opclass options?  That would
be some point for putting these options together, but it doesn't look
like opclass options method can do this?


There are no examples for AM and opclass options interaction now.  But AM and
opclass can register custom callbacks that will be called after parsing in
their registration order.  In these callbacks it is possible to post-process
option values, check presence or absence of some options.

The main benefit of putting both option into single bytea is that it does not
require major modifications of reloption processing code.  And it also does
not require to split reloption list obtained from SQL into two separate lists
for AM and opclass options.


* It current opclass code safe for introduction new atattoptions.
For instace, would ghstore_*() work the same way expecting
GistHstoreOptions struct to be passed as opclass options if gist would
introduce own attoptions?  I guess not.  If I'm wrong, please clarify
this.  And patchset needs comment one could get this without guessing.


Yes, the code will be broken after introduction of GiST per-attribute options.
GistHstoreOptions should include GistAttOptions which simply did not exist in
the previous version of the patches.  I added empty XxxAttOptions for all AMs
in patch #7, and GistAttOptions and GinAttOptions now are included into
corresponding structures for opclass options.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Introduce-opclass-parameters-20200318.patch.gz
Description: application/gzip


0002-Use-opclass-parameters-in-GiST-tsvector_ops-20200318.patch.gz
Description: application/gzip


0003-Use-opclass-parameters-in-contrib_intarray-20200318.patch.gz
Description: application/gzip


0004-Use-opclass-parameters-in-contrib_ltree-20200318.patch.gz
Description: application/gzip


0005-Use-opclass-parameters-in-contrib_pg_trgm-20200318.patch.gz
Description: application/gzip


0006-Use-opclass-parameters-in-contrib_hstore-20200318.patch.gz
Description: application/gzip


0007-Introduce-amattoptions-20200318.patch.gz
Description: application/gzip


0008-Use-amattoptions-in-contrib_bloom-20200318.patch.gz
Description: application/gzip


0009-Remove-pg_index.indoption-20200318.patch.gz
Description: application/gzip


Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-17 Thread Tomas Vondra

On Tue, Mar 17, 2020 at 06:05:17PM +0100, Tomas Vondra wrote:

On Tue, Mar 17, 2020 at 04:14:26PM +, Dean Rasheed wrote:

On Tue, 17 Mar 2020 at 15:37, Tomas Vondra  wrote:


On Tue, Mar 17, 2020 at 12:42:52PM +, Dean Rasheed wrote:


The other thing that I'm still concerned about is the possibility of
returning estimates with P(a,b) > P(a) or P(b). I think that such a
thing becomes much more likely with the new types of clause supported
here, because they now allow multiple values from each column, where
before we only allowed one. I took another look at the patch I posted
on the other thread, and I've convinced myself that it's correct.
Attached is an updated version, with some cosmetic tidying up and now
with some additional regression tests.


Yeah, I agree that's something we need to fix. Do you plan to push the
fix, or do you want me to do it?



I can push it. Have you had a chance to review it?



Not yet, I'll take a look today.



OK, I took a look. I think from the correctness POV the patch is OK, but
I think the dependencies_clauselist_selectivity() function now got a bit
too complex. I've been able to parse it now, but I'm sure I'll have
trouble in the future :-(

Can we refactor / split it somehow and move bits of the logic to smaller
functions, or something like that?

Another thing I'd like to suggest is keeping the "old" formula, and
instead of just replacing it with

   P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b)

but explaining how the old formula may produce nonsensical selectivity,
and how the new formula addresses that issue.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Small docs bugfix: make it clear what can be used in UPDATE FROM and DELETE USING

2020-03-17 Thread Bruce Momjian
On Thu, Feb 13, 2020 at 11:26:45AM -0500, Tom Lane wrote:
> I see where you're coming from, but I do not think that repeating the
> whole from_item syntax in UPDATE and DELETE is the best way forward.
> In the first place, we'd inevitably forget to update those copies,
> and in the second, I'm not sure that the syntax is all that helpful
> without all the supporting text in the SELECT ref page --- which
> surely we aren't going to duplicate.
> 
> I think the real problem with the places Alexey is on about is that
> they're too waffle-y.  They use wording like "similar to", leaving
> one wondering what discrepancies exist but are being papered over.
> In point of fact, as a look into gram.y will show, what you can
> write after UPDATE ... FROM or DELETE ... USING is *exactly* the
> same thing as what you can write after SELECT ... FROM.  So what
> I'm in favor of here is:
> 
> * Change the synopsis entries to look like "FROM from_item [, ...]"
> and "USING from_item [, ...]", so that they match the SELECT
> synopsis exactly.
> 
> * In the text, describe from_item as being exactly the same as
> it is in SELECT.
> 
> (Compare the handling of with_query, which has pretty much the
> same problem of being way too complex to document three times.)

I have implemented the ideas above in the attached patch.  I have
synchronized the syntax to match SELECT, and synchronized the paragraphs
describing the item.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/doc/src/sgml/ref/delete.sgml b/doc/src/sgml/ref/delete.sgml
index df8cea48cf..9f0ef0e681 100644
--- a/doc/src/sgml/ref/delete.sgml
+++ b/doc/src/sgml/ref/delete.sgml
@@ -23,7 +23,7 @@ PostgreSQL documentation
 
 [ WITH [ RECURSIVE ] with_query [, ...] ]
 DELETE FROM [ ONLY ] table_name [ * ] [ [ AS ] alias ]
-[ USING using_list ]
+[ USING using_item [, ...] ]
 [ WHERE condition | WHERE CURRENT OF cursor_name ]
 [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]
 
@@ -117,17 +117,18 @@ DELETE FROM [ ONLY ] table_name [ *

 

-using_list
+using_item
 
  
-  A list of table expressions, allowing columns from other tables
-  to appear in the WHERE condition.  This is similar
-  to the list of tables that can be specified in the  of a
-  SELECT statement; for example, an alias for
-  the table name can be specified.  Do not repeat the target table
-  in the using_list,
-  unless you wish to set up a self-join.
+  A table expression allowing columns from other tables to appear
+  in the WHERE condition.  This is the same as
+  the table that can be specified in the  of a SELECT
+  statement; for example, an alias for the table name can be
+  specified.  Do not repeat the target table as a using_item unless you wish to set
+  up a self-join (in which case it must appear with an alias in the
+  using_item).
  
 

diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index f58dcd8877..d1e74a7c3b 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -27,7 +27,7 @@ UPDATE [ ONLY ] table_name [ * ] [
   ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) |
   ( column_name [, ...] ) = ( sub-SELECT )
 } [, ...]
-[ FROM from_list ]
+[ FROM from_item [, ...] ]
 [ WHERE condition | WHERE CURRENT OF cursor_name ]
 [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]
 
@@ -164,17 +164,18 @@ UPDATE [ ONLY ] table_name [ * ] [

 

-from_list
+from_item
 
  
-  A list of table expressions, allowing columns from other tables
-  to appear in the WHERE condition and the update
-  expressions. This is similar to the list of tables that can be
-  specified in the  of a SELECT
-  statement.  Note that the target table must not appear in the
-  from_list, unless you intend a self-join (in which
-  case it must appear with an alias in the from_list).
+  A table expression allowing columns from other tables to
+  appear in the WHERE condition and update
+  expressions. This is the same as the table that can be specified
+  in the  of a
+  SELECT statement; for example, an alias for
+  the table name can be specified.  Do not repeat the target table
+  as a from_item unless you intend a
+  self-join (in which case it must appear with an alias in the
+  from_item).
  
 

@@ -264,7 +265,7 @@ UPDATE count
   
When a FROM clause is present, what essentially happens
is that the target table is joined to the tables mentioned in the
-   from_list, and each output row of the join
+   from_item list, and each output row of the join
represents an update operation f

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Andres Freund
Hi,

On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > I think Andres was thinking this would maybe be an optimization independent 
> > of
> > is_insert_only (?)
>
> I wasn't sure.

I'm not sure myself - but I'm doubtful that using a 0 min age by default
will be ok.

I was trying to say (in a later email) that I think it might be a good
compromise to opportunistically freeze if we're dirtying the page
anyway, but not optimize WAL emission etc. That's a pretty simple
change, and it'd address a lot of the potential performance regressions,
while still freezing for the "first" vacuum in insert only workloads.


> Add "autovacuum_vacuum_insert_threshold" and
> "autovacuum_vacuum_insert_scale_factor" GUC and reloption.
> The default value for the threshold is 1000.
> The scale factor defaults to 0, which means that it is
> effectively disabled, but it offers some flexibility
> to tune the feature similar to other autovacuum knobs.

I don't think a default scale factor of 0 is going to be ok. For
large-ish tables this will basically cause permanent vacuums. And it'll
sometimes trigger for tables that actually coped well so far. 10 million
rows could be a few seconds, not more.

I don't think that the argument that otherwise a table might not get
vacuumed before autovacuum_freeze_max_age is convincing enough.

a) if that's indeed the argument, we should increase the default
  autovacuum_freeze_max_age - now that there's insert triggered vacuums,
  the main argument against that from before isn't valid anymore.

b) there's not really a good arguments for vacuuming more often than
  autovacuum_freeze_max_age for such tables. It'll not be not frequent
  enough to allow IOS for new data, and you're not preventing
  anti-wraparound vacuums from happening.

Greetings,

Andres Freund




Re: Auxiliary Processes and MyAuxProc

2020-03-17 Thread Alvaro Herrera
On 2020-Mar-17, Justin Pryzby wrote:

> +static PgSubprocess process_types[] = {
> + {
> + .desc = "checker",
> + .entrypoint = CheckerModeMain
> + },
> + {
> + .desc = "bootstrap",
> + .entrypoint = BootstrapModeMain
> + },

Maybe this stuff can be resolved using a technique like rmgrlist.h or
cmdtaglist.h.  That way it's not in two separate files.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Autovacuum on partitioned table

2020-03-17 Thread yuzuko
Hello,

> > > > +  */
> > > > + if (IsAutoVacuumWorkerProcess() &&
> > > > + rel->rd_rel->relispartition &&
> > > > + !(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
> > >
> > > I'm not sure I understand why we do this only on autovac. Why not all
> > > analyzes?
> >
> > +1.  If there is a reason, it should at least be documented in the
> > comment above.
> >
> When we analyze partitioned table by ANALYZE command,
> all inheritors including partitioned table are analyzed
> at the same time.  In this case, if we call pgstat_report_partanalyze,
> partitioned table's changes_since_analyze is updated
> according to the number of analyzed tuples of partitions
> as follows.  But I think it should be 0.
>
> \d+ p
>Partitioned table "public.p"
>  Column |  Type   | Collation | Nullable | Default | Storage | Stats
> target | Description
> +-+---+--+-+-+--+-
>  i  | integer |   |  | | plain   |  |
> Partition key: RANGE (i)
> Partitions: p_1 FOR VALUES FROM (0) TO (100),
>  p_2 FOR VALUES FROM (100) TO (200)
>
> insert into p select * from generate_series(0,199);
> INSERT 0 200
>
> (before analyze)
> -[ RECORD 1 ]---+--
> relname | p
> n_mod_since_analyze | 0
> -[ RECORD 2 ]---+--
> relname | p_1
> n_mod_since_analyze | 100
> -[ RECORD 3 ]---+--
> relname | p_2
> n_mod_since_analyze | 100
>
> (after analyze)
> -[ RECORD 1 ]---+--
> relname | p
> n_mod_since_analyze | 200
> -[ RECORD 2 ]---+--
> relname | p_1
> n_mod_since_analyze | 0
> -[ RECORD 3 ]---+--
> relname | p_2
> n_mod_since_analyze | 0
>
>
> I think if we analyze partition tree in order from leaf partitions
> to root table, this problem can be fixed.
> What do you think about it?
>

Attach the new patch fixes the above problem.  Also, This patch
includes modifications accoring to all comments Alvaro and Amit
mentioned before in this thread.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center


v6_autovacuum_on_partitioned_table.patch
Description: Binary data


Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-17 Thread Tom Lane
I wrote:
> The cfbot will be unhappy at this point, but I need to rebase the
> main patch again ...

And rebased.  Still not quite happy about some of the details in
enforce_generic_type_consistency, and I've not looked at the test
cases or documentation at all.  But this should make the cfbot
happy.

regards, tom lane

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 410eaed..6407d3d 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4784,6 +4784,14 @@ SELECT * FROM pg_attribute

 

+anycompatible
+   
+
+   
+anycompatiblearray
+   
+
+   
 void

 
@@ -4889,6 +4897,34 @@ SELECT * FROM pg_attribute

 

+anycompatible
+Indicates that a function accepts any data type. Values
+are converted to real common type.
+(see ).
+   
+
+   
+anycompatiblearray
+Indicates that a function accepts any array data type. The
+elements of array are converted to common type of these values.
+(see ).
+   
+
+   
+anycompatiblenonarray
+Indicates that a function accepts any non-array data type
+(see ).
+   
+
+   
+anycompatiblerange
+Indicates that a function accepts any range data type
+(see  and
+). The subtype can be used for
+deduction of common type.
+   
+
+   
 cstring
 Indicates that a function accepts or returns a null-terminated C string.

diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index 97706da..931c23c 100644
--- a/doc/src/sgml/extend.sgml
+++ b/doc/src/sgml/extend.sgml
@@ -231,7 +231,7 @@
 
  Five pseudo-types of special interest are anyelement,
  anyarray, anynonarray, anyenum,
- and anyrange,
+ anyrange, anycompatible and anycompatiblearray.
  which are collectively called polymorphic types.
  Any function declared using these types is said to be
  a polymorphic function.  A polymorphic function can
@@ -270,6 +270,15 @@
 
 
 
+ Second family of polymorphic types are types anycompatible and
+ anycompatiblearray. These types are similar to types
+ anyelement and anyarray. The arguments declared
+ as anyelement requires same real type of passed values. For
+ anycompatible's arguments is selected common type, and later
+ all these arguments are casted to this common type.
+
+
+
  Thus, when more than one argument position is declared with a polymorphic
  type, the net effect is that only certain combinations of actual argument
  types are allowed.  For example, a function declared as
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 76fd938..22540bb 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -404,10 +404,6 @@ ConstructTupleDescriptor(Relation heapRelation,
 		 */
 		keyType = amroutine->amkeytype;
 
-		/*
-		 * Code below is concerned to the opclasses which are not used with
-		 * the included columns.
-		 */
 		if (i < indexInfo->ii_NumIndexKeyAttrs)
 		{
 			tuple = SearchSysCache1(CLAOID, ObjectIdGetDatum(classObjectId[i]));
@@ -422,6 +418,10 @@ ConstructTupleDescriptor(Relation heapRelation,
 			 * If keytype is specified as ANYELEMENT, and opcintype is
 			 * ANYARRAY, then the attribute type must be an array (else it'd
 			 * not have matched this opclass); use its element type.
+			 *
+			 * We could also allow ANYCOMPATIBLE/ANYCOMPATIBLEARRAY here, but
+			 * there seems no need to do so; there's no reason to declare an
+			 * opclass as taking ANYCOMPATIBLEARRAY rather than ANYARRAY.
 			 */
 			if (keyType == ANYELEMENTOID && opclassTup->opcintype == ANYARRAYOID)
 			{
diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index 0cac936..6cdda35 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -262,6 +262,9 @@ ProcedureCreate(const char *procedureName,
 		case ANYARRAYOID:
 			variadicType = ANYELEMENTOID;
 			break;
+		case ANYCOMPATIBLEARRAYOID:
+			variadicType = ANYCOMPATIBLEOID;
+			break;
 		default:
 			variadicType = get_element_type(allParams[i]);
 			if (!OidIsValid(variadicType))
diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 5eac55a..694114a 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -320,6 +320,7 @@ interpret_function_parameter_list(ParseState *pstate,
 			switch (toid)
 			{
 case ANYARRAYOID:
+case ANYCOMPATIBLEARRAYOID:
 case ANYOID:
 	/* okay */
 	break;
diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c
index c3fb51d..0a6c051 100644
--- a/src/backend/parser/parse_coerce.c
+++ b/src/backend/parser/parse_coerce.c
@@ -167,15 +167,17 @@ coerce_type(ParseState *pstate, Node *node,
 	}
 	if (targetTyp

Re: WIP: WAL prefetch (another approach)

2020-03-17 Thread Alvaro Herrera
On 2020-Mar-17, Thomas Munro wrote:

Hi Thomas

> On Sat, Mar 14, 2020 at 10:15 AM Alvaro Herrera
>  wrote:

> > I didn't manage to go over 0005 though, but I agree with Tomas that
> > having this be configurable in terms of bytes of WAL is not very
> > user-friendly.
> 
> The primary control is now maintenance_io_concurrency, which is
> basically what Tomas suggested.

> The byte-based control is just a cap to prevent it reading a crazy
> distance ahead, that also functions as the on/off switch for the
> feature.  In this version I've added "max" to the name, to make that
> clearer.

Mumble.  I guess I should wait to comment on this after reading 0005
more in depth.

> > First of all, let me join the crowd chanting that this is badly needed;
> > I don't need to repeat what Chittenden's talk showed.  "WAL recovery is
> > now 10x-20x times faster" would be a good item for pg13 press release,
> > I think.
> 
> We should be careful about over-promising here: Sean basically had a
> best case scenario for this type of techology, partly due to his 16kB
> filesystem blocks.  Common results may be a lot more pedestrian,
> though it could get more interesting if we figure out how to get rid
> of FPWs...

Well, in my mind it's an established fact that our WAL replay uses far
too little of the available I/O speed.  I guess if the system is
generating little WAL, then this change will show no benefit, but that's
not the kind of system that cares about this anyway -- for the others,
the parallelisation gains will be substantial, I'm sure.

> > > From a61b4e00c42ace5db1608e02165f89094bf86391 Mon Sep 17 00:00:00 2001
> > > From: Thomas Munro 
> > > Date: Tue, 3 Dec 2019 17:13:40 +1300
> > > Subject: [PATCH 1/5] Allow PrefetchBuffer() to be called with a 
> > > SMgrRelation.
> > >
> > > Previously a Relation was required, but it's annoying to have
> > > to create a "fake" one in recovery.

> While staring at this, I decided that SharedPrefetchBuffer() was a
> weird word order, so I changed it to PrefetchSharedBuffer().  Then, by
> analogy, I figured I should also change the pre-existing function
> LocalPrefetchBuffer() to PrefetchLocalBuffer().  Do you think this is
> an improvement?

Looks good.  I doubt you'll break anything by renaming that routine.

> > > From acbff1444d0acce71b0218ce083df03992af1581 Mon Sep 17 00:00:00 2001
> > > From: Thomas Munro 
> > > Date: Mon, 9 Dec 2019 17:10:17 +1300
> > > Subject: [PATCH 2/5] Rename GetWalRcvWriteRecPtr() to 
> > > GetWalRcvFlushRecPtr().
> > >
> > > The new name better reflects the fact that the value it returns
> > > is updated only when received data has been flushed to disk.
> > >
> > > An upcoming patch will make use of the latest data that was
> > > written without waiting for it to be flushed, so use more
> > > precise function names.
> >
> > Ugh.  (Not for your patch -- I mean for the existing naming convention).
> > It would make sense to rename WalRcvData->receivedUpto in this commit,
> > maybe to flushedUpto.
> 
> Ok, I renamed that variable and a related one.  There are more things
> you could rename if you pull on that thread some more, including
> pg_stat_wal_receiver's received_lsn column, but I didn't do that in
> this patch.

+1 for that approach.  Maybe we'll want to rename the SQL-visible name,
but I wouldn't burden this patch with that, lest we lose the entire
series to that :-)

> > > + pg_atomic_uint64 writtenUpto;
> >
> > Are we already using uint64s for XLogRecPtrs anywhere?  This seems
> > novel.  Given this, I wonder if the comment near "mutex" needs an
> > update ("except where atomics are used"), or perhaps just move the
> > member to after the line with mutex.
> 
> Moved.

LGTM.

> We use [u]int64 in various places in the replication code.  Ideally
> I'd have a magic way to say atomic so I didn't have to
> assume that pg_atomic_uint64 is the right atomic integer width and
> signedness, but here we are.  In dsa.h I made a special typedef for
> the atomic version of something else, but that's because the size of
> that thing varied depending on the build, whereas our LSNs are of a
> fixed width that ought to be en... .

Let's rewrite Postgres in Rust ...

> > I didn't understand the purpose of inc_counter() as written.  Why not
> > just pg_atomic_fetch_add_u64(..., 1)?
> 
> I didn't want counters that wrap at ~4 billion, but I did want to be
> able to read and write concurrently without tearing.  Instructions
> like "lock xadd" would provide more guarantees that I don't need,
> since only one thread is doing all the writing and there's no ordering
> requirement.  It's basically just counter++, but some platforms need a
> spinlock to perform atomic read and write of 64 bit wide numbers, so
> more hoop jumping is required.

Ah, I see, you don't want lock xadd ... That's non-obvious.  I suppose
the function could use more commentary on *why* you're doing it that way
then.

> > >  /*
> > >   *   smgrprefetch() -- Initiate asynchronous read of the s

Re: Autovacuum on partitioned table

2020-03-17 Thread Alvaro Herrera
On 2020-Mar-18, yuzuko wrote:

> > I think if we analyze partition tree in order from leaf partitions
> > to root table, this problem can be fixed.
> > What do you think about it?
> 
> Attach the new patch fixes the above problem.

Thanks for the new version.

I'm confused about some error messages in the regression test when a
column is mentioned twice, that changed from mentioning the table named
in the vacuum command, to mentioning the first partition.  Is that
because you changed an lappend() to lcons()?  I think you do this so
that the counters accumulate for the topmost parent that will be
processed at the end.  I'm not sure I like that too much ... I think
that needs more thought.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-17 Thread David Rowley
On Mon, 16 Mar 2020 at 06:01, Andy Fan  wrote:
>
> Hi All:
>
> I have re-implemented the patch based on David's suggestion/code,  Looks it
> works well.   The updated patch mainly includes:
>
> 1. Maintain the not_null_colno in RelOptInfo, which includes the not null from
> catalog and the not null from vars.

What about non-nullability that we can derive from means other than
NOT NULL constraints. Where will you track that now that you've
removed the UniqueKeySet type?

Traditionally we use attno or attnum rather than colno for variable
names containing attribute numbers

> 3. postpone the propagate_unique_keys_to_joinrel call to 
> populate_joinrel_with_paths
>   since we know jointype at that time. so we can handle the semi/anti join 
> specially.

ok, but the join type was known already where I was calling the
function from. It just wasn't passed to the function.

> 4. Add the rule I suggested above,  if both of the 2 relation yields the a 
> unique result,
>   the join result will be unique as well. the UK can be  ( (rel1_uk1, 
> rel1_uk2).. )

I see. So basically you're saying that the joinrel's uniquekeys should
be the cartesian product of the unique rels from either side of the
join.  I wonder if that's a special case we need to worry about too
much. Surely it only applies for clauseless joins.

> 5. If the unique key is impossible to be referenced by others,  we can safely 
> ignore
> it in order  to keep the (join)rel->unqiuekeys short.

You could probably have an equivalent of has_useful_pathkeys() and
pathkeys_useful_for_ordering()

> 6. I only consider the not null check/opfamily check for the uniquekey which 
> comes
>from UniqueIndex.  I think that should be correct.
> 7. I defined each uniquekey as List of Expr,  so I didn't introduce new node 
> type.

Where will you store the collation Oid? I left comments to mention
that needed to be checked but just didn't wire it up.




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread James Coleman
On Tue, Mar 17, 2020 at 9:03 PM Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > > I think Andres was thinking this would maybe be an optimization 
> > > independent of
> > > is_insert_only (?)
> >
> > I wasn't sure.
>
> I'm not sure myself - but I'm doubtful that using a 0 min age by default
> will be ok.
>
> I was trying to say (in a later email) that I think it might be a good
> compromise to opportunistically freeze if we're dirtying the page
> anyway, but not optimize WAL emission etc. That's a pretty simple
> change, and it'd address a lot of the potential performance regressions,
> while still freezing for the "first" vacuum in insert only workloads.

If we have truly insert-only tables, then doesn't vacuuming with
freezing every tuple actually decrease total vacuum cost (perhaps
significantly) since otherwise every vacuum keeps having to scan the
heap for dead tuples on pages where we know there are none? Those
pages could conceptually be frozen and ignored, but are not frozen
because of the default behavior, correct?

We have tables that log each change to a business object (as I suspect
many transactional workloads do), and I've often thought that
immediately freeze every page as soon as it fills up would be a real
win for us.

If that's all true, it seems to me that removing that part of the
patch significantly lowers its value.

If we opportunistically freeze only if we're already dirtying a page,
would that help a truly insert-only workload? E.g., are there hint
bits on the page that would need to change the first time we vacuum a
full page with no dead tuples? I would have assumed the answer was
"no" (since if so I think it would follow that _all_ pages need
updated the first time they're vacuumed?). But if that's the case,
then this kind of opportunistic freezing wouldn't help this kind of
workload. Maybe there's something I'm misunderstanding about how
vacuum works though.

Thanks,
James




Re: Collation versioning

2020-03-17 Thread Michael Paquier
On Tue, Mar 17, 2020 at 06:43:51PM +0100, Julien Rouhaud wrote:
> On Tue, Mar 17, 2020 at 05:31:47PM +0100, Christoph Berg wrote:
>> Not sure if that's the case there, but reg* typecasts are very handy
>> when used interactively in ad-hoc queries.
> 
> +1.  I'm obviously biased, but I find it quite useful when querying pg_depend,
> which may become more frequent once we start generating warnings about 
> possibly
> corrupted indexes.

That means less joins for lookup queries, and collations can be
schema-qualified, so I would be in favor of adding it rather than not.
Now, it is true as well that ::regcollation is not a mandatory
requirement for the feature discussed on this thread.
--
Michael


signature.asc
Description: PGP signature


Re: Standby got fatal after the crash recovery

2020-03-17 Thread Ian Barwick

On 2020/03/17 12:53, Thunder wrote:

Sorry.
We are using pg11, and cloned from tag REL_11_BETA2.


In that case you should upgrade to the current version
in the PostgreSQL 11 series (at the time of writing 11.7).


Regards

Ian Barwick

--
Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Standby got fatal after the crash recovery

2020-03-17 Thread Michael Paquier
On Wed, Mar 18, 2020 at 11:19:22AM +0900, Ian Barwick wrote:
> On 2020/03/17 12:53, Thunder wrote:
> > Sorry.
> > We are using pg11, and cloned from tag REL_11_BETA2.
> 
> In that case you should upgrade to the current version
> in the PostgreSQL 11 series (at the time of writing 11.7).

Definitely.  The closest things I can see in this area are 9a1bd8 and
c34f80 which had symptoms similar to what you have mentioned here with
btree_xlog_delete(), but that's past history.
--
Michael


signature.asc
Description: PGP signature


Re: A bug when use get_bit() function for a long bytea string

2020-03-17 Thread movead...@highgo.ca

Hello thanks for the detailed review,

>I think the second argument indicates the bit position, which would be max 
>bytea length * 8. If max bytea length covers whole int32, the second argument 
>>needs to be wider i.e. int64.
Yes, it makes sence and followed.

> Some more comments on the patch
> struct pg_encoding
> {
>- unsigned (*encode_len) (const char *data, unsigned dlen);
>+ int64 (*encode_len) (const char *data, unsigned dlen);
>  unsigned (*decode_len) (const char *data, unsigned dlen);
>  unsigned (*encode) (const char *data, unsigned dlen, char *res);
>  unsigned (*decode) (const char *data, unsigned dlen, char *res);
> Why not use return type of int64 for rest of the functions here as well?
>  res = enc->encode(VARDATA_ANY(data), datalen, VARDATA(result));
>  /* Make this FATAL 'cause we've trodden on memory ... */
>- if (res > resultlen)
>+ if ((int64)res > resultlen)
>
>if we change return type of all those functions to int64, we won't need this 
>cast.
I change the 'encode' function, it needs an int64 return type, but keep other
functions in 'pg_encoding', because I think it of no necessary reason.

>Right now we are using int64 because bytea can be 1GB, but what if we increase
>that limit tomorrow, will int64 be sufficient? That may be unlikely in the near
>future, but nevertheless a possibility. Should we then define a new datatype
>which resolves to int64 right now but really depends upon the bytea length. I
>am not suggesting that we have to do it right now, but may be something to
>think about.
I decide to use int64 because if we really want to increase the limit,  it 
should be
the same change with 'text', 'varchar' which have the same limit. So it may need
a more general struct. Hence I give up the idea.

> hex_enc_len(const char *src, unsigned srclen)
> {
>- return srclen << 1;
>+ return (int64)(srclen << 1);
>
>why not to convert srclen also to int64. That will also change the pg_encoding
>member definitions. But if encoded length requires int64 to fit the possibly
>values, same would be true for the source lengths. Why can't the source length
>also be int64?
>If still we want the cast, I think it should be ((int64)srclen << 1) rather
>than casting the result.
I prefer the '((int64)srclen << 1)' way.

>  /* 3 bytes will be converted to 4, linefeed after 76 chars */
>- return (srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4);
>+ return (int64)((srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4));
>similar comments as above.
 Followed.


>It might help to add a test where we could pass the second argument something
>greater than 1G. But it may be difficult to write such a test case.
Add two test cases.




Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


long_bytea_string_bug_fix_ver3.patch
Description: Binary data


Re: ALTER tbl rewrite loses CLUSTER ON index

2020-03-17 Thread Michael Paquier
On Tue, Mar 17, 2020 at 11:20:44AM -0500, Justin Pryzby wrote:
> On Tue, Mar 17, 2020 at 02:33:32PM +0900, Michael Paquier wrote:
>> Patch 0002 from Justin does that, I would keep this refactoring as
>> HEAD-only material though, and I don't spot any other code paths in
>> need of patching.
>> 
>> The commit message of patch 0001 is not what you wanted I guess.
> 
> That's what git-am does, and I didn't find any option to make it less
> unreadable.  I guess I should just delete the email body it inserts.

Strange...

Anyway, Tom, Alvaro, are you planning to look at what is proposed on
this thread?  I don't want to step on your toes if that's the case and
it seems to me that the approach taken by the patch is sound, using as
basic fix the addition of an AT_ClusterOn sub-command to the list of
commands to execute when rebuilding the table, ensuring that any
follow-up CLUSTER command will use the correct index.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-17 Thread Andy Fan
Hi David:

Thanks for your time.

On Wed, Mar 18, 2020 at 9:56 AM David Rowley  wrote:

> On Mon, 16 Mar 2020 at 06:01, Andy Fan  wrote:
> >
> > Hi All:
> >
> > I have re-implemented the patch based on David's suggestion/code,  Looks
> it
> > works well.   The updated patch mainly includes:
> >
> > 1. Maintain the not_null_colno in RelOptInfo, which includes the not
> null from
> > catalog and the not null from vars.
>
> What about non-nullability that we can derive from means other than
> NOT NULL constraints. Where will you track that now that you've
> removed the UniqueKeySet type?
>

I tracked it in 'deconstruct_recurse',  just before
the distribute_qual_to_rels call.

+   ListCell*lc;
+   foreach(lc, find_nonnullable_vars(qual))
+   {
+   Var *var = lfirst_node(Var, lc);
+   RelOptInfo *rel =
root->simple_rel_array[var->varno];
+   if (var->varattno > InvalidAttrNumber)
+   rel->not_null_cols =
bms_add_member(rel->not_null_cols, var->varattno);
+   }


> Traditionally we use attno or attnum rather than colno for variable
> names containing attribute numbers
>

Currently I use a list of Var for a UnqiueKey,   I guess it is ok?


>
> > 3. postpone the propagate_unique_keys_to_joinrel call to
> populate_joinrel_with_paths
> >   since we know jointype at that time. so we can handle the semi/anti
> join specially.
>
> ok, but the join type was known already where I was calling the
> function from. It just wasn't passed to the function.
>
> > 4. Add the rule I suggested above,  if both of the 2 relation yields the
> a unique result,
> >   the join result will be unique as well. the UK can be  ( (rel1_uk1,
> rel1_uk2).. )
>
> I see. So basically you're saying that the joinrel's uniquekeys should
> be the cartesian product of the unique rels from either side of the
> join.  I wonder if that's a special case we need to worry about too
> much. Surely it only applies for clauseless joins


Some other cases we may need this as well:).   like select m1.pk, m2.pk
from m1, m2
where m1.b = m2.b;

The cartesian product of the unique rels will make the unqiue keys too
long,  so I maintain
the UnqiueKeyContext to make it short.  The idea is if  (UK1) is unique
already,  no bother
to add another UK as (UK1, UK2) which is just a superset of it.


>
>
> 5. If the unique key is impossible to be referenced by others,  we can
> safely ignore
> > it in order  to keep the (join)rel->unqiuekeys short.
>
> You could probably have an equivalent of has_useful_pathkeys() and
> pathkeys_useful_for_ordering()
>
>
Thanks for suggestion,  I will do so  in the v5-xx.patch.


> > 6. I only consider the not null check/opfamily check for the uniquekey
> which comes
> >from UniqueIndex.  I think that should be correct.
> > 7. I defined each uniquekey as List of Expr,  so I didn't introduce new
> node type.
>
> Where will you store the collation Oid? I left comments to mention
> that needed to be checked but just didn't wire it up.
>

This is too embarrassed,  I am not sure if it is safe to ignore it.  I
removed it due to
the following reasons (sorry for that I didn't explain it carefully for the
last email).

1.  When we choose if an UK is usable,  we have chance to compare the
collation info
for  restrictinfo  (where uk = 1) or target list (select uk from t) with
the indexinfo's collation,
the targetlist one has more trouble since we need to figure out the default
collation  for it.
However relation_has_unique_index_for has the same situation as us,  it
ignores it as well.
See comment /* XXX at some point we may need to check collations here too.
*/. It think
if there are some reasons we can ignore that.

2. What we expect from UK is:
a).  Where m1.uniquekey = m2.b   m2.uk will not be duplicated by this
joinclause.   Here
if m1.uk has different collation, it will raise runtime error.
b).  Where m1.uniquekey collate '' = m2.b.   We may can't depends on
the run-time error this time. But if we are sure that  *if uk is uk at
default collation is unique,
then (uk collcate 'other-colation') is unique as well**.  if so we may safe
ignore it as well.
c).  select uniquekey from t / select uniquekey collate ''  from t.
This have the same
requirement as item b).

3).  Looks maintain the collation for our case totally is a big effort,
and user rarely use it,  If
my expectation for 2.b is not true,  I prefer to detect such case (user use
a different collation),
we can just ignore the UK for that.

But After all, I think this should be an open question for now.

---
At last,  I am so grateful for your suggestion/feedback,  that's really
heuristic and constructive.
And so thanks Tom's for the quick review and suggest to add a new fields
for RelOptInfo,
without it I don't think I can add a new field to a so important struct.
And als

Re: Small docs bugfix: make it clear what can be used in UPDATE FROM and DELETE USING

2020-03-17 Thread Tom Lane
Bruce Momjian  writes:
> I have implemented the ideas above in the attached patch.  I have
> synchronized the syntax to match SELECT, and synchronized the paragraphs
> describing the item.

I think that the DELETE synopsis should look like

[ USING from_item [, ...] ]

so that there's not any question which part of the SELECT syntax we're
talking about.  I also think that the running text in both cases should
say in exactly these words "from_item means the same thing as it does
in SELECT"; the wording you propose still seems to be dancing around
the point, leaving readers perhaps not quite sure about what is meant.

In the DELETE case you could alternatively say "using_item means the same
thing as from_item does in SELECT", but that doesn't really seem like an
improvement to me.

regards, tom lane




Re: adding partitioned tables to publications

2020-03-17 Thread Amit Langote
Hi Peter,

On Mon, Mar 16, 2020 at 9:49 PM Peter Eisentraut
 wrote:
>
> I was trying to extract some preparatory work from the remaining patches
> and came up with the attached.  This is part of your patch 0003, but
> also relevant for part 0004.  The problem was that COPY (SELECT *) is
> not sufficient when the table has generated columns, so we need to build
> the column list explicitly.
>
> Thoughts?

Thank you for that.

+   if (isnull || !remote_is_publishable)
+   ereport(ERROR,
+   (errmsg("table \"%s.%s\" on the publisher is not publishable",
+   nspname, relname)));

Maybe add a one-line comment above this to say it's an "not supposed
to happen" error or am I missing something?  Wouldn't elog() suffice
for this?

Other than that, looks good.

--
Thank you,
Amit




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-17 Thread Amit Kapila
On Tue, Mar 17, 2020 at 6:32 PM Dilip Kumar  wrote:
>
> Your changes look fine to me.  I have also verified all the test and
> everything works fine.
>

I have pushed the first patch.  I will push the others in coming days.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Missing errcode() in ereport

2020-03-17 Thread Amit Kapila
On Tue, Mar 17, 2020 at 7:39 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Tue, Mar 17, 2020 at 3:28 PM Michael Paquier  wrote:
> >> Definitely an oversight.  All stable branches down to 9.5 have
> >> mistakes in the same area, with nothing extra by grepping around.
> >> Amit, I guess that you will take care of it?
>
> > Yes, I will unless I see any objections in a day or so.
>
> No need to wait, it's a pretty obvious thinko.
>

Okay, I will push in some time.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby
On Tue, Mar 17, 2020 at 09:58:53PM -0400, James Coleman wrote:
> On Tue, Mar 17, 2020 at 9:03 PM Andres Freund  wrote:
> >
> > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > > > I think Andres was thinking this would maybe be an optimization 
> > > > independent of
> > > > is_insert_only (?)
> > >
> > > I wasn't sure.
> >
> > I'm not sure myself - but I'm doubtful that using a 0 min age by default
> > will be ok.
> >
> > I was trying to say (in a later email) that I think it might be a good
> > compromise to opportunistically freeze if we're dirtying the page
> > anyway, but not optimize WAL emission etc. That's a pretty simple
> > change, and it'd address a lot of the potential performance regressions,
> > while still freezing for the "first" vacuum in insert only workloads.
> 
> If we have truly insert-only tables, then doesn't vacuuming with
> freezing every tuple actually decrease total vacuum cost (perhaps
> significantly) since otherwise every vacuum keeps having to scan the
> heap for dead tuples on pages where we know there are none? Those
> pages could conceptually be frozen and ignored, but are not frozen
> because of the default behavior, correct?

The essential part of this patch is to trigger vacuum *at all* on an
insert-only table.  Before today's updated patch, it also used FREEZE on any
table which hit the new insert threshold.  The concern I raised is for
insert-MOSTLY tables.  I thought it might be an issue if repeatedly freezing
updated tuples caused vacuum to be too slow, especially if they're distributed
in pages all across the table rather than clustered.

And I asked that the behavior (FREEZE) be configurable by a separate setting
than the one that triggers autovacuum to run.  FREEZE is already controlled by
the vacuum_freeze_table_age param.

I think you're right that VACUUM FREEZE on an insert-only table would be less
expensive than vacuum once without freeze and vacuum again later, which uses
freeze.  To me, that suggests setting vacuum_freeze_table_age to a low value on
those tables.

Regular vacuum avoids scanning all-visible pages, so for an insert-only table
pages should only be vacuumed once (if frozen the 1st time) or twice (if not).

 * Except when aggressive is set, we want to skip pages that are
 * all-visible according to the visibility map, but only when we can 
skip

postgres=# CREATE TABLE t (i int) ; INSERT INTO t SELECT 
generate_series(1,99); VACUUM VERBOSE t; VACUUM VERBOSE t;
...
INFO:  "t": found 0 removable, 99 nonremovable row versions in 4425 out of 
4425 pages
...
VACUUM
Time: 106.038 ms
INFO:  "t": found 0 removable, 175 nonremovable row versions in 1 out of 4425 
pages
VACUUM
Time: 1.828 ms

=> That's its not very clear way of saying that it only scanned 1 page the 2nd
time around.

> We have tables that log each change to a business object (as I suspect
> many transactional workloads do), and I've often thought that
> immediately freeze every page as soon as it fills up would be a real
> win for us.
> 
> If that's all true, it seems to me that removing that part of the
> patch significantly lowers its value.

> If we opportunistically freeze only if we're already dirtying a page,
> would that help a truly insert-only workload? E.g., are there hint
> bits on the page that would need to change the first time we vacuum a
> full page with no dead tuples? I would have assumed the answer was
> "no" (since if so I think it would follow that _all_ pages need
> updated the first time they're vacuumed?).

You probably know that hint bits are written by the first process to access the
tuple after it was written.  I think you're asking if the first *vacuum*
requires additional writes beyond that.  And I think vacuum wouldn't touch the
page until it decides to freeze tuples.

I do have a patch to display the number of hint bits written and pages frozen.
https://www.postgresql.org/message-id/flat/20200126141328.GP13621%40telsasoft.com

> But if that's the case, then this kind of opportunistic freezing wouldn't
> help this kind of workload. Maybe there's something I'm misunderstanding
> about how vacuum works though.

I am reminding myself about vacuum with increasing frequency and usually still
learn something new.

-- 
Justin




Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-17 Thread David Rowley
On Wed, 18 Mar 2020 at 15:57, Andy Fan  wrote:
> I'm now writing the code for partition index stuff, which
> is a bit of boring, since every partition may have different unique index.

Why is that case so different?

For a partitioned table to have a valid unique index, a unique index
must exist on each partition having columns that are a superset of the
partition key columns. An IndexOptInfo will exist on the partitioned
table's RelOptInfo, in this case.

At the leaf partition level, wouldn't you just add the uniquekeys the
same as we do for base rels? Maybe only do it if
enable_partitionwise_aggregation is on. Otherwise, I don't think we'll
currently have a need for them.  Currently, we don't do unique joins
for partition-wise joins. Perhaps uniquekeys will be a good way to fix
that omission in the future.




Re: Online checksums verification in the backend

2020-03-17 Thread Michael Paquier
On Mon, Mar 16, 2020 at 09:21:22AM +0100, Julien Rouhaud wrote:
> On Mon, Mar 16, 2020 at 12:29:28PM +0900, Michael Paquier wrote:
>> With a large amount of
>> shared buffer eviction you actually increase the risk of torn page
>> reads.  Instead of a logic relying on partition mapping locks, which
>> could be unwise on performance grounds, did you consider different
>> approaches?  For example a kind of pre-emptive lock on the page in
>> storage to prevent any shared buffer operation to happen while the
>> block is read from storage, that would act like a barrier.
> 
> Even with a workload having a large shared_buffers eviction pattern, I don't
> think that there's a high probability of hitting a torn page.  Unless I'm
> mistaken it can only happen if all those steps happen concurrently to doing 
> the
> block read just after releasing the LWLock:
> 
> - postgres read the same block in shared_buffers (including all the locking)
> - dirties it
> - writes part of the page
> 
> It's certainly possible, but it seems so unlikely that the optimistic 
> lock-less
> approach seems like a very good tradeoff.

Having false reports in this area could be very confusing for the
user.  That's for example possible now with checksum verification and
base backups.

>> I guess that this leads to the fact that this function may be better as
>> a contrib module, with the addition of some better-suited APIs in core
>> (see paragraph above).
> 
> Below?

Above.  This thought more precisely:
>> For example a kind of pre-emptive lock on the page in
>> storage to prevent any shared buffer operation to happen while the
>> block is read from storage, that would act like a barrier.

> For the record when I first tested that feature I did try to check dirty
> blocks, and it seemed that dirty blocks of shared relation were sometimes
> wrongly reported as corrupted.  I didn't try to investigate more though.

Hmm.  It would be good to look at that, correct verification of shared
relations matter.

>> + * - if a block is dirty in shared_buffers, it's ignored as it'll be 
>> flushed to
>> + *   disk either before the end of the next checkpoint or during recovery in
>> + *   case of unsafe shutdown
>> Not sure that the indentation is going to react well on that part of
>> the patch, perhaps it would be better to add some "/*---" at the
>> beginning and end of the comment block to tell pgindent to ignore this
>> part?
> 
> Ok.  Although I think only the beginning comment is needed?

From src/tools/pgindent/README:
"pgindent will reflow any comment block that's not at the left margin.
If this messes up manual formatting that ought to be preserved,
protect the comment block with some dashes:"

/*--
 * Text here will not be touched by pgindent.
 *--
 */

>> Based on the feedback gathered on this thread, I guess that you should
>> have a SRF returning the list of broken blocks, as well as NOTICE
>> messages.
> 
> The current patch has an SRF and a WARNING message, do you want an additional
> NOTICE message or downgrade the existing one?

Right, not sure which one is better, for zero_damaged_pages a WARNING
is used.
--
Michael


signature.asc
Description: PGP signature


RE: Multivariate MCV list vs. statistics target

2020-03-17 Thread Shinoda, Noriyoshi (PN Japan A&PS Delivery)
Hello,

I found a missing column description in the pg_statistic_ext catalog document 
for this new feature.
The attached patch adds a description of the 'stxstattarget' column to 
pg_statistic_ext catalog's document. 
If there is a better explanation, please correct it.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com] 
Sent: Wednesday, September 11, 2019 7:28 AM
To: Alvaro Herrera 
Cc: Thomas Munro ; Jamison, Kirk 
; Dean Rasheed ; PostgreSQL 
Hackers 
Subject: Re: Multivariate MCV list vs. statistics target

On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote:
>On 2019-Aug-01, Tomas Vondra wrote:
>
>> I'll move it to the next CF. Aside from the issues pointed by 
>> Kyotaro-san in his review, I still haven't made my mind about whether 
>> to base the use statistics targets set for the attributes. That's 
>> what we're doing now, but I'm not sure it's a good idea after adding 
>> separate statistics target.
>> I wonder what Dean's opinion on this is, as he added the current logic.
>
>Latest patch no longer applies.  Please update.  And since you already 
>seem to have handled all review comments since it was Ready for 
>Committer, and you now know Dean's opinion on the remaining question, 
>please get it pushed.
>

OK, I've pushed this the patch, after some minor polishing.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





pg_statistic_ext_doc_v1.diff
Description: pg_statistic_ext_doc_v1.diff


Re: Adding missing object access hook invocations

2020-03-17 Thread Michael Paquier
On Tue, Mar 17, 2020 at 12:39:35PM -0700, Mark Dilger wrote:
> I agree that this does not need to be back-patched.  I was debating
> whether it constitutes a bug for the purpose of putting the fix into
> v13 vs. punting the patch forward to the v14 cycle.  I don't have a
> strong opinion on that.

I don't see any strong argument against fixing this stuff in v13,
FWIW.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-17 Thread Andy Fan
Hi David:

On Wed, Mar 18, 2020 at 12:13 PM David Rowley  wrote:

> On Wed, 18 Mar 2020 at 15:57, Andy Fan  wrote:
> > I'm now writing the code for partition index stuff, which
> > is a bit of boring, since every partition may have different unique
> index.
>
> Why is that case so different?
>
> For a partitioned table to have a valid unique index, a unique index
> must exist on each partition having columns that are a superset of the
> partition key columns. An IndexOptInfo will exist on the partitioned
> table's RelOptInfo, in this case.
>
> The main difference are caused:

1.  we can create unique index on some of  partition only.

create table q100 (a int, b int, c int) partition by range (b);
create table q100_1 partition of q100 for values from (1) to (10);
create table q100_2 partition of q100 for values from (11) to (20);
create unique index q100_1_c on q100_1(c);   // user may create this index
on q100_1 only

2.  The unique index may not contains part key as above.

For the above case, even the same index on all the partition as well, we
still can't
use it since the it unique on local partition only.

3.  So the unique index on a partition table can be used only if it
contains the partition key
AND exists on all the partitions.

4.  When we apply the uniquekey_is_useful_for_rel,  I compare the
information between ind->indextlist
and rel->reltarget,  but the indextlist has a wrong varno,  we we have to
change the varno with
ChangeVarNodes for the indextlist from childrel since the varno is for
childrel.

5.  When we detect the  uk = 1 case,  the uk is also present with
parentrel->relid information, which
we may requires the ChangeVarNodes on childrel->indexinfo->indextlist  as
well.

Even the rules looks long,   The run time should be very short since
usually we don't have
many unique index on partition table.


> At the leaf partition level, wouldn't you just add the uniquekeys the
> same as we do for base rels?


Yes, But due to the uk of a childrel may be not useful for parent rel (the
uk only exist
in one partiton),  so I think we can bypass if it is a child rel case?


Best Regards
Andy Fan


Re: Missing errcode() in ereport

2020-03-17 Thread Amit Kapila
On Wed, Mar 18, 2020 at 9:01 AM Amit Kapila  wrote:
>
> On Tue, Mar 17, 2020 at 7:39 PM Tom Lane  wrote:
> >
> > Amit Kapila  writes:
> > > On Tue, Mar 17, 2020 at 3:28 PM Michael Paquier  
> > > wrote:
> > >> Definitely an oversight.  All stable branches down to 9.5 have
> > >> mistakes in the same area, with nothing extra by grepping around.
> > >> Amit, I guess that you will take care of it?
> >
> > > Yes, I will unless I see any objections in a day or so.
> >
> > No need to wait, it's a pretty obvious thinko.
> >
>
> Okay, I will push in some time.
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: control max length of parameter values logged

2020-03-17 Thread keisuke kuroda
Hi,

BIND parameter truncation is good to me.
Logs could be very hard to read due to
the very long parameters recorded.

Now, parameter is extracuted from left.
e.g. "----" to "--..."

Is not necessary from right?
e.g. "----" to "...--"

It is sometimes nice to be able to check
the last of strings. For example, if there
are difference only in the last of strings
of many parameters.

Best Regards,
Keisuke Kuroda




GSoC applicant proposal, Uday PB

2020-03-17 Thread p.b uday
Hi PostgreSQL team,
I am looking forward to participating in the GSoC with PostgreSQL this
summer. Below is my draft proposal for your review. Any feedback would be
greatly appreciated.

*PL/Java online documentation improvements*



*Project goal:*

The goal of the project is to improve the PL/JAVA online documentation
website in terms of appearance, usability, information showcase and
functionality.



*Project deliverables:*

1)A PL/JAVA documentation website having four major categories ( Usage,
release information, modules and project documentation).

2)Auto generated documentation pages for the above-mentioned sections.

3)A page showing documentation tree for more than one release of
PL/JAVA.

4)A custom package which uses maven-site-plugin to generate new
webpages whenever required.

5)A style (CSS) framework class that has all the PostgreSQL style
configurations ready to use.

6)An additional template for about us page or any additional
information required based on velocity template.



*Project implementation plan:*

The project can be divided into three phases as follows:

1)*CSS framework class and Velocity templates:*

Analyze the existing PostgreSQL documentation website and identify all the
style including typography, color, theme, headers and footers. Create a CSS
class file having the color scheme, footer and header styles, font styles
and text sizes. Use existing velocity engine code to accommodate new
release documentation.



2)*Maven-site-plugin package with custom skins:*

Since the PL/Java documentation should look like the original PostgreSQL
documentation, create a template file and a new skin using the
maven-site-plugin from apache. Document the template file configurations
and skin styles. Create a multi-module site menu (
https://maven.apache.org/plugins/maven-site-plugin/examples/multimodule.html)
that would accommodate multiple releases of PL/Java.  The sub-files will
inherit the site URL from parent POM. The new pages containing the newer
release information would inherit data from JavaDoc using Velocity.



3)*Documentation and Testing:*

Document the new documentation tree menu developed using maven-site-plugin
and the skin. Create documentation for the CSS style class and velocity
plugin changes.















*Timeline:*

This week-by-week timeline provides a rough guideline of how the project
will be done.



May 4 – May 16

Familiarize with website and velocity. Start with simple CSS framework
class to clone the original PostgreSQL documentation website styles and
themes.



May 17 – May 30

Start writing the CSS framework class and a new routine to import new
release data using maven-site-plugin and velocity. Maintain a configuration
file to track the progress.





May 31 – June 5

Make minor changes to the existing Pl/JAVA documentation website template
to visualize the improvements. Start implementing multi-module menu changes
in the website. Add new skin using maven-site-plugin skins (which will use
styles from the CSS framework class).



June 10 – June 25

Create a dummy data file to test the documentation trees multi-module menu.
This should include at least two dummy release data to test the page
aesthetics and user experience. Conduct feedback survey to analyze the new
look of the website and the user experience.



July 1 – July 10

Improve minor typography issues and color schemes across the website and
its pages. Document the test routines and the website code base. Also
document the skin and style configuration used.







*About me :*

I am a computer science graduate student at Arizona State University
pursuing projects in Node.JS, MySQL and react-native. I am part of the
Google developer’s club here where I volunteer to contribute on several of
their projects including some applications that help students find the
right professor using HTML, CSS, Velocity and GitHub pages.  I have a
combined experience of 5 years developing and deploying websites using
several frameworks including Vue.JS, Reast.JS and site generators. I have
used PostgreSQL in some of my projects that involve analysis and querying
of geo-spatial data and hence I would like to contribute to this project.







[image: Mailtrack]

Sender
notified by
Mailtrack

03/17/20,
10:11:21 PM


Re: WIP: WAL prefetch (another approach)

2020-03-17 Thread Thomas Munro
On Wed, Mar 18, 2020 at 2:47 PM Alvaro Herrera  wrote:
> On 2020-Mar-17, Thomas Munro wrote:
> > I didn't want counters that wrap at ~4 billion, but I did want to be
> > able to read and write concurrently without tearing.  Instructions
> > like "lock xadd" would provide more guarantees that I don't need,
> > since only one thread is doing all the writing and there's no ordering
> > requirement.  It's basically just counter++, but some platforms need a
> > spinlock to perform atomic read and write of 64 bit wide numbers, so
> > more hoop jumping is required.
>
> Ah, I see, you don't want lock xadd ... That's non-obvious.  I suppose
> the function could use more commentary on *why* you're doing it that way
> then.

I updated the comment:

+/*
+ * On modern systems this is really just *counter++.  On some older systems
+ * there might be more to it, due to inability to read and write 64 bit values
+ * atomically.  The counters will only be written to by one process, and there
+ * is no ordering requirement, so there's no point in using higher overhead
+ * pg_atomic_fetch_add_u64().
+ */
+static inline void inc_counter(pg_atomic_uint64 *counter)

> > > Umm, I would keep the return values of both these functions in sync.
> > > It's really strange that PrefetchBuffer does not return
> > > PrefetchBufferResult, don't you think?
> >
> > Agreed, and changed.  I suspect that other users of the main
> > PrefetchBuffer() call will eventually want that, to do a better job of
> > keeping the request queue full, for example bitmap heap scan and
> > (hypothetical) btree scan with prefetch.
>
> LGTM.

Here's a new version that changes that part just a bit more, after a
brief chat with Andres about his async I/O plans.  It seems clear that
returning an enum isn't very extensible, so I decided to try making
PrefetchBufferResult a struct whose contents can be extended in the
future.  In this patch set it's still just used to distinguish 3 cases
(hit, miss, no file), but it's now expressed as a buffer and a flag to
indicate whether I/O was initiated.  You could imagine that the second
thing might be replaced by a pointer to an async I/O handle you can
wait on or some other magical thing from the future.

The concept here is that eventually we'll have just one XLogReader for
both read ahead and recovery, and we could attach the prefetch results
to the decoded records, and then recovery would try to use already
looked up buffers to avoid a bit of work (and then recheck).  In other
words, the WAL would be decoded only once, and the buffers would
hopefully be looked up only once, so you'd claw back all of the
overheads of this patch.  For now that's not done, and the buffer in
the result is only compared with InvalidBuffer to check if there was a
hit or not.

Similar things could be done for bitmap heap scan and btree prefetch
with this interface: their prefetch machinery could hold onto these
results in their block arrays and try to avoid a more expensive
ReadBuffer() call if they already have a buffer (though as before,
there's a small chance it turns out to be the wrong one and they need
to fall back to ReadBuffer()).

> As before, I didn't get to reading 0005 in depth.

Updated to account for the above-mentioned change, and with a couple
of elog() calls changed to ereport().
From 94df05846b155dfc68997f17899ddb34637d868a Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 17 Mar 2020 15:25:55 +1300
Subject: [PATCH 1/5] Allow PrefetchBuffer() to be called with a SMgrRelation.

Previously a Relation was required, but it's annoying to have to create
a "fake" one in recovery.  A new function PrefetchSharedBuffer() is
provided that works with SMgrRelation, and LocalPrefetchBuffer() is
renamed to PrefetchLocalBuffer() to fit with that more natural naming
scheme.

Reviewed-by: Alvaro Herrera 
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c   | 84 ---
 src/backend/storage/buffer/localbuf.c |  4 +-
 src/include/storage/buf_internals.h   |  2 +-
 src/include/storage/bufmgr.h  |  6 ++
 4 files changed, 59 insertions(+), 37 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e05e2b3456..d30aed6fd9 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -466,6 +466,53 @@ static int	ckpt_buforder_comparator(const void *pa, const void *pb);
 static int	ts_ckpt_progress_comparator(Datum a, Datum b, void *arg);
 
 
+/*
+ * Implementation of PrefetchBuffer() for shared buffers.
+ */
+void
+PrefetchSharedBuffer(struct SMgrRelationData *smgr_reln,
+	 ForkNumber forkNum,
+	 BlockNumber blockNum)
+{
+#ifdef USE_PREFETCH
+	BufferTag	newTag;		/* identity of requested block */
+	uint32		newHash;	/* hash value for newTag */
+	LWLock	   *newPartitionLock;	/* buffer partition lock for it */
+	int			buf_id;
+
+	Assert(BlockN

  1   2   >