Re: [PATCH] Add section headings to index types doc

2020-10-25 Thread Jürgen Purtz

On 23.10.20 18:08, David G. Johnston wrote:
On Fri, Oct 23, 2020 at 3:18 AM Jürgen Purtz > wrote:



and add a link to the "CREATE INDEX" command from the chapter
preamble.

is the link necessary?


I suppose it would make more sense to add it to the previous section - 
the introduction page.  I do think having a link (or more than one) to 
CREATE INDEX from the Indexes chapter is reader friendly.  Having 
links to SQL commands is never truly necessary - the reader knows a 
SQL command reference exists and the name of the command allows them 
to find the correct page.


David J.

I'm afraid I haven't grasped everything of your intentions and 
suggestions of your last two mails.


- equal operator in standalone paragraph: ok, integrated.

- shift "create index ... using HASH" to a different place: You suggest 
shifting the statement or a link to the previous (sub-)chapter "11.1 
Introduction"? But there is already a "create index" example. Please 
read my suggestion/modification in the first paragraph of the "11.2 
Index Types" page.


- "rewording hash": I don't know what is missing here. But I have added 
a few words about the nature of this index type.


Attached are two patches: a) summary against master and b) standalone of 
my current changes.


--

J. Purtz

diff --git a/doc/src/sgml/0001-Add-section-headers-to-index-types-doc_1.patch b/doc/src/sgml/0001-Add-section-headers-to-index-types-doc_1.patch
new file mode 100644
index 00..1919080bfc
--- /dev/null
+++ b/doc/src/sgml/0001-Add-section-headers-to-index-types-doc_1.patch
@@ -0,0 +1,183 @@
+>From 84522dc77afd1b8ce0bf111279302888d9d3edcb Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
+Date: Fri, 31 Jul 2020 10:20:48 +0100
+Subject: [PATCH] Add section headers to index types doc
+
+This makes it easier to compare the properties of different index
+types at a glance.
+
+In passing, make the index operator lists a single line each.
+---
+ doc/src/sgml/indices.sgml | 81 +++
+ 1 file changed, 39 insertions(+), 42 deletions(-)
+
+diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
+index 671299ff05..f22253f4c3 100644
+--- a/doc/src/sgml/indices.sgml
 b/doc/src/sgml/indices.sgml
+@@ -122,6 +122,9 @@
+B-tree indexes, which fit the most common situations.
+   
+ 
++  
++   B-tree
++
+   
+
+ index
+@@ -137,13 +140,9 @@
+will consider using a B-tree index whenever an indexed column is
+involved in a comparison using one of these operators:
+ 
+-   
+-<
+-<=
+-=
+->=
+->
+-   
++
++<   <=   =   >=   >
++
+ 
+Constructs equivalent to combinations of these operators, such as
+BETWEEN and IN, can also be implemented with
+@@ -172,6 +171,10 @@
+This is not always faster than a simple scan and sort, but it is
+often helpful.
+   
++  
++
++  
++   Hash
+ 
+   
+
+@@ -191,6 +194,10 @@
+ CREATE INDEX name ON table USING HASH (column);
+ 
+   
++  
++
++  
++   GiST
+ 
+   
+
+@@ -210,20 +217,9 @@
+for several two-dimensional geometric data types, which support indexed
+queries using these operators:
+ 
+-   
+-<<
+-&<
+-&>
+->>
+-<<|
+-&<|
+-|&>
+-|>>
+-@>
+-<@
+-~=
+-&&
+-   
++
++<<   &<   &>   >>   <<|   &<|   |&>   |>>   @>   <@   ~=   &&
++
+ 
+(See  for the meaning of
+these operators.)
+@@ -246,6 +242,10 @@
+In , operators that can be
+used in this way are listed in the column Ordering Operators.
+   
++  
++
++  
++   SP-GiST
+ 
+   
+
+@@ -264,14 +264,9 @@
+for two-dimensional points, which support indexed
+queries using these operators:
+ 
+-   
+-<<
+->>
+-~=
+-<@
+-<^
+->^
+-   
++
++<<   >>   ~=   <@   <^   >^
++
+ 
+(See  for the meaning of
+these operators.)
+@@ -286,6 +281,10 @@
+corresponding operator is specified in the Ordering Operators
+column in .
+   
++  
++
++  
++   GIN
+ 
+   
+
+@@ -312,12 +311,9 @@
+PostgreSQL includes a GIN operator class
+for arrays, which supports indexed queries using these operators:
+ 
+-   
+-<@
+-@>
+-=
+-&&
+-   
++
++<@   @>   =   &&
++
+ 
+(See  for the meaning of
+these operators.)
+@@ -327,6 +323,10 @@
+classes are available in the contrib collection or as separate
+projects.  For more information see .
+   
++  
++
++  
++   BRIN
+ 
+   
+
+@@ -348,18 +348,15 @@
+values in the column for each block range.  This supports indexed queries
+using these operators:
+ 
+-   
+-<
+-<=
+-=
+->=
+->
+-   
++
++<   <=   =   >=   >
++
+ 
+The BRIN operator classes included in the standard distribution are
+documented in .
+For more information see .
+   
++  
+  
+ 
+ 
+-- 
+2.27.0
+
diff --git a/doc/src/sgml/0002-Reindent-index-types-docs-after-previous-commit.patch b/doc/src/sgml/0002-Reindent-index-types-docs-after

Re: Fix typo in src/backend/utils/cache/lsyscache.c

2020-10-25 Thread David Rowley
On Sun, 25 Oct 2020 at 14:23, Hou, Zhijie  wrote:
> I found the comment of function get_attgenerated(Oid relid, AttrNumber 
> attnum) seems wrong.
> It seems the function is given the attribute number not the name.

Thanks. Pushed.

David




Re: Collation versioning

2020-10-25 Thread Julien Rouhaud
On Sun, Oct 25, 2020 at 10:36 AM Thomas Munro  wrote:
>
> On Fri, Oct 23, 2020 at 2:07 AM Julien Rouhaud  wrote:
> > While reviewing the changes, I found a couple of minor issues
> > (inherited from the previous versions).  It's missing a
> > free_objects_addresses in recordDependencyOnCollations, and there's a
> > small typo.  Inline diff:
>
> Thanks, fixed.
>
> I spent the past few days trying to simplify this patch further.
> Here's what I came up with:

Thanks!

>
> 1.  Drop the function dependencyExists() and related code, which in
> earlier versions tried to avoid creating duplicate pg_depends rows by
> merging with existing rows.  This was rather complicated, and there
> are not many duplicates anyway, and it's easier to suppress duplicate
> warnings at warning time (see do_collation_version_check()).  (I'm not
> against working harder to make pg_depend rows unique, but it's not
> required for this and I didn't like that way of doing it.)

I didn't review all the changes yet, so I'll probably post a deeper
review tomorrow.  I'm not opposed to this new approach, as it indeed
saves a lot of code.  However, looking at
do_collation_version_check(), it seems that you're saving the
collation in context->checked_calls even if it didn't raise a WARNING.
Since you can now have indexes with dependencies on a same collation
with both version tracked and untracked (see for instance
icuidx00_val_pattern_where in the regression tests), can't this hide
corruption warning reports if the untracked version is found first?
That can be easily fixed, so no objection to that approach of course.




Supporting = operator in gin/gist_trgm_ops

2020-10-25 Thread Julien Rouhaud
Hello,

A french user recently complained that with an index created using
gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
like

col LIKE 'something'

but not

col = 'something'

even though both clauses are technically identical.  That's clearly
not a high priority thing to support, but looking at the code it seems
to me that this could be achieved quite simply: just adding a new
operator = in the opclass, with an operator strategy number that falls
back doing exactly what LikeStrategyNumber is doing and that's it.
There shouldn't be any wrong results, even using wildcards as the
recheck will remove any incorrect one.

Did I miss something? And if not would such a patch be welcome?




Re: pgstat_report_activity() and parallel CREATE INDEX (was: Parallel index creation & pg_stat_activity)

2020-10-25 Thread Tom Lane
Noah Misch  writes:
> On Sat, Oct 17, 2020 at 08:39:35AM -0700, Peter Geoghegan wrote:
>> I also prefer 2.

> Thanks.  Here is the pair of patches I plan to use.  The second is equivalent
> to v0; I just added a log message.

The change in worker_spi_main seems a little weird/lazy.  I'd
be inclined to set and clear debug_query_string each time through
the loop, so that those operations are adjacent to the other
status-reporting operations, as they are in initialize_worker_spi.

I don't really like modifying a StringInfo while debug_query_string
is pointing at it, either.  Yeah, you'll *probably* get away with
that because the buffer is unlikely to move, but since the whole
exercise can only benefit crash-debugging scenarios, it'd be better
to be more paranoid.

One idea is to set debug_query_string immediately before each SPI_execute
and clear it just afterwards, rather than trying to associate it with
pgstat_report_activity calls.

regards, tom lane




Re: deferred primary key and logical replication

2020-10-25 Thread Euler Taveira
On Mon, 5 Oct 2020 at 08:34, Amit Kapila  wrote:

> On Mon, May 11, 2020 at 2:41 AM Euler Taveira
>  wrote:
> >
> > Hi,
> >
> > While looking at an old wal2json issue, I stumbled on a scenario that a
> table
> > with a deferred primary key is not updatable in logical replication.
> AFAICS it
> > has been like that since the beginning of logical decoding and seems to
> be an
> > oversight while designing logical decoding.
> >
>
> I am not sure if it is an oversight because we document that the index
> must be non-deferrable, see "USING INDEX records the old values of the
> columns covered by the named index, which must be unique, not partial,
> not deferrable, and include only columns marked NOT NULL." in docs
> [1].
>
>
Inspecting this patch again, I forgot to consider
that RelationGetIndexList()
is called by other backend modules. Since logical decoding deals with
finished
transactions, it is ok to use a deferrable primary key. However, this patch
is
probably wrong because it does not consider the other modules.


-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgstat_report_activity() and parallel CREATE INDEX (was: Parallel index creation & pg_stat_activity)

2020-10-25 Thread Noah Misch
On Sun, Oct 25, 2020 at 10:52:50AM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Sat, Oct 17, 2020 at 08:39:35AM -0700, Peter Geoghegan wrote:
> >> I also prefer 2.
> 
> > Thanks.  Here is the pair of patches I plan to use.  The second is 
> > equivalent
> > to v0; I just added a log message.
> 
> The change in worker_spi_main seems a little weird/lazy.  I'd
> be inclined to set and clear debug_query_string each time through
> the loop, so that those operations are adjacent to the other
> status-reporting operations, as they are in initialize_worker_spi.

True.  Emitting STATEMENT message fields during ProcessConfigFile(PGC_SIGHUP)
would be wrong.

> I don't really like modifying a StringInfo while debug_query_string
> is pointing at it, either.  Yeah, you'll *probably* get away with
> that because the buffer is unlikely to move, but since the whole
> exercise can only benefit crash-debugging scenarios, it'd be better
> to be more paranoid.

Good point.  This is supposed to be example code, so it shouldn't cut corners.

> One idea is to set debug_query_string immediately before each SPI_execute
> and clear it just afterwards, rather than trying to associate it with
> pgstat_report_activity calls.

Each elog(FATAL) check deserves a STATEMENT field if it fires, so I think that
would be too early to clear.  Here's an version fixing the defects.  In
worker_spi_main(), the timing now mirrors postgres.c.  initialize_worker_spi()
is doing something not directly possible from SQL, so I improvised there.
Author: Noah Misch 
Commit: Noah Misch 

Set debug_query_string in worker_spi.

This makes elog.c emit the string, which is good practice for a
background worker that executes SQL strings.

Reviewed by Tom Lane.

Discussion: https://postgr.es/m/20201014022636.ga1962...@rfd.leadboat.com

diff --git a/src/test/modules/worker_spi/worker_spi.c 
b/src/test/modules/worker_spi/worker_spi.c
index 1c7b17c..258237f 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -114,53 +114,59 @@ initialize_worker_spi(worktable *table)
PushActiveSnapshot(GetTransactionSnapshot());
pgstat_report_activity(STATE_RUNNING, "initializing worker_spi schema");
 
/* XXX could we use CREATE SCHEMA IF NOT EXISTS? */
initStringInfo(&buf);
appendStringInfo(&buf, "select count(*) from pg_namespace where nspname 
= '%s'",
 table->schema);
 
+   debug_query_string = buf.data;
ret = SPI_execute(buf.data, true, 0);
if (ret != SPI_OK_SELECT)
elog(FATAL, "SPI_execute failed: error code %d", ret);
 
if (SPI_processed != 1)
elog(FATAL, "not a singleton result");
 
ntup = DatumGetInt64(SPI_getbinval(SPI_tuptable->vals[0],
   
SPI_tuptable->tupdesc,
   1, 
&isnull));
if (isnull)
elog(FATAL, "null result");
 
if (ntup == 0)
{
+   debug_query_string = NULL;
resetStringInfo(&buf);
appendStringInfo(&buf,
 "CREATE SCHEMA \"%s\" "
 "CREATE TABLE \"%s\" ("
 "  type text CHECK 
(type IN ('total', 'delta')), "
 "  value   
integer)"
 "CREATE UNIQUE INDEX 
\"%s_unique_total\" ON \"%s\" (type) "
 "WHERE type = 'total'",
 table->schema, table->name, 
table->name, table->name);
 
/* set statement start time */
SetCurrentStatementStartTimestamp();
 
+   debug_query_string = buf.data;
ret = SPI_execute(buf.data, false, 0);
 
if (ret != SPI_OK_UTILITY)
elog(FATAL, "failed to create my schema");
+
+   debug_query_string = NULL;  /* rest is not 
statement-specific */
}
 
SPI_finish();
PopActiveSnapshot();
CommitTransactionCommand();
+   debug_query_string = NULL;
pgstat_report_activity(STATE_IDLE, NULL);
 }
 
 void
 worker_spi_main(Datum main_arg)
 {
int index = DatumGetInt32(main_arg);
worktable  *table;
@@ -257,16 +263,17 @@ worker_spi_main(Datum main_arg)
 *
 * The pgstat_report_activity() call makes our activity visible
 * through the pgstat views.
 */
SetCurrentStatementStartTimestamp();
StartTransactionCommand();
SPI_connect();
PushActi

Re: pgstat_report_activity() and parallel CREATE INDEX (was: Parallel index creation & pg_stat_activity)

2020-10-25 Thread Tom Lane
Noah Misch  writes:
> Each elog(FATAL) check deserves a STATEMENT field if it fires, so I think that
> would be too early to clear.  Here's an version fixing the defects.  In
> worker_spi_main(), the timing now mirrors postgres.c.  initialize_worker_spi()
> is doing something not directly possible from SQL, so I improvised there.

I'm good with this version.  Thanks!

regards, tom lane




Re: Commitfest manager 2020-11

2020-10-25 Thread Anastasia Lubennikova

On 20.10.2020 10:30, gkokola...@pm.me wrote:





‐‐‐ Original Message ‐‐‐

[snip]

Wow, that was well in advance) I am willing to assist if you need any help.


Indeed it was a bit early. I left for vacation after that. For the record, I am 
newly active to the community. In our PUG, in Stockholm, we held a meetup 
during which a contributor presented ways to contribute to the community, one 
of which is becoming CFM. So, I thought of picking up the recommendation.

I have taken little part in CFs as reviewer/author and I have no idea how a CF 
is actually run. A contributor from Stockholm has been willing to mentor me to 
the part.

Since you have both the knowledge and specific ideas on improving the CF, how 
about me assisting you? I could shadow you and you can groom me to the part, so 
that I can take the lead to a future CF more effectively.

This is just a suggestion of course. I am happy with anything that can help the 
community as a whole.

Even though, I've worked a lot with community, I have never been CFM 
before as well. So, I think we can just follow these articles:


https://www.2ndquadrant.com/en/blog/managing-a-postgresql-commitfest/
https://wiki.postgresql.org/wiki/CommitFest_Checklist

Some parts are a bit outdated, but in general the checklist is clear. 
I've already requested CFM privileges in pgsql-www and I'm going to 
spend next week sending pings and updates to the patches at commitfest.


There are already 219 patches, so I will appreciate if you join me in 
this task.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: bulk typos

2020-10-25 Thread Justin Pryzby
On Sat, Mar 31, 2018 at 05:56:40AM -0500, Justin Pryzby wrote:
> I needed another distraction so bulk-checked for typos, limited to comments in
> *.[ch].
> 
> I'm not passionate about this, but it serves the purpose of reducing the
> overhead of fixing them individually.

I happened across this old patch, so ran this again to find new typos.

There's a few that I don't know how best to fix.

Heikki, do you remember what this means ?
+++ b/src/backend/catalog/storage.c
+ * NOTE: the list is kept in TopMemoryContext to be sure it won't disappear
+ * unbetimes.  It'd probably be OK to keep it in TopTransactionContext,
+ * but I'm being paranoid.

Pavel, I can't understand this one.
I guess s/producement/producing/ is too simple of a fix.
Please check my proposed language.
+XmlTableFetchRow(TableFuncScanState *state)
+* XmlTable returns table - set of composite values. The error context, 
is
+* used for producement more values, between two calls, there can be
+* created and used another libxml2 error context. ...

Surafel, this typo existed twice in the original commit (357889eb1).
One instance was removed by Tom in 35cb574aa.  Should we simply fix the typo,
or borrow Julien/Tom's lanuage?
+   * Tuple at limit is needed for comparation in subsequent
+   * execution to detect ties.
>From fba47da46875c6c89a52225ac39f40e1993a9b56 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 25 Oct 2020 13:56:19 -0500
Subject: [PATCH 1/6] bulk typos

Using same strategy as here
https://www.postgresql.org/message-id/20180331105640.GK28454%40telsasoft.com
---
 contrib/amcheck/verify_heapam.c  | 2 +-
 src/backend/access/heap/pruneheap.c  | 2 +-
 src/backend/catalog/namespace.c  | 2 +-
 src/backend/catalog/pg_namespace.c   | 2 +-
 src/backend/executor/execExpr.c  | 2 +-
 src/backend/executor/nodeIncrementalSort.c   | 2 +-
 src/backend/optimizer/path/allpaths.c| 2 +-
 src/backend/optimizer/plan/analyzejoins.c| 2 +-
 src/backend/partitioning/partbounds.c| 2 +-
 src/backend/postmaster/interrupt.c   | 2 +-
 src/backend/statistics/dependencies.c| 2 +-
 src/backend/statistics/extended_stats.c  | 2 +-
 src/backend/storage/buffer/bufmgr.c  | 2 +-
 src/backend/utils/adt/varlena.c  | 2 +-
 src/bin/pgbench/pgbench.c| 2 +-
 src/common/pg_lzcompress.c   | 2 +-
 src/interfaces/libpq/fe-connect.c| 2 +-
 src/test/modules/dummy_index_am/dummy_index_am.c | 2 +-
 18 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 8bb890438a..dbe3134b6b 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -1342,7 +1342,7 @@ fxid_in_cached_range(FullTransactionId fxid, const HeapCheckContext *ctx)
 }
 
 /*
- * Checks wheter a multitransaction ID is in the cached valid range, returning
+ * Checks whether a multitransaction ID is in the cached valid range, returning
  * the nature of the range violation, if any.
  */
 static XidBoundsViolation
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bc510e2e9b..9e04bc712c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -385,7 +385,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 
 
 /*
- * Perform visiblity checks for heap pruning.
+ * Perform visibility checks for heap pruning.
  *
  * This is more complicated than just using GlobalVisTestIsRemovableXid()
  * because of old_snapshot_threshold. We only want to increase the threshold
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 391a9b225d..740570c566 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3853,7 +3853,7 @@ recomputeNamespacePath(void)
 	/*
 	 * We want to detect the case where the effective value of the base search
 	 * path variables didn't change.  As long as we're doing so, we can avoid
-	 * copying the OID list unncessarily.
+	 * copying the OID list unnecessarily.
 	 */
 	if (baseCreationNamespace == firstNS &&
 		baseTempCreationPending == temp_missing &&
diff --git a/src/backend/catalog/pg_namespace.c b/src/backend/catalog/pg_namespace.c
index ed85276070..7d2e26fd35 100644
--- a/src/backend/catalog/pg_namespace.c
+++ b/src/backend/catalog/pg_namespace.c
@@ -106,7 +106,7 @@ NamespaceCreate(const char *nspName, Oid ownerId, bool isTemp)
 	/* dependency on owner */
 	recordDependencyOnOwner(NamespaceRelationId, nspoid, ownerId);
 
-	/* dependences on roles mentioned in default ACL */
+	/* dependencies on roles mentioned in default ACL */
 	recordDependencyOnNewAcl(NamespaceRelationId, nspoid, 0, ownerId, nspacl);
 
 	/* dependency on extension ... but not for magic temp schemas */
diff --git a/src/backend/executo

Re: bulk typos

2020-10-25 Thread Heikki Linnakangas

On 25/10/2020 21:48, Justin Pryzby wrote:

On Sat, Mar 31, 2018 at 05:56:40AM -0500, Justin Pryzby wrote:

I needed another distraction so bulk-checked for typos, limited to comments in
*.[ch].

I'm not passionate about this, but it serves the purpose of reducing the
overhead of fixing them individually.


I happened across this old patch, so ran this again to find new typos.


Nice script.


There's a few that I don't know how best to fix.

Heikki, do you remember what this means ?
+++ b/src/backend/catalog/storage.c
+ * NOTE: the list is kept in TopMemoryContext to be sure it won't disappear
+ * unbetimes.  It'd probably be OK to keep it in TopTransactionContext,
+ * but I'm being paranoid.


Heh, even though I was the last one to touch that line according to git 
blame, I just moved it from smgr.c. Looks like Tom wrote it in 2004. I 
guess it means "too early". That would make sense from the context. 
Google only gives a few hits but they seem to agree with "too early". A 
very rare word, for sure. Might be good to change it to "too early", but 
it's not wrong, and the poet in me kind of likes "unbetimes" :-).


- Heikki




Re: Supporting = operator in gin/gist_trgm_ops

2020-10-25 Thread Tom Lane
Julien Rouhaud  writes:
> A french user recently complained that with an index created using
> gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
> like
> col LIKE 'something'
> but not
> col = 'something'

Huh, I'd supposed we did that already.

> even though both clauses are technically identical.  That's clearly
> not a high priority thing to support, but looking at the code it seems
> to me that this could be achieved quite simply: just adding a new
> operator = in the opclass, with an operator strategy number that falls
> back doing exactly what LikeStrategyNumber is doing and that's it.
> There shouldn't be any wrong results, even using wildcards as the
> recheck will remove any incorrect one.

I think you may be overoptimistic about being able to use the identical
code path without regard for LIKE wildcards; but certainly it should be
possible to do this with not a lot of new code.  +1.

regards, tom lane




Re: bulk typos

2020-10-25 Thread Tom Lane
Heikki Linnakangas  writes:
> On 25/10/2020 21:48, Justin Pryzby wrote:
>> Heikki, do you remember what this means ?
>> +++ b/src/backend/catalog/storage.c
>> + * NOTE: the list is kept in TopMemoryContext to be sure it won't disappear
>> + * unbetimes.  It'd probably be OK to keep it in TopTransactionContext,
>> + * but I'm being paranoid.

> Heh, even though I was the last one to touch that line according to git 
> blame, I just moved it from smgr.c. Looks like Tom wrote it in 2004. I 
> guess it means "too early". That would make sense from the context. 

Sorry about that, I was being a little whimsical I suppose.

Doing a bit of research, I see the OED defines "betimes" as

[archaic] Before the usual or expected time; early.
‘next morning I was up betimes’

"unbetimes" doesn't seem to be a recognized word ... and now that I look
at this, if it did exist it would mean "too late" which is backwards.

I'd suggest "untimely" if you want to wax poetic, or "too soon" if
you prefer prose that plods.

regards, tom lane




Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

2020-10-25 Thread Justin Pryzby
On Fri, Oct 23, 2020 at 11:09:26PM +0300, Heikki Linnakangas wrote:
> Findings in detail follow.

Are you working on a patch for these ?

Otherwise, since I started something similar in April, I could put something
together based on comments you've gotten here.

-- 
Justin




Re: POC: contrib/unaccent as IMMUTABLE

2020-10-25 Thread Thomas Munro
On Sun, Oct 4, 2020 at 4:19 AM Tom Lane  wrote:
> Tomas Vondra  writes:
> > As for the patch, I wonder if we want to make this change. I'm not very
> > familiar with how unaccent works, but if changes to unicode rules would
> > really silently break indexes, it's kinda similar to the collation
> > issues caused by glibc updates. And we've generally considered that a
> > case of data corruption, I think, so it'd be strange to allow that here.
>
> Yeah.  The fact that we have a problem with collation updates doesn't
> mean that it's okay to introduce the same problem elsewhere.
>
> Note the extremely large amount of work that's ongoing to try to
> track collation changes so we can know whether such an update has
> invalidated indexes.  Unless this patch comes with equivalent
> infrastructure to detect when the unaccent mapping has changed,
> I think it should be rejected.
>
> The problem here is somewhat different than for collations, in
> that collation changes are hidden behind more-or-less-obscure
> library APIs.  Here I think we'd "just" have to track whether
> the user has changed the associated unaccent mapping file.
> However, detecting which indexes are invalidated by such a
> change still involves a lot of infrastructure that's not there.
> And it'd be qualitatively different, because a config file
> change could happen at any time --- we could not relegate
> the checks to pg_upgrade or the like.

As a potential next use for refobjversion (the system I'm hoping to
commit soon for collation version tracking), I have wondered about
declarative function versions.  For those not following that thread,
the basic idea is that you get WARNING messages telling you to REINDEX
if certain things change, and the warnings only stop for each
dependent index once you've actually done it.  Once the collation
stuff is in the tree, it wouldn't be too hard to do a naive function
version tracking system, but there are a couple of tricky problems to
make it really useful:  (1) plpgsql function f1() calls f2(), f2()
changed, so an index on ((f1(x))) needs to be rebuilt, (2) unaccent()
and ts_parse() access other complicated objects, perhaps even
depending on arguments passed in to them.  You'd probably need to
design some kind of dependency analyser handler that PLs could
provide, and likewise, individual functions could perhaps provide
their own analyser, or maybe you could redesign the functions so that
a naive single version scheme could work.  Or something like
that.




Re: Allow deleting enum value

2020-10-25 Thread Justin Pryzby
On Wed, Oct 07, 2020 at 11:47:07PM +0300, Maksim Kita wrote:
> I like the idea, during prototype I added additional column and modified
> enum_in method. But enum_in is called in contexts that can be important
> for user (like comparisons).
...
> postgres=# ALTER TYPE test_enum DELETE VALUE '2';
> ALTER TYPE

I think it should be called "DROP VALUE"

> postgres=# SELECT * FROM test_table WHERE value = '2';
> ERROR:  enum value is dropped test_enum: "2"
> LINE 1: SELECT * FROM test_table WHERE value = '2';

This fails if the value was never added, so why wouldn't it also fail if the
value is added and then removed ?

Maybe you'd want to rename old enum names to something "unlikely", like what we
do for dropped attributes in RemoveAttributeById.

How do you want to handle "adding a value back" ?

I think you should determine/propose/discuss the desired behaviors before
implementing it.

I think you'd also need to update these:
src/bin/pg_dump/pg_dump.c
src/bin/psql/describe.c
src/bin/psql/tab-complete.c
doc/src/sgml/catalogs.sgml
src/test/regress/sql/enum.sql
src/test/regress/expected/enum.out

-- 
Justin




Re: Mop-up around psql's \connect behavior

2020-10-25 Thread Kyotaro Horiguchi
At Fri, 23 Oct 2020 17:12:44 -0400, Tom Lane  wrote in 
> Kyotaro Horiguchi  writes:
> > At Thu, 22 Oct 2020 15:23:04 -0400, Tom Lane  wrote in 
> >> ... The only real objection I can see is that it could
> >> hold a server connection open when the user thinks there is none;
> >> but that could only happen in a non-interactive script, and it does
> >> not seem like a big problem in that case.  We could alternatively
> >> not stash the "dead" connection after a non-interactive \connect
> >> failure, but I doubt that's better.
> 
> > Agreed. Thanks!
> 
> After further thought I decided we *must* do it as per my "alternative"
> idea.  Consider a script containing
>   \c db1 user1 live_server
>   \c db2 user2 dead_server
>   \c db3
> The script would be expecting to connect to db3 at dead_server, but
> if we re-use parameters from the first connection then it might
> successfully connect to db3 at live_server.  This'd defeat the goal
> of not letting a script accidentally execute commands against the
> wrong database.

Hmm. True.

> So we have to not save the connection after a failed script \connect.

Yes, we shouldn't save a connection parameters that haven't made a
connection.

> However, it seems OK to save after a connection loss whether we're
> in a script or not; that is,
> 
>   \c db1 user1 server1
>   ...
>   (connection dies here)
>   ...  --- these commands will fail
>   \c db2
> 
> The script will be expecting the second \c to re-use parameters
> from the first one, and that will still work as expected.

Agreed.

> I went ahead and pushed it after adjusting that.

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: MultiXact\SLRU buffers configuration

2020-10-25 Thread Alexander Korotkov
Hi!

On Mon, Sep 28, 2020 at 5:41 PM Andrey M. Borodin  wrote:
> > 28 авг. 2020 г., в 23:08, Anastasia Lubennikova 
> >  написал(а):
> >
> > 1) The first patch is sensible and harmless, so I think it is ready for 
> > committer. I haven't tested the performance impact, though.
> >
> > 2) I like the initial proposal to make various SLRU buffers configurable, 
> > however, I doubt if it really solves the problem, or just moves it to 
> > another place?
> >
> > The previous patch you sent was based on some version that contained 
> > changes for other slru buffers numbers: 'multixact_offsets_slru_buffers' 
> > and 'multixact_members_slru_buffers'. Have you just forgot to attach them? 
> > The patch message "[PATCH v2 2/4]" hints that you had 4 patches)
> > Meanwhile, I attach the rebased patch to calm down the CFbot. The changes 
> > are trivial.
> >
> > 2.1) I think that both min and max values for this parameter are too 
> > extreme. Have you tested them?
> >
> > +   &multixact_local_cache_entries,
> > +   256, 2, INT_MAX / 2,
> >
> > 2.2) MAX_CACHE_ENTRIES is not used anymore, so it can be deleted.
> >
> > 3) No changes for third patch. I just renamed it for consistency.
>
> Thank you for your review.
>
> Indeed, I had 4th patch with tests, but these tests didn't work well: I still 
> did not manage to stress SLRUs to reproduce problem from production...
>
> You are absolutely correct in point 2: I did only tests with sane values. And 
> observed extreme performance degradation with values ~ 64 megabytes. I do not 
> know which highest values should we pick? 1Gb? Or highest possible 
> functioning value?
>
> I greatly appreciate your review, sorry for so long delay. Thanks!

I took a look at this patchset.

The 1st and 3rd patches look good to me.  I made just minor improvements.
1) There is still a case when SimpleLruReadPage_ReadOnly() relocks the
SLRU lock, which is already taken in exclusive mode.  I've evaded this
by passing the lock mode as a parameter to
SimpleLruReadPage_ReadOnly().
3) CHECK_FOR_INTERRUPTS() is not needed anymore, because it's called
inside ConditionVariableSleep() if needed.  Also, no current wait
events use slashes, and I don't think we should introduce slashes
here.  Even if we should, then we should also rename existing wait
events to be consistent with a new one.  So, I've renamed the new wait
event to remove the slash.

Regarding the patch 2.  I see the current documentation in the patch
doesn't explain to the user how to set the new parameter.  I think we
should give users an idea what workloads need high values of
multixact_local_cache_entries parameter and what doesn't.  Also, we
should explain the negative aspects of high values
multixact_local_cache_entries.  Ideally, we should get the advantage
without overloading users with new nontrivial parameters, but I don't
have a particular idea here.

I'd like to propose committing 1 and 3, but leave 2 for further review.

--
Regards,
Alexander Korotkov


v4-0001-Use-shared-lock-in-GetMultiXactIdMembers-for-offsets.patch
Description: Binary data


v4-0002-Make-MultiXact-local-cache-size-configurable.patch
Description: Binary data


v4-0003-Add-conditional-variable-to-wait-for-next-MultXact-o.patch
Description: Binary data


Re: [patch] Fix checksum verification in base backups for zero page headers

2020-10-25 Thread Michael Paquier
On Fri, Oct 23, 2020 at 08:00:08AM +0900, Michael Paquier wrote:
> Yeah, we could try to make the logic a bit more complicated like
> that.  However, for any code path relying on a page read without any
> locking insurance, we cannot really have a lot of trust in any of the
> fields assigned to the page as this could just be random corruption
> garbage, and the only thing I am ready to trust here a checksum
> mismatch check, because that's the only field on the page that's
> linked to its full contents on the 8k page.  This also keeps the code
> simpler.

A small update here.  I have extracted the refactored part for
PageIsVerified() and committed it as that's independently useful.
This makes the patch proposed here simpler on HEAD, leading to the
attached.
--
Michael
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b89df01fa7..37b60437a6 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1569,21 +1569,24 @@ sendFile(const char *readfilename, const char *tarfilename,
 {
 	int			fd;
 	BlockNumber blkno = 0;
-	bool		block_retry = false;
+	int			block_attempts = 0;
 	char		buf[TAR_SEND_SIZE];
-	uint16		checksum;
 	int			checksum_failures = 0;
 	off_t		cnt;
 	int			i;
 	pgoff_t		len = 0;
 	char	   *page;
 	size_t		pad;
-	PageHeader	phdr;
 	int			segmentno = 0;
 	char	   *segmentpath;
 	bool		verify_checksum = false;
 	pg_checksum_context checksum_ctx;
 
+	/* Maximum number of checksum failures reported for this file */
+#define CHECKSUM_REPORT_THRESHOLD		5
+	/* maximum number of retries done for a single page */
+#define PAGE_RETRY_THRESHOLD			5
+
 	pg_checksum_init(&checksum_ctx, manifest->checksum_type);
 
 	fd = OpenTransientFile(readfilename, O_RDONLY | PG_BINARY);
@@ -1661,9 +1664,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 		if (verify_checksum && (cnt % BLCKSZ != 0))
 		{
 			ereport(WARNING,
-	(errmsg("could not verify checksum in file \"%s\", block "
-			"%d: read buffer size %d and page size %d "
-			"differ",
+	(errmsg("could not verify checksum in file \"%s\", block %u: read buffer size %d and page size %d differ",
 			readfilename, blkno, (int) cnt, BLCKSZ)));
 			verify_checksum = false;
 		}
@@ -1675,78 +1676,75 @@ sendFile(const char *readfilename, const char *tarfilename,
 page = buf + BLCKSZ * i;
 
 /*
- * Only check pages which have not been modified since the
- * start of the base backup. Otherwise, they might have been
- * written only halfway and the checksum would not be valid.
- * However, replaying WAL would reinstate the correct page in
- * this case. We also skip completely new pages, since they
- * don't have a checksum yet.
+ * Check on-disk pages with the same set of verification
+ * conditions used before loading pages into shared buffers.
+ * Note that PageIsVerifiedExtended() considers pages filled
+ * only with zeros as valid, since they don't have a checksum
+ * yet.  Failures are not reported to pgstat yet, as these are
+ * reported in a single batch once a file is completed.  It
+ * could be possible, that a page is written halfway while
+ * doing this check, causing a false positive.  If that
+ * happens, a page is retried multiple times, with an error
+ * reported if the second attempt also fails.
  */
-if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+if (PageIsVerifiedExtended(page, blkno, 0))
 {
-	checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
-	phdr = (PageHeader) page;
-	if (phdr->pd_checksum != checksum)
+	/* success, move on to the next block */
+	blkno++;
+	block_attempts = 0;
+	continue;
+}
+
+/* The verification of a page has failed, retry again */
+if (block_attempts < PAGE_RETRY_THRESHOLD)
+{
+	int			reread_cnt;
+
+	/* Reread the failed block */
+	reread_cnt =
+		basebackup_read_file(fd, buf + BLCKSZ * i,
+			 BLCKSZ, len + BLCKSZ * i,
+			 readfilename,
+			 false);
+	if (reread_cnt == 0)
 	{
 		/*
-		 * Retry the block on the first failure.  It's
-		 * possible that we read the first 4K page of the
-		 * block just before postgres updated the entire block
-		 * so it ends up looking torn to us.  We only need to
-		 * retry once because the LSN should be updated to
-		 * something we can ignore on the next pass.  If the
-		 * error happens again then it is a true validation
-		 * failure.
+		 * If we hit end-of-file, a concurrent truncation must
+		 * have occurred, so break out of this loop just as if
+		 * the initial fread() returned 0. We'll drop through
+		 * to the same code that handles that case. (We must
+		 * fix up cnt first, though.)
 		 */
-		if (block_retry == false)
-		{
-			int			reread_cnt;
-
-			/* Reread the failed block */

A couple questions about ordered-set aggregates

2020-10-25 Thread Chapman Flack
I find I am allowed to create an ordered-set aggregate with a non-empty
direct argument list and no finisher function. Am I right in thinking
that's kind of nonsensical, as nothing will ever look at the direct args?

Also, the syntax summary shows PARALLEL = { SAFE | RESTRICTED | UNSAFE }
in the ordered-set syntax variant, since 9.6, though that variant
accepts no combine/serial/deserial functions, and there's also
a note saying "Partial (including parallel) aggregation is currently
not supported for ordered-set aggregates."

Does PARALLEL = { SAFE | RESTRICTED | UNSAFE } on an ordered-set
aggregate affect anything?

Regards,
-Chap




Re: Supporting = operator in gin/gist_trgm_ops

2020-10-25 Thread Julien Rouhaud
On Mon, Oct 26, 2020 at 5:03 AM Tom Lane  wrote:
>
> Julien Rouhaud  writes:
> > A french user recently complained that with an index created using
> > gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
> > like
> > col LIKE 'something'
> > but not
> > col = 'something'
>
> Huh, I'd supposed we did that already.
>
> > even though both clauses are technically identical.  That's clearly
> > not a high priority thing to support, but looking at the code it seems
> > to me that this could be achieved quite simply: just adding a new
> > operator = in the opclass, with an operator strategy number that falls
> > back doing exactly what LikeStrategyNumber is doing and that's it.
> > There shouldn't be any wrong results, even using wildcards as the
> > recheck will remove any incorrect one.
>
> I think you may be overoptimistic about being able to use the identical
> code path without regard for LIKE wildcards; but certainly it should be
> possible to do this with not a lot of new code.  +1.

Well, that's what I was thinking too, but I tried all the possible
wildcard combinations I could think of and I couldn't find any case
yielding wrong results.  As far as I can see the index scans return at
least all the required rows, and all extraneous rows are correctly
removed either by heap recheck or index recheck.

I'm attaching a patch POC pach with regression tests covering those
combinations.  I also found a typo in the 1.4--1.5 pg_trgm upgrade
script, so I'm also attaching a patch for that.
From 3539eb386b5f15dc3c454cef4fee210d014bd91e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 26 Oct 2020 11:29:45 +0800
Subject: [PATCH v1 1/2] Fix typo in 1.4--1.5 pg_trm upgrade script

---
 contrib/pg_trgm/pg_trgm--1.4--1.5.sql | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
index 284f88d325..db122fce0f 100644
--- a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
+++ b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
@@ -1,4 +1,4 @@
-/* contrib/pg_trgm/pg_trgm--1.5--1.5.sql */
+/* contrib/pg_trgm/pg_trgm--1.4--1.5.sql */
 
 -- complain if script is sourced in psql, rather than via ALTER EXTENSION
 \echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.5'" to load this file. \quit
-- 
2.28.0

From cbcd983a350208dc7b9b583d2d5b1476d7acd1c2 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 26 Oct 2020 11:28:36 +0800
Subject: [PATCH v1 2/2] Handle = operator in pg_trgm.

---
 contrib/pg_trgm/Makefile |   2 +-
 contrib/pg_trgm/expected/pg_trgm.out | 204 ++-
 contrib/pg_trgm/pg_trgm.control  |   2 +-
 contrib/pg_trgm/sql/pg_trgm.sql  |  40 ++
 contrib/pg_trgm/trgm.h   |   1 +
 contrib/pg_trgm/trgm_gin.c   |   3 +
 contrib/pg_trgm/trgm_gist.c  |   2 +
 doc/src/sgml/pgtrgm.sgml |   7 +-
 8 files changed, 252 insertions(+), 9 deletions(-)

diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index d75e9ada2e..1fbdc9ec1e 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -9,7 +9,7 @@ OBJS = \
 	trgm_regexp.o
 
 EXTENSION = pg_trgm
-DATA = pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
+DATA = pg_trgm--1.5--1.6.sql pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
 	pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
 	pg_trgm--1.0--1.1.sql
 PGFILEDESC = "pg_trgm - trigram matching"
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index 923c326c7b..20141ce7f3 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -4761,6 +4761,12 @@ insert into test2 values ('abcdef');
 insert into test2 values ('quark');
 insert into test2 values ('  z foo bar');
 insert into test2 values ('/123/-45/');
+insert into test2 values ('line 1');
+insert into test2 values ('%line 2');
+insert into test2 values ('line 3%');
+insert into test2 values ('%line 4%');
+insert into test2 values ('%li%ne 5%');
+insert into test2 values ('li_e 6');
 create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
 set enable_seqscan=off;
 explain (costs off)
@@ -4863,7 +4869,13 @@ select * from test2 where t ~ '(abc)*$';
  quark
z foo bar
  /123/-45/
-(4 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+ %li%ne 5%
+ li_e 6
+(10 rows)
 
 select * from test2 where t ~* 'DEF';
t
@@ -4918,7 +4930,11 @@ select * from test2 where t ~ '[a-z]{3}';
  abcdef
  quark
z foo bar
-(3 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+(7 rows)
 
 select * from test2 where t ~* '(a{10}|b{10}|c{10}){10}';
  t 
@@ -4961,6 +4977,93 @@ select * from test2 where t ~ '/\d+/-\d';
  /123/-45/
 (1 row)
 
+-- test = operator
+explain (costs off)
+  select * from test2 where t = 'abcdef';
+QUERY PLAN
+--
+ Bitmap Heap Scan on test2
+   Recheck Cond: (t = 'abcdef'::text)
+   ->  Bitmap Index Scan on test2_idx_g

Re: Supporting = operator in gin/gist_trgm_ops

2020-10-25 Thread Julien Rouhaud
On Mon, Oct 26, 2020 at 12:02 PM Julien Rouhaud  wrote:
>
> On Mon, Oct 26, 2020 at 5:03 AM Tom Lane  wrote:
> >
> > Julien Rouhaud  writes:
> > > A french user recently complained that with an index created using
> > > gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
> > > like
> > > col LIKE 'something'
> > > but not
> > > col = 'something'
> >
> > Huh, I'd supposed we did that already.
> >
> > > even though both clauses are technically identical.  That's clearly
> > > not a high priority thing to support, but looking at the code it seems
> > > to me that this could be achieved quite simply: just adding a new
> > > operator = in the opclass, with an operator strategy number that falls
> > > back doing exactly what LikeStrategyNumber is doing and that's it.
> > > There shouldn't be any wrong results, even using wildcards as the
> > > recheck will remove any incorrect one.
> >
> > I think you may be overoptimistic about being able to use the identical
> > code path without regard for LIKE wildcards; but certainly it should be
> > possible to do this with not a lot of new code.  +1.
>
> Well, that's what I was thinking too, but I tried all the possible
> wildcard combinations I could think of and I couldn't find any case
> yielding wrong results.  As far as I can see the index scans return at
> least all the required rows, and all extraneous rows are correctly
> removed either by heap recheck or index recheck.
>
> I'm attaching a patch POC pach with regression tests covering those
> combinations.  I also found a typo in the 1.4--1.5 pg_trgm upgrade
> script, so I'm also attaching a patch for that.

Oops, I forgot to git-add the 1.5--1.6.sql upgrade script in the previous patch.
From 3539eb386b5f15dc3c454cef4fee210d014bd91e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 26 Oct 2020 11:29:45 +0800
Subject: [PATCH v2 1/2] Fix typo in 1.4--1.5 pg_trm upgrade script

---
 contrib/pg_trgm/pg_trgm--1.4--1.5.sql | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
index 284f88d325..db122fce0f 100644
--- a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
+++ b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
@@ -1,4 +1,4 @@
-/* contrib/pg_trgm/pg_trgm--1.5--1.5.sql */
+/* contrib/pg_trgm/pg_trgm--1.4--1.5.sql */
 
 -- complain if script is sourced in psql, rather than via ALTER EXTENSION
 \echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.5'" to load this file. \quit
-- 
2.28.0

From 38c6727c16814640fe9e3be0bc019a67642f7c56 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Mon, 26 Oct 2020 11:28:36 +0800
Subject: [PATCH v2 2/2] Handle = operator in pg_trgm.

---
 contrib/pg_trgm/Makefile  |   2 +-
 contrib/pg_trgm/expected/pg_trgm.out  | 204 +-
 contrib/pg_trgm/pg_trgm--1.5--1.6.sql |  11 ++
 contrib/pg_trgm/pg_trgm.control   |   2 +-
 contrib/pg_trgm/sql/pg_trgm.sql   |  40 +
 contrib/pg_trgm/trgm.h|   1 +
 contrib/pg_trgm/trgm_gin.c|   3 +
 contrib/pg_trgm/trgm_gist.c   |   2 +
 doc/src/sgml/pgtrgm.sgml  |   7 +-
 9 files changed, 263 insertions(+), 9 deletions(-)
 create mode 100644 contrib/pg_trgm/pg_trgm--1.5--1.6.sql

diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index d75e9ada2e..1fbdc9ec1e 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -9,7 +9,7 @@ OBJS = \
 	trgm_regexp.o
 
 EXTENSION = pg_trgm
-DATA = pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
+DATA = pg_trgm--1.5--1.6.sql pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
 	pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
 	pg_trgm--1.0--1.1.sql
 PGFILEDESC = "pg_trgm - trigram matching"
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index 923c326c7b..20141ce7f3 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -4761,6 +4761,12 @@ insert into test2 values ('abcdef');
 insert into test2 values ('quark');
 insert into test2 values ('  z foo bar');
 insert into test2 values ('/123/-45/');
+insert into test2 values ('line 1');
+insert into test2 values ('%line 2');
+insert into test2 values ('line 3%');
+insert into test2 values ('%line 4%');
+insert into test2 values ('%li%ne 5%');
+insert into test2 values ('li_e 6');
 create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
 set enable_seqscan=off;
 explain (costs off)
@@ -4863,7 +4869,13 @@ select * from test2 where t ~ '(abc)*$';
  quark
z foo bar
  /123/-45/
-(4 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+ %li%ne 5%
+ li_e 6
+(10 rows)
 
 select * from test2 where t ~* 'DEF';
t
@@ -4918,7 +4930,11 @@ select * from test2 where t ~ '[a-z]{3}';
  abcdef
  quark
z foo bar
-(3 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+(7 rows)
 
 select * from test2 where t ~* '(a{10}|b{10}|c{10}){10}';
  t 
@@ -4961,6 +4977,93 @@ select * from test2 where t 

Re: The ultimate extension hook.

2020-10-25 Thread Daniel Wood
> On 10/23/2020 9:31 AM Jehan-Guillaume de Rorthais  wrote:
> [...]
> * useless with encrypted traffic
> 
> So, +1 for such hooks.
> 
> Regards,

Ultimately Postgresql is supposed to be extensible.
I don't see an API hook as being some crazy idea even if some may not like what 
I might want to use it for.  It can be useful for a number of things.

- Dan




Re: Supporting = operator in gin/gist_trgm_ops

2020-10-25 Thread Tom Lane
Julien Rouhaud  writes:
> On Mon, Oct 26, 2020 at 5:03 AM Tom Lane  wrote:
>> I think you may be overoptimistic about being able to use the identical
>> code path without regard for LIKE wildcards; but certainly it should be
>> possible to do this with not a lot of new code.  +1.

> Well, that's what I was thinking too, but I tried all the possible
> wildcard combinations I could think of and I couldn't find any case
> yielding wrong results.  As far as I can see the index scans return at
> least all the required rows, and all extraneous rows are correctly
> removed either by heap recheck or index recheck.

But "does it get the right answers" isn't the only figure of merit.
If the index scan visits far more rows than necessary, that's bad.
Maybe it's OK given that we only make trigrams from alphanumerics,
but I'm not quite sure.

regards, tom lane




Re: Supporting = operator in gin/gist_trgm_ops

2020-10-25 Thread Julien Rouhaud
On Mon, Oct 26, 2020 at 12:19 PM Tom Lane  wrote:
>
> Julien Rouhaud  writes:
> > On Mon, Oct 26, 2020 at 5:03 AM Tom Lane  wrote:
> >> I think you may be overoptimistic about being able to use the identical
> >> code path without regard for LIKE wildcards; but certainly it should be
> >> possible to do this with not a lot of new code.  +1.
>
> > Well, that's what I was thinking too, but I tried all the possible
> > wildcard combinations I could think of and I couldn't find any case
> > yielding wrong results.  As far as I can see the index scans return at
> > least all the required rows, and all extraneous rows are correctly
> > removed either by heap recheck or index recheck.
>
> But "does it get the right answers" isn't the only figure of merit.
> If the index scan visits far more rows than necessary, that's bad.
> Maybe it's OK given that we only make trigrams from alphanumerics,
> but I'm not quite sure.

Ah, yes this might lead to bad performance if the "fake wildcard"
matches too many rows, but this shouldn't be a very common use case,
and the only alternative for that might be to create trigrams for non
alphanumerics characters.  I didn't try to do that because it would
mean meaningful overhead for mainstream usage of pg_trgm, and would
also mean on-disk format break.  In my opinion supporting = should be
a best effort, especially for such corner cases.