Re: Disallow cancellation of waiting for synchronous replication

2020-01-12 Thread Andrey Borodin



> 11 янв. 2020 г., в 7:34, Bruce Momjian  написал(а):
> 
> Actually, it might be worse than that.  In my reading of
> RecordTransactionCommit(), we do this:
> 
>   write to WAL
>   flush WAL (durable)
>   make visible to other backends
>   replicate
>   communicate to the client
> 
> I think this means we make the transaction commit visible to all
> backends _before_ we replicate it, and potentially wait until we get a
> replication reply to return SUCCESS to the client.
No. Data is not visible to other backend when we await sync rep. It's easy to 
check.
in one psql you can start waiting for sync rep:
postgres=# \d+ x
 Table "public.x"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | 
Description 
+-+---+--+-+--+--+-
 key| integer |   | not null | | plain|  | 
 data   | text|   |  | | extended |  | 
Indexes:
"x_pkey" PRIMARY KEY, btree (key)
Access method: heap

postgres=# alter system set synchronous_standby_names to 'nonexistent';
ALTER SYSTEM
postgres=# select pg_reload_conf();
2020-01-12 16:09:58.167 +05 [45677] LOG:  received SIGHUP, reloading 
configuration files
 pg_reload_conf 

 t
(1 row)

postgres=# insert into x values (7, '7');


In other try to see inserted (already locally committed data)

postgres=# select * from x where key = 7;
 key | data 
-+--
(0 rows)


try to insert same data and backend will hand on locks

postgres=# insert into x values (7,'7') on conflict do nothing;

 ProcessQuery  (in postgres) + 189  [0x1014b05bd]
 standard_ExecutorRun  (in postgres) + 301  [0x101339fcd]
 ExecModifyTable  (in postgres) + 1106  [0x101362b62]
 ExecInsert  (in postgres) + 494  [0x10136344e]
 ExecCheckIndexConstraints  (in postgres) + 570  [0x10133910a]
 check_exclusion_or_unique_constraint  (in postgres) + 977  [0x101338db1]
 XactLockTableWait  (in postgres) + 176  [0x101492770]
 LockAcquireExtended  (in postgres) + 1274  [0x101493aaa]



Thanks!

Best regards, Andrey Borodin.



Re: [Logical Replication] TRAP: FailedAssertion("rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT || rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL || rel->rd_rel->relreplident == REPLICA_IDE

2020-01-12 Thread Michael Paquier
On Sat, Jan 11, 2020 at 10:34:20AM +0900, Michael Paquier wrote:
> Thanks for the lookup.  I'll look at that again in a couple of days
> and hopefully wrap it by then.

And done.
--
Michael


signature.asc
Description: PGP signature


Re: our checks for read-only queries are not great

2020-01-12 Thread Laurenz Albe
On Fri, 2020-01-10 at 09:29 -0500, Tom Lane wrote:
> > ALTER SYSTEM is read only in my mind.
> 
> I'm still having trouble with this conclusion.  I think it can only
> be justified by a very narrow reading of "reflected in pg_dump"
> that relies on the specific factorization we've chosen for upgrade
> operations, ie that postgresql.conf mods have to be carried across
> by hand.  But that's mostly historical baggage, rather than a sane
> basis for defining "read only".  If somebody comes up with a patch
> that causes "pg_dumpall -g" to include ALTER SYSTEM SET commands for
> whatever is in postgresql.auto.conf (not an unreasonable idea BTW),
> will you then decide that ALTER SYSTEM SET is no longer read-only?

I think that having ALTER SYSTEM commands in pg_dumpall output
would be a problem.  It would cause all kinds of problems whenever
parameters change.  Thinking of the transition "checkpoint_segments"
-> "max_wal_size", you'd have to build some translation magic into pg_dump.
Besides, such a feature would make it harder to restore a dump taken
with version x into version x + n for n > 0.

> Or, perhaps, reject such a patch on the grounds that it breaks this
> arbitrary definition of read-only-ness?

I agree with Robert that such a patch should be rejected on other
grounds.

Concerning the topic of the thread, I personally have come to think
that changing GUCs is *not* writing to the database.  But that is based
on the fact that you can change GUCs on streaming replication standbys,
and it may be surprising to a newcomer.

Perhaps it would be good to consider this question:
Do we call something "read-only" if it changes nothing, or do we call it
"read-only" if it is allowed on a streaming replication standby?
The first would be more correct, but the second may be more convenient.

Yours,
Laurenz Albe





Re: our checks for read-only queries are not great

2020-01-12 Thread Tom Lane
Laurenz Albe  writes:
> Perhaps it would be good to consider this question:
> Do we call something "read-only" if it changes nothing, or do we call it
> "read-only" if it is allowed on a streaming replication standby?
> The first would be more correct, but the second may be more convenient.

Yeah, this is really the larger point at stake.  I'm not sure that
"read-only" and "allowed on standby" should be identical, nor even
that one should be an exact subset of the other.  They're certainly
by-and-large the same sets of operations, but there might be
exceptions that belong to only one set.  "read-only" is driven by
(some reading of) the SQL standard, while "allowed on standby" is
driven by implementation limitations, so I think it'd be dangerous
to commit ourselves to those being identical.

regards, tom lane




Re: [Proposal] Add accumulated statistics for wait event

2020-01-12 Thread Pavel Stehule
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I like this patch, because I used similar functionality some years ago very 
successfully. The implementation is almost simple, and the result should be 
valid by used method.

The potential problem is performance impact. Very early test show impact cca 3% 
worst case, but I'll try to repeat these tests.

There are some ending whitespaces and useless tabs.

The new status of this patch is: Waiting on Author


Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-12 Thread Peter
On Fri, Jan 10, 2020 at 10:51:50PM -0500, Tom Lane wrote:
! Here's a draft patch that cleans up all the logic errors I could find.

Okiee, thank You!
Let's see (was a bit busy yesterday trying to upgrade pgadmin3 -
difficult matter), now lets sort this out:

With the first patch applied (as from Friday - applied only on the
client side), the application did appear to work well.

But then, when engaging bandwidth-limiting to some modem-speed, it did
not work: psql would receive all (or most of) the data from a SELECT,
but then somehow not recognize the end of it and sit there and wait for
whatever:

> flowmdev=> select * from flows;
> ^CCancel request sent
> ^CCancel request sent


Now with the new patches 0001+0003 applied, on both server & client,
all now running 12.1 release, on a first run I did not perceive
a malfunction, bandwidth limited or not.
I'll leave them applied, but this here will not experience serious
loads; You'll need somebody else to test for that...


rgds,
PMc




Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-12 Thread Tom Lane
Peter  writes:
> With the first patch applied (as from Friday - applied only on the
> client side), the application did appear to work well.
> But then, when engaging bandwidth-limiting to some modem-speed, it did
> not work: psql would receive all (or most of) the data from a SELECT,
> but then somehow not recognize the end of it and sit there and wait for
> whatever:

Yeah, that's just the behavior I'd expect (and was able to reproduce
here) because of the additional design problem.

> Now with the new patches 0001+0003 applied, on both server & client,
> all now running 12.1 release, on a first run I did not perceive
> a malfunction, bandwidth limited or not.
> I'll leave them applied, but this here will not experience serious
> loads; You'll need somebody else to test for that...

Cool, let us know if you do see any problems.

regards, tom lane




Re: Consolidate 'unique array values' logic into a reusable function?

2020-01-12 Thread Noah Misch
On Wed, Jan 08, 2020 at 02:49:48PM +1300, Thomas Munro wrote:
> On Sun, Dec 29, 2019 at 8:02 PM Noah Misch  wrote:
> > ==00:00:00:28.322 1527557== Source and destination overlap in 
> > memcpy(0x1000104, 0x1000104, 4)
> > ==00:00:00:28.322 1527557==at 0x4C2E74D: memcpy@@GLIBC_2.14 
> > (vg_replace_strmem.c:1035)
> > ==00:00:00:28.322 1527557==by 0xA9A57B: qunique (qunique.h:34)
> > ==00:00:00:28.322 1527557==by 0xA9A843: InitCatalogCache 
> > (syscache.c:1056)
> > ==00:00:00:28.322 1527557==by 0xAB6B18: InitPostgres (postinit.c:682)
> > ==00:00:00:28.322 1527557==by 0x91F98E: PostgresMain (postgres.c:3909)
> > ==00:00:00:28.322 1527557==by 0x872DE9: BackendRun (postmaster.c:4498)
> > ==00:00:00:28.322 1527557==by 0x8725B3: BackendStartup 
> > (postmaster.c:4189)
> > ==00:00:00:28.322 1527557==by 0x86E7F4: ServerLoop (postmaster.c:1727)
> > ==00:00:00:28.322 1527557==by 0x86E0AA: PostmasterMain 
> > (postmaster.c:1400)
> > ==00:00:00:28.322 1527557==by 0x77CB56: main (main.c:210)
> > ==00:00:00:28.322 1527557==
> > {
> >
> >Memcheck:Overlap
> >fun:memcpy@@GLIBC_2.14
> >fun:qunique
> >fun:InitCatalogCache
> >fun:InitPostgres
> >fun:PostgresMain
> >fun:BackendRun
> >fun:BackendStartup
> >fun:ServerLoop
> >fun:PostmasterMain
> >fun:main
> > }
> >
> > This is like the problem fixed in 9a9473f; the precedent from there would be
> > to test src!=dst before calling mempcy(), e.g. as attached.  I suppose the
> > alternative would be to add a suppression like the one 9a9473f removed.
> 
> Thanks for fixing that.

Glad to help.

> > I do wonder why the Valgrind buildfarm animals haven't noticed.
> 
> Optimisation levels?  For example, I see that skink is using -Og, at
> which level my local GCC inlines qunique() and memcpy() so that in the
> case you quoted there's a MOV instruction and valgrind has nothing to
> complain about.

That explains it.  I would have been using -O0.  (It would be a nice bonus to
have both an -O0 Valgrind buildfarm animal and an optimized Valgrind animal,
since neither catches everything.)




Re: refactoring - standardize integer parsing in front-end utilities

2020-01-12 Thread Joe Nelson
Joe Nelson wrote:
> > I see this patch has been moved to the next commitfest. What's the next
> > step, does it need another review?

Peter Eisentraut wrote:
> I think you need to promote your patch better.

Thanks for taking the time to revive this thread.

Quick sales pitch for the patch:

* Standardizes bounds checking and error message format in utilities
  pg_standby, pg_basebackup, pg_receivewal, pg_recvlogical, pg_ctl,
  pg_dump, pg_restore, pg_upgrade, pgbench, reindexdb, and vacuumdb
* Removes Undefined Behavior caused by atoi() as described
  in the C99 standard, section 7.20.1
* Unifies integer parsing between the front- and back-end using
  functions provided in https://commitfest.postgresql.org/25/2272/

In reality I doubt my patch is fixing any pressing problem, it's just a
small refactor.

> The thread subject and the commit fest entry title are somewhat
> nonsensical and no longer match what the patch actually does.

I thought changing the subject line might be discouraged, but since you
suggest doing it, I just did. Updated the title of the commitfest entry
https://commitfest.postgresql.org/26/2197/ as well.

> The patch contains no commit message

Does this list not accept plain patches, compatible with git-apply?
(Maybe your point is that I should make it as easy for committers as
possible, and asking them to invent a commit message is just extra
work.)

> and no documentation or test changes

The interfaces of the utilities remain the same. Same flags. The only
change would be the error messages produced for incorrect values.

The tests I ran passed successfully, but perhaps there were others I
didn't try running and should have.

> Moreover, a lot of this error message tweaking is opinion-based, so
> it's more burdensome to do an objective review.  This patch is
> competing for attention against more than 200 other patches that have
> more going for them in these aspects.

True. I think the code looks nicer and the errors are more informative,
but I'll leave it in the community's hands to determine if this is
something they want.

Once again, I appreciate your taking the time to help me with this
process.




Question regarding heap_multi_insert documentation

2020-01-12 Thread Daniel Gustafsson
While reading the code for heapam.c:heap_multi_insert I happened upon this
comment which I'm either too thick for, or it lacks a word or two:

 * ..
 * A check here does not definitively prevent a serialization anomaly;
 * that check MUST be done at least past the point of acquiring an
 * exclusive buffer content lock on every buffer that will be affected,
 * and MAY be done after all inserts are reflected in the buffers and
 * those locks are released; otherwise there race condition.  Since
 * multiple buffers can be locked and unlocked in the loop below, and it
 * would not be feasible to identify and lock all of those buffers before
 * the loop, we must do a final check at the end.
 * ..

The part I don't understand is "otherwise there race condition", it doesn't
sound complete to me as a non-native english speaker.  Should that really be
"otherwise there *is a (potential)* race condition" or something similar?

cheers ./daniel




Re: Question regarding heap_multi_insert documentation

2020-01-12 Thread Tom Lane
Daniel Gustafsson  writes:
> The part I don't understand is "otherwise there race condition", it doesn't
> sound complete to me as a non-native english speaker.  Should that really be
> "otherwise there *is a (potential)* race condition" or something similar?

I agree, it's missing "is a".

regards, tom lane




Re: Question regarding heap_multi_insert documentation

2020-01-12 Thread Daniel Gustafsson
> On 13 Jan 2020, at 00:25, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> The part I don't understand is "otherwise there race condition", it doesn't
>> sound complete to me as a non-native english speaker.  Should that really be
>> "otherwise there *is a (potential)* race condition" or something similar?
> 
> I agree, it's missing "is a".

Thanks for clarifying. PFA tiny patch for this.

cheers ./daniel



multiinsert_comment.diff
Description: Binary data


Re: benchmarking Flex practices

2020-01-12 Thread Tom Lane
John Naylor  writes:
>> I no longer use state variables to track scanner state, and in fact I
>> removed the existing "state_before" variable in ECPG. Instead, I used
>> the Flex builtins yy_push_state(), yy_pop_state(), and yy_top_state().
>> These have been a feature for a long time, it seems, so I think we're
>> okay as far as portability. I think it's cleaner this way, and
>> possibly faster.

Hmm ... after a bit of research I agree that these functions are not
a portability hazard.  They are present at least as far back as flex
2.5.33 which is as old as we've got in the buildfarm.

However, I'm less excited about them from a performance standpoint.
The BEGIN() macro expands to (ordinarily)

yyg->yy_start = integer-constant

which is surely pretty cheap.  However, yy_push_state is substantially
more expensive than that, not least because the first invocation in
a parse cycle will involve a malloc() or palloc().  Likewise yy_pop_state
is multiple times more expensive than plain BEGIN().

Now, I agree that this is negligible for ECPG's usage, so if
pushing/popping state is helpful there, let's go for it.  But I am
not convinced it's negligible for the backend, and I also don't
see that we actually need to track any nested scanner states there.
So I'd rather stick to using BEGIN in the backend.  Not sure about
psql.

BTW, while looking through the latest patch it struck me that
"UCONST" is an underspecified and potentially confusing name.
It doesn't indicate what kind of constant we're talking about,
for instance a C programmer could be forgiven for thinking
it means something like "123U".  What do you think of "USCONST",
following UIDENT's lead of prefixing U onto whatever the
underlying token type is?

regards, tom lane




Re: Using multiple extended statistics for estimates

2020-01-12 Thread Tomas Vondra

Hi,

I've pushed these two improvements after some minor improvements, mostly
to comments. I ended up not using the extra tests, as it wasn't clear to
me it's worth the extra duration.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: vacuum verbose detail logs are unclear; log at *start* of each stage; show allvisible/frozen/hintbits

2020-01-12 Thread Justin Pryzby
On Sun, Dec 29, 2019 at 01:15:24PM -0500, Jeff Janes wrote:
> On Fri, Dec 20, 2019 at 12:11 PM Justin Pryzby  wrote:
> 
> > This is a usability complaint.  If one knows enough about vacuum and/or
> > logging, I'm sure there's no issue.
> 
> > | 11  DEBUG:  "t": found 999 removable, 999 nonremovable row versions in 9 
> > out of 9 pages
> 
> I agree the mixture of pre-action and after-action reporting is rather
> confusing sometimes.  I'm more concerned about what the user sees in their
> terminal, though, rather than the server's log file.

Sorry, I ran vacuum (not verbose) with client_min_messages=debug, which was 
confusing.

> Also, the above quoted line is confusing.  It makes it sound like it found
> removable items, but didn't actually remove them.  I think that that is
> taking grammatical parallelism too far.  How about something like:
> 
> DEBUG:  "t": removed 999 row versions, found 999 nonremovable row versions in 
> 9 out of 9 pages

Since da4ed8bf, lazy_vacuum_heap() actually says: "removed %d [row versions] in
%d pages".  Strangely, the "found .. removable, .. nonremovable" in
lazy_scan_heap() is also from da4ed8bf.  Should we change them to match ?

> Also, I'd appreciate a report on how many hint-bits were set
> and how many pages were marked all-visible and/or frozen.

Possibly should fork this part to a different thread, but..
hint bits are being set by heap_prune_chain():

|#0  HeapTupleSatisfiesVacuum (htup=htup@entry=0x7fffabf0, 
OldestXmin=OldestXmin@entry=536, buffer=buffer@entry=167) at 
heapam_visibility.c:1245
|#1  0x7fb6eb3eb848 in heap_prune_chain (prstate=0x7fffabfccf30, 
OldestXmin=536, rootoffnum=1, buffer=167, relation=0x7fb6eb1e6858) at 
pruneheap.c:488
|#2  heap_page_prune (relation=relation@entry=0x7fb6eb1e6858, 
buffer=buffer@entry=167, OldestXmin=536, report_stats=report_stats@entry=false, 
latestRemovedXid=latestRemovedXid@entry=0x7fb6ed84a13c) at pruneheap.c:223
|#3  0x7fb6eb3f02a2 in lazy_scan_heap (aggressive=false, nindexes=0, 
Irel=0x0, vacrelstats=0x7fb6ed84a0c0, params=0x7fffabfcdfd0, 
onerel=0x7fb6eb1e6858) at vacuumlazy.c:970
|#4  heap_vacuum_rel (onerel=0x7fb6eb1e6858, params=0x7fffabfcdfd0, 
bstrategy=) at vacuumlazy.c:302

In the attached, I moved heap_page_prune to avoid a second loop over items.
Then, initdb crashed until I avoided calling heap_prepare_freeze_tuple() for
HEAPTUPLE_DEAD.  I'm not sure that's ok or maybe if it's exposing an issue.
I'm also not sure if t_infomask!=oldt_infomask is the right test.

One of my usability complaints was that the DETAIL includes newlines, which
makes it not apparent that it's detail, or that it's associated with the
preceding INFO.  Should those all be separate DETAIL messages (currently, only
the first errdetail is used, but maybe they should be catted together
usefully).  Should errdetail do something with newlines, like change them to
\n\t for output to the client (but not logfile).  Should vacuum itself do
something (but probably no change to logfiles).

I remembered that log_statement_stats looks like this:

2020-01-01 11:28:33.758 CST [3916] LOG:  EXECUTOR STATISTICS
2020-01-01 11:28:33.758 CST [3916] DETAIL:  ! system usage stats:
!   0.050185 s user, 0.000217 s system, 0.050555 s elapsed
!   [2.292346 s user, 0.215656 s system total]
[...]


It calls errdetail_internal("%s", str.data), same as vaccum, but the multi-line
detail messages are written like this:
|appendStringInfo(&str, "!\t...")
|...
|ereport(LOG,
|   (errmsg_internal("%s", title),
|   errdetail_internal("%s", str.data)));

Since they can run multiple times, including rusage, and there's not currently
any message shown before their action, I propose that lazy_vacuum_index/heap
should write VACUUM VERBOSE logs at DEBUG level.  Or otherwise show a log
before starting each action, at least those for which it logs completion.

I'm not sure why this one doesn't use get ngettext() ?  Missed at a8d585c0 ?
|appendStringInfo(&buf, _("There were %.0f unused item identifiers.\n"),

Or why this one uses _/gettext() ?  (580ddcec suggests that I'm missing
something?).
|appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));

Anyway, now it looks like this:
postgres=# VACUUM VERBOSE t;
INFO:  vacuuming "pg_temp_3.t"
INFO:  "t": removed 1998 row versions in 5 pages
INFO:  "t": removed 1998, found 999 nonremovable row versions in 9 out of 9 
pages
DETAIL:  ! 0 dead row versions cannot be removed yet, oldest xmin: 4505
!   There were 0 unused item identifiers.
!   Skipped 0 pages due to buffer pins, 0 frozen pages.
!   0 pages are entirely empty.
!   Marked 9 pages all visible, 4 pages frozen.
!   Wrote 1998 hint bits.
!   CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

Thanks for your input.

Justin
>From d27f2cb5262ec3d3511de44ec7c15d1b1235e5ee Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 27 Nov 2019 20:07:10 -0600
Subject: [PATCH v1 1/6] Rename buf to avoid sh

Re: bitmaps and correlation

2020-01-12 Thread Justin Pryzby
On Mon, Jan 06, 2020 at 11:26:06PM -0600, Justin Pryzby wrote:
> As Jeff has pointed out, high correlation has two effects in cost_index():
> 1) the number of pages read will be less;
> 2) the pages will be read more sequentially;
> 
> cost_index reuses the pages_fetched variable, so (1) isn't particularly clear,

I tried to make this more clear in 0001

> +   cost_per_page_corr = spc_random_page_cost -
> +   (spc_random_page_cost - spc_seq_page_cost)
> +   * (1-correlation*correlation);

And fixed bug: this should be c*c not 1-c*c.
>From d0819177ef1c6f86a588e3d2700ecff638f83b4a Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 8 Jan 2020 19:23:51 -0600
Subject: [PATCH v4 1/2] Make more clear the computation of min/max IO..

..and specifically the double use and effect of correlation.

Avoid re-use of the "pages_fetched" variable
---
 src/backend/optimizer/path/costsize.c | 47 +++
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index b5a0033..bdc23a0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -491,12 +491,13 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 csquared;
 	double		spc_seq_page_cost,
 spc_random_page_cost;
-	Cost		min_IO_cost,
+	double		min_pages_fetched,	/* The min and max page count based on index correlation */
+max_pages_fetched;
+	Cost		min_IO_cost,	/* The min and max cost based on index correlation */
 max_IO_cost;
 	QualCost	qpqual_cost;
 	Cost		cpu_per_tuple;
 	double		tuples_fetched;
-	double		pages_fetched;
 	double		rand_heap_pages;
 	double		index_pages;
 
@@ -579,7 +580,8 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	 * (just after a CLUSTER, for example), the number of pages fetched should
 	 * be exactly selectivity * table_size.  What's more, all but the first
 	 * will be sequential fetches, not the random fetches that occur in the
-	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * uncorrelated case (the index is expected to read fewer pages, *and* each
+	 * page read is cheaper).  So if the number of pages is more than 1, we
 	 * ought to charge
 	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
 	 * For partially-correlated indexes, we ought to charge somewhere between
@@ -604,17 +606,17 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 		 * pro-rate the costs for one scan.  In this case we assume all the
 		 * fetches are random accesses.
 		 */
-		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+		max_pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
 			baserel->pages,
 			(double) index->pages,
 			root);
 
 		if (indexonly)
-			pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));
+			max_pages_fetched = ceil(max_pages_fetched * (1.0 - baserel->allvisfrac));
 
-		rand_heap_pages = pages_fetched;
+		rand_heap_pages = max_pages_fetched;
 
-		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+		max_IO_cost = (max_pages_fetched * spc_random_page_cost) / loop_count;
 
 		/*
 		 * In the perfectly correlated case, the number of pages touched by
@@ -626,17 +628,17 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 		 * where such a plan is actually interesting, only one page would get
 		 * fetched per scan anyway, so it shouldn't matter much.)
 		 */
-		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+		min_pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
 
-		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+		min_pages_fetched = index_pages_fetched(min_pages_fetched * loop_count,
 			baserel->pages,
 			(double) index->pages,
 			root);
 
 		if (indexonly)
-			pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));
+			min_pages_fetched = ceil(min_pages_fetched * (1.0 - baserel->allvisfrac));
 
-		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+		min_IO_cost = (min_pages_fetched * spc_random_page_cost) / loop_count;
 	}
 	else
 	{
@@ -644,30 +646,31 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 		 * Normal case: apply the Mackert and Lohman formula, and then
 		 * interpolate between that and the correlation-derived result.
 		 */
-		pages_fetched = index_pages_fetched(tuples_fetched,
+
+		/* For the perfectly uncorrelated case (csquared=0) */
+		max_pages_fetched = index_pages_fetched(tuples_fetched,
 			baserel->pages,
 			(double) index->pages,
 			root);
 
 		if (indexonly)
-			pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));
+			max_pages_fetched = ceil(max_pages_fetched * (1.0 - baserel->allvisfrac));
 
-		rand_heap_pages = pages_fetched;
+	

Re: Questions/Observations related to Gist vacuum

2020-01-12 Thread Dilip Kumar
On Thu, Jan 9, 2020 at 4:41 PM Mahendra Singh Thalor  wrote:
>
> On Mon, 9 Dec 2019 at 14:37, Amit Kapila  wrote:
> >
> > On Mon, Dec 9, 2019 at 2:27 PM Amit Kapila  wrote:
> > >
> > > I have modified the patch for the above points and additionally ran
> > > pgindent.  Let me know what you think about the attached patch?
> > >
> >
> > A new version with a slightly modified commit message.
>
> I reviewed v4 patch and below is the one review comment:
>
> + * These are used to memorize all internal and empty leaf pages. They are
> + * used for deleting all the empty pages.
>   */
> After dot, there should be 2 spaces. Earlier, there was 2 spaces.
>
> Other than that patch looks fine to me.
>
Thanks for the comment. Amit has already taken care of this before pushing it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2020-01-12 Thread Amit Kapila
On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
 wrote:
>
> On Sat, 11 Jan 2020 at 13:18, Amit Kapila  wrote:
> >
> > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> >  wrote:
> > >
> > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor  
> > > wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
> > > > >
> > > > > Hi
> > > > > Thank you for update! I looked again
> > > > >
> > > > > (vacuum_indexes_leader)
> > > > > +   /* Skip the indexes that can be processed by parallel 
> > > > > workers */
> > > > > +   if (!skip_index)
> > > > > +   continue;
> > > > >
> > > > > Does the variable name skip_index not confuse here? Maybe rename to 
> > > > > something like can_parallel?
> > > >
> > > > I also agree with your point.
> > >
> > > I don't think the change is a good idea.
> > >
> > > -   boolskip_index = (get_indstats(lps->lvshared, 
> > > i) == NULL ||
> > > - 
> > > skip_parallel_vacuum_index(Irel[i], lps->lvshared));
> > > +   boolcan_parallel = 
> > > (get_indstats(lps->lvshared, i) == NULL ||
> > > +   
> > > skip_parallel_vacuum_index(Irel[i],
> > > + 
> > >  lps->lvshared));
> > >
> > > The above condition is true when the index can *not* do parallel index 
> > > vacuum. How about changing it to skipped_index and change the comment to 
> > > something like “We are interested in only index skipped parallel vacuum”?
> > >
> >
> > Hmm, I find the current code and comment better than what you or
> > Sergei are proposing.  I am not sure what is the point of confusion in
> > the current code?
>
> Yeah the current code is also good. I just thought they were concerned
> that the variable name skip_index might be confusing because we skip
> if skip_index is NOT true.
>

Okay, would it better if we get rid of this variable and have code like below?

/* Skip the indexes that can be processed by parallel workers */
if ( !(get_indstats(lps->lvshared, i) == NULL ||
skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
continue;
...

> >
> > > >
> > > > >
> > > > > Another question about behavior on temporary tables. Use case: the 
> > > > > user commands just "vacuum;" to vacuum entire database (and has 
> > > > > enough maintenance workers). Vacuum starts fine in parallel, but on 
> > > > > first temporary table we hit:
> > > > >
> > > > > +   if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 0)
> > > > > +   {
> > > > > +   ereport(WARNING,
> > > > > +   (errmsg("disabling parallel option of 
> > > > > vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
> > > > > +   
> > > > > RelationGetRelationName(onerel;
> > > > > +   params->nworkers = -1;
> > > > > +   }
> > > > >
> > > > > And therefore we turn off the parallel vacuum for the remaining 
> > > > > tables... Can we improve this case?
> > > >
> > > > Good point.
> > > > Yes, we should improve this. I tried to fix this.
> > >
> > > +1
> > >
> >
> > Yeah, we can improve the situation here.  I think we don't need to
> > change the value of params->nworkers at first place if allow
> > lazy_scan_heap to take care of this.  Also, I think we shouldn't
> > display warning unless the user has explicitly asked for parallel
> > option.  See the fix in the attached patch.
>
> Agreed. But with the updated patch the PARALLEL option without the
> parallel degree doesn't display warning because params->nworkers = 0
> in that case. So how about restoring params->nworkers at the end of
> vacuum_rel()?
>

I had also thought on those lines, but I was not entirely sure about
this resetting of workers.  Today, again thinking about it, it seems
the idea Mahendra is suggesting that is giving an error if the
parallel degree is not specified seems reasonable to me.  This means
Vacuum (parallel), Vacuum (parallel) , etc. will give an
error "parallel degree must be specified".  This idea has merit as now
we are supporting a parallel vacuum by default, so a 'parallel' option
without a parallel degree doesn't have any meaning.  If we do that,
then we don't need to do anything additional about the handling of
temp tables (other than what patch is already doing) as well.  What do
you think?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2020-01-12 Thread Dilip Kumar
On Mon, Jan 13, 2020 at 9:20 AM Amit Kapila  wrote:
>
> On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
>  wrote:
> >
> > On Sat, 11 Jan 2020 at 13:18, Amit Kapila  wrote:
> > >
> > > On Sat, Jan 11, 2020 at 9:23 AM Masahiko Sawada
> > >  wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 20:54, Mahendra Singh Thalor 
> > > >  wrote:
> > > > >
> > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
> > > > > >
> > > > > > Hi
> > > > > > Thank you for update! I looked again
> > > > > >
> > > > > > (vacuum_indexes_leader)
> > > > > > +   /* Skip the indexes that can be processed by 
> > > > > > parallel workers */
> > > > > > +   if (!skip_index)
> > > > > > +   continue;
> > > > > >
> > > > > > Does the variable name skip_index not confuse here? Maybe rename to 
> > > > > > something like can_parallel?
> > > > >
> > > > > I also agree with your point.
> > > >
> > > > I don't think the change is a good idea.
> > > >
> > > > -   boolskip_index = 
> > > > (get_indstats(lps->lvshared, i) == NULL ||
> > > > - 
> > > > skip_parallel_vacuum_index(Irel[i], lps->lvshared));
> > > > +   boolcan_parallel = 
> > > > (get_indstats(lps->lvshared, i) == NULL ||
> > > > +   
> > > > skip_parallel_vacuum_index(Irel[i],
> > > > +   
> > > >lps->lvshared));
> > > >
> > > > The above condition is true when the index can *not* do parallel index 
> > > > vacuum. How about changing it to skipped_index and change the comment 
> > > > to something like “We are interested in only index skipped parallel 
> > > > vacuum”?
> > > >
> > >
> > > Hmm, I find the current code and comment better than what you or
> > > Sergei are proposing.  I am not sure what is the point of confusion in
> > > the current code?
> >
> > Yeah the current code is also good. I just thought they were concerned
> > that the variable name skip_index might be confusing because we skip
> > if skip_index is NOT true.
> >
>
> Okay, would it better if we get rid of this variable and have code like below?
>
> /* Skip the indexes that can be processed by parallel workers */
> if ( !(get_indstats(lps->lvshared, i) == NULL ||
> skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
> continue;
> ...
>
> > >
> > > > >
> > > > > >
> > > > > > Another question about behavior on temporary tables. Use case: the 
> > > > > > user commands just "vacuum;" to vacuum entire database (and has 
> > > > > > enough maintenance workers). Vacuum starts fine in parallel, but on 
> > > > > > first temporary table we hit:
> > > > > >
> > > > > > +   if (RelationUsesLocalBuffers(onerel) && params->nworkers >= 
> > > > > > 0)
> > > > > > +   {
> > > > > > +   ereport(WARNING,
> > > > > > +   (errmsg("disabling parallel option 
> > > > > > of vacuum on \"%s\" --- cannot vacuum temporary tables in parallel",
> > > > > > +   
> > > > > > RelationGetRelationName(onerel;
> > > > > > +   params->nworkers = -1;
> > > > > > +   }
> > > > > >
> > > > > > And therefore we turn off the parallel vacuum for the remaining 
> > > > > > tables... Can we improve this case?
> > > > >
> > > > > Good point.
> > > > > Yes, we should improve this. I tried to fix this.
> > > >
> > > > +1
> > > >
> > >
> > > Yeah, we can improve the situation here.  I think we don't need to
> > > change the value of params->nworkers at first place if allow
> > > lazy_scan_heap to take care of this.  Also, I think we shouldn't
> > > display warning unless the user has explicitly asked for parallel
> > > option.  See the fix in the attached patch.
> >
> > Agreed. But with the updated patch the PARALLEL option without the
> > parallel degree doesn't display warning because params->nworkers = 0
> > in that case. So how about restoring params->nworkers at the end of
> > vacuum_rel()?
> >
>
> I had also thought on those lines, but I was not entirely sure about
> this resetting of workers.  Today, again thinking about it, it seems
> the idea Mahendra is suggesting that is giving an error if the
> parallel degree is not specified seems reasonable to me.  This means
> Vacuum (parallel), Vacuum (parallel) , etc. will give an
> error "parallel degree must be specified".  This idea has merit as now
> we are supporting a parallel vacuum by default, so a 'parallel' option
> without a parallel degree doesn't have any meaning.  If we do that,
> then we don't need to do anything additional about the handling of
> temp tables (other than what patch is already doing) as well.  What do
> you think?
>
This idea make sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Comment fix in session.h

2020-01-12 Thread Antonin Houska
This diff fixes what I consider a typo.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

diff --git a/src/include/access/session.h b/src/include/access/session.h
index ac552105b9..4c1f6ffd40 100644
--- a/src/include/access/session.h
+++ b/src/include/access/session.h
@@ -19,7 +19,7 @@ struct SharedRecordTypmodRegistry;
 
 /*
  * A struct encapsulating some elements of a user's session.  For now this
- * manages state that applies to parallel query, but it principle it could
+ * manages state that applies to parallel query, but in principle it could
  * include other things that are currently global variables.
  */
 typedef struct Session


How to make a OpExpr check compatible among different versions

2020-01-12 Thread Andy Fan
During one of my works for logical rewrite,  I want to check if the expr is
a given Expr.

so the simplest way is:
if (expr->opno == 418 && nodeTag(linitial(expr->args)) == T_xxx  &&
nodeTag(lsecond(expr->args)) == T_ )
{
 ..
}

if we write code like above,  we may have issues if the oid changed in the
future version.
so what would be your suggestion?

Thanks