RE: pl/perl extension fails on Windows
* Noah Misch wrote: > On Wed, Nov 29, 2017 at 11:45:35PM -0500, Tom Lane wrote: >> Oh, OK. In that case, we need to get some representatives of these >> more modern builds into the buildfarm while we're at it. > > Yep. Among machines already in the buildfarm, the one running member > woodlouse is the best candidate for this. Its owner could install > http://strawberryperl.com/download/5.26.1.1/strawberry-perl-5.26.1.1-32bit.msi > and setup another animal on the same machine that builds 32-bit and enables > Perl. Christian, are you interested in doing this? Ready to go, waiting for animal assignment. For now, I can confirm that it works, that is, the buildfarm --test run is successful. Although I have to admit, I fail to see the need for Windows x86 builds, too. Who in their right mind would want them today? -- Christian
Re: [HACKERS] Uninterruptible slow geo_ops.c
[Replying to an old thread...] > A customer of ours hit some very slow code while running the > @>(polygon, polygon) operator with some big polygons. I'm not familiar > with this stuff but I think the problem is that the algorithm converges > too slowly to a solution and also has some pretty expensive calls > somewhere. (Perhaps there is also a problem that the algorithm *never* > converges for some inputs ...) I believe there is a simple flaw on the algorithm that causes it loop endlessly. It should stop looking at the other segments of the polygon after finding the touching one. The attached patch fixes the issue for the query posted to this thread. I suggest backpacking the fix. There are many code quality issues and bugs like this still unresolved on geo_ops.c. I am trying to improve it. Any help appreciated: https://commitfest.postgresql.org/16/1160/ 0001-polygon-contains-lseg-loop-fix-v00.patch Description: Binary data
Re: Rethinking MemoryContext creation
Peter Geoghegan writes: > On Sat, Dec 9, 2017 at 5:53 PM, Tom Lane wrote: >> Overall I'm seeing about a 5% improvement in a "pgbench -S" scenario, >> although that number is a bit shaky since the run-to-run variation >> is a few percent anyway. > Is that with "-M prepared", too? No, I didn't use that. regards, tom lane
Re: Postgres with pthread
On 06/12/2017 17:26, Andreas Karlsson wrote: An additional issue is that this could break a lot of extensions and in a way that it is not apparent at compile time. This means we may need to break all extensions to force extensions authors to check if they are thread safe. I do not like making life hard for out extension community, but if the gains are big enough it might be worth it. It seems to me that the counter-argument is that extensions that naturally support threading will benefit. For example it may be a lot more practical to have CLR or JVM extensions.
Inconsistency in plpgsql's error context reports
I noticed that since I put in commit 390d58135, buildfarm members that use CLOBBER_CACHE_ALWAYS are failing the added test case with diffs like *** 5051,5057 NOTICE: y = 2 ERROR: record "r" is not assigned yet DETAIL: The tuple structure of a not-yet-assigned record is indeterminate. ! CONTEXT: PL/pgSQL function inline_code_block line 15 at RAISE -- Check handling of conflicts between plpgsql vars and table columns. set plpgsql.variable_conflict = error; create function conflict_test() returns setof int8_tbl as $$ --- 5051,5058 NOTICE: y = 2 ERROR: record "r" is not assigned yet DETAIL: The tuple structure of a not-yet-assigned record is indeterminate. ! CONTEXT: SQL statement "SELECT r" ! PL/pgSQL function inline_code_block line 15 at RAISE -- Check handling of conflicts between plpgsql vars and table columns. set plpgsql.variable_conflict = error; create function conflict_test() returns setof int8_tbl as $$ The reason for the difference turns out to be that in a CCA build, we are forcing recreation of the cached plan for the simple expression "r", so the error gets detected during parse analysis of "SELECT r" rather than during execution of the expression. And because this all mostly relies on SPI, we've gone through SPI_plan_get_cached_plan, which inserts _SPI_error_callback into the error context stack. That's where the first CONTEXT line is coming from. There seem to be two ways we could look at this. One is that the new test case just needs to be rejiggered to avoid unstable output ("\set VERBOSITY terse" would be the easiest way). But there is also room to argue that it's bad that plpgsql produces error reports that vary depending on the phase of the moon, which is pretty much what this would look like in the field --- cache flushes will occur unpredictably in most application environments. In that view, where exec_eval_simple_expr() bypasses SPI but nonetheless "has to do some of the things SPI_execute_plan would do", one of the things it ought to be doing is setting up an error context stack entry to duplicate the one that SPI_execute_plan would push. I'm of mixed mind about whether this is a good idea. Adding more overhead to exec_eval_simple_expr() isn't great, even though it'd be just a few instructions. And in most cases the statement-level context line that you get anyway ought to be sufficient to localize the problem. But it's not too hard to imagine that a CONTEXT line that only shows up some of the time could break somebody's error handling code. It's also arguable that it's weird that you get different CONTEXT reports depending on whether exec_simple_check_plan thinks the expression is simple or not, eg regression=# do $$ declare x int := 1; begin raise notice '%', x/0; end $$; ERROR: division by zero CONTEXT: PL/pgSQL function inline_code_block line 2 at RAISE regression=# do $$ declare x int := 1; begin raise notice '%', (select x/0); end $$; ERROR: division by zero CONTEXT: SQL statement "SELECT (select x/0)" PL/pgSQL function inline_code_block line 2 at RAISE Thoughts? regards, tom lane
Re: pl/perl extension fails on Windows
On Sun, Dec 10, 2017 at 12:36:13PM +, Christian Ullrich wrote: > * Noah Misch wrote: > > On Wed, Nov 29, 2017 at 11:45:35PM -0500, Tom Lane wrote: > >> Oh, OK. In that case, we need to get some representatives of these > >> more modern builds into the buildfarm while we're at it. > > > > Yep. Among machines already in the buildfarm, the one running member > > woodlouse is the best candidate for this. Its owner could install > > http://strawberryperl.com/download/5.26.1.1/strawberry-perl-5.26.1.1-32bit.msi > > and setup another animal on the same machine that builds 32-bit and enables > > Perl. Christian, are you interested in doing this? > > Ready to go, waiting for animal assignment. For now, I can confirm that it > works, that is, the buildfarm --test run is successful. Thanks! > Although I have to admit, I fail to see the need for Windows x86 builds, too. > Who in their right mind would want them today? I can't see installing a 32-bit Windows postmaster in 2018. The 32-bit libpq might be useful. Download statistics for the binary distributions would be informative. On the other hand, removing 32-bit Windows support would eliminate three veteran buildfarm animals, and maintenance was cheap in the few years before this thread. I don't think today is the day to remove support, but it's coming one of these years.
Re: [HACKERS] Uninterruptible slow geo_ops.c
Emre Hasegeli writes: > I believe there is a simple flaw on the algorithm that causes it loop > endlessly. It should stop looking at the other segments of the > polygon after finding the touching one. The attached patch fixes the > issue for the query posted to this thread. I suggest backpacking the > fix. I think this fix is wrong, because it assumes that the answer of touched_lseg_inside_poly is authoritative, which it clearly isn't; the blind "return true" for noncollinear segments breaks it. That is, if "a" is an interior point on the polygon's first segment, but "b" is outside the polygon, then lseg_inside_poly's first on_ps_internal call will succeed, its second will not, then it will call touched_lseg_inside_poly. All four of the latter's tests will fail and it'll return true. Your patch would have us break out of the loop without looking at any subsequent segments, which is wrong. It's a bit difficult to demonstrate the error, because lseg_inside_poly isn't exposed directly, only through poly_contain (maybe we should add an operator that does expose it?). But after experimenting awhile I found a counterexample: select '(0,0),(10,0),(0,10)'::polygon @> '(6,0),(7,0),(6,6)'::polygon; This surely should return false, and it does in HEAD, but your patch makes it return true. There are other things to be unhappy about. As written, lseg_inside_poly and touched_lseg_inside_poly can mutually recurse to a depth equal to the number of points in the polygon, which opens a stack-overflow hazard. I'd like to get rid of the recursion; it seems unnecessary. Also, while it's sadly underdocumented what touched_lseg_inside_poly is meant to accomplish at all, I think what it's there for is to deal with cases involving collinear polygon segments. That is, if we inject some useless points into the polygon, say '(0,0),(5,0),(10,0),(0,10)'::polygon then it should still be true that the line segment '(1,0),(9,0)' is "inside" this polygon, but we won't find that out if we just consider the (0,0),(5,0) and (5,0),(10,0) polygon segments independently. I am not, however, convinced that this coding fixes that problem if there are multiple polygon segments we need to match up against. What concerns me is that if the first collinear poly segment we come to is an interior subset of the segment-under-test, say we're considering line segment '(1,0),(9,0)' and we find a poly segment '(3,0),(5,0)', then we need to separately detect whether each of '(1,0),(3,0)' and '(5,0),(9,0)' are subsets of the rest of the polygon ... and I don't see where this code can handle that. I've not managed to build a weaponized test case though. regards, tom lane
Re: Out of date comment in cached_plan_cost
On 9 December 2017 at 06:05, Robert Haas wrote: > On Thu, Dec 7, 2017 at 3:14 AM, David Rowley > wrote: >> The attached is my attempt at putting this right. > > I don't feel entirely right about the way this seems to treat > inheritance and partitioning as two entirely separate features; that's > true from a user perspective, more or less, but not internally to the > code. Originally I had it typed out in a way that mentioned something about "unless using partition-wise join with partitioned tables", but I felt that the partition planning code is in such a state of flux at the moment that I feared that comment might get outdated again someday in the near future. I've attached another patch which just loosens the claim that join planning situation is never made worse by inheritance children to now say it does not "generally" make any worse. > Of course, this also begs the question of whether we ought to be > changing the formula somehow. Perhaps, but not for this patch. The comment already mentions that the code is far from perfect. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services cached_plan_cost_comment_fix_v2.patch Description: Binary data
Re: ScalarArrayOpExpr and multi-dimensional arrays
On 2017/12/08 23:34, Tom Lane wrote: > Amit Langote writes: >> I wonder if ScalarArrayOpExpr is not really meant for multi-dimensional >> arrays appearing on the right hand side? Because: >> # select array[1] = any (array[array[1], array[2]]); > >> ERROR: operator does not exist: integer[] = integer > > You are falling into the misimpression that a 2-D array is an array of > 1-D arrays. It is not, even if the syntax makes it look like that. > > ScalarArrayOpExpr just iterates over the array elements without regard > to dimensionality; so the LHS must be of the element type. Yeah, I can now see that. Although, I wonder if there is any room for improvement here. Instead of waiting for make_scalar_array_op() to emit the error as it does today, would it be better if we error'd out earlier saying "ERROR: ANY/ALL leftarg must be scalar, not array"? Attached a patch for that, if it's worth going for at all. Thanks, Amit diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 29f9da796f..c179d297af 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -958,6 +958,12 @@ transformAExprOpAny(ParseState *pstate, A_Expr *a) a->location); lexpr = transformExprRecurse(pstate, lexpr); + if (type_is_array(exprType(lexpr))) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), +errmsg("ANY/ALL leftarg must be be scalar, not array"), +parser_errposition(pstate, exprLocation(lexpr; + rexpr = transformExprRecurse(pstate, rexpr); return (Node *) make_scalar_array_op(pstate, @@ -981,6 +987,12 @@ transformAExprOpAll(ParseState *pstate, A_Expr *a) a->location); lexpr = transformExprRecurse(pstate, lexpr); + if (type_is_array(exprType(lexpr))) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), +errmsg("ANY/ALL leftarg must be scalar, not array"), +parser_errposition(pstate, exprLocation(lexpr; + rexpr = transformExprRecurse(pstate, rexpr); return (Node *) make_scalar_array_op(pstate, diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out index c730563f03..c127ffaac0 100644 --- a/src/test/regress/expected/arrays.out +++ b/src/test/regress/expected/arrays.out @@ -1295,6 +1295,13 @@ SELECT -1 != ALL(ARRAY(SELECT NULLIF(g.i, 900) FROM generate_series(1,1000) g(i) (1 row) +-- leftarg must always be scalar +select '{1}'::int[] = any ('{{1}, {2}}'); +ERROR: ANY/ALL leftarg must be be scalar, not array +LINE 1: select '{1}'::int[] = any ('{{1}, {2}}'); + ^ +create table arr_tbl_check (a int[] check (a = any ('{{1}, {2}}'))); +ERROR: ANY/ALL leftarg must be be scalar, not array -- test indexes on arrays create temp table arr_tbl (f1 int[] unique); insert into arr_tbl values ('{1,2,3}'); diff --git a/src/test/regress/sql/arrays.sql b/src/test/regress/sql/arrays.sql index 25dd4e2c6d..d5ced0f695 100644 --- a/src/test/regress/sql/arrays.sql +++ b/src/test/regress/sql/arrays.sql @@ -374,6 +374,9 @@ select 33 = all ('{1,null,3}'); select 33 = all ('{33,null,33}'); -- nulls later in the bitmap SELECT -1 != ALL(ARRAY(SELECT NULLIF(g.i, 900) FROM generate_series(1,1000) g(i))); +-- leftarg must always be scalar +select '{1}'::int[] = any ('{{1}, {2}}'); +create table arr_tbl_check (a int[] check (a = any ('{{1}, {2}}'))); -- test indexes on arrays create temp table arr_tbl (f1 int[] unique);
Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD
Hi hackers, I heard a report of a 10.1 cluster hanging with several 'BtreePage' wait_events showing in pg_stat_activity. The query plan involved Parallel Index Only Scan, and the table is concurrently updated quite heavily. I tried and failed to make a reproducer, but from the clues available it seemed clear that somehow *all* participants in a Parallel Index Scan must be waiting for someone else to advance the scan. The report came with a back trace[1] that was the same in all 3 backends (leader + 2 workers), which I'll summarise here: ConditionVariableSleep _bt_parallel_seize _bt_readnextpage _bt_steppage _bt_next btgettuple index_getnext_tid IndexOnlyNext I think _bt_steppage() called _bt_parallel_seize(), then it called _bt_readnextpage() which I guess must have encountered a BTP_DELETED or BTP_HALF_DEAD-marked page so didn't take this early break out of the loop: /* check for deleted page */ if (!P_IGNORE(opaque)) { PredicateLockPage(rel, blkno, scan->xs_snapshot); /* see if there are any matches on this page */ /* note that this will clear moreRight if we can stop */ if (_bt_readpage(scan, dir, P_FIRSTDATAKEY(opaque))) break; } ... and then it called _bt_parallel_seize() itself, in violation of the rule (by my reading of the code) that you must call _bt_parallel_release() (via _bt_readpage()) or _bt_parallel_done() after seizing the scan. If you call _bt_parallel_seize() again without doing that first, you'll finish up waiting for yourself forever. Does this theory make sense? [1] http://dpaste.com/05PGJ4E -- Thomas Munro http://www.enterprisedb.com
Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD
On Mon, Dec 11, 2017 at 3:51 PM, Thomas Munro wrote: > I heard a report of a 10.1 cluster hanging with several 'BtreePage' > wait_events showing in pg_stat_activity. I forgot to add, for bug reporter credit purposes: thanks to Patrick Hemmer for the off-list report and backtrace. He's able to work around this recurring problem for now by disabling parallelism. -- Thomas Munro http://www.enterprisedb.com
Added PostgreSQL internals learning materials in Developer FAQ
Hello, FYI I collected Web resources to learn PostgreSQL internals in the Developer FAQ page. I did this because I need to provide self-education materials for new developers. Specifically, I modified "How is the source code organized?", and added "What information is available to learn PostgreSQL internals?". Developer FAQ https://wiki.postgresql.org/wiki/Developer_FAQ#Development_Tools_and_Help I borrowed most of the information from the following discussion. I'd appreciate it if you could other useful information. On How To Shorten the Steep Learning Curve Towards PG Hacking... https://www.postgresql.org/message-id/flat/29ce9fa2-63d1-001b-74f4-a26e23500acb%40lab.ntt.co.jp#29ce9fa2-63d1-001b-74f4-a26e23500...@lab.ntt.co.jp Regards Takayuki Tsunakawa
Re: [HACKERS] parallel.c oblivion of worker-startup failures
On Fri, Dec 8, 2017 at 9:45 PM, Robert Haas wrote: > On Fri, Dec 8, 2017 at 4:56 AM, Amit Kapila wrote: >> I think the optimization you are suggesting has a merit over what I >> have done in the patch. However, one point to note is that we are >> anyway going to call that function down in the path( >> WaitForParallelWorkersToExit->WaitForBackgroundWorkerShutdown->GetBackgroundWorkerPid), >> so we might want to take the advantage of calling it first time as >> well. We can actually cache the status of workers that have returned >> BGWH_STOPPED and use it later so that we don't need to make this >> function call again for workers which are already stopped. We can >> even do what Tom is suggesting to avoid calling it for workers which >> are known to be launched, but I am really not sure if that function >> call is costly enough that we need to maintain one extra state to >> avoid that. > > Well, let's do what optimization we can without making it too complicated. > Okay, see the attached and let me know if that suffices the need? >> While looking into this code path, I wonder how much we are gaining by >> having two separate calls (WaitForParallelWorkersToFinish and >> WaitForParallelWorkersToExit) to check the status of workers after >> finishing the parallel execution? They are used together in rescan >> code path, so apparently there is no benefit calling them separately >> there. OTOH, during shutdown of nodes, it will often be the case that >> they will be called in a short duration after each other except for >> the case where we need to shut down the node for the early exit like >> when there is a limit clause or such. > > I'm not 100% sure either, but if we're going to do anything about > that, it seems like a topic for another patch. > makes sense. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_parallel_worker_startup_failures_v1.patch Description: Binary data
Re: BUG #14941: Vacuum crashes
On Sat, Dec 2, 2017 at 6:16 AM, Bossart, Nathan wrote: > On 12/1/17, 3:04 PM, "Robert Haas" wrote: >> Seems entirely reasonable to me, provided that we only update the >> extensible-options syntax: >> >> VACUUM [ ( option [, ...] ) ] [ table_and_columns [, ...] ] >> >> I don't want to add any more options to the older syntax: >> >> VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ table_and_columns [, >> ...] ] > > Right. This seems like the right path forward to me, as the > VACUUM documentation states that the unparenthesized syntax is > deprecated, and the DISABLE_PAGE_SKIPPING option was not added > to the old syntax, either. Yeah, the exact same discussion has happened when what has become DISABLE_PAGE_SKIPPING was discussed for the freeze map. -- Michael
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Thanks Tels for reviewing the patch. On Fri, Dec 8, 2017 at 2:54 PM, Tels wrote: > Hello Rushabh, > > On Fri, December 8, 2017 2:28 am, Rushabh Lathia wrote: > > Thanks for review. > > > > On Fri, Dec 8, 2017 at 6:27 AM, Peter Geoghegan wrote: > > > >> On Thu, Dec 7, 2017 at 12:25 AM, Rushabh Lathia > >> wrote: > >> > 0001-Add-parallel-B-tree-index-build-sorting_v14.patch > > I've looked only at patch 0002, here are some comments. > > > + * leaderasworker indicates whether leader going to participate as > worker or > > + * not. > > The grammar is a bit off, and the "or not" seems obvious. IMHO this could > be: > > + * leaderasworker indicates whether the leader is going to participate as > worker > > Sure. > The argument leaderasworker is only used once and for one temp. variable > that is only used once, too. So the temp. variable could maybe go. > > And not sure what the verdict was from the const-discussion threads, I did > not follow it through. If "const" is what should be done generally, then > the argument could be consted, as to not create more "unconsted" code. > > E.g. so: > > +plan_create_index_workers(Oid tableOid, Oid indexOid, const bool > leaderasworker) > > Make sense. > and later: > > - sort_mem_blocks / (parallel_workers + 1) < > min_sort_mem_blocks) > + sort_mem_blocks / (parallel_workers + (leaderasworker > ? 1 : 0)) < min_sort_mem_blocks) > > Even I didn't liked to take a extra variable, but then code looks bit unreadable - so rather then making difficult to read the code - I thought of adding new variable. > Thank you for working on this patch! > > I will address review comments in the next set of patches. Regards, Rushabh Lathia www.EnterpriseDB.com
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
On Sat, Dec 9, 2017 at 2:00 AM, Robert Haas wrote: > On Fri, Dec 8, 2017 at 3:20 AM, Masahiko Sawada wrote: >>> The first hunk in monitoring.sgml looks unnecessary. >> >> You meant the following hunk? >> >> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml >> index 8d461c8..7aa7981 100644 >> --- a/doc/src/sgml/monitoring.sgml >> +++ b/doc/src/sgml/monitoring.sgml >> @@ -669,8 +669,8 @@ postgres 27093 0.0 0.0 30096 2752 ? >> Ss 11:34 0:00 postgres: ser >>Heavyweight locks, also known as lock manager locks or simply >> locks, >>primarily protect SQL-visible objects such as tables. However, >>they are also used to ensure mutual exclusion for certain internal >> - operations such as relation extension. >> wait_event will >> - identify the type of lock awaited. >> + operations such as waiting for a transaction to finish. >> + wait_event will identify the type of lock >> awaited. >> >> >> >> >> I think that since the extension locks are no longer a part of >> heavyweight locks we should change the explanation. > > Yes, you are right. > >>> +RelationExtensionLockWaiterCount(Relation relation) >>> >>> Hmm. This is sort of problematic, because with then new design we >>> have no guarantee that the return value is actually accurate. I don't >>> think that's a functional problem, but the optics aren't great. >> >> Yeah, with this patch we could overestimate it and then add extra >> blocks to the relation. Since the number of extra blocks is capped at >> 512 I think it would not become serious problem. > > How about renaming it EstimateNumberOfExtensionLockWaiters? Agreed, fixed. > >>> +/* If force releasing, release all locks we're holding */ >>> +if (force) >>> +held_relextlock.nLocks = 0; >>> +else >>> +held_relextlock.nLocks--; >>> + >>> +Assert(held_relextlock.nLocks >= 0); >>> + >>> +/* Return if we're still holding the lock even after computation */ >>> +if (held_relextlock.nLocks > 0) >>> +return; >>> >>> I thought you were going to have the caller adjust nLocks? >> >> Yeah, I was supposed to change so but since we always release either >> one lock or all relext locks I thought it'd better to pass a bool >> rather than an int. > > I don't see why you need to pass either one. The caller can set > held_relextlock.nLocks either with -- or = 0, and then call > RelExtLockRelease() only if the resulting value is 0. Fixed. > >>> When I took a break from sitting at the computer, I realized that I >>> think this has a more serious problem: won't it permanently leak >>> reference counts if someone hits ^C or an error occurs while the lock >>> is held? I think it will -- it probably needs to do cleanup at the >>> places where we do LWLockReleaseAll() that includes decrementing the >>> shared refcount if necessary, rather than doing cleanup at the places >>> we release heavyweight locks. >>> I might be wrong about the details here -- this is off the top of my head. >> >> Good catch. It can leak reference counts if someone hits ^C or an >> error occurs while waiting. Fixed in the latest patch. But since >> RelExtLockReleaseAll() is called even when such situations I think we >> don't need to change the place where releasing the all relext lock. We >> just moved it from heavyweight locks. Am I missing something? > > Hmm, that might be an OK way to handle it. I don't see a problem off > the top of my head. It might be clearer to rename it to > RelExtLockCleanup() though, since it is not just releasing the lock > but also any wait count we hold. Yeah, it seems better. Fixed. > +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */ > +#define RELEXT_WAIT_COUNT_MASK((uint32) ((1 << 24) - 1)) > > Let's drop the comment here and instead add a StaticAssertStmt() that > checks this. Fixed. I added StaticAssertStmt() to InitRelExtLocks(). > > I am slightly puzzled, though. If I read this correctly, bits 0-23 > are used for the waiter count, bit 24 is always 0, bit 25 indicates > the presence or absence of an exclusive lock, and bits 26+ are always > 0. That seems slightly odd. Shouldn't we either use the highest > available bit for the locker (bit 31) or the lowest one (bit 24)? The > former seems better, in case MAX_BACKENDS changes later. We could > make RELEXT_WAIT_COUNT_MASK bigger too, just in case. I agree with the former. Fixed. > +/* Make a lock tag */ > +tag.dbid = MyDatabaseId; > +tag.relid = relid; > > What about shared relations? I bet we need to use 0 in that case. > Otherwise, if backends in two different databases try to extend the > same shared relation at the same time, we'll (probably) fail to notice > that they conflict. > You're right. I changed it so that we set invalidOId to tag.dbid if the relation is shared relation. > + * To avoid unnecessary recomput
Re: [HACKERS] What does it mean by XLOG_BACKUP_RECORD?
On Sat, Dec 9, 2017 at 1:25 AM, Peter Eisentraut wrote: > On 6/29/17 06:09, Masahiko Sawada wrote: >> Thanks, I agree to use XLOG_BACKUP_END instead. >> >>> Worse, the current comment implies that >>> minRecoveryPoint is incorrectly set if it is true. Bleh. >> >> Looking at the condition, we use minRecoveryPoint only when >> ControlFile->backupEndRequired is *false*. So I guess that it means >> that minRecoveryPoint is incorrectly set if >> ControlFile->backupEndReuired is true. Am I missing something? > > I agree with you that the logic in the comment is correct. I've > committed just the symbol change. > Thank you for picking up an old thread and committing it! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.
On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas wrote: > On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier > wrote: >> On Fri, Dec 8, 2017 at 5:38 PM, Masahiko Sawada >> wrote: >>> After off-discussion with Fujii-san, I've updated the comment of why >>> we should disallow interrupts before setting/cleanup the session-level >>> lock. Please review it. >> >> + /* >> +* Set session-level lock. If we allow interrupts before setting >> +* session-level lock, we could call callbacks with an inconsistent >> +* state. To avoid calling CHECK_FOR_INTERRUPTS by >> LWLockReleaseClearVar >> +* which is called by WALInsertLockRelease before changing the backup >> +* state we change it while holding the WAL insert lock. >> +*/ >> So you are just adding the reference to WALInsertLockRelease.. Instead >> of writing the function names for LWLocks, I also added a sentence "If we allow interrupts before cleanup session-level lock, we could call do_pg_abort_backup with an inconsistent state" at two places: setting and cleanup session-level lock. >> I would just write "To >> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a >> LWLock" and be done with it. There is no point to list a full function >> dependency list, which could change in the future with static routines >> of lwlock.c. Agreed. Updated the comment. > > I think it's actually good to be explicit here. I looked at this > patch a bit last week and had great difficulty understanding how the > CHECK_FOR_INTERRUPTS() could happen. > Attached the updated version patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_do_pg_abort_backup_v11.patch Description: Binary data
Re: BUG #14941: Vacuum crashes
On Wed, Dec 6, 2017 at 1:52 AM, Bossart, Nathan wrote: > 0001_fix_unparenthesized_vacuum_grammar_v1.patch > > One VacuumStmt grammar rule allows users to specify the VACOPT_ANALYZE > option by including an AnalyzeStmt at the end of the command. This > appears to have been added as part of the introduction of the ANALYZE > command (f905d65e). However, this means that users can call VACUUM > commands like > > VACUUM VERBOSE ANALYZE VERBOSE pg_class; > > Besides being disallowed according to the documentation, I think this > may give users the idea that there is a difference between the VERBOSE > options for VACUUM versus ANALYZE. In fact, they are the same option, > and specifying it twice is redundant. Hmm. Yeah. If adding a parenthesis grammar to ANALYZE this is a good move for long purposes. If we don't do that now, or at least for the purpose of this patch set, then a AnalyzeStmt node embedded directly in the grammar of VACUUM is likely to bite in the future. > This change fixes the unparenthesized VACUUM syntax to disallow the out- > of-order VACUUM options as described above. This is a prerequisite > change for opening up VACOPT_NOWAIT to users, as adding it to the > grammar as-is would cause statements like > > VACUUM VERBOSE NOWAIT ANALYZE VERBOSE NOWAIT pg_class; > > to be allowed. Since I am also looking to introduce a parenthesized > syntax for ANALYZE, this patch would prevent statements like If you add only a parenthesized grammar of ANALYZE, it seems to me that this would not be an allowed query anyway. > VACUUM VERBOSE ANALYZE (VERBOSE, NOWAIT) pg_class; > > from being accepted as well. This could also be equivalent to ANALYZE(VERBOSE, NOWAIT). I can see pros and cons by keeping the complete extension of AnalyzeStmt in the VACUUM grammar, but as the documentation does not mention VACUUM VERBOSE ANALYZE VERBOSE as a legit query I guess that ripping it out is not going to annoy many users. Still let's see opinions from other folks on -hackers as another approach to the feature you want to add here would be to just implement the grammar for VACUUM but *not* ANALYZE, but I'd like to think that we still want ANALYZE to be supported, and also we want it to be extensible as a different thing than what VACUUM is. > 0002_add_parenthesized_analyze_syntax_v1.patch > > As a prerequisite to giving users access to the VACOPT_NOWAIT option, I > am introducing a parenthesized syntax for ANALYZE that is similar to > VACUUM's. Following VACUUM's example, any new options will be added to > this syntax, and the old style will become deprecated. > > Adding a parenthesized syntax for ANALYZE is not strictly necessary for > providing the NOWAIT option, as NOWAIT is already a keyword in gram.y. > However, I thought it would be good to match VACUUM (since the command > structure is so similar), and this work would be required at some point > anyway if ANALYZE ever accepts options that we do not want to make > reserved keywords. Yep, agreed with the contents of this patch. > 0003_add_nowait_vacuum_option_v1.patch > > This change provides the existing VACOPT_NOWAIT option to users. This > option was previously only used by autovacuum in special cases, but it > seems like a potentially valuable option in other cases as well. For > example, perhaps a user wants to run a nightly VACUUM job that skips > tables that cannot be locked due to autovacuum (or something else) > already working on it. > > I chose to use NOWAIT as the option name because it is already a keyword > for the LOCK command. Makes sense to me. > This patch follows the existing logging precedent added by 11d8d72c and > ab6eaee8: if a relation is explicitly specified in the command, a log > statement will be emitted if it is skipped. If the relation is not > specified (e.g. "VACUUM FULL;"), no such log statements will be emitted. +WARNING: skipping analyze of "test1" --- lock not available +step analyze_specified: ANALYZE (NOWAIT) test1, test2; +step commit: + COMMIT; OK for a WARNING for me in this case. We already do that for the other entries. Let's see what other folks think first about the ANALYZE grammar in VACUUM, as well as having ANALYZE use a parenthesized grammer. FWIW, I like the contents of this patch set, and the thing is non-intrusive. I think that NOWAIT gains more value now that multiple relations can be vacuumed or analyzed with a single manual command. -- Michael
Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.
On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada wrote: > On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas wrote: >> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier >> wrote: >>> I would just write "To >>> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a >>> LWLock" and be done with it. There is no point to list a full function >>> dependency list, which could change in the future with static routines >>> of lwlock.c. > > Agreed. Updated the comment. Robert actually liked adding the complete routine list. Let's see what Fujii-san thinks at the end, there is still some time until the next round of minor releases. Robert, if Fujii-san does not show up in time, would you look at this patch? I won't fight if you rework the comments the way you think is better :) -- Michael
Re: no partition pruning when partitioning using array type
On 2017/12/09 3:46, Robert Haas wrote: > On Fri, Dec 8, 2017 at 5:40 AM, Amit Langote > wrote: >> I noticed that if you partition using a array type column, partition >> pruning using constraint exclusion fails to work due to a minor problem. >> >> Example: >> >> create table p (a int[]) partition by list (a); >> create table p1 partition of p for values in ('{1}'); >> create table p1 partition of p for values in ('{2, 3}', '{4, 5}'); >> >> explain select a from p where a = '{1}'; >> QUERY PLAN >> |- >> Append (cost=0.00..54.00 rows=14 width=32) >>-> Seq Scan on p1 (cost=0.00..27.00 rows=7 width=32) >> Filter: (a = '{1}'::integer[]) >>-> Seq Scan on p2 (cost=0.00..27.00 rows=7 width=32) >> Filter: (a = '{1}'::integer[]) >> >> explain select a from p where a = '{2, 3}'; >> QUERY PLAN >> |- >> Append (cost=0.00..54.00 rows=14 width=32) >>-> Seq Scan on p1 (cost=0.00..27.00 rows=7 width=32) >> Filter: (a = '{2,3}'::integer[]) >>-> Seq Scan on p2 (cost=0.00..27.00 rows=7 width=32) >> Filter: (a = '{2,3}'::integer[]) >> (5 rows) >> >> In the case of array type partition key, make_partition_op_expr() will >> have to put a RelabelType node on top of the partition key Var, after >> having selected an = operator from the array_ops family. The RelabelType >> causes operator_predicate_proof() to fail to consider predicate leftop and >> clause leftop as equal, because only one of them ends up having the >> RelabelType attached to it. >> >> As a simple measure, the attached patch teaches operator_predicate_proof() >> to strip RelabelType nodes from all the nodes it compares using equal(). >> I also added a relevant test in partition_prune.sql. > > I guess the question is whether that's guaranteed to be safe. I spent > a little bit of time thinking about it and I don't see a problem. The > function is careful to check that the opclasses and collations of the > OpExprs are compatible, and it is the behavior of the operator that is > in question here, not the column type, so your change seems OK to me. > But I hope somebody else will also study this, because this stuff is > fairly subtle and I would not like to be the one who breaks it. Thanks for taking a look at it. I will try to say a little more on why it seems safe. RelabelType node exists (if any) on top of a given expression node only to denote that the operator for which the node is an input will interpret its result as of the type RelableType.resulttype, instead of the node's original type. No conversion of values actually occurs before making any decisions that this function is in charge of making, because the mismatching types in question are known to be binary coercible. Or more to the point, the operator that will be used in the proof will give correct answers for the values without having to do any conversion of values. IOW, it's okay if we simply drop the RelabelType, because it doesn't alter in any way the result of the proof that operator_predicate_proof() performs. That said, I've to come think in this particular case that the partitioning code that generates the predicate expression should be a bit smarter about the various types it manipulates such that RelabelType won't be added in the first place. In contrast, make_op(), that generates an OpExpr from the parser representation of a = '{1}' appearing in the query's WHERE clause, won't add the RelabelType because the underlying type machinery that it invokes is able to conclude that that's unnecessary. The original patch may still be worth considering as a solution though. I hope someone else chimes in as well. :) Thanks, Amit
Re: Added PostgreSQL internals learning materials in Developer FAQ
On Mon, Dec 11, 2017 at 8:43 AM, Tsunakawa, Takayuki wrote: > Hello, > > FYI > I collected Web resources to learn PostgreSQL internals in the Developer FAQ > page. I did this because I need to provide self-education materials for new > developers. Specifically, I modified "How is the source code organized?", > and added "What information is available to learn PostgreSQL internals?". > > Developer FAQ > https://wiki.postgresql.org/wiki/Developer_FAQ#Development_Tools_and_Help > > > I borrowed most of the information from the following discussion. > Thanks, looks like quite useful information for the new people who want to learn about PostgreSQL especially internals. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD
On Mon, Dec 11, 2017 at 8:21 AM, Thomas Munro wrote: > Hi hackers, > > > ... and then it called _bt_parallel_seize() itself, in violation of > the rule (by my reading of the code) that you must call > _bt_parallel_release() (via _bt_readpage()) or _bt_parallel_done() > after seizing the scan. If you call _bt_parallel_seize() again > without doing that first, you'll finish up waiting for yourself > forever. Does this theory make sense? > Yes, I think if the current page is half-dead or deleted, we need to set the next page to be scanned and release the parallel scan. This has to be done for both forward and backward scans. Thanks for looking into it. I will see if we can write some test. In the meantime if possible, can you please request Patrick Hemmer to verify the attached patch? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_parallel_index_scan_v1.patch Description: Binary data
Re: SIGPIPE in TAP tests
On Sun, Dec 10, 2017 at 6:02 AM, Noah Misch wrote: > Two buildfarm runs[1][2] from the last 90 days have failed in > src/test/authentication, like this: > > t/001_password.pl .. > Failed 3/8 subtests > t/002_saslprep.pl .. ok 1815 ms ( 0.00 usr 0.00 sys + 0.89 cusr 0.26 > csys = 1.15 CPU) > > Test Summary Report > --- > t/001_password.pl (Wstat: 13 Tests: 5 Failed: 0) > Non-zero wait status: 13 > Parse errors: Bad plan. You planned 8 tests but ran 5. > Files=2, Tests=17, 3 wallclock secs ( 0.04 usr 0.00 sys + 1.67 cusr > 0.50 csys = 2.21 CPU) > Result: FAIL > > Compared to a good run, the other logs just end suddenly after the expected > "FATAL: password authentication failed". "Wstat: 13" means the Perl process > died to signal 13 (SIGPIPE). This test invokes psql in a way that fails > authentication, and it writes "SELECT 1" to psql's stdin. The SIGPIPE happens > if the psql process exits before that write. Nice investigation. An interesting coincidence is that I have looked yesterday at an off-list reported some folks have sent me which is basically what you have here. + # Return EPIPE instead of killing the process with SIGPIPE. An affected + # test may still fail, but it's more likely to report useful facts. + $SIG{PIPE} = 'IGNORE'; With this portion changed I can indeed see something which is more helpful: psql: FATAL: password authentication failed for user "saslpreptest1_role" ack Broken pipe: write( 11, 'SELECT 1' ) at /Library/Perl/5.18/IPC/Run/IO.pm line 558. If SIGPIPE is ignored then test output just stops after generating the FATAL message. Oops. > The two src/test/authentication tests then fail, but nothing else fails. > Let's ignore SIGPIPE in all TAP tests, which leaves some evidence in > regress_log_001_password: > > ack Broken pipe: write( 13, 'SELECT 1' ) at > /home/nm/sw/cpan/lib/perl5/IPC/Run/IO.pm line 549. > > To fix the actual failures, we can cease sending "SELECT 1"; it's enough to > disconnect immediately. Patch attached. Perhaps you could use an empty string instead? I feel a bit uneasy about passing an undefined object to IPC::Run::run. -- Michael
Re: [HACKERS] postgres_fdw bug in 9.6
(2017/12/09 5:53), Robert Haas wrote: On Sun, Dec 3, 2017 at 2:56 PM, Tom Lane wrote: Not sure where that leaves us for the patch at hand. It feels to me like it's a temporary band-aid for code that we want to get rid of in the not-too-long run. As such, it'd be okay if it were smaller, but it seems bigger and more invasive than I could wish for a band-aid. Well, the bug report is a year old today, so we should try to do something. The patch needs a rebase, but reading through it, I'm not convinced that it's particularly risky. I mean, it's going to break FDWs that are calling GetExistingLocalJoinPath(), but there are probably very few third-party FDWs that do that. Any that do are precisely the ones that need this fix, so I think there's a good chance they'll forgive of us for requiring them to make a code change. I think we can contain the risk of anything else getting broken to an acceptable level because there's not really very much other code changing. The changes to joinpath.c need to be carefully audited to make sure that they are not changing the semantics, but that seems like it shouldn't take more than an hour of code-reading to check. The new fields in JoinPathExtraData can be moved to the end of the struct to minimize ABI breakage. I don't see what else there is that could break apart from the EPQ rechecks themselves, and if that weren't broken already, this patch wouldn't exist. Now, if you're still super-concerned about this breaking something, we could commit it only to master, where it will have 9 months to bake before it gets released. I think that's overly conservative, but I think it's still better than waiting for the rewrite you'd like to see happen. We don't know when or if anyone is going to undertake that, and if we wait, we may easing release a v11 that's got the same defect as v9.6 and now v10. And I don't see that we lose much by committing this now even if that rewrite does happen in time for v11. Ripping out CreateLocalJoinPath() won't be any harder than ripping out GetExistingLocalJoinPath(). Agreed. Attached is an rebased version which moved the new fields in JoinPathExtraData to the end of that struct. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 1701,1722 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t Output: t1.c1, t2.c1, t1.c3, t1.*, t2.* Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 !-> Merge Join Output: t1.c1, t1.c3, t1.*, t2.c1, t2.* ! Merge Cond: (t1.c1 = t2.c1) ! -> Sort Output: t1.c1, t1.c3, t1.* !Sort Key: t1.c1 !-> Foreign Scan on public.ft1 t1 ! Output: t1.c1, t1.c3, t1.* ! Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE ! -> Sort Output: t2.c1, t2.* !Sort Key: t2.c1 !-> Foreign Scan on public.ft2 t2 ! Output: t2.c1, t2.* ! Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ! (23 rows) SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1; c1 | c1 --- 1701,1716 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.* Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 !-> Nested Loop Output: t1.c1, t1.c3, t1.*, t2.c1, t2.* ! Join Filter: (t1.c1 = t2.c1) ! -> Foreign Scan on public.ft1 t1 Output: t1.c1, t1.c3, t1.* !Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE ! -> For
Re: SIGPIPE in TAP tests
On Mon, Dec 11, 2017 at 04:19:52PM +0900, Michael Paquier wrote: > On Sun, Dec 10, 2017 at 6:02 AM, Noah Misch wrote: > If SIGPIPE is ignored then test output just stops after generating the > FATAL message. Oops. You mean "If SIGPIPE is not ignored ...", right? > > To fix the actual failures, we can cease sending "SELECT 1"; it's enough to > > disconnect immediately. Patch attached. > > Perhaps you could use an empty string instead? I feel a bit uneasy > about passing an undefined object to IPC::Run::run. IPC::Run documents the equivalence of undef and '' in this context; search for "close a child processes stdin" in http://search.cpan.org/~rbs/IPC-Run-0.78/lib/IPC/Run.pm. Thus, I expect both spellings to work reliably, and I find "undef" slightly more evocative.