Re: How do we support FULL JOIN on PostGIS types?
Hi, Thanks a lot RhodiumToad on IRC for suggestion of setting HASHES, MERGES on OPERATOR =. Now we have other problem: how do we set these flags on upgrade to new version of extension? Dropping an OPERATOR = will drop all indexes an views depending on it so isn't really an option. Also, if someone can sneak "ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions" keywords into https://www.postgresql.org/docs/current/xoper-optimization.html#id-1.8.3.17.8 it would greatly help future extension writers - it's not possible to google this page out by the error message. On Thu, May 16, 2019 at 7:05 PM Darafei "Komяpa" Praliaskouski < m...@komzpa.net> wrote: > Hi! > > Greetings from OSGeo Code sprint in Minneapolis :) > > We're trying to make FULL JOIN on equality of geometry and can't figure > out why it doesn't work. > > Here's reproducer, it works on bytea but not on PostGIS geometry throwing > out > > ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable > join conditions > > https://trac.osgeo.org/postgis/ticket/4394 > > We already have a btree opclass with equality: > > https://github.com/postgis/postgis/blob/svn-trunk/postgis/postgis.sql.in#L420 > > > We also have hash equality opclass: > > https://github.com/postgis/postgis/blob/svn-trunk/postgis/postgis.sql.in#L440 > > > Reading through Postgres documentation I can't figure out what else shall > we do for this join to work. How do we make it work? > > -- > Darafei Praliaskouski > Support me: http://patreon.com/komzpa > -- Darafei Praliaskouski Support me: http://patreon.com/komzpa
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sun, Jun 2, 2019 at 5:18 PM Tomas Vondra wrote: > Thanks for the rebase! I think the patch is in pretty good shape - I'm > sure we'll find ways to make it more efficient etc. but IMO that's fine > and should not prevent getting it committed. Thank you for the in-depth review! > Currently, the costing logic (cost_incremental_sort) assumes all groups > have the same size, which is fine for uniform distributions. But for > skewed distributions, that may be an issue. > > Consider for example a table with 1M rows, two columns, 100 groups in each > column, and index on the first one. > > CREATE table t (a INT, b INT); > > INSERT INTO t SELECT 100*random(), 100*random() > FROM generate_series(1,100); > > Now, let's do a simple limit query to find the first row: > > SELECT * FROM t ORDER BU a, b LIMIT 1; > > In this case the current costing logic is fine - the groups are close to > average size, and we only need to sort the first group, i.e. 1% of data. > > Now, let's say the first group is much larger: > > > INSERT INTO t SELECT 0, 100*random() > FROM generate_series(1,90) s(i); > > INSERT INTO t SELECT 100*random(), 100*random() > FROM generate_series(1,10); > > That is, the first group is roughly 90% of data, but the number of groups > is still the same. But we need to scan 90% of data. But the average group > size is still the same, so the current cost model is oblivious to this. Thinking out loud here: the current implementation doesn't guarantee that sort groups always have the same prefix column values because (from code comments) "Sorting many small groups with tuplesort is inefficient). While this seems a reasonable optimization, I think it's possible that thinking steered away from an optimization in the opposite direction. Perhaps we should always track whether or not all prefix tuples are the same (currently we only do that after reaching DEFAULT_MIN_GROUP_SIZE tuples) and use that information to be able to have tuplesort only care about the non-prefix columns (where currently it has to sort on all pathkey columns even though for a large group the prefix columns are guaranteed to be equal). Essentially I'm trying to think of ways that would get us to comparable performance with regular sort in the case of large batch sizes. One other thing about the DEFAULT_MIN_GROUP_SIZE logic is that in the case where you have a very small group and then a very large batch, we'd lose the ability to optimize in the above way. That makes me wonder if we shouldn't intentionally optimize for the possibility of large batch sizes, since a little extra expense per group/tuple is more likely to be a non-concern with small groups anyway when there are large numbers of input tuples but a relatively small limit. Thoughts? > So I think we should look at the MCV list, and use that information when > computing the startup/total cost. I think using the first/largest group to > compute the startup cost, and the average group size for total cost would > do the trick. I think this sounds very reasonable. > I don't think we can do much better than this during planning. There will > inevitably be cases where the costing model will push us to do the wrong > thing, in either direction. The question is how serious issue that is, and > whether we could remedy that during execution somehow. > > When we "incorrectly" pick full sort (when the incremental sort would be > faster), that's obviously sad but I think it's mostly fine because it's > not a regression. > > The opposite direction (picking incremental sort, while full sort being > faster) is probably more serious, because it's a regression between > releases. > > I don't think we can fully fix that by refining the cost model. We have > two basic options: > > 1) Argue/show that this is not an issue in practice, because (a) such > cases are very rare, and/or (b) the regression is limited. In short, the > benefits of the patch outweight the losses. My comments above go in this direction. If we can improve performance in the worst case, I think it's plausible this concern becomes a non-issue. > 2) Provide some fallback at execution time. For example, we might watch > the size of the group, and if we run into an unexpectedly large one we > might abandon the incremental sort and switch to a "full sort" mode. Are there good examples of our doing this in other types of nodes (whether the fallback is an entirely different algorithm/node type)? I like this idea in theory, but I also think it's likely it would add a very significant amount of complexity. The other problem is knowing where to draw the line: you end up creating these kinds of cliffs where pulling one more tuple through the incremental sort would give you your batch and result in not having to pull many more tuples in a regular sort node, but the fallback logic kicks in anyway. Unrelated to all of the above: if I read the patch properly it intentionally excludes backwa
Re: How do we support FULL JOIN on PostGIS types?
=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= writes: > Thanks a lot RhodiumToad on IRC for suggestion of setting HASHES, MERGES on > OPERATOR =. > Now we have other problem: how do we set these flags on upgrade to new > version of extension? Dropping an OPERATOR = will drop all indexes an views > depending on it so isn't really an option. I think you're going to have to use a direct UPDATE on pg_operator in the extension update script :-(. Perhaps ALTER OPERATOR should be able to handle changing these flags, but for now it can't. regards, tom lane
Binary support for pgoutput plugin
Is there a reason why pgoutput sends data in text format? Seems to me that sending data in binary would provide a considerable performance improvement. Dave Cramer
Re: Dead stores in src/common/sha2.c
Heikki Linnakangas writes: > On 29/05/2019 18:47, Hamlin, Garick L wrote: >> On Wed, May 29, 2019 at 11:01:05AM -0400, Tom Lane wrote: >>> At the same time, I'm not sure if we should just write this off as an >>> ignorable warning. If the C compiler concludes these are dead stores >>> it'll probably optimize them away, leading to not accomplishing the >>> goal of wiping the values. >> Yeah, I mean it's odd to put code in to zero/hide state knowing it's >> probably optimized out. >> We could also take it out, but maybe it does help somewhere? >> ... or put in a comment that says: This probably gets optimized away, but >> we don't consider it much of a risk. > There's a function called explicit_bzero() in glibc, for this purpose. > See > https://www.gnu.org/software/libc/manual/html_node/Erasing-Sensitive-Data.html. > > It's not totally portable, but it's also available in some BSDs, at > least. That documentation mentions the possibility that it might force > variables to be stored in memory that would've otherwise been kept only > in registers, but says that in most situations it's nevertheless better > to use explicit_bero() than not. So I guess we should use that, if it's > available. Meh. After looking at this closer, I'm convinced that doing anything that might force the variables into memory would be utterly stupid. Aside from any performance penalty we'd take, that would make their values more observable not less so. In any case, though, it's very hard to see why we should care in the least. Somebody who can observe the contents of server memory (or application memory, in the case of frontend usage) can surely extract the input or output of the SHA2 transform, which is going to be way more useful than a few intermediate values. So I think we should either do nothing, or suppress this warning by commenting out the variable-zeroing. (Note that an eyeball scan finds several more dead zeroings besides the ones in Garick's patch. Some of them are ifdef'd out ...) regards, tom lane
WITH NOT MATERIALIZED and DML CTEs
Currently, WITH a AS NOT MATERIALIZED (INSERT ...) would silently disregard the "NOT MATERIALIZED" instruction and execute the data- modifying CTE to completion (as per the long-standing DML CTE rule). This seems like an omission to me. Ideally, the presence of an explicit "NOT MATERIALIZED" clause on a data-modifying CTE should disable the "run to completion" logic. It is understandably late in the 12 cycle, so maybe prohibit NOT MATERIALIZED with DML altogheter and revisit this in 13? Thoughts? Elvis
Re: WITH NOT MATERIALIZED and DML CTEs
Hi, On 2019-06-03 11:45:51 -0400, Elvis Pranskevichus wrote: > Currently, WITH a AS NOT MATERIALIZED (INSERT ...) would silently > disregard the "NOT MATERIALIZED" instruction and execute the data- > modifying CTE to completion (as per the long-standing DML CTE rule). > > This seems like an omission to me. Ideally, the presence of an explicit > "NOT MATERIALIZED" clause on a data-modifying CTE should disable the > "run to completion" logic. I don't see us ever doing that. The result of minor costing and other planner changes would yield different updated data. That'll just create endless bug reports. > It is understandably late in the 12 cycle, so maybe prohibit NOT > MATERIALIZED with DML altogheter and revisit this in 13? I could see us adding an error, or just continuing to silently ignore it. Greetings, Andres Freund
undo: zedstore vs. zheap
Hi, On Saturday, we had a nice in-person conversation about the requirements that zedstore has for an undo facility vs. the requirements that zheap has vs. the current design of the undo patch set. The people present were: Heikki Linnakangas, Amit Kapila, Thomas Munro, Kuntal Ghosh, Andres Freund, and me. The remainder of this email consists of the notes that I took during that conversation, with some subsequent editing that I did to try to make them more broadly understandable. zedstore needs a very fast way to determine whether an undo pointer is old enough that it doesn't matter. Perhaps we should keep a backend-local cache of discard pointers so that we can test an undo pointer against the discard horizon without needing to touch shared memory. zedstore needs a very fast way of obtaining the XID when it looks up an undo pointer. It seems inefficient to store the full XID (especially an 8-byte FullTransactionId) in every undo record, but storing it only in the transaction header makes getting the XID prohibitively expensive. Heikki had the idea of storing a flag in each undo record saying 'same X as first tuple as on the page', where X might be XID, CID, relation OID, block number, etc. That seems like a good idea. We'd need to decide exactly how many such flags to have and which fields they cover; and it's probably best not to depend on an undo record for another transaction that might be independently discarded. Another option is to put a "default" for each of these values in the page header and then store a value in individual undo records only if it differs from the value in the page header. We don't have a way to do similar optimization for whatever individual clients of the undo machinery choose to store in the undo record's payload, which might be nice if we had a good idea how to do it. zedstore intends to store an undo pointer per tuple, whereas the current zheap code stores an undo pointer per transaction slot. Therefore, zheap wants an undo record to point to the previous undo record for that transaction slot; whereas zedstore wants an undo record to point to the previous undo record for the same tuple (which might not belong to the same transaction). The undo code that assumes per-block chaining (uur_blkprev) needs to be made more generic. For either zedstore or zheap, newly-inserted tuples could potentially point to an XID/CID rather than an undo record pointer, because visibility checks don't need to look at the actual undo record. An undo record still needs to be written in case we abort, but we don't need to be able to refer to it for visibility purposes. zedstore and zheap have different batching requirements. Both want to process undo records in TID order, but zedstore doesn't care about pages. The current undo patch set calls the RMGR-specific callback once per page; that needs to be generalized. Is it possible that some other table AM might want to sort by something other than TID? Right now, zedstore stores a separate btree for each column, and an extra btree for the visibility information, but it could have column groups instead. If you put all of the columns and the visibility information into a single column group, you'd have a row store. How would the performance of that solution compare with zheap? Maybe zheap should forget about having transaction slots, and just store an undo pointer in each tuple. That would be worse in cases such as bulk loading, where all the tuples on the page are modified by the same transaction ID. We could optimize that case by using something like a transaction slot, but then there's a problem if, say, every tuple on the page is updated or deleted by a separate transaction: the tuples need to get bigger to accommodate separate undo pointers, and the page might overflow. Perhaps we should design a solution that allows for the temporary use of overflow space in such cases. Tuple locking is complicated for both zheap and zedstore. Storing the locker XIDs in the tuple or page is tempting, but you can run out of space; where do you put the extras? Writing an undo record per lock is also tempting, but that could generate a very large amount of undo if there are many transactions that are each locking a whole table, whereas the heap doesn't have that problem, because it uses MultiXacts. On the other hand, in the current heap, it's easily possible for N transactions to burn through O(N^2) MultiXactIds, which is worse than an undo record per lock. A perfect system would avoid permanently bloating the table when many transactions each take many locks, but it's hard to design a system that has that property AND ALSO is maximally compact for a table that is bulk-loaded and then never updated again. (Perhaps it's OK for the table to expand a little bit if we find that we need to make space for a MultiXactId pointer or similar in each tuple? Thomas later dubbed this doubtless-uncontroversial design non-in-place select, and we're all
Re: WITH NOT MATERIALIZED and DML CTEs
On Monday, June 3, 2019 11:50:15 A.M. EDT Andres Freund wrote: > > This seems like an omission to me. Ideally, the presence of an > > explicit "NOT MATERIALIZED" clause on a data-modifying CTE should > > disable the "run to completion" logic. > > I don't see us ever doing that. The result of minor costing and other > planner changes would yield different updated data. That'll just > create endless bug reports. I understand why the rule exists in the first place, but I think that an explicit opt-in signals the assumption of responsibility and opens the possibility of using this in a well-defined evaluation context, such as CASE WHEN. Elvis
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
On Fri, May 31, 2019 at 2:59 AM Antonin Houska wrote: > > Sounds good. I'm not quite sure how this is going to work. Ideally > > you'd like the encryption key command to fetch the key from something > > like ssh-agent, or maybe pop up a window on the user's terminal with a > > key prompt. Just reading from stdin and writing to stdout is not > > going to be very convenient... > > When I mentioned writing to stdout in my previous email, I viewed it from the > perspective of postmaster: whichever way the command gets the password from > the DBA, it'll always write it to stdout and postmaster will read it from > there. This was to address your objection that the executables do not actually > "return" strings. So the part about returning strings is really just a wording issue: the documentation needs to talk about data sent to stdout or wherever, because that's what really happens, and somebody writing such a command from scratch needs to know what it must do. What I'm talking about here is that it also has to be reasonably possible to write an encryption key command that does something useful. I don't have a really clear vision for how that's going to work. Nobody wants the server, which is probably being launched by pg_ctl or systemd or both, to prompt using its own stdin/stderr, but the we need to think about how the prompting is actually going to work. One idea is to do it via the FEBE protocol: connect to the server using libpq, issue a new command that causes the server to enter COPY mode, and then send the encryption key as COPY data. However, that idea doesn't really work, because we've got to be able to get the key before we run recovery or reach consistency, so the server won't be listening for connections at that point. Also, we've got to have a way for this to work in single-user mode, which also can't listen for connections. It's probably all fine if your encryption key command is something like 'curl https://my.secret.keyserver.me/sekcret.key' because then there's an external server which you can just go access - but I don't quite understand how you'd do interactive prompting from here. Sorry to get hung up on what may seem like a minor point, but I think it's actually fairly fundamental: we've got to have a clear vision of how end-users will really be able to make use of this. > The header comment in pg_upgrade.c indicates that this is because of TOAST. So > I think that dbnode and relfilenode are not preserved just because there was > no strong reason to do so by now. Maybe you want to propose an independent patch making that change? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: WITH NOT MATERIALIZED and DML CTEs
Elvis Pranskevichus writes: > On Monday, June 3, 2019 11:50:15 A.M. EDT Andres Freund wrote: >>> This seems like an omission to me. Ideally, the presence of an >>> explicit "NOT MATERIALIZED" clause on a data-modifying CTE should >>> disable the "run to completion" logic. >> I don't see us ever doing that. The result of minor costing and other >> planner changes would yield different updated data. That'll just >> create endless bug reports. > I understand why the rule exists in the first place, but I think that an > explicit opt-in signals the assumption of responsibility and opens the > possibility of using this in a well-defined evaluation context, such as > CASE WHEN. TBH, if you think it's well-defined, you're wrong. I concur with Andres that turning off run-to-completion for DMLs would be disastrous. For just one obvious point, what about firing AFTER triggers? It's already the case that the planner will silently ignore NOT MATERIALIZED for other cases where it can't inline the CTE for semantic or implementation reasons -- see comments in SS_process_ctes(). I see no good reason to treat the DML exception much differently from other exceptions, such as presence of volatile functions or recursion. regards, tom lane
Re: WITH NOT MATERIALIZED and DML CTEs
On Monday, June 3, 2019 12:09:46 P.M. EDT Tom Lane wrote: > > I understand why the rule exists in the first place, but I think > > that an explicit opt-in signals the assumption of responsibility > > and opens the possibility of using this in a well-defined > > evaluation context, such as CASE WHEN. > > TBH, if you think it's well-defined, you're wrong. The documentation seems to strongly suggest otherwise: "When it is essential to force evaluation order, a CASE construct (see Section 9.17) can be used. ... CASE construct used in this fashion will defeat optimization attempts" Are there cases where this is not true outside of the documented exceptions (i.e. immutable early-eval and aggregates)? Elvis
Re: WITH NOT MATERIALIZED and DML CTEs
Elvis Pranskevichus writes: > On Monday, June 3, 2019 12:09:46 P.M. EDT Tom Lane wrote: >>> I understand why the rule exists in the first place, but I think >>> that an explicit opt-in signals the assumption of responsibility >>> and opens the possibility of using this in a well-defined >>> evaluation context, such as CASE WHEN. >> TBH, if you think it's well-defined, you're wrong. > The documentation seems to strongly suggest otherwise: > "When it is essential to force evaluation order, a CASE construct (see > Section 9.17) can be used. ... CASE construct used in this fashion will > defeat optimization attempts" > Are there cases where this is not true outside of the documented > exceptions (i.e. immutable early-eval and aggregates)? CASE is a scalar-expression construct. It's got little to do with the timing of scan/join operations such as row fetches. We offer users essentially no control over when those happen ... other than the guarantees about CTE materialization, which are exactly what you say you want to give up. regards, tom lane
Re: Fix inconsistencies for v12
Amit Langote writes: > On 2019/05/30 18:51, Amit Kapila wrote: >> I think it will be better to include postgres_fdw in the comment in >> some way so that if someone wants a concrete example, there is >> something to refer to. > Maybe a good idea, but this will be the first time to mention postgres_fdw > in the core source code. If you think that's OK, how about the attached? This wording seems fine to me. Now that we've beat that item into the ground ... there were a bunch of other tweaks suggested in Alexander's initial email. Amit (K), were you going to review/commit those? regards, tom lane
Re: WITH NOT MATERIALIZED and DML CTEs
On Monday, June 3, 2019 1:03:44 P.M. EDT Tom Lane wrote: > CASE is a scalar-expression construct. It's got little to do with > the timing of scan/join operations such as row fetches. We offer > users essentially no control over when those happen ... other than > the guarantees about CTE materialization, which are exactly what > you say you want to give up. In the general case, yes, but I *can* use a scalar-returning INSERT CTE in a THEN clause as a subquery. Useful for a conditional INSERT, when you can't use ON CONFLICT. Anyway, I understand that the complications are probably not worth it. Thanks, Elvis
Re: Index Skip Scan
Hi Floris, On 6/1/19 12:10 AM, Floris Van Nee wrote: Given a table definition of (market text, feedcode text, updated_at timestamptz, value float8) and an index on (market, feedcode, updated_at desc) (note that this table slightly deviates from what I described in my previous mail) and filling it with data. The following query uses an index skip scan and returns just 1 row (incorrect!) select distinct on (market, feedcode) market, feedcode from streams.base_price where market='TEST' The following query still uses the regular index scan and returns many more rows (correct) select distinct on (market, feedcode) * from streams.base_price where market='TEST' It seems that partially filtering on one of the distinct columns triggers incorrect behavior where too many rows in the index are skipped. Thanks for taking a look at the patch, and your feedback on it. I'll def look into this once I'm back from my travels. Best regards, Jesper
Re: Index Skip Scan
Hi Rafia, On 6/1/19 6:03 AM, Rafia Sabih wrote: Here is my repeatable test case, create table t (market text, feedcode text, updated_at timestamptz, value float8) ; create index on t (market, feedcode, updated_at desc); insert into t values('TEST', 'abcdef', (select timestamp '2019-01-10 20:00:00' + random() * (timestamp '2014-01-20 20:00:00' - timestamp '2019-01-20 20:00:00') ), generate_series(1,100)*9.88); insert into t values('TEST', 'jsgfhdfjd', (select timestamp '2019-01-10 20:00:00' + random() * (timestamp '2014-01-20 20:00:00' - timestamp '2019-01-20 20:00:00') ), generate_series(1,100)*9.88); Now, without the patch, select distinct on (market, feedcode) market, feedcode from t where market='TEST'; market | feedcode +--- TEST | abcdef TEST | jsgfhdfjd (2 rows) explain select distinct on (market, feedcode) market, feedcode from t where market='TEST'; QUERY PLAN Unique (cost=12.20..13.21 rows=2 width=13) -> Sort (cost=12.20..12.70 rows=201 width=13) Sort Key: feedcode -> Seq Scan on t (cost=0.00..4.51 rows=201 width=13) Filter: (market = 'TEST'::text) (5 rows) And with the patch, select distinct on (market, feedcode) market, feedcode from t where market='TEST'; market | feedcode +-- TEST | abcdef (1 row) explain select distinct on (market, feedcode) market, feedcode from t where market='TEST'; QUERY PLAN Index Only Scan using t_market_feedcode_updated_at_idx on t (cost=0.14..0.29 rows=2 width=13) Scan mode: Skip scan Index Cond: (market = 'TEST'::text) (3 rows) Notice that in the explain statement it shows correct number of rows to be skipped. Thanks for your test case; this is very helpful. For now, I would like to highlight that SET enable_indexskipscan = OFF can be used for testing with the patch applied. Dmitry and I will look at the feedback provided. Best regards, Jesper
Re: Pinned files at Windows
On Thu, May 30, 2019 at 3:25 AM Konstantin Knizhnik wrote: > If call of stat() is succeed, then my assumption is that the only reason > of GetFileAttributesEx > failure is that file is deleted and returning ENOENT error code in this > case is correct behavior. In my experience, the assumption "the only possible cause of an error during X is Y" turns out to be wrong nearly 100% of the time. Our job is to report the errors the OS gives us, not guess what they mean. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: BUG #15821: Parallel Workers with functions and auto_explain: ERROR: could not find key 3 in shm TOC
PG Bug reporting form writes: > We have enabled auto_explain and see errors on PostgreSQL 11.3 when > SELECTing from a user defined function. No such crashes have been > observed on 10.7. I think that you didn't give a complete dump of relevant settings, but after some fooling around I was able to reproduce this error, and the cause is this: auto_explain hasn't a single clue about parallel query. 1. In the parent process, we have a parallelizable hash join being executed in a statement inside a function. Since auto_explain.log_nested_statements is not enabled, auto_explain does not deem that it should trace the statement, so the query starts up with estate->es_instrument = 0, and therefore ExecHashInitializeDSM chooses not to create any shared SharedHashInfo area. 2. In the worker processes, auto_explain manages to grab execution control when ParallelQueryMain calls ExecutorStart, thanks to being in ExecutorStart_hook. Having no clue what's going on, it decides that this is a new top-level query that it should trace, and it sets some bits in queryDesc->instrument_options. 3. When the workers get to ExecHashInitializeWorker, they see that instrumentation is active so they try to look up the SharedHashInfo. Kaboom. I'm inclined to think that explain_ExecutorStart should simply keep its hands off of everything when in a parallel worker; if instrumentation is required, that'll be indicated by options passed down from the parent process. It looks like this could conveniently be merged with the rate-sampling logic by forcing current_query_sampled to false when IsParallelWorker(). Likely this should be back-patched all the way to 9.6. I'm not sure how we managed to avoid noticing it before now, but there are probably ways to cause visible trouble in any release that has any parallel query support. regards, tom lane
Fix runtime errors from -fsanitize=undefined
After many years of trying, it seems the -fsanitize=undefined checking in gcc is now working somewhat reliably. Attached is a patch that fixes all errors of the kind runtime error: null pointer passed as argument N, which is declared to never be null Most of the cases are calls to memcpy(), memcmp(), etc. with a length of zero, so one appears to get away with passing a null pointer. Note that these are runtime errors, not static analysis, so the code in question is actually reached. To reproduce, configure normally and then set COPT=-fsanitize=undefined -fno-sanitize=alignment -fno-sanitize-recover=all and build and run make check-world. Unpatched, this will core dump in various places. (-fno-sanitize=alignment should also be fixed but I took it out here to deal with it separately.) See https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html for further documentation. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 4b657d1af4a8518218be207024b123cfa6d799d8 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 3 Jun 2019 20:55:35 +0200 Subject: [PATCH] Fix runtime errors from -fsanitize-undefined These are of the form runtime error: null pointer passed as argument N, which is declared to never be null --- contrib/pgcrypto/px.c | 2 +- src/backend/access/heap/heapam.c| 2 +- src/backend/access/heap/heapam_visibility.c | 2 ++ src/backend/access/transam/clog.c | 1 + src/backend/access/transam/xact.c | 5 +++-- src/backend/commands/indexcmds.c| 3 ++- src/backend/utils/cache/relcache.c | 8 ++-- src/backend/utils/misc/guc.c| 2 +- src/backend/utils/time/snapmgr.c| 10 ++ src/fe_utils/print.c| 3 ++- 10 files changed, 25 insertions(+), 13 deletions(-) diff --git a/contrib/pgcrypto/px.c b/contrib/pgcrypto/px.c index 0f02fb56c4..ec9cf99086 100644 --- a/contrib/pgcrypto/px.c +++ b/contrib/pgcrypto/px.c @@ -200,7 +200,7 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned klen, memset(ivbuf, 0, ivs); if (ivlen > ivs) memcpy(ivbuf, iv, ivs); - else + else if (ivlen > 0) memcpy(ivbuf, iv, ivlen); } diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 419da8784a..52e3782282 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -307,7 +307,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock) /* * copy the scan key, if appropriate */ - if (key != NULL) + if (key != NULL && scan->rs_base.rs_nkeys) memcpy(scan->rs_base.rs_key, key, scan->rs_base.rs_nkeys * sizeof(ScanKeyData)); /* diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c index 537e681b23..9c291cd1ab 100644 --- a/src/backend/access/heap/heapam_visibility.c +++ b/src/backend/access/heap/heapam_visibility.c @@ -1520,6 +1520,8 @@ HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple) static bool TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num) { + if (!xip) + return false; return bsearch(&xid, xip, num, sizeof(TransactionId), xidComparator) != NULL; } diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 47db7a8a88..4324229a61 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -294,6 +294,7 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids, * on the same page. Check those conditions, too. */ if (all_xact_same_page && xid == MyPgXact->xid && + nsubxids > 0 && nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT && nsubxids == MyPgXact->nxids && memcmp(subxids, MyProc->subxids.xids, diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index f1108ccc8b..976ab5a950 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -5201,8 +5201,9 @@ SerializeTransactionState(Size maxsize, char *start_address) { if (FullTransactionIdIsValid(s->fullTransactionId)) workspace[i++] = XidFromFullTransactionId(s->fullTransactionId); - memcpy(&workspace[i], s->childXids, - s->nChildXids * sizeof(TransactionId)); + if (s->nChildXids > 0) + memcpy(&workspace[i], s->childXids, + s->nChildXids * sizeof(TransactionId)); i += s->nChildXids; } Assert(i == nxids); diff --git a/src/backend/commands/indexcmds.c b/src/backend/co
Re: Question about some changes in 11.3
Mat Arye writes: > On Fri, May 24, 2019 at 5:05 PM Mat Arye wrote: >> 11.3 included some change to partition table planning. Namely >> commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is >> known empty.") seems to redo all paths for partitioned tables >> in apply_scanjoin_target_to_paths. It clears the paths in: >> >> ``` >> if (rel_is_partitioned) >> rel->pathlist = NIL >> ``` >> >> Then the code rebuild the paths. However, the rebuilt append path never >> gets the >> set_rel_pathlist_hook called. Thus, the work that hook did previously gets >> thrown away and the rebuilt append path can never be influenced by this >> hook. Is this intended behavior? Am I missing something? Hm. I'd say this was already broken by the invention of apply_scanjoin_target_to_paths; perhaps 11-before-11.3 managed to still work for you, but it's not hard to envision applications of set_rel_pathlist_hook for which it would not have worked. The contract for set_rel_pathlist_hook is supposed to be that it gets to editorialize on all normal (non-Gather) paths created by the core code, and that's no longer the case now that apply_scanjoin_target_to_paths can add more. > I've attached a small patch to address this discrepancy for when the > set_rel_pathlist_hook is called so that's it is called for actual paths > used for partitioned rels. Please let me know if I am misunderstanding how > this should be handled. I'm not very happy with this patch either, as it makes the situation even more confused, not less so. The best-case scenario is that the set_rel_pathlist_hook runs twice and does useless work; the worst case is that it gets confused completely by being called twice for the same rel. I think we need to maintain the invariant that that hook is called exactly once per baserel. I wonder whether we could fix matters by postponing the set_rel_pathlist_hook call till later in the same cases where we postpone generate_gather_paths, ie, it's the only baserel. That would make its name pretty misleading, though. Maybe we should leave it alone and invent a separate hook to be called by/after apply_scanjoin_target_to_paths? Although I don't know if it'd be useful to add a new hook to v11 at this point. Extensions would have a hard time knowing if they could use it. regards, tom lane
Re: Index Skip Scan
> On Sat, Jun 1, 2019 at 6:57 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > On Sat, Jun 1, 2019 at 5:34 PM Floris Van Nee > > wrote: > > > > I did a little bit of investigation and it seems to occur because in > > pathkeys.c the function pathkey_is_redundant considers pathkeys redundant if > > there is an equality condition with a constant in the corresponding WHERE > > clause. > > ... > > However, the index skip scan interprets this as that it has to skip over > > just > > the first column. > > Right, passing correct number of columns fixes this particular problem. But > while debugging I've also discovered another related issue, when the current > implementation seems to have a few assumptions, that are not correct if we > have > an index condition and a distinct column is not the first in the index. I'll > try to address these in a next version of the patch in the nearest future. So, as mentioned above, there were a few problems, namely the number of distinct_pathkeys with and without redundancy, and using _bt_search when the order of distinct columns doesn't match the index. As far as I can see the problem in the latter case (when we have an index condition) is that it's still possible to find a value, but lastItem value after the search is always zero (due to _bt_checkkeys filtering) and _bt_next stops right away. To address this, probably we can do something like in the attached patch. Altogether with distinct_pathkeys uniq_distinct_pathkeys are stored, which is the same, but without the constants elimination. It's being used then for getting the real number of distinct keys, and to check the order of the columns to not consider index skip scan if it's different. Hope it doesn't look too hacky. Also I've noticed, that the current implementation wouldn't work e.g. for: select distinct a, a from table; because in this case an IndexPath is hidden behind a ProjectionPath. For now I guess it's fine, but probably it's possible here to apply skip scan too. v17-0001-Index-skip-scan.patch Description: Binary data
Re: Pinned files at Windows
On 03.06.2019 22:15, Robert Haas wrote: On Thu, May 30, 2019 at 3:25 AM Konstantin Knizhnik wrote: If call of stat() is succeed, then my assumption is that the only reason of GetFileAttributesEx failure is that file is deleted and returning ENOENT error code in this case is correct behavior. In my experience, the assumption "the only possible cause of an error during X is Y" turns out to be wrong nearly 100% of the time. Our job is to report the errors the OS gives us, not guess what they mean. This is what we are try to do now: r = stat(path, buf); if (r < 0) { if (GetLastError() == ERROR_DELETE_PENDING) { /* * File has been deleted, but is not gone from the filesystem yet. * This can happen when some process with FILE_SHARE_DELETE has it * open and it will be fully removed once that handle is closed. * Meanwhile, we can't open it, so indicate that the file just * doesn't exist. */ errno = ENOENT; return -1; } return r; } but without success because ERROR_DELETE_PENDING is never returned by Win32. And moreover, stat() doesn't ever return error in this case.
Re: Sort support for macaddr8
On 6/3/19 3:23 PM, Melanie Plageman wrote: > Peter and I implemented this small (attached) patch to extend > abbreviated key compare sort to macaddr8 datatype (currently supported > for macaddr). Am I going cross-eyed, or would the memset be serving more of a purpose if it were in the SIZEOF_DATUM != 8 branch? Regards, -Chap
Re: Avoiding hash join batch explosions with extreme skew and weird stats
On Sun, May 19, 2019 at 4:07 PM Thomas Munro wrote: > On Sat, May 18, 2019 at 12:15 PM Melanie Plageman > wrote: > > On Thu, May 16, 2019 at 3:22 PM Thomas Munro > wrote: > >> Admittedly I don't have a patch, just a bunch of handwaving. One > >> reason I haven't attempted to write it is because although I know how > >> to do the non-parallel version using a BufFile full of match bits in > >> sync with the tuples for outer joins, I haven't figured out how to do > >> it for parallel-aware hash join, because then each loop over the outer > >> batch could see different tuples in each participant. You could use > >> the match bit in HashJoinTuple header, but then you'd have to write > >> all the tuples out again, which is more IO than I want to do. I'll > >> probably start another thread about that. > > > > Could you explain more about the implementation you are suggesting? > > > > Specifically, what do you mean "BufFile full of match bits in sync with > the > > tuples for outer joins?" > > First let me restate the PostgreSQL terminology for this stuff so I > don't get confused while talking about it: > > * The inner side of the join = the right side = the side we use to > build a hash table. Right and full joins emit inner tuples when there > is no matching tuple on the outer side. > > * The outer side of the join = the left side = the side we use to > probe the hash table. Left and full joins emit outer tuples when > there is no matching tuple on the inner side. > > * Semi and anti joins emit exactly one instance of each outer tuple if > there is/isn't at least one match on the inner side. > > We have a couple of relatively easy cases: > > * Inner joins: for every outer tuple, we try to find a match in the > hash table, and if we find one we emit a tuple. To add looping > support, if we run out of memory when loading the hash table we can > just proceed to probe the fragment we've managed to load so far, and > then rewind the outer batch, clear the hash table and load in the next > work_mem-sized fragment and do it again... rinse and repeat until > we've eventually processed the whole inner batch. After we've > finished looping, we move on to the next batch. > > * For right and full joins ("HJ_FILL_INNER"), we also need to emit an > inner tuple for every tuple that was loaded into the hash table but > never matched. That's done using a flag HEAP_TUPLE_HAS_MATCH in the > header of the tuples of the hash table, and a scan through the whole > hash table at the end of each batch to look for unmatched tuples > (ExecScanHashTableForUnmatched()). To add looping support, that just > has to be done at the end of every inner batch fragment, that is, > after every loop. > > And now for the cases that need a new kind of match bit, as far as I can > see: > > * For left and full joins ("HJ_FILL_OUTER"), we also need to emit an > outer tuple for every tuple that didn't find a match in the hash > table. Normally that is done while probing, without any need for > memory or match flags: if we don't find a match, we just spit out an > outer tuple immediately. But that simple strategy won't work if the > hash table holds only part of the inner batch. Since we'll be > rewinding and looping over the outer batch again for the next inner > batch fragment, we can't yet say if there will be a match in a later > loop. But the later loops don't know on their own either. So we need > some kind of cumulative memory between loops, and we only know which > outer tuples have a match after we've finished all loops. So there > would need to be a new function ExecScanOuterBatchForUnmatched(). > > * For semi joins, we need to emit exactly one outer tuple whenever > there is one or more match on the inner side. To add looping support, > we need to make sure that we don't emit an extra copy of the outer > tuple if there is a second match in another inner batch fragment. > Again, this implies some kind of memory between loops, so we can > suppress later matches. > > * For anti joins, we need to emit an outer tuple whenever there is no > match. To add looping support, we need to wait until we've seen all > the inner batch fragments before we know that a given outer tuple has > no match, perhaps with the same new function > ExecScanOuterBatchForUnmatched(). > > So, we need some kind of inter-loop memory, but we obviously don't > want to create another source of unmetered RAM gobbling. So one idea > is a BufFile that has one bit per outer tuple in the batch. In the > first loop, we just stream out the match results as we go, and then > somehow we OR the bitmap with the match results in subsequent loops. > After the last loop, we have a list of unmatched tuples -- just scan > it in lock-step with the outer batch and look for 0 bits. > > Unfortunately that bits-in-order scheme doesn't work for parallel > hash, where the SharedTuplestore tuples seen by each worker are > non-deterministic. So perhaps in that case we could use the >
Re: initdb recommendations
Greetings, * Noah Misch (n...@leadboat.com) wrote: > On Tue, May 28, 2019 at 12:15:35PM -0400, Magnus Hagander wrote: > > On Fri, May 24, 2019 at 11:24 AM Noah Misch wrote: > > > On Thu, May 23, 2019 at 06:56:49PM +0200, Magnus Hagander wrote: > > > > On Thu, May 23, 2019, 18:54 Peter Eisentraut > > > > wrote: > > > > > To recap, the idea here was to change the default authentication > > > > > methods > > > > > that initdb sets up, in place of "trust". > > > > > > > > > > I think the ideal scenario would be to use "peer" for local and some > > > > > appropriate password method (being discussed elsewhere) for host. > > > > > > > > > > Looking through the buildfarm, I gather that the only platforms that > > > > > don't support peer are Windows, AIX, and HP-UX. I think we can > > > > > probably > > > > > figure out some fallback or alternative default for the latter two > > > > > platforms without anyone noticing. But what should the defaults be on > > > > > Windows? It doesn't have local sockets, so the lack of peer wouldn't > > > > > matter. But is it OK to default to a password method, or would that > > > > > upset people particularly? > > > > > > > > I'm sure password would be fine there. It's what "everybody else" does > > > > (well sqlserver also cord integrated security, but people are used to > > > > it). > > > > > > Our sspi auth is a more-general version of peer auth, and it works over > > > TCP. > > > It would be a simple matter of programming to support "peer" on Windows, > > > consisting of sspi auth with an implicit pg_ident map. Nonetheless, I > > > agree > > > password would be fine. > > > > I hope oyu don't mean "make peer use sspi on windows". I think that's a > > really bad idea from a confusion perspective. > > I don't mean "make peer an alias for SSPI", but I do mean "implement peer on > Windows as a special case of sspi, using the same Windows APIs". To the > client, "peer" would look like "sspi". If that's confusion-prone, what's > confusing about it? I tend to agree with Magnus here. It's confusing because 'peer' in our existing parlance discusses connections over a unix socket, which certainly isn't what's happening on Windows. I do agree with the general idea of making SSPI work by default on Windows. > > However, what we could do there is have the defaut pg_hba.conf file contain > > a "reasonable setup using sspi" that's a different story. > > That's another way to do it. Currently, to behave like "peer" behaves, one > hard-codes the machine's SSPI realm into pg_ident.conf. If we introduced > pg_ident.conf syntax to remove that need (e.g. %MACHINE_REALM%), that approach > would work. I would be in favor of something like this, provided the variables are defined in such a way that we could avoid conflicting with real values (and remember that you'd need a regexp in pg_ident.conf for this to work...). %xyz%, while supporting %% to mean a literal percent, seems likely to work. Not sure if that's what you were thinking though. > > But I wonder if that isn't better implemented at the installer level. I > > think we're better off doing something like scram as the config when you > > build from source ,and then encourage installers to do other things based on > > the fact that they know more information about the setup (such as usernames > > actually used). > > If initdb has the information needed to configure the recommended > authentication, that's the best place to do it, since there's one initdb and > many installers. So far, I haven't seen a default auth configuration proposal > involving knowledge of OS usernames or other information initdb lacks. I agree with doing it at initdb time. Note that the current default auth configuration (to some extent) does depend on the OS username- but that's also something that initdb knows, and therefore it isn't an issue here. I don't see a reason that we wouldn't be able to have initdb handle this. Thanks! Stephen signature.asc Description: PGP signature
Re: Sort support for macaddr8
On Mon, Jun 3, 2019 at 1:17 PM Melanie Plageman wrote: > I tried checking to see if there is a performance difference using the > attached DDL based on src/test/regress/sql/macaddr8.sql. I found > that the sort function is only exercised when creating an index (not, > for example, when doing some type of aggregation). As you know, it's a bit weird that we're proposing adding sort support with abbreviated keys for a type that is 8 bytes, since you'd expect it to also be pass-by-value on most platforms, which largely defeats the purpose of having abbreviated keys (though sort support could still make sense, for the same reason it makes sense to have it for int8). However, macaddr8 isn't actually pass-by-value, and it seems too late to do anything about that now, so abbreviated keys actually make sense. > With the patch applied to current master and using the DDL attached, > the timing for creating the index hovered around 20 ms for master and > 15 ms for the patched version. I would expect a sufficiently large sort to execute in about half the time with abbreviation, based on previous experience. However, I think that this patch can be justified in a relatively straightforward way. It extends sort support for macaddr to macaddr8, since these two types are almost identical in every other way. We should still validate the performance out of an abundance of caution, but I would be very surprised if there was much difference between the macaddr and macaddr8 cases. In short, users should not be surprised by the big gap in performance between macaddr and macaddr8. It's worth being consistent there. > I think that that seems like an improvement. I was thinking of > registering this patch for the next commitfest. Is that okay? Definitely, yes. > I was just wondering what the accepted way to test and share > performance numbers is for a small patch like this. Is sharing DDL > enough? Do I need to use pg_bench? I've always used custom microbenchmarks for stuff like this. Repeatedly executing a particular query and taking the median execution time as representative seems to be the best approach. -- Peter Geoghegan
[PATCH] ruleutils: Fix subqueries with shadowed aliases
Discovered while looking into issue here: https://github.com/citusdata/citus/pull/2733 For completeness I'll quote the example code to demonstrate the issue: postgres=# create table events_table (id integer primary key, user_id integer); CREATE TABLE postgres=# create table users_table_ref (id integer primary key, value_2 integer); CREATE TABLE postgres=# create view asdf as SELECT r FROM (SELECT user_id_deep, random() as r -- prevent pulling up the subquery FROM (events_table INNER JOIN users_table_ref ON (events_table.user_id = users_table_ref.value_2)) AS join_alias(user_id_dee p)) AS bar, (events_table INNER JOIN users_table_ref ON (events_table.user_id = users_table_ref.value_2)) AS join_alias(user_id_deep) WHERE (bar.user_id_deep = join_alias.user_id_deep); CREATE VIEW postgres=# \d+ asdf View "public.asdf" Column | Type | Collation | Nullable | Default | Storage | Description +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- +--+---+--+-+-+- r | double precision | | | | plain | View definition: SELECT bar.r FROM ( SELECT join_alias_1.user_id_deep, random() AS r FROM (events_table events_table_1 JOIN users_table_ref users_table_ref_1 ON events_table_1.user_id = users_table_ref_1.value_2) join_alias(user_id_deep, user_id, id, value_2)) bar, (events_table JOIN users_table_ref ON events_table.user_id = users_table_ref.value_2) join_alias(user_id_deep, user_id, id, value_2) WHERE bar.user_id_deep = join_alias.user_id_deep; Where the 2nd join_alias should be renamed to join_alias_1 0001-ruleutils-Fix-subqueries-with-shadowed-aliases.patch Description: 0001-ruleutils-Fix-subqueries-with-shadowed-aliases.patch
Re: Sort support for macaddr8
On Mon, Jun 3, 2019 at 2:03 PM Chapman Flack wrote: > Am I going cross-eyed, or would the memset be serving more of a purpose > if it were in the SIZEOF_DATUM != 8 branch? No, it wouldn't -- that's the correct place for it with the macaddr type. However, it isn't actually necessary to memset() at the equivalent point for macaddr8, since we cannot "run out of bytes from the authoritative representation" that go in the Datum/abbreviated key. I suppose that the memset() should simply be removed, since it is superfluous here. -- Peter Geoghegan
Use of multi-column gin index
Hello, we have some confusion over the planner's use of an index. Suppose we have a table "parades" with columns: "city_id" of type integer "description" of type text "start_time" of type timestamp without time zone Suppose also we have indexes: "parades_city_id_description_tsv_index" gin (city_id, to_tsvector('simple'::regconfig, description)) WHERE description IS NOT NULL "parades_city_id_start_time_index" btree (city_id, start_time) When we EXPLAIN the query SELECT * FROM "parades" WHERE ((description IS NOT NULL) AND (to_tsvector('simple', description) @@ to_tsquery('simple', 'fun')) AND ("city_id" IN ())); We get Bitmap Heap Scan on parades (cost=12691.97..18831.21 rows=2559 width=886) Recheck Cond: ((to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery) AND (description IS NOT NULL) AND (city_id = ANY ('{}'::integer[]))) -> BitmapAnd (cost=12691.97..12691.97 rows=2559 width=0) -> Bitmap Index Scan on parades_city_id_description_tsv_index (cost=0.00..2902.97 rows=229463 width=0) Index Cond: (to_tsvector('simple'::regconfig, title) @@ '''fun'''::tsquery) -> Bitmap Index Scan on parades_city_id_start_time_index (cost=0.00..9787.47 rows=565483 width=0) Index Cond: (city_id = ANY ('{}'::integer[])) When we EXPLAIN the same query but with one city_id SELECT * FROM "parades" WHERE ((description IS NOT NULL) AND (to_tsvector('simple', description) @@ to_tsquery('simple', 'fun')) AND ("city_id" IN (1))); We get Bitmap Heap Scan on parades (cost=36.20..81.45 rows=20 width=886) Recheck Cond: ((city_id = 1) AND (to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery) AND (description IS NOT NULL)) -> Bitmap Index Scan on parades_city_id_description_tsv_index (cost=0.00..36.20 rows=20 width=0) Index Cond: ((city_id = 1) AND (to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery)) This leaves us with two questions: 1. How is postgres able to use parades_city_id_description_tsv_index in the first explain result without any filter on "city_id"? 2. Why does the planner in the first query decide not to simply use parades_city_id_description_tsv_index (as in the second explain result) when the cardinality of the set of "city_id"s is high? Thanks, Jared
Re: Sort support for macaddr8
On 6/3/19 5:03 PM, Chapman Flack wrote: > On 6/3/19 3:23 PM, Melanie Plageman wrote: >> Peter and I implemented this small (attached) patch to extend >> abbreviated key compare sort to macaddr8 datatype (currently supported >> for macaddr). > > Am I going cross-eyed, or would the memset be serving more of a purpose > if it were in the SIZEOF_DATUM != 8 branch? It looks like a copy-pasto coming from mac.c, where the size of the thing to be compared isn't itself 8 bytes. With sizeof(macaddr) being 6, that original code may have had these cases in mind: - SIZEOF_DATUM is something smaller than 6 (likely 4). The whole key doesn't fit, but that's ok, because abbreviated "equality" just means to recheck with the authoritative routine. - SIZEOF_DATUM is exactly 6. Probably not a thing. - SIZEOF_DATUM is anything larger than 6 (likely 8). Needs the memset. Also, in this case, abbreviated "equality" could be taken as true equality, never needing the authoritative fallback. For macaddr8, the cases morph into these: - SIZEOF_DATUM is something smaller than 8 (likely 4). Ok; it's just an abbreviation. - SIZEOF_DATUM is exactly 8. Now an actual thing, even likely. - SIZEOF_DATUM is larger than 8. Our flying cars run postgres, and we need the memset to make sure they don't crash. This leaves me with a couple of questions: 1. (This one seems like a bug.) In the little-endian case, if SIZEOF_DATUM is smaller than the type, I'm not convinced by doing the DatumBigEndianToNative() after the memcpy(). Seems like that's too late to make sure the most-significant bytes got copied. 2. (This one seems like an API opportunity.) If it becomes common to add abbreviation support for smallish types such that (as here, when SIZEOF_DATUM >= 8), an abbreviated "equality" result is in fact authoritative, would it be worthwhile to have some way for the sort support routine to announce that fact to the caller? That could spare the caller the effort of re-checking with the authoritative routine. It could also (by making the equality case less costly) end up changing the weight assigned to the cardinality estimate in deciding whether to abbrev.. Regards, -Chap
Re: Sort support for macaddr8
On 6/3/19 5:59 PM, Chapman Flack wrote: > On 6/3/19 5:03 PM, Chapman Flack wrote: > 1. (This one seems like a bug.) In the little-endian case, if >SIZEOF_DATUM is smaller than the type, I'm not convinced by doing >the DatumBigEndianToNative() after the memcpy(). Seems like that's >too late to make sure the most-significant bytes got copied. Wait, I definitely was cross-eyed for that one. It's the abbreviated copy whose endianness varies. Never mind. Regards, -Chap
Re: Sort support for macaddr8
On Mon, Jun 3, 2019 at 2:59 PM Chapman Flack wrote: > 1. (This one seems like a bug.) In the little-endian case, if >SIZEOF_DATUM is smaller than the type, I'm not convinced by doing >the DatumBigEndianToNative() after the memcpy(). Seems like that's >too late to make sure the most-significant bytes got copied. Uh, when else would you do it? Before the memcpy()? > 2. (This one seems like an API opportunity.) If it becomes common to >add abbreviation support for smallish types such that (as here, >when SIZEOF_DATUM >= 8), an abbreviated "equality" result is in fact >authoritative, would it be worthwhile to have some way for the sort >support routine to announce that fact to the caller? That could >spare the caller the effort of re-checking with the authoritative >routine. It's possible that that would make sense, but I don't think that this patch needs to do that. There is at least one pre-existing cases that does this -- macaddr. -- Peter Geoghegan
Re: Use of multi-column gin index
Jared Rulison writes: > Hello, we have some confusion over the planner's use of an index. > ... > 1. How is postgres able to use parades_city_id_description_tsv_index in the > first explain result without any filter on "city_id"? GIN indexes don't have any particular bias towards earlier or later columns (unlike btrees). So this isn't any harder than if you'd put the index columns in the other order. > 2. Why does the planner in the first query decide not to simply use > parades_city_id_description_tsv_index (as in the second explain result) > when the cardinality of the set of "city_id"s is high? [ shrug... ] It thinks it's cheaper. Whether it's correct is impossible to say from the given data, but there is a moderately complex cost model in there. The comments for gincost_scalararrayopexpr note * A ScalarArrayOpExpr will give rise to N separate indexscans at runtime, * each of which involves one value from the RHS array, plus all the * non-array quals (if any). I haven't checked the actual execution code, but this seems to be saying that the GIN indexscan executor always does ANDs before ORs. That means that doing everything in the same GIN indexscan would require executing the to_tsvector part 50 times, so I can definitely believe that shoving the IN part to a different index and AND'ing afterwards is a better idea. (Whether the GIN executor should be made smarter to avoid that is a separate question ;-)) regards, tom lane
Re: WITH NOT MATERIALIZED and DML CTEs
On Mon, Jun 03, 2019 at 11:45:51AM -0400, Elvis Pranskevichus wrote: > Currently, WITH a AS NOT MATERIALIZED (INSERT ...) would silently > disregard the "NOT MATERIALIZED" instruction and execute the data- > modifying CTE to completion (as per the long-standing DML CTE rule). > > This seems like an omission to me. Ideally, the presence of an explicit > "NOT MATERIALIZED" clause on a data-modifying CTE should disable the > "run to completion" logic. It might be worth documenting the fact that NOT MATERIALIZED doesn't affect DML CTEs, just as it doesn't affect statements with volatile functions and recursive CTEs. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: WITH NOT MATERIALIZED and DML CTEs
David Fetter writes: > It might be worth documenting the fact that NOT MATERIALIZED doesn't > affect DML CTEs, just as it doesn't affect statements with volatile > functions and recursive CTEs. We already do: However, if a WITH query is non-recursive and side-effect-free (that is, it is a SELECT containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the WITH query just once, but not if it references the WITH query more than once. You can override that decision by specifying MATERIALIZED to force separate calculation of the WITH query, or by specifying NOT MATERIALIZED to force it to be merged into the parent query. The latter choice risks duplicate computation of the WITH query, but it can still give a net savings if each usage of the WITH query needs only a small part of the WITH query's full output. regards, tom lane
Re: Confusing error message for REINDEX TABLE CONCURRENTLY
On Tue, May 28, 2019 at 12:05 PM David Rowley wrote: > On Tue, 28 May 2019 at 01:23, Ashwin Agrawal wrote: > > I think we will need to separate out the NOTICE message for concurrent > and regular case. > > > > For example this doesn't sound correct > > WARNING: cannot reindex exclusion constraint index > "public.circles_c_excl" concurrently, skipping > > NOTICE: table "circles" has no indexes to reindex > > > > As no indexes can't be reindexed *concurrently* but there are still > indexes which can be reindexed, invalid indexes I think fall in same > category. > > Swap "can't" for "can" and, yeah. I think it would be good to make the > error messages differ for these two cases. This would serve as a hint > to the user that they might have better luck trying without the > "concurrently" option. > Please check if the attached patch addresses and satisfies all the points discussed so far in this thread. Was thinking of adding explicit errhint for concurrent case NOTICE to convey, either the table has no indexes or can only be reindexed without CONCURRENTLY. But thought may be its obvious but feel free to add if would be helpful. From 2e9eceff07d9bdadef82a54c17f399263436bf9d Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Mon, 3 Jun 2019 16:30:39 -0700 Subject: [PATCH v2] Avoid confusing error message for REINDEX. --- src/backend/commands/indexcmds.c | 26 +++--- src/test/regress/expected/create_index.out | 10 - 2 files changed, 23 insertions(+), 13 deletions(-) diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 40ea629ffe7..bab20028090 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -2438,17 +2438,25 @@ ReindexTable(RangeVar *relation, int options, bool concurrent) RangeVarCallbackOwnsTable, NULL); if (concurrent) + { result = ReindexRelationConcurrently(heapOid, options); + + if (!result) + ereport(NOTICE, + (errmsg("table \"%s\" has no indexes that can be concurrently reindexed", + relation->relname))); + } else + { result = reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST | REINDEX_REL_CHECK_CONSTRAINTS, options); - - if (!result) - ereport(NOTICE, -(errmsg("table \"%s\" has no indexes", - relation->relname))); + if (!result) + ereport(NOTICE, + (errmsg("table \"%s\" has no indexes to reindex", + relation->relname))); + } return heapOid; } @@ -2630,7 +2638,6 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, foreach(l, relids) { Oid relid = lfirst_oid(l); - bool result; StartTransactionCommand(); /* functions in indexes may want a snapshot set */ @@ -2638,11 +2645,12 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, if (concurrent) { - result = ReindexRelationConcurrently(relid, options); + ReindexRelationConcurrently(relid, options); /* ReindexRelationConcurrently() does the verbose output */ } else { + bool result; result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST | REINDEX_REL_CHECK_CONSTRAINTS, @@ -2656,7 +2664,6 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, PopActiveSnapshot(); } - CommitTransactionCommand(); } StartTransactionCommand(); @@ -2676,6 +2683,9 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind, * The locks taken on parent tables and involved indexes are kept until the * transaction is committed, at which point a session lock is taken on each * relation. Both of these protect against concurrent schema changes. + * + * Returns true if any indexes have been rebuilt (including toast table's + * indexes when relevant). */ static bool ReindexRelationConcurrently(Oid relationOid, int options) diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out index c8bc6be0613..72a8fca48e4 100644 --- a/src/test/regress/expected/create_index.out +++ b/src/test/regress/expected/create_index.out @@ -1944,9 +1944,9 @@ DROP TABLE reindex_verbose; CREATE TABLE concur_reindex_tab (c1 int); -- REINDEX REINDEX TABLE concur_reindex_tab; -- notice -NOTICE: table "concur_reindex_tab" has no indexes +NOTICE: table "concur_reindex_tab" has no indexes to reindex REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice -NOTICE: table "concur_reindex_tab" has no indexes +NOTICE: table "concur_reindex_tab" has no indexes that can be concurrently reindexed ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index -- Normal index with integer column CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1); @@ -2043,10 +2043,10 @@ ERROR: REINDEX is not yet implemented for partitioned indexes -- REINDEX is a no-op for partitioned tables REINDEX TABLE concur_reindex_part_10; WARNING: REINDEX of partition
Re: Pinned files at Windows
On Mon, Jun 03, 2019 at 11:37:30PM +0300, Konstantin Knizhnik wrote: > but without success because ERROR_DELETE_PENDING is never returned by Win32. > And moreover, stat() doesn't ever return error in this case. Could it be possible to find a reliable way to detect that? Cloberring errno with an incorrect value is not something we can rely on, and I am ready to buy that GetFileAttributesEx() can also return EACCES for some legit cases, like a file it has no access to. What if for example something is done on a file between the stat() call and the GetFileAttributesEx() call in pgwin32_safestat() so as EACCES is a legit error? -- Michael signature.asc Description: PGP signature
Re: Comment typo in tableam.h
There were few more minor typos I had collected for table am, passing them along here. Some of the required callback functions are missing Assert checking (minor thing), adding them in separate patch. From f32bdf5d0d3af5fd6ee6bf6430905f9c4bf5fefa Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Fri, 24 May 2019 16:30:38 -0700 Subject: [PATCH v1 1/2] Fix typos in few tableam comments. --- src/backend/access/table/tableamapi.c | 2 +- src/include/access/tableam.h | 14 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c index 32877e7674f..0df5ba35803 100644 --- a/src/backend/access/table/tableamapi.c +++ b/src/backend/access/table/tableamapi.c @@ -91,7 +91,7 @@ GetTableAmRoutine(Oid amhandler) Assert(routine->relation_estimate_size != NULL); - /* optional, but one callback implies presence of hte other */ + /* optional, but one callback implies presence of the other */ Assert((routine->scan_bitmap_next_block == NULL) == (routine->scan_bitmap_next_tuple == NULL)); Assert(routine->scan_sample_next_block != NULL); diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h index 0b6ac15d316..2d06d52d71f 100644 --- a/src/include/access/tableam.h +++ b/src/include/access/tableam.h @@ -231,7 +231,7 @@ typedef struct TableAmRoutine /* * Estimate the size of shared memory needed for a parallel scan of this - * relation. The snapshot does not need to be accounted for. + * relation. */ Size (*parallelscan_estimate) (Relation rel); @@ -434,8 +434,8 @@ typedef struct TableAmRoutine * * Note that only the subset of the relcache filled by * RelationBuildLocalRelation() can be relied upon and that the relation's - * catalog entries either will either not yet exist (new relation), or - * will still reference the old relfilenode. + * catalog entries will either not yet exist (new relation), or will still + * reference the old relfilenode. * * As output *freezeXid, *minmulti must be set to the values appropriate * for pg_class.{relfrozenxid, relminmxid}. For AMs that don't need those @@ -591,7 +591,7 @@ typedef struct TableAmRoutine * See table_relation_estimate_size(). * * While block oriented, it shouldn't be too hard for an AM that doesn't - * doesn't internally use blocks to convert into a usable representation. + * internally use blocks to convert into a usable representation. * * This differs from the relation_size callback by returning size * estimates (both relation size and tuple count) for planning purposes, @@ -967,7 +967,7 @@ table_index_fetch_end(struct IndexFetchTableData *scan) * * *all_dead, if all_dead is not NULL, will be set to true by * table_index_fetch_tuple() iff it is guaranteed that no backend needs to see - * that tuple. Index AMs can use that do avoid returning that tid in future + * that tuple. Index AMs can use that to avoid returning that tid in future * searches. * * The difference between this function and table_fetch_row_version is that @@ -1014,8 +1014,8 @@ extern bool table_index_fetch_tuple_check(Relation rel, * true, false otherwise. * * See table_index_fetch_tuple's comment about what the difference between - * these functions is. This function is the correct to use outside of - * index entry->table tuple lookups. + * these functions is. This function is correct to use outside of index + * entry->table tuple lookups. */ static inline bool table_tuple_fetch_row_version(Relation rel, -- 2.19.1 From 022569d249918da60d145d7877dc0f8df4ccc6cd Mon Sep 17 00:00:00 2001 From: Ashwin Agrawal Date: Mon, 3 Jun 2019 17:07:05 -0700 Subject: [PATCH v1 2/2] Add assertions for required table am callbacks. --- src/backend/access/table/tableamapi.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c index 0df5ba35803..55bd1eea3db 100644 --- a/src/backend/access/table/tableamapi.c +++ b/src/backend/access/table/tableamapi.c @@ -50,6 +50,7 @@ GetTableAmRoutine(Oid amhandler) Assert(routine->scan_begin != NULL); Assert(routine->scan_end != NULL); Assert(routine->scan_rescan != NULL); + Assert(routine->scan_getnextslot != NULL); Assert(routine->parallelscan_estimate != NULL); Assert(routine->parallelscan_initialize != NULL); @@ -61,8 +62,12 @@ GetTableAmRoutine(Oid amhandler) Assert(routine->index_fetch_tuple != NULL); Assert(routine->tuple_fetch_row_version != NULL); + Assert(routine->tuple_tid_valid != NULL); + Assert(routine->tuple_get_latest_tid != NULL); Assert(routine->tuple_satisfies_snapshot != NULL); + Assert(routine->compute_xid_horizon_for_tuples != NULL); + Assert(routine->tuple_insert != NULL); /* @@ -88,6 +93,7 @@ GetTableAmRoutine(Oid amhandler) Assert(routine->index_validate_scan != NULL); Assert(routine->relation_size != NULL); + Assert(routine->r
Re: Comment typo in tableam.h
Hi, Thanks for these! On 2019-06-03 17:24:15 -0700, Ashwin Agrawal wrote: > /* >* Estimate the size of shared memory needed for a parallel scan of this > - * relation. The snapshot does not need to be accounted for. > + * relation. >*/ > Size(*parallelscan_estimate) (Relation rel); That's not a typo? Greetings, Andres Freund
Re: Fix inconsistencies for v12
On Mon, Jun 3, 2019 at 10:56 PM Tom Lane wrote: > > Amit Langote writes: > > On 2019/05/30 18:51, Amit Kapila wrote: > >> I think it will be better to include postgres_fdw in the comment in > >> some way so that if someone wants a concrete example, there is > >> something to refer to. > > > Maybe a good idea, but this will be the first time to mention postgres_fdw > > in the core source code. If you think that's OK, how about the attached? > > This wording seems fine to me. > > Now that we've beat that item into the ground ... there were a bunch of > other tweaks suggested in Alexander's initial email. Amit (K), were you > going to review/commit those? > Yes, I am already reviewing those. I will post my comments today. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WITH NOT MATERIALIZED and DML CTEs
On Mon, Jun 03, 2019 at 07:33:35PM -0400, Tom Lane wrote: > David Fetter writes: > > It might be worth documenting the fact that NOT MATERIALIZED doesn't > > affect DML CTEs, just as it doesn't affect statements with volatile > > functions and recursive CTEs. > > We already do: > > However, if a WITH query is non-recursive and side-effect-free (that > is, it is a SELECT containing no volatile functions) then it can be I guess this part makes it pretty clear that DML isn't part of the party just yet. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Table AM callbacks referring to heap in declarations (+typos)
On Sat, Jun 01, 2019 at 12:43:11PM -0700, Andres Freund wrote: > I don't mind at all (although it's imo not a fix for the s/heap/table)! Thanks, committed what I had. -- Michael signature.asc Description: PGP signature
Re: Binary support for pgoutput plugin
On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote: > Is there a reason why pgoutput sends data in text format? Seems to > me that sending data in binary would provide a considerable > performance improvement. Are you seeing something that suggests that the text output is taking a lot of time or other resources? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Comment typo in tableam.h
On Mon, Jun 3, 2019 at 5:26 PM Andres Freund wrote: > Hi, > > Thanks for these! > > On 2019-06-03 17:24:15 -0700, Ashwin Agrawal wrote: > > /* > >* Estimate the size of shared memory needed for a parallel scan > of this > > - * relation. The snapshot does not need to be accounted for. > > + * relation. > >*/ > > Size(*parallelscan_estimate) (Relation rel); > > That's not a typo? > The snapshot is not passed as argument to that function hence seems weird to refer to snapshot in the comment, as anyways callback function can't account for it. Seems stale piece of comment and hence that piece of text should be removed. I should have refereed to changes as general comment fixes instead of explicit typo fixes :-)
Re: Comment typo in tableam.h
Hi, On 2019-06-03 18:21:56 -0700, Ashwin Agrawal wrote: > On Mon, Jun 3, 2019 at 5:26 PM Andres Freund wrote: > > > Hi, > > > > Thanks for these! > > > > On 2019-06-03 17:24:15 -0700, Ashwin Agrawal wrote: > > > /* > > >* Estimate the size of shared memory needed for a parallel scan > > of this > > > - * relation. The snapshot does not need to be accounted for. > > > + * relation. > > >*/ > > > Size(*parallelscan_estimate) (Relation rel); > > > > That's not a typo? > > > > The snapshot is not passed as argument to that function hence seems weird > to refer to snapshot in the comment, as anyways callback function can't > account for it. It's part of the parallel scan struct, and it used to be accounted for by pre tableam function... Greetings, Andres Freund
Re: Confusing error message for REINDEX TABLE CONCURRENTLY
On Mon, Jun 03, 2019 at 04:53:48PM -0700, Ashwin Agrawal wrote: > Please check if the attached patch addresses and satisfies all the points > discussed so far in this thread. It looks to be so, please see below for some comments. > +{ > result = ReindexRelationConcurrently(heapOid, options); > + > +if (!result) > +ereport(NOTICE, > +(errmsg("table \"%s\" has no indexes that can be > concurrently reindexed", > +relation->relname))); "concurrently" should be at the end of this string. I have had the exact same argument with Tom for 508300e. > @@ -2630,7 +2638,6 @@ ReindexMultipleTables(const char *objectName, > ReindexObjectType objectKind, > foreach(l, relids) > { > Oidrelid = lfirst_oid(l); > -boolresult; > > StartTransactionCommand(); > /* functions in indexes may want a snapshot set */ > @@ -2638,11 +2645,12 @@ ReindexMultipleTables(const char *objectName, > ReindexObjectType objectKind, > > if (concurrent) > { > -result = ReindexRelationConcurrently(relid, options); > +ReindexRelationConcurrently(relid, options); > /* ReindexRelationConcurrently() does the verbose output */ Indeed this variable is not used. So we could just get rid of it completely. > +bool result; > result = reindex_relation(relid, >REINDEX_REL_PROCESS_TOAST | >REINDEX_REL_CHECK_CONSTRAINTS, > @@ -2656,7 +2664,6 @@ ReindexMultipleTables(const char *objectName, > ReindexObjectType objectKind, > > PopActiveSnapshot(); > } The table has been considered for reindexing even if nothing has been reindexed, so perhaps we'd want to keep this part as-is? We have the same level of reporting for a couple of releases for this part. > - > CommitTransactionCommand(); Useless noise diff. -- Michael signature.asc Description: PGP signature
Re: Print baserestrictinfo for varchar fields
On Mon, Jun 03, 2019 at 01:37:22AM -0400, Tom Lane wrote: > It's hard to muster much enthusiasm for extending print_expr(), > considering how incomplete and little-used it is. I'd rather > spend effort on ripping it out in favor of using the far more > complete, and better-tested, code in ruleutils.c. If it is possible to get the same amount of coverage when debugging the planner, count me in. Now it seems to me that we'd still require some work to get the same level of information as for range table entry kinds.. -- Michael signature.asc Description: PGP signature
Re: Comment typo in tableam.h
On Mon, Jun 3, 2019 at 6:24 PM Andres Freund wrote: > Hi, > > On 2019-06-03 18:21:56 -0700, Ashwin Agrawal wrote: > > On Mon, Jun 3, 2019 at 5:26 PM Andres Freund wrote: > > > > > Hi, > > > > > > Thanks for these! > > > > > > On 2019-06-03 17:24:15 -0700, Ashwin Agrawal wrote: > > > > /* > > > >* Estimate the size of shared memory needed for a parallel > scan > > > of this > > > > - * relation. The snapshot does not need to be accounted for. > > > > + * relation. > > > >*/ > > > > Size(*parallelscan_estimate) (Relation rel); > > > > > > That's not a typo? > > > > > > > The snapshot is not passed as argument to that function hence seems weird > > to refer to snapshot in the comment, as anyways callback function can't > > account for it. > > It's part of the parallel scan struct, and it used to be accounted for > by pre tableam function... > Reads like the comment written from past context then, and doesn't have much value now. Its confusing than helping, to state not to account for snapshot and not any other field. table_parallelscan_estimate() has snapshot argument and it accounts for it, but callback doesn't. I am not sure how a callback would explicitly use that comment and avoid accounting for snapshot if its using generic ParallelTableScanDescData. But if you feel is helpful, please feel free to keep that text.
Re: coverage additions
On Sat, Jun 01, 2019 at 12:55:47AM -0400, Alvaro Herrera wrote: > Ah, now I remember that I tried this before, but it requires some extra > packages installed in the machine I think, and those create running > services. Did you note that src/backend/libpq does not even list the > gssapi file? Do you mean the header file be-gssapi-common.h? It is stored in src/backend/libpq/ which is obviously incorrect. I think that it should be moved to src/include/libpq/be-gssapi-common.h. Its identification marker even says that. Perhaps that's because of MSVC? Stephen? -- Michael signature.asc Description: PGP signature
Re: Fix inconsistencies for v12
Hi Andres, I have added you here as some of these are related to commits done by you. So need your opinion on the same. I have mentioned where your feedback will be helpful. On Sun, May 26, 2019 at 2:20 AM Alexander Lakhin wrote: > > 6. ExecGetResultSlot - remove (not used since introduction in 1a0586de) > Yeah, I also think this is not required. Andres, this API is not defined. Is it intended for some purpose? > 8. heap_parallelscan_initialize -> remove the sentence (changed in c2fe139c) > The same check has been moved to table_block_parallelscan_initialize. So, I think instead of removing the sentence you need to change the function name in the comment. > 10. HeapTupleSatisfiesSnapshot -> HeapTupleSatisfiesVisibility (an > internal inconsistency) > * This is an interface to HeapTupleSatisfiesVacuum that's callable via - * HeapTupleSatisfiesSnapshot, so it can be used through a Snapshot. + * HeapTupleSatisfiesVisibility, so it can be used through a Snapshot. I think now we don't need to write the second half of the comment ("so it can be used through a Snapshot"). It makes more sense with previous style API. Another related point: * HeapTupleSatisfiesNonVacuumable * * True if tuple might be visible to some transaction; false if it's * surely dead to everyone, ie, vacuumable. * * See SNAPSHOT_TOAST's definition for the intended behaviour. Here, I think instead of SNAPSHOT_TOAST, we should mention SNAPSHOT_NON_VACUUMABLE. Andres, do you have any comments on the proposed changes? > 14. latestRemovedxids -> latestRemovedXids (an inconsistent case) * Conjecture: if hitemid is dead then it had xids before the xids * marked on LP_NORMAL items. So we just ignore this item and move * onto the next, for the purposes of calculating - * latestRemovedxids. + * latestRemovedXids. I think it should be latestRemovedXid. > 16. NextSampletuple -> NextSampleTuple (an inconsistent case) I think this doesn't make much difference, but we can fix it so that NextSampleTuple's occurrence can be found during grep. > 20. RelationGetOidIndex ? just to remove the paragraph (orphaned after > 578b2297) - * This is exported separately because there are cases where we want to use - * an index that will not be recognized by RelationGetOidIndex: TOAST tables - * have indexes that are usable, but have multiple columns and are on - * ordinary columns rather than a true OID column. This code will work - * anyway, so long as the OID is the index's first column. The caller must - * pass in the actual heap attnum of the OID column, however. - * Instead of removing the entire paragraph, how about changing it like "This also handles the special cases where TOAST tables have indexes that are usable, but have multiple columns and are on ordinary columns rather than a true OID column. This code will work anyway, so long as the OID is the index's first column. The caller must pass in the actual heap attnum of the OID column, however." Andres, any suggestions? > 27. XactTopTransactionId -> XactTopFullTransactionId (an internal > inconsistency) > - * XactTopTransactionId stores the XID of our toplevel transaction, which + * XactTopFullTransactionId stores the XID of our toplevel transaction, which * will be the same as TopTransactionState.transactionId in an ordinary I think in the above sentence, now we need to use TopTransactionState.fullTransactionId. Note that I agree with your changes for the points where I have not responded anything. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
PG 11 JIT deform failure
Hi, JIT slot_compile_deform assumes there's at least 'natts' in TupleDesc, eg /* * Iterate over each attribute that needs to be deformed, build code to * deform it. */ for (attnum = 0; attnum < natts; attnum++) { Form_pg_attribute att = TupleDescAttr(desc, attnum); but a new TupleDesc has no attribute and the caller only tests TupleDesc is not null. diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c index 0da318218f..3831ed2f5a 100644 --- a/src/backend/jit/llvm/llvmjit_expr.c +++ b/src/backend/jit/llvm/llvmjit_expr.c @@ -336,7 +336,7 @@ llvm_compile_expr(ExprState *state) * function specific to tupledesc and the exact number of * to-be-extracted attributes. */ - if (desc && (context->base.flags & PGJIT_DEFORM)) + if (desc && desc->natts >= op->d.fetch.last_var && (context->base.flags & PGJIT_DEFORM)) { LLVMValueRef params[1]; LLVMValueRef l_jit_deform;