Re: [HACKERS] pow support for pgbench
The fact that the return type is not consistently of one type bothers me. I'm not sure pgbench's expression language is a good place to runtime polymorphism -- SQL doesn't work that way. Sure. Pg has a NUMERIC adaptative precision version, which is cheating, because it can return kind of an "int" or a "float", depending on whether there are digits after the decimal point or not. Pgbench does not have support for NUMERIC, just INT & DOUBLE, so the current version is an approximation of that. Now it is always possible to just do DOUBLE version, but this won't match SQL behavior either. Another point I forgot: pgbench functions and operators are notably interesting to generate/transform keys in tables, which are usually integers, so having int functions when possible/appropriate is desirable, which explain why I pushed for having an int version for POW. Also, pgbench does not have a static typing model because variable types are not declared "\set i ...", so the type is somehow "guessed" based on the string values, although if in doubt it is always possible to convert (with "int" & "double" functions). So for me the philosophy is to have expression match SQL behavior when possible, as closely as possible, but it is not an exact match. -- Fabien.
Re: [HACKERS] advanced partition matching algorithm for partition-wise join
On Fri, Oct 13, 2017 at 7:59 AM, Ashutosh Bapat wrote: > On Thu, Oct 12, 2017 at 9:46 PM, Robert Haas wrote: >> On Wed, Oct 11, 2017 at 7:08 AM, Ashutosh Bapat >> wrote: >>> Here's updated patch set based on the basic partition-wise join >>> committed. The patchset applies on top of the patch to optimize the >>> case of dummy partitioned tables [1]. >>> >>> Right now, the advanced partition matching algorithm bails out when >>> either of the joining relations has a default partition. >> >> So is that something you are going to fix? >> > > Yes, if time permits. I had left the patch unattended while basic > partition-wise join was getting committed. Now that it's committed, I > rebased it. It still has TODOs and some work is required to improve > it. But for the patch to be really complete, we have to deal with the > problem of missing partitions described before. I am fine > collaborating if someone else wants to pick it up. > Here's patchset which support advanced partition matching for partition bounds with default partition. The patchset is rebased on the latest head. When a list value is present in one of the joining relations and not the other, and the other relation has default partition, match (join) the partition containing that list value with the default partition, since the default partition may contain rows with that list value. If the default partition happens to be on the outer side of the join, the resulting join partition acts as a default partition as it will contain all the values from the default partition. If the partition containing the list value happens to be on the outer side of the join, the resulting join partition is associated with the list value, since no other partition key value from the default partition makes it to the join result. When a range is present (completely or partly) in one of the joining relations and not the other, and the other relation has default partition, match (join) the partition corresponding to that range with the default partition. If the default partition happens to be on the outer side of the join, the resulting join partition acts as a default partition as it will contain all the values from the default partition. If the non-partition corresponding to the range happens to be on the outer side of the join, the resulting join partition is associated with that range, since partition key values from the default partition outside that range won't make it to the join result. If both the relations have default partition, match (join) the default partition with each other and deem the resulting join partition as default partition. If one of the relations has default partition but not the other, and the default partition happens to be on the outer side of the join, all its rows will make it to the join. Such a default partition may get joined to a non-default partition from the inner side, if inner side has a range missing in the outer side. If any of the above causes multiple partitions from one side to match with one or more partitions on the other side, we won't use partition-wise join as discussed in the first mail of this thread. I have tested the patches for two-way join, but haven't added any test involving default partitions to the patch itself. It needs to be tested for N-way join as well. So, for now I have kept the two patches supporting the default partition in case of range and list resp. separate. Also, some of the code duplication in partition matching functions can be avoided using macros. I will merge those patches into the main patch and add macros once they are tested appropriately. For hash partitioned table, we haven't implemented the advanced partition matching, since it would be rare that somebody has hash partitioned tables with holes (even if they are allowed). -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company pg_adv_dp_join_patches_v2.tar.gz Description: GNU Zip compressed data
Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
From: Henry Would this require a the new pluggable storage which is currently in development or would the existing storage engine be sufficient? I am just wondering if there are any rough design/plans for this... I'm sorry for the long interval. The graph model can be implemented on top of the relational storage engine, like Oracle and SQL Server stores graph data in relational tables. But the optimal storage engine for the graph model is required to maximize performance for graph traversal, which introduces direct pointers between graph nodes. So I think new level of pluggable storage is necessary. Regards MauMau
Re: [HACKERS] <> join selectivity estimate question
Thomas Munro writes: > So, in that plan we saw anti-join estimate 1 row but really there were > 13462. If you remove most of Q21 and keep just the anti-join between > l1 and l3, then you try removing different quals, you can see the the > problem is not the <> qual: > select count(*) > from lineitem l1 >where not exists ( > select * > from lineitem l3 > where l3.l_orderkey = l1.l_orderkey >and l3.l_suppkey <> l1.l_suppkey >and l3.l_receiptdate > l3.l_commitdate > ) > => estimate=1 actual=8998304 ISTM this is basically another variant of ye olde column correlation problem. That is, we know there's always going to be an antijoin match for the l_orderkey equality condition, and that there's always going to be matches for the l_suppkey inequality, but what we don't know is that l_suppkey is correlated with l_orderkey so that the two conditions aren't satisfied at the same time. The same thing is happening on a smaller scale with the receiptdate/commitdate comparison. I wonder whether the extended stats machinery could be brought to bear on this problem. regards, tom lane
Re: [HACKERS] postgres_fdw bug in 9.6
Robert Haas writes: > But have a look at this: > http://postgr.es/m/561e12d4.7040...@lab.ntt.co.jp > That shows a case where, before > 5fc4c26db5120bd90348b6ee3101fcddfdf54800, a query that required the > foreign table to do an EPQ recheck produced an unambiguously wrong > answer; the query stipulates that inttab.a = ft1.a but the row > returned lacks that property. I think this clearly shows that EPQ > rechecks are needed at least for FDW paths that are parameterized. Certainly; that's what I said before. > However, I guess that doesn't actually refute your point, which IIUC > is that maybe we don't need these checks for FDW join paths, because > we don't build parameterized paths for those yet. Now I think it would > still be a good idea to have a way of handling that case, because > somebody's likely going want to fix that /* XXX Consider parameterized > paths for the join relation */ eventually, but I guess for the > back-branches just setting epq_path = NULL instead of calling > GetExistingLocalJoinPath() might be the way to go... assuming that > your contention that this won't break anything is indeed correct. I thought some more about this. While it seems clear that we don't actually need to recheck anything within a postgres_fdw foreign join, there's still a problem: we have to be able to regurgitate the join row during postgresRecheckForeignScan. Since, in the current system design, the only available data is scan-level rowmarks (that is, whole-row values from each base foreign relation), there isn't any very good way to produce the join row except to insert the rowmark values into dummy scan nodes and then re-execute the join locally. So really, that is what the outerPlan infrastructure is doing for us. We could imagine that, instead of the scan-level rowmarks, the foreign join node could produce a ROW() expression containing the values it actually has to output, and then use that as a "join-level rowmark" to reconstitute the join row without any of the outerPlan infrastructure. This would have some pretty significant advantages, notably that we could (in principle) avoid fetching all the columns of remote tables we no longer needed scan-level rowmarks for. However, to do that, we'd have to dynamically decide which rowmark expressions actually need to wind up getting computed in the final join plan, based on which joins we choose to do remotely. The current scheme of statically adding rowmarks to the query during base relation setup doesn't cut it. So that's sounding like a nontrivial rewrite (and one that would break the related FDW APIs, although the patch at hand does that too). As for the question of a future extension to parameterized paths, I think that it would likely work to recheck the parameterized qual at the join node, making it basically just like rechecks for scan-level nodes. The objection to this is that it might not produce the same answer if the parameterized qual references relations that are on the insides of outer joins ... but I think that in practice we could just blow off that case (ie, reject producing parameterized paths involving such quals). The reason is that I think such cases arise only very rarely. It could only happen for non-strict qual expressions, because a strict qual would have resulted in the outer join getting strength reduced to a regular join. Not sure where that leaves us for the patch at hand. It feels to me like it's a temporary band-aid for code that we want to get rid of in the not-too-long run. As such, it'd be okay if it were smaller, but it seems bigger and more invasive than I could wish for a band-aid. regards, tom lane
Re: [HACKERS] Small improvement to compactify_tuples
hi, On Wed, Nov 29, 2017 at 8:00 AM, Peter Geoghegan wrote: > On Tue, Nov 28, 2017 at 2:41 PM, Andres Freund wrote: >> Maybe it's a stupid question. But would we still want to have this after >> the change? These should be just specializations of the template version >> imo. "generic" version operates on bytes, and it will be a bit hard to combine it with templated version. Not impossible, but it will look ugly. > I also wonder why regression test output has changed. Wasn't this > supposed to be a mechanical change in how the templating is > implemented? Why would the behavior of the algorithm change, even if > the change is only a change in the output order among equal elements? I did some change to algorithm then. But I reverted changes, and now no need in test fixes. > Also, is that one last raw CHECK_FOR_INTERRUPTS() in the template > definition supposed to be there? There was error. Fixed. In attach fixed qsort_template version. And version for compactify_tuples with bucket_sort and templated qsort. With regards, Sokolov Yura. 0001-Header-for-customized-qsort.patch.gz Description: GNU Zip compressed data 0002-Bucket-sort-for-compactify_tuples.patch.gz Description: GNU Zip compressed data
Re: [HACKERS] postgres_fdw super user checks
Robert, all, * Robert Haas (robertmh...@gmail.com) wrote: > On Fri, Dec 1, 2017 at 12:31 AM, Michael Paquier > wrote: > > I am moving this patch to next CF 2018-01. > > There now seems to be a consensus for superuser -> superuser_arg > rather than what Jeff did originally; that approach has 4 votes and > nothing else has more than 1. So, here's a patch that does it that > way. I've taken a quick look and this looks good to me. > I tried to see if some documentation update was needed, but I think > the documentation already reflects the proposed new behavior. It > says: > > > Only superusers may connect to foreign servers without password > authentication, so always specify the password option > for user mappings belonging to non-superusers. > > > Currently, however, that's not accurate. Right now you need to > specify the password option for user mappings that will be *used by* > non-superusers, not user mappings *belonging to* non-superusers. So > this patch is, I think, just making the actual behavior match the > documented behavior. Not sure if anyone has any other suggestions > here. I think this is definitely a master-only change; should we try > to insert some kind of warning into the back-branch docs? I > definitely think this should be called out in the v11 release notes. I'm not a fan of having *only* warning in the back-branches. What I would think we'd do here is correct the back-branch documentation to be correct, and then add a warning that it changes in v11. You didn't suggest an actual change wrt the back-branch warning, but it seems to me like it'd end up being morally equivilant to "ok, forget what we just said, what really happens is X, but we fix it in v11." Thanks! Stephen signature.asc Description: Digital signature
Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
From: Tom Lane It sounds like what you want is to replace all of Postgres except the name. I'm not clear on the point. The point is to make PostgreSQL a versatile database suitable even for niche use cases. I want more people to love and rely on PostgreSQL. Ideally, I want to see various data models incorporated in core from the beginning, but it would be difficult. So, this pluggable data model is necessary to foster data model development both inside and outside the PostgreSQL community. With that said, I hope PostgreSQL will someday absorb those data models, just like PL/pgSQL is now in core. But for now, I think the relational data model will continue to play a central role. I don't think it's worth making the current relational data model implementation a plugged module. It will depend on the software design and source code cleanness whether to do that. I don't understand yet how painful it would be to support other data models as first-class citizens, and I may be a reckless man who doesn't know fear. So I wish you could help and pave this difficult way together. Regards MauMau
Re: pg_dumpall -r -c try to drop user postgres
On Sun, Dec 3, 2017 at 3:21 PM, Pavel Stehule wrote: > I am not sure if user postgres should be removed, so it is probably bug > > pg_dumpall -r -c | grep postgres > > DROP ROLE postgres; > CREATE ROLE postgres; You are looking for this bit of code: /* * If asked to --clean, do that first. We can avoid detailed * dependency analysis because databases never depend on each other, * and tablespaces never depend on each other. Roles could have * grants to each other, but DROP ROLE will clean those up silently. */ if (output_clean) { if (!globals_only && !roles_only && !tablespaces_only) dropDBs(conn); if (!roles_only && !no_tablespaces) dropTablespaces(conn); if (!tablespaces_only) dropRoles(conn); } Could you clarify what you think is wrong here? -- Michael
Re: Bitmap scan is undercosted?
I wrote: > I tried creating multiple-column statistics using the v10 facility for > that: > regression=# create statistics s1 on num, flag from aaa; > CREATE STATISTICS > regression=# analyze aaa; > ANALYZE > but that changed the estimate not at all, which surprised me because > dependency statistics are supposed to fix exactly this type of problem. > I suspect there may be something in the extended-stats code that causes it > not to work right for boolean columns --- this wouldn't be excessively > surprising because of the way the planner messes around with converting > "flag = true" to just "flag" and sometimes back again. But I've not > looked closer yet. After looking, I found that indeed dependency_is_compatible_clause() rejects expressions like "flag" or "NOT flag", which it needn't since those are fully equivalent to "flag = true" or "flag = false" respectively. Moreover there's nothing else in dependencies_clauselist_selectivity that depends on the exact form of the clause under test, only on the semantics that it's an equality condition on some Var. Hence I propose the attached patch, which fixes the rowcount estimate for the example discussed in this thread: create table aaa as select (id%100)::int num, (id%10=1)::bool flag from generate_series(1, 1000) id; create index i1 on aaa (num); create index i2 on aaa (flag); create statistics s1 on num, flag from aaa; analyze aaa; explain analyze select count(*) from aaa where num = 1 and flag = true; Without patch: Aggregate (cost=43236.73..43236.74 rows=1 width=8) (actual time=349.365..349.3 65 rows=1 loops=1) -> Bitmap Heap Scan on aaa (cost=20086.40..43212.94 rows=9514 width=0) (act ual time=101.308..337.985 rows=10 loops=1) Recheck Cond: (num = 1) Filter: flag Heap Blocks: exact=44248 -> BitmapAnd (cost=20086.40..20086.40 rows=9514 width=0) (actual time =92.214..92.214 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..1776.43 rows=96000 width =0) (actual time=17.236..17.236 rows=10 loops=1) Index Cond: (num = 1) -> Bitmap Index Scan on i2 (cost=0.00..18304.96 rows=991003 wid th=0) (actual time=72.168..72.168 rows=100 loops=1) Index Cond: (flag = true) Planning time: 0.254 ms Execution time: 350.796 ms With patch: Aggregate (cost=43496.19..43496.20 rows=1 width=8) (actual time=359.195..359.1 95 rows=1 loops=1) -> Bitmap Heap Scan on aaa (cost=20129.64..43256.19 rows=96000 width=0) (ac tual time=99.750..347.353 rows=10 loops=1) Recheck Cond: (num = 1) Filter: flag Heap Blocks: exact=44248 -> BitmapAnd (cost=20129.64..20129.64 rows=9514 width=0) (actual time =90.671..90.671 rows=0 loops=1) -> Bitmap Index Scan on i1 (cost=0.00..1776.43 rows=96000 width =0) (actual time=16.946..16.946 rows=10 loops=1) Index Cond: (num = 1) -> Bitmap Index Scan on i2 (cost=0.00..18304.96 rows=991003 wid th=0) (actual time=70.898..70.898 rows=100 loops=1) Index Cond: (flag = true) Planning time: 0.218 ms Execution time: 360.608 ms That's the right overall rowcount estimate for the scan, given the stats it's working from. There's apparently still something wrong with bitmap costing, since it's still estimating this as cheaper than the single-index case --- noting the bogus rowcount estimate for the BitmapAnd, I suspect that bitmap costing is taking shortcuts rather than using clauselist_selectivity to estimate the overall selectivity of the bitmap conditions. But whatever is causing that, it's independent of this deficiency. In addition to the bugfix proper, I improved some comments, got rid of a NumRelids() test that's redundant with the preceding bms_membership() test, and fixed dependencies_clauselist_selectivity so that estimatedclauses actually is a pure output argument as stated by its API contract. regards, tom lane diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c index 9756fb8..ae0f304 100644 *** a/src/backend/statistics/dependencies.c --- b/src/backend/statistics/dependencies.c *** pg_dependencies_send(PG_FUNCTION_ARGS) *** 736,826 * dependency_is_compatible_clause * Determines if the clause is compatible with functional dependencies * ! * Only OpExprs with two arguments using an equality operator are supported. ! * When returning True attnum is set to the attribute number of the Var within ! * the supported clause. ! * ! * Currently we only support Var = Const, or Const = Var. It may be possible ! * to expand on this later. */ static bool dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum) { RestrictInfo *rinfo = (RestrictInfo *) clause; if (!IsA(rinfo, RestrictInfo)) return false; ! /* Pseudoconstants are not really inte
Re: [HACKERS] Constraint exclusion for partitioned tables
On Sat, Dec 2, 2017 at 1:11 AM, Robert Haas wrote: > On Fri, Dec 1, 2017 at 12:21 AM, Michael Paquier > wrote: >> On Wed, Sep 13, 2017 at 4:07 PM, Ashutosh Bapat >> wrote: >>> For a partitioned table, this patch saves the time to run constraint >>> exclusion on all the partitions if constraint exclusion succeeds on >>> the partitioned table. If constraint exclusion fails, we have wasted >>> CPU cycles on one run of constraint exclusion. The difference between >>> the time spent in the two scenarios increases with the number of >>> partitions. Practically, users will have a handful partitions rather >>> than a couple and thus running overhead of running constraint >>> exclusion on partitioned table would be justified given the time it >>> will save when CE succeeds. >> >> Moved patch to next CF. > > Committed after adding a comment. Generally, code changes should be > accompanied by comment updates. Thanks for committing the patch. Sorry for not including the comments. Your comment looks good. > > I tested this and found out that this is quite useful for cases where > multiple levels of partitioning are in use. Consider creating 100 > partitions like this: > > #!/usr/bin/perl > > use strict; > use warnings; > > print "create table foo (a int, b int, c text) partition by list (a);\n"; > for $a (1..10) > { > print "create table foo$a partition of foo for values in ($a) > partition by list (b);\n"; > for $b (1..10) > { > print "create table foo${a}_$b partition of foo$a for values > in ($b);\n"; > } > } > > Then consider this query: select * from foo where a = 5; > > Without this patch, we have to reject 90 leaf partitions individually, > but with the patch, we can reject the intermediate partitioned tables; > each time we do, it substitutes for rejecting 10 children > individually. This seems to me to be a case that is quite likely to > come up in the real world. > Right. Thanks for the testing. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Do we actually need an ItemId array on nbtree pages containing fixed-width tuples?
I've been working on adding a new feature to pg_hexedit [1] to allow it to work with GIN indexes. This led to an idea about how we could do better within nbtree, which I will now describe. I noticed that GIN's internal posting tree pages (B-Trees over TIDs) do away with the need for an ItemId array, despite more or less working in the same way as classic B-Tree pages (posting tree *leaf* pages are a bit weird OTOH, but that's not important). This is possible because internal posting list pages store fixed-width PostingItem structs instead of IndexTuples. A posting tree only needs to work with TID key values -- not values of some arbitrary data type, that could be a variable-width type. This raises the question: since we really just use ItemIds to deal with variable-width tuples, can we do away with the need for an ItemId array at the start of nbtree pages in some important, common cases? Applicability = When there is a single attribute within an ItemId array, that we know will always be 16 bytes (on 64-bit systems), ISTM that it should be possible to not have any ItemId array. Simple pointer arithmetic on the page, based on a tuple width that is known at compile time can be used instead -- no need to go off of an lp_off value -- see GinDataPageGetPostingItem() for an example of this. All ItemId metadata becomes redundant, and we add IndexTuples from the start of the page, just after the page header, and continue to add them until the special area at the end of the page is reached (again, see the GIN posting tree code). Note that ItemId is already quasi-redundant within indexes. In general, nbtree indexes only actually need the LP_DEAD bit, as well as lp_off. (lp_len is simply redundant, even for variable-width index tuples [2].) Advantages == Before I go into more depth on implementation issues, let me first list off the ways in which I'd expect this to improve performance: * We currently fit 367 IndexTuples on to a leaf page of a single-int4-or-int8-attribute nbtree index (with the default leaf fillfactor and BLCKSZ) after many SERIAL/BIGSERIAL style sequential inserts (workloads where all leaf page splits are 10:90 splits of the rightmost page). If we removed the ItemId array, we'd be able to fit 458 IndexTuples on the left half post-split (while still leaving the same amount of freespace on the page as before). IOW, we'd make indexes that avail of the optimization about 20% smaller, thereby reducing random I/O and so on. The benefits for internal pages would also be significant, because we win in at least 3 places: * The first win comes from the fact that we can store more IndexTuples (downlinks) now, because the optimization works just as well for internal pages as it does for leaf pages. This much is obvious. * The second win comes from the fact that there are now about 20% fewer downlinks/internal tuples needed in the first place, because that directly corresponds to the number of pages one level down. The advantages *compound*; our savings in the total number of internal pages are close to 40% (it's less than a full 40% because of the 30:70 splits we do on internal pages). * The third win comes from the fact that we will make more efficient use of CPU cache within internal pages, by packing IndexTuples together. Binary searches with many cache misses are something that has been identified as a bottleneck for some workloads [3]. I haven't bothered to apply the classic "2/3 of a page" utilization rule for my free space accounting in this example, just to keep things simple. (That rule applies to workloads with random insertions.) The general approach here should be to keep things simple by changing the physical layout of B-Tree pages (where the optimization applies) without changing their logical structure. So, very few changes ought to be required within amcheck to make it work with this, and they should all be fairly mechanical. Similarly, nothing would change about the interface of the nbtree pageinspect functions. We never make any IndexTuple bigger, so we can mix the old and new formats when pg_upgrade is used without needing to be very defensive all over the place. Issues == I'm not planning to work on this myself, so I encourage other contributors to think about it. These tricky issues will need to be worked through: * In the case of GIN, routines like GinDataPageAddPostingItem() are used instead of the generic PageAddItem() routine. This optimization requires similar new code that performs page modifications that must care about page space management (code that must memmove() ranges of tuples to make space for a new tuples, and so on). The implementation should probably generalize its approach, so that this optimization could in principle be used by other index AMs, such as the hash AM. The PageAddItem() code (and similar bufpage.c code) that is now used by nbtree would only be used in pages that still need an ItemId array. The new fixed-width-aware cod
Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE
Hi, I have rebased the patch on the latest version. Because the CURRENT_DATABASE can not only being used on COMMENT ON statement but also on other statements as following list so the patch name is renamed to "support_CURRENT_DATABASE_keyword_vxx.patch". 1. COMMENT ON DATABASE CURRENT_DATABASE is ... 2. ALTER DATABASE CURRENT_DATABASE OWNER to ... 3. ALTER DATABASE CURRENT_DATABASE SET parameter ... 4. ALTER DATABASE CURRENT_DATABASE RESET parameter ... 5. ALTER DATABASE CURRENT_DATABASE RESET ALL 6. SELECT CURRENT_DATABASE 7. SECURITY LABEL ON DATABASE CURRENT_DATABASE 8. ALTER ROLE ... IN DATABASE CURRENT_DATABASE SET/RESET configuration_parameter 9. GRANT ... ON DATABASE CURRENT_DATABASE TO role_specification ... 10. REVOKE ... ON DATABASE CURRENT_DATABASE FROM ... Regards, Jing Wang Fujitsu Australia support_CURRENT_DATABASE_keyword_v4.5.patch Description: Binary data
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Sat, Dec 2, 2017 at 2:13 AM, Robert Haas wrote: > On Fri, Dec 1, 2017 at 1:36 AM, Rajkumar Raghuwanshi > wrote: >> On Tue, Oct 31, 2017 at 2:45 PM, Robert Haas wrote: OK, committed. This is a good example of how having good code >>> coverage doesn't necessarily mean you've found all the bugs. :-) >>> >> As of now partition_join.sql is not having test cases covering cases >> where partition table have default partition, attaching a small test >> case patch to cover those. > > That's not that small, and to me it looks like overkill. > I agree, the patch looks longer than expected. I think, it's important to have some testcases to test partition-wise join with default partitions. I think we need at least one test for range default partitions, one test for list partitioning, one for multi-level partitioning and one negative testcase with default partition missing from one side of join. May be we could reduce the number of SQL commands and queries in the patch by adding default partition to every table that participates in partition-wise join (leave the tables participating in negative tests aside.). But that's going to increase the size of EXPLAIN outputs and query results. The negative test may simply drop the default partition from one of the tables. For every table being tested, the patch adds two ALTER TABLE commands, one for detaching an existing partition and then attach the same as default partition. Alternative to that is just add a new default partition without detaching and existing partition. But then the default partition needs to populated with some data, which requires 1 INSERT statement at least. That doesn't reduce the size of patch, but increases the output of query and EXPLAIN plan. May be in case of multi-level partitioning test, we don't need to add DEFAULT in every partitioned relation; adding to one of them would be enough. May be add it to the parent, but that too can be avoided. That would reduce the size of patch a bit. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Runtime Partition Pruning
On Sat, Dec 2, 2017 at 3:33 AM, Beena Emerson wrote: > Append (cost=0.00..395.10 rows=9 width=8) (actual time=0.119..0.119 > rows=0 loops=1) (run-time partition pruning: on) If we can, it would be better to show something a bit more precise, like the table being used for run-time pruning, or the expression being used for pruning. Also, we shouldn't use an ad-hoc format like "(run-time partition-pruning: on)"; rather, we should display something using one of the ExplainPropertyBlah functions in explain.c. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Partition-wise aggregation/grouping
On Sat, Dec 2, 2017 at 4:08 AM, Robert Haas wrote: > On Fri, Dec 1, 2017 at 7:41 AM, Ashutosh Bapat > wrote: >> This code creates plans where there are multiple Gather nodes under an Append >> node. > > We should avoid that. Starting and stopping workers is inefficient, > and precludes things like turning the Append into a Parallel Append. Ah, I didn't think about it. Thanks for bringing it up. > >> AFAIU, the workers assigned to one gather node can be reused until that >> Gather node finishes. Having multiple Gather nodes under an Append mean that >> every worker will be idle from the time that worker finishes the work till >> the >> last worker finishes the work. > > No, workers will exit as soon as they finish. They don't hang around idle. Sorry, I think I used wrong word "idle". I meant that if a worker finishes and exists, the query can't use it that worker slot until the next Gather node starts. But as you pointed out, starting and stopping a worker is costlier than the cost of not using the slot. So we should avoid such plans. > >> index b422050..1941468 100644 >> --- a/src/tools/pgindent/typedefs.list >> +++ b/src/tools/pgindent/typedefs.list >> @@ -2345,6 +2345,7 @@ UnlistenStmt >> UnresolvedTup >> UnresolvedTupData >> UpdateStmt >> +UpperPathExtraData >> UpperRelationKind >> UpperUniquePath >> UserAuth >> >> Do we commit this file as part of the feature? > > Andres and I regularly commit such changes; Tom rejects them. > We will leave it to the committer to decide what to do with this hunk. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: pl/perl extension fails on Windows
On Wed, Nov 29, 2017 at 08:14:41PM -0800, Noah Misch wrote: > 1. If $Config{gccversion} is nonempty, add _USE_32BIT_TIME_T. This will do >the wrong thing if MinGW changes its default to match modern MSVC. It will >do the wrong thing for a Perl built with "gcc -D__MINGW_USE_VC2005_COMPAT". > > 2. When configuring the build, determine whether to add _USE_32BIT_TIME_T by >running a test program built with and without that symbol. Perhaps have >the test program store and retrieve a PL_modglobal value. (PL_modglobal >maps to a PerlInterpreter field that follows the fields sensitive to >_USE_32BIT_TIME_T, and perlapi documents it since 5.8.0 or earlier.) This >is more principled than (1), but it will be more code and may have weird >interactions with rare Perl build options. > > I am inclined toward (2) if it takes no more than roughly a hundred lines of > code, else (1). Opinions? I regret investing in 32-bit Windows. If there's > any OS where a 32-bit PostgreSQL server makes sense today, it's not Windows. Here's an implementation of (2). This is more intricate than I hoped. One could argue for (1), but I estimate (2) wins by a nose. I successfully tested http://strawberryperl.com/download/5.14.4.1/strawberry-perl-5.14.4.1-32bit.msi (Perl 5.14.4; MinGW-built; must have -D_USE_32BIT_TIME_T) and http://get.enterprisedb.com/languagepacks/edb-languagepack-10-3-windows.exe (Perl 5.24.0; MSVC-built; must not have -D_USE_32BIT_TIME_T). I also tried http://strawberryperl.com/download/5.8.9/strawberry-perl-5.8.9.5.msi, which experienced a StackHash_0a9e crash during PERL_SYS_INIT3(), with or without -D_USE_32BIT_TIME_T. I expect that breakage is orthogonal. I didn't have ready access to obsolete MSVC-built Perl, so it will be interesting to see how the buildfarm likes this. MSVC: Test whether 32-bit Perl needs -D_USE_32BIT_TIME_T. Commits 5a5c2feca3fd858e70ea348822595547e6fa6c15 and b5178c5d08ca59e30f9d9428fa6fdb2741794e65 introduced support for modern MSVC-built, 32-bit Perl, but they broke use of MinGW-built, 32-bit Perl distributions like Strawberry Perl and modern ActivePerl. Perl has no robust means to report whether it expects a -D_USE_32BIT_TIME_T ABI, so test this. Back-patch to 9.3 (all supported versions). The chief alternative was a heuristic of adding -D_USE_32BIT_TIME_T when $Config{gccversion} is nonempty. That banks on every gcc-built Perl using the same ABI. gcc could change its default ABI the way MSVC once did, and one could build Perl with gcc and the non-default ABI. The GNU make build system could benefit from a similar test, without which it does not support MSVC-built Perl. For now, just add a comment. Most users taking the special step of building Perl with MSVC probably build PostgreSQL with MSVC. Discussion: https://postgr.es/m/20171130041441.ga3161...@rfd.leadboat.com diff --git a/config/perl.m4 b/config/perl.m4 index 76b1a92..caefb07 100644 --- a/config/perl.m4 +++ b/config/perl.m4 @@ -48,19 +48,23 @@ AC_DEFUN([PGAC_CHECK_PERL_CONFIGS], # PGAC_CHECK_PERL_EMBED_CCFLAGS # - -# We selectively extract stuff from $Config{ccflags}. We don't really need -# anything except -D switches, and other sorts of compiler switches can -# actively break things if Perl was compiled with a different compiler. -# Moreover, although Perl likes to put stuff like -D_LARGEFILE_SOURCE and -# -D_FILE_OFFSET_BITS=64 here, it would be fatal to try to compile PL/Perl -# to a different libc ABI than core Postgres uses. The available information -# says that all the symbols that affect Perl's own ABI begin with letters, -# so it should be sufficient to adopt -D switches for symbols not beginning -# with underscore. An exception is that we need to let through -# -D_USE_32BIT_TIME_T if it's present. (We probably could restrict that to -# only get through on Windows, but for the moment we let it through always.) -# For debugging purposes, let's have the configure output report the raw -# ccflags value as well as the set of flags we chose to adopt. +# We selectively extract stuff from $Config{ccflags}. For debugging purposes, +# let's have the configure output report the raw ccflags value as well as the +# set of flags we chose to adopt. We don't really need anything except -D +# switches, and other sorts of compiler switches can actively break things if +# Perl was compiled with a different compiler. Moreover, although Perl likes +# to put stuff like -D_LARGEFILE_SOURCE and -D_FILE_OFFSET_BITS=64 here, it +# would be fatal to try to compile PL/Perl to a different libc ABI than core +# Postgres uses. The available information says that most symbols that affect +# Perl's own ABI begin with letters, so it's almost sufficient to adopt -D +# switches for symbols not beginning with underscore. Some exceptions are the +# Windows-specific -D_USE_32BIT_TIME_T and -D__MINGW_USE_VC2005_COMPAT; see +# Mkvcbuild.pm for details. We absorb t
Re: Partition pruning for Star Schema
On Sun, Dec 3, 2017 at 5:56 AM, legrand legrand wrote: > Hello, > > I have a typical star schema, having dimension tables "product", "calendar" > and "country" and a fact table "sales". > This fact table is partitionned by time (range by month) and country (list). > > Will query like: > > select product.name, calendar.month, sum(sales.net_price) > from sales > inner join product on (product.id = sales.cust_id) > inner join country on (country.id = sales.country_id) > inner join calendar on (calendar.id = sales.calendar_id) > where > country.name = 'HERE' > and calendar.year = '2017' > group by product.name,calendar.month > > be able to identify needed partitions ? > AFAIU partition pruning, it works only with the partition key columns. So, if country.name and calendar.year are the partition keys partition pruning would identify the needed partitions from those tables. But planner doesn't know that calendar.year is somehow related to calendar.id and then transfer that knowledge so that partitions of sales can be identified. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] [POC] Faster processing at Gather node
On Fri, Dec 1, 2017 at 8:04 PM, Robert Haas wrote: > On Sun, Nov 26, 2017 at 3:15 AM, Amit Kapila wrote: >> Yeah and I think something like that can happen after your patch >> because now the memory for tuples returned via TupleQueueReaderNext >> will be allocated in ExecutorState and that can last for long. I >> think it is better to free memory, but we can leave it as well if you >> don't feel it important. In any case, I have written a patch, see if >> you think it makes sense. > > Well, I don't really know. My intuition is that in most cases after > ExecShutdownGatherMergeWorkers() we will very shortly thereafter call > ExecutorEnd() and everything will go away. > I thought there are some cases (though less) where we want to Shutdown the nodes (ExecShutdownNode) earlier and release the resources sooner. However, if you are not completely sure about this change, then we can leave it as it. Thanks for sharing your thoughts. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Partition pruning for Star Schema
On 04/12/17 16:08, Ashutosh Bapat wrote: On Sun, Dec 3, 2017 at 5:56 AM, legrand legrand wrote: Hello, I have a typical star schema, having dimension tables "product", "calendar" and "country" and a fact table "sales". This fact table is partitionned by time (range by month) and country (list). Will query like: select product.name, calendar.month, sum(sales.net_price) from sales inner join product on (product.id = sales.cust_id) inner join country on (country.id = sales.country_id) inner join calendar on (calendar.id = sales.calendar_id) where country.name = 'HERE' and calendar.year = '2017' group by product.name,calendar.month be able to identify needed partitions ? AFAIU partition pruning, it works only with the partition key columns. So, if country.name and calendar.year are the partition keys partition pruning would identify the needed partitions from those tables. But planner doesn't know that calendar.year is somehow related to calendar.id and then transfer that knowledge so that partitions of sales can be identified. If you can get your code to perform a star transformation on this type of query, then you might see some partition pruning. Cheers Mark
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Tue, Nov 28, 2017 at 10:37 PM, Michael Paquier wrote: > On Mon, Oct 2, 2017 at 11:02 PM, Claudio Freire > wrote: >> Rebased version of the patches attached > > The status of the patch is misleading: > https://commitfest.postgresql.org/15/844/. This was marked as waiting > on author but a new version has been published. Let's be careful. > > The last patches I am aware of, aka those from > https://www.postgresql.org/message-id/CAGTBQpZHTf2JtShC=ijc9wzeipo3xokwqhx+8wip7zjpc3f...@mail.gmail.com, > do not apply. I am moving the patch to the next commit fest with a > waiting on author status, as this should be reviewed, but those need a > rebase. They did apply at the time, but I think major work on vacuum was pushed since then, and also I was traveling so out of reach. It may take some time to rebase them again. Should I move to needs review myself after that?
Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
On Mon, Dec 4, 2017 at 2:38 PM, Claudio Freire wrote: > They did apply at the time, but I think major work on vacuum was > pushed since then, and also I was traveling so out of reach. > > It may take some time to rebase them again. Should I move to needs > review myself after that? Sure, if you can get into this state, please feel free to update the status of the patch yourself. -- Michael
Re: Do we actually need an ItemId array on nbtree pages containing fixed-width tuples?
Hi, Peter! > 4 дек. 2017 г., в 4:55, Peter Geoghegan написал(а): > Thoughts? I like the idea of more compact B-tree. Chances are that I didn't understood all your ideas. But ItemId's let you insert a tuple among two existing tuples without data movement. New tuple is places wherever free space starts. You just shift bytes in ItemId array. And you always have to insert tuple in specific position, since B-tree relies on tuple order. Best regards, Andrey Borodin.
Re: Would a BGW need shmem_access or database_connection to enumerate databases?
On Sat, Dec 2, 2017 at 12:41 AM, Robert Haas wrote: > On Fri, Dec 1, 2017 at 10:04 AM, Chapman Flack wrote: >> Can I call RegisterDynamicBackgroundWorker when not in the postmaster, >> but also not in a "regular backend", but rather another BGW? > > I believe that doing it from another BGW works fine. > I think we are already doing it that way in autoprewarm (src/contrib/pg_prewarm/autoprewarm.c). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Errands around AllocateDir()
Hi all, On the recent following thread problems around the use of AllocateDir() have been diagnosed with its use in the backend code: https://www.postgresql.org/message-id/20171127093107.1473.70...@wrigleys.postgresql.org I had a close look at all the callers of AllocateDir() and noticed a couple of unwelcome things (Tom noticed some of those in the thread mentioned above, I found others): - Some elog() instead of ereport(), which is bad for the error code and translation. - Some use ereport(LOG), and could take advantage of ReadDirExtended, which could get exposed instead of being contained in fd.c. Those elog(LOG) can be removed when there is no further operation in the routine where the folder read is happening. - perform_base_backup() makes the mistake of not saving errno before CheckXLogRemoved() when AllocateDir returns NULL, which can lead to an incorrect error message. - restoreTwoPhaseData() calls ReadDir() with LWLockAcquire() in-between, which can lead to an incorrect errno used. - Some code paths can simply rely on the error generated by ReadDir(), so some code is useless. - Some code paths use non-generic error messages, like RemoveOldXlogFiles(). Here more consistent error string would I think make sense. Some issues are real bugs, like what's in perform_base_backup() and restoreTwoPhaseData(), and some incorrect error codes. Missing translation of some messages is also wrong IMO. Making error messages more consistent is nice and cosmetic. My monitoring of all those issues is leading me to the attached patch for HEAD. Thoughts? -- Michael allocatedir-errands.patch Description: Binary data