Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Thu, Jul 04, 2019 at 09:29:49AM -0400, James Coleman wrote: On Tue, Jun 25, 2019 at 7:22 PM Tomas Vondra wrote: On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote: > >Unrelated: if you or someone else you know that's more familiar with >the parallel code, I'd be interested in their looking at the patch at >some point, because I have a suspicion it might not be operating in ... So I've looked into that, and the reason seems fairly simple - when generating the Gather Merge paths, we only look at paths that are in partial_pathlist. See generate_gather_paths(). And we only have sequential + index paths in partial_pathlist, not incremental sort paths. IMHO we can do two things: 1) modify generate_gather_paths to also consider incremental sort for each sorted path, similarly to what create_ordered_paths does 2) modify build_index_paths to also generate an incremental sort path for each index path IMHO (1) is the right choice here, because it automatically does the trick for all other types of ordered paths, not just index scans. So, something like the attached patch, which gives me plans like this: ... But I'm not going to claim those are total fixes, it's the minimum I needed to do to make this particular type of plan work. Thanks for looking into this! I intended to apply this to my most recent version of the patch (just sent a few minutes ago), but when I apply it I noticed that the partition_aggregate regression tests have several of these failures: ERROR: could not find pathkey item to sort I haven't had time to look into the cause yet, so I decided to wait until the next patch revision. I wanted to investigate this today, but I can't reprodure it. How are you building and running the regression tests? Attached is a patch adding the incremental sort below gather merge, and also tweaking the costing. But that's mostly for and better planning decisions, I don't get any pathkey errors even with the first patch. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 3efc807164..d7bf33f64d 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -2719,6 +2719,8 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows) { Path *subpath = (Path *) lfirst(lc); GatherMergePath *path; + boolis_sorted; + int presorted_keys; if (subpath->pathkeys == NIL) continue; @@ -2727,6 +2729,26 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows) path = create_gather_merge_path(root, rel, subpath, rel->reltarget, subpath->pathkeys, NULL, rowsp); add_path(rel, &path->path); + + /* consider incremental sort */ + is_sorted = pathkeys_common_contained_in(root->sort_pathkeys, + subpath->pathkeys, &presorted_keys); + + if (!is_sorted && (presorted_keys > 0)) + { + /* Also consider incremental sort. */ + subpath = (Path *) create_incremental_sort_path(root, + rel, + subpath, + root->sort_pathkeys, + presorted_keys, + -1); + + path = create_gather_merge_path(root, rel, subpath, rel->reltarget, + subpath->pathkeys, NULL, rowsp); + + add_path(rel, &path->path); + } } } diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 7f820e7351..c6aa17ba67 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -1875,16 +1875,8 @@ cost_incremental_sort(Path *path, limit_tuples); /* If we have a LIMIT, adjust the number of groups we'll have to return. */ - if (limit_tuples > 0 && limit_tuples < input_tuples) - { -
(select query)/relation as first class citizen
Hello, Just a bit of background - I currently work as a full-time db developer, mostly with Ms Sql server but I like Postgres a lot, especially because I really program in sql all the time and type system / plpgsql language of Postgres seems to me more suitable for actual programming then t-sql. Here's the problem - current structure of the language doesn't allow to decompose the code well and split calculations and data into different modules. For example. Suppose I have a table employee and I have a function like this (I'll skip definition of return types for the sake of simplicity): create function departments_salary () returns table (...) as return $$ select department, sum(salary) as salary from employee group by department; $$; so that's fine, but what if I want to run this function on filtered employee? I can adjust the function of course, but it implies I can predict all possible filters I'm going to need in the future. And logically, function itself doesn't have to be run on employee table, anything with department and salary columns will fit. So it'd be nice to be able to define the function like this: create function departments_salary(_employee query) returns table (...) as return $$ select department, sum(salary) as salary from _employee group by department; $$; and then call it like this: declare _employee query; ... _poor_employee = (select salary, department from employee where salary < 1000); select * from departments_salary( _poor_employee); And just to be clear, the query is not really invoked until the last line, so re-assigning _employee variable is more like building query expression. As far as I understand the closest way to do this is to put the data into temporary table and use this temporary table inside of the function. It's not exactly the same of course, cause in case of temporary tables data should be transferred to temporary table, while it will might be filtered later. So it's something like array vs generator in python, or List vs IQueryable in C#. Adding this functionality will allow much better decomposition of the program's logic. What do you think about the idea itself? If you think the idea is worthy, is it even possible to implement it? Regards, Roman Pekar
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sun, Jul 7, 2019 at 8:34 AM Tomas Vondra wrote: > > On Thu, Jul 04, 2019 at 09:29:49AM -0400, James Coleman wrote: > >On Tue, Jun 25, 2019 at 7:22 PM Tomas Vondra > > wrote: > >> > >> On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote: > >> > > >> >Unrelated: if you or someone else you know that's more familiar with > >> >the parallel code, I'd be interested in their looking at the patch at > >> >some point, because I have a suspicion it might not be operating in > >... > >> So I've looked into that, and the reason seems fairly simple - when > >> generating the Gather Merge paths, we only look at paths that are in > >> partial_pathlist. See generate_gather_paths(). > >> > >> And we only have sequential + index paths in partial_pathlist, not > >> incremental sort paths. > >> > >> IMHO we can do two things: > >> > >> 1) modify generate_gather_paths to also consider incremental sort for > >> each sorted path, similarly to what create_ordered_paths does > >> > >> 2) modify build_index_paths to also generate an incremental sort path > >> for each index path > >> > >> IMHO (1) is the right choice here, because it automatically does the > >> trick for all other types of ordered paths, not just index scans. So, > >> something like the attached patch, which gives me plans like this: > >... > >> But I'm not going to claim those are total fixes, it's the minimum I > >> needed to do to make this particular type of plan work. > > > >Thanks for looking into this! > > > >I intended to apply this to my most recent version of the patch (just > >sent a few minutes ago), but when I apply it I noticed that the > >partition_aggregate regression tests have several of these failures: > > > >ERROR: could not find pathkey item to sort > > > >I haven't had time to look into the cause yet, so I decided to wait > >until the next patch revision. > > > > I wanted to investigate this today, but I can't reprodure it. How are > you building and running the regression tests? > > Attached is a patch adding the incremental sort below gather merge, and > also tweaking the costing. But that's mostly for and better planning > decisions, I don't get any pathkey errors even with the first patch. On 12be7f7f997debe4e05e84b69c03ecf7051b1d79 (the last patch I sent, which is based on top of 5683b34956b4e8da9dccadc2e3a53b86104ebb33), I did this: patch -p1 < ~/Downloads/parallel-incremental-sort.patch (FWIW I configure with ./configure --prefix=$HOME/postgresql-test --enable-cassert --enable-debug --enable-depend CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer -DOPTIMIZER_DEBUG") make check-world And I get the attached regression failures. James Coleman regression.diffs Description: Binary data regression.out Description: Binary data
Re: (select query)/relation as first class citizen
Hi ne 7. 7. 2019 v 14:54 odesílatel Roman Pekar napsal: > Hello, > > Just a bit of background - I currently work as a full-time db developer, > mostly with Ms Sql server but I like Postgres a lot, especially because I > really program in sql all the time and type system / plpgsql language of > Postgres seems to me more suitable for actual programming then t-sql. > > Here's the problem - current structure of the language doesn't allow to > decompose the code well and split calculations and data into different > modules. > > For example. Suppose I have a table employee and I have a function like > this (I'll skip definition of return types for the sake of simplicity): > > create function departments_salary () > returns table (...) > as > return $$ > select department, sum(salary) as salary from employee group by > department; > $$; > > so that's fine, but what if I want to run this function on filtered > employee? I can adjust the function of course, but it implies I can predict > all possible filters I'm going to need in the future. > And logically, function itself doesn't have to be run on employee table, > anything with department and salary columns will fit. > So it'd be nice to be able to define the function like this: > > create function departments_salary(_employee query) > returns table (...) > as > return $$ > select department, sum(salary) as salary from _employee group by > department; > $$; > > and then call it like this: > > declare _employee query; > ... > _poor_employee = (select salary, department from employee where salary < > 1000); > select * from departments_salary( _poor_employee); > > And just to be clear, the query is not really invoked until the last line, > so re-assigning _employee variable is more like building query expression. > > As far as I understand the closest way to do this is to put the data into > temporary table and use this temporary table inside of the function. It's > not exactly the same of course, cause in case of temporary tables data > should be transferred to temporary table, while it will might be filtered > later. So it's something like array vs generator in python, or List vs > IQueryable in C#. > > Adding this functionality will allow much better decomposition of the > program's logic. > What do you think about the idea itself? If you think the idea is worthy, > is it even possible to implement it? > If we talk about plpgsql, then I afraid so this idea can disallow plan caching - or significantly increase the cost of plan cache. There are two possibilities of implementation - a) query like cursor - unfortunately it effectively disables any optimization and it carry ORM performance to procedures. This usage is known performance antipattern, b) query like view - it should not to have a performance problems with late optimization, but I am not sure about possibility to reuse execution plans. Currently PLpgSQL is compromise between performance and dynamic (PLpgSQL is really static language). Your proposal increase much more dynamic behave, but performance can be much more worse. More - with this behave, there is not possible to do static check - so you have to find bugs only at runtime. I afraid about performance of this solution. Regards Pavel > Regards, > Roman Pekar > > >
Re: Use relative rpath if possible
I wrote: > Peter Eisentraut writes: >> rebased patch attached, no functionality changes > I poked at this a bit, and soon found that it fails check-world, > because the isolationtester binary is built with an rpath that > only works if it's part of the temp install tree, which it ain't. Oh ... just thought of another issue in the same vein: what about modules being built out-of-tree with pgxs? (I'm imagining something with a libpq.so dependency, like postgres_fdw.) We probably really have to keep using the absolute rpath for that, because not only would such modules certainly fail "make check" with a relative rpath, but it's not really certain that they're intended to get installed into the same installdir as the core libraries. regards, tom lane
Re: (select query)/relation as first class citizen
Hi, Yes, I'm thinking about 'query like a view', 'query like a cursor' is probably possible even now in ms sql server (not sure about postgresql), but it requires this paradygm shift from set-based thinking to row-by-row thinking which I'd not want to do. I completely agree with your points of plan caching and static checks. With static checks, though it might be possible to do if the query would be defined as typed, so all the types of the columns is known in advance. In certain cases having possibility of much better decomposition is might be more important than having cached plan. Not sure how often these cases appear in general, but personally for me it'd be awesome to have this possibility. Regards, Roman Pekar On Sun, 7 Jul 2019 at 15:39, Pavel Stehule wrote: > Hi > > ne 7. 7. 2019 v 14:54 odesílatel Roman Pekar > napsal: > >> Hello, >> >> Just a bit of background - I currently work as a full-time db developer, >> mostly with Ms Sql server but I like Postgres a lot, especially because I >> really program in sql all the time and type system / plpgsql language of >> Postgres seems to me more suitable for actual programming then t-sql. >> >> Here's the problem - current structure of the language doesn't allow to >> decompose the code well and split calculations and data into different >> modules. >> >> For example. Suppose I have a table employee and I have a function like >> this (I'll skip definition of return types for the sake of simplicity): >> >> create function departments_salary () >> returns table (...) >> as >> return $$ >> select department, sum(salary) as salary from employee group by >> department; >> $$; >> >> so that's fine, but what if I want to run this function on filtered >> employee? I can adjust the function of course, but it implies I can predict >> all possible filters I'm going to need in the future. >> And logically, function itself doesn't have to be run on employee table, >> anything with department and salary columns will fit. >> So it'd be nice to be able to define the function like this: >> >> create function departments_salary(_employee query) >> returns table (...) >> as >> return $$ >> select department, sum(salary) as salary from _employee group by >> department; >> $$; >> >> and then call it like this: >> >> declare _employee query; >> ... >> _poor_employee = (select salary, department from employee where salary < >> 1000); >> select * from departments_salary( _poor_employee); >> >> And just to be clear, the query is not really invoked until the last >> line, so re-assigning _employee variable is more like building query >> expression. >> >> As far as I understand the closest way to do this is to put the data into >> temporary table and use this temporary table inside of the function. It's >> not exactly the same of course, cause in case of temporary tables data >> should be transferred to temporary table, while it will might be filtered >> later. So it's something like array vs generator in python, or List vs >> IQueryable in C#. >> >> Adding this functionality will allow much better decomposition of the >> program's logic. >> What do you think about the idea itself? If you think the idea is worthy, >> is it even possible to implement it? >> > > If we talk about plpgsql, then I afraid so this idea can disallow plan > caching - or significantly increase the cost of plan cache. > > There are two possibilities of implementation - a) query like cursor - > unfortunately it effectively disables any optimization and it carry ORM > performance to procedures. This usage is known performance antipattern, b) > query like view - it should not to have a performance problems with late > optimization, but I am not sure about possibility to reuse execution plans. > > Currently PLpgSQL is compromise between performance and dynamic (PLpgSQL > is really static language). Your proposal increase much more dynamic > behave, but performance can be much more worse. > > More - with this behave, there is not possible to do static check - so you > have to find bugs only at runtime. I afraid about performance of this > solution. > > Regards > > Pavel > > > >> Regards, >> Roman Pekar >> >> >>
Re: Switching PL/Python to Python 3 by default in PostgreSQL 12
On 2019-07-07 00:34, Steven Pousty wrote: > Why would it be a 13 or later issue? Because PostgreSQL 12 is feature frozen and in beta, and this issue is not a regression. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Switching PL/Python to Python 3 by default in PostgreSQL 12
Peter Eisentraut writes: > On 2019-07-07 00:34, Steven Pousty wrote: >> Why would it be a 13 or later issue? > Because PostgreSQL 12 is feature frozen and in beta, and this issue is > not a regression. More to the point: it does not seem to me that we should change what "plpythonu" means until Python 2 is effectively extinct in the wild. Which is surely some years away yet. If we change it sooner than that, the number of people complaining that we broke perfectly good installations will vastly outweigh the number of people who are happy because we saved them one keystroke per function definition. As a possibly relevant comparison, I get the impression that most packagers of Python are removing the versionless "python" executable name and putting *nothing* in its place. You have to write python2 or python3 nowadays. Individuals might still be setting up symlinks so that "python" does what they want, but it's not happening at the packaging/distro level. (This comparison suggests that maybe what we should be thinking about is a way to make it easier to change what "plpythonu" means at the local-opt-in level.) regards, tom lane
Re: [RFC] Removing "magic" oids
On Tue, Nov 20, 2018 at 01:20:04AM -0800, Andres Freund wrote: > On 2018-11-14 21:02:41 -0800, Andres Freund wrote: > > On 2018-11-15 04:57:28 +, Noah Misch wrote: > > > On Wed, Nov 14, 2018 at 12:01:52AM -0800, Andres Freund wrote: > > > > - one pgbench test tested concurrent insertions into a table with > > > > oids, as some sort of stress test for lwlocks and spinlocks. I *think* > > > > this doesn't really have to be a system oid column, and this was just > > > > because that's how we triggered a bug on some machine. Noah, do I get > > > > this right? > > > > > > The point of the test is to exercise OidGenLock by issuing many parallel > > > GetNewOidWithIndex() and verifying absence of duplicates. There's nothing > > > special about OidGenLock, but it is important to use an operation that > > > takes a > > > particular LWLock many times, quickly. If the test query spends too much > > > time > > > on things other than taking locks, it will catch locking races too rarely. > > > > Sequences ought to do that, too. And if it's borked, we'd hopefully see > > unique violations. But it's definitely not a 1:1 replacement. > I've tested this on ppc. Neither the old version nor the new version > stress test spinlocks sufficiently to error out with weakened spinlocks > (not that surprising, there are no spinlocks in any hot path of either > workload). Both versions very reliably trigger on weakened lwlocks. So I > think we're comparatively good on that front. I tested this on xlc, the compiler that motivated the OID test, and the v12+ version of the test didn't catch the bug[1] with xlc 13.1.3. CREATE TYPE ... AS ENUM generates an OID for each label, so the attached patch makes the v12+ test have locking behavior similar to its v11 ancestor. [1] https://postgr.es/m/flat/a72cfcb0-37d0-de2f-b3ec-f38ad8d6a...@postgrespro.ru diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl b/src/bin/pgbench/t/001_pgbench_with_server.pl index dc2c72f..3b097a9 100644 --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -58,27 +58,20 @@ sub pgbench return; } -# Test concurrent insertion into table with serial column. This -# indirectly exercises LWLock and spinlock concurrency. This test -# makes a 5-MiB table. - -$node->safe_psql('postgres', - 'CREATE UNLOGGED TABLE insert_tbl (id serial primary key); '); - +# Test concurrent OID generation via pg_enum_oid_index. This indirectly +# exercises LWLock and spinlock concurrency. +my $labels = join ',', map { "'l$_'" } 1 .. 1000; pgbench( '--no-vacuum --client=5 --protocol=prepared --transactions=25', 0, [qr{processed: 125/125}], [qr{^$}], - 'concurrent insert workload', + 'concurrent OID generation', { '001_pgbench_concurrent_insert' => - 'INSERT INTO insert_tbl SELECT FROM generate_series(1,1000);' + "CREATE TYPE pg_temp.e AS ENUM ($labels); DROP TYPE pg_temp.e;" }); -# cleanup -$node->safe_psql('postgres', 'DROP TABLE insert_tbl;'); - # Trigger various connection errors pgbench( 'no-such-database',
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On 2019-07-05 22:24, Tomas Vondra wrote: > What if the granular encryption (not the "whole cluster with a single > key") case does not encrypt whole blocks, but just tuple data? Would > that allow at least the most critical WAL use cases (recovery, physical > replication) to work without having to know all the encryption keys? Finding the exact point where you divide up sensitive and non-sensitive data would be difficult. For example, say, you encrypt the tuple payload but not the tuple header, so that vacuum would still work. Then, someone who has access to the raw data directory could infer in combination with commit timestamps for example, that on Friday between 5pm and 6pm, 1 records were updated, 500 were inserted, and 200 were deleted, and that table has about this size, and this happens every Friday, and so on. That seems way to much information to reveal for an allegedly encrypted data directory. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Broken defenses against dropping a partitioning column
(Moved from pgsql-bugs thread at [1]) Consider regression=# create domain d1 as int; CREATE DOMAIN regression=# create table t1 (f1 d1) partition by range(f1); CREATE TABLE regression=# alter table t1 drop column f1; ERROR: cannot drop column named in partition key So far so good, but that defense has more holes than a hunk of Swiss cheese: regression=# drop domain d1 cascade; psql: NOTICE: drop cascades to column f1 of table t1 DROP DOMAIN Of course, the table is now utterly broken, e.g. regression=# \d t1 psql: ERROR: cache lookup failed for type 0 (More-likely variants of this include dropping an extension that defines the type of a partitioning column, or dropping the schema containing such a type.) The fix I was speculating about in the pgsql-bugs thread was to add explicit pg_depend entries making the table's partitioning columns internally dependent on the whole table (or maybe the other way around; haven't experimented). That fix has a couple of problems though: 1. In the example, "drop domain d1 cascade" would automatically cascade to the whole partitioned table, including child partitions of course. This might leave a user sad, if a few terabytes of valuable data went away; though one could argue that they'd better have paid more attention to what the cascade cascaded to. 2. It doesn't fix anything for pre-existing tables in pre-v12 branches. I thought of a different possible approach, which is to move the "cannot drop column named in partition key" error check from ATExecDropColumn(), where it is now, to RemoveAttributeById(). That would be back-patchable, but the implication would be that dropping anything that a partitioning column depends on would be impossible, even with CASCADE; you'd have to manually drop the partitioned table first. Good for data safety, but a horrible violation of expectations, and likely of the SQL spec as well. I'm not sure we could avoid order-of-traversal problems, either. Ideally, perhaps, a DROP CASCADE like this would not cascade to the whole table but only to the table's partitioned-ness property, leaving you with a non-partitioned table with most of its data intact. It would take a lot of work to make that happen though, and it certainly wouldn't be back-patchable, and I'm not really sure it's worth it. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/flat/CA%2Bu7OA4JKCPFrdrAbOs7XBiCyD61XJxeNav4LefkSmBLQ-Vobg%40mail.gmail.com
Re: Switching PL/Python to Python 3 by default in PostgreSQL 12
The point of the links I sent from the Python community is that they wanted Python extinct in the wild as of Jan 1 next year. They are never fixing it, even for a security vulnerability. It seems to me we roll out breaking changes with major versions. So yes, if the user chooses to upgrade to 12 and they haven't migrated their code to Python 2 it might not work. I don't have a good answer to no changes except regressions. I do hope, given how much our users expect us to be secure, that we weigh the consequences of making our default Python a version which is dead to the community a month or so after Postgresql 12s release. We can certainly take the stance of leave the Python version be, but it seems that we should then come up with a plan if there is a security vulnerability found in Python 2 after Jan 1st 2020. If Python 2 wasn't our *default* choice then I would be much more comfortable letting this just pass without mention. All that aside, I think allowing the admin set the default version of plpythonu to be an excellent idea. Thanks Steve On Sun, Jul 7, 2019, 8:26 AM Tom Lane wrote: > Peter Eisentraut writes: > > On 2019-07-07 00:34, Steven Pousty wrote: > >> Why would it be a 13 or later issue? > > > Because PostgreSQL 12 is feature frozen and in beta, and this issue is > > not a regression. > > More to the point: it does not seem to me that we should change what > "plpythonu" means until Python 2 is effectively extinct in the wild. > Which is surely some years away yet. If we change it sooner than > that, the number of people complaining that we broke perfectly good > installations will vastly outweigh the number of people who are > happy because we saved them one keystroke per function definition. > > As a possibly relevant comparison, I get the impression that most > packagers of Python are removing the versionless "python" executable > name and putting *nothing* in its place. You have to write python2 > or python3 nowadays. Individuals might still be setting up symlinks > so that "python" does what they want, but it's not happening at the > packaging/distro level. > > (This comparison suggests that maybe what we should be thinking > about is a way to make it easier to change what "plpythonu" means > at the local-opt-in level.) > > regards, tom lane >
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On Sun, Jul 07, 2019 at 09:01:43AM -0400, James Coleman wrote: On Sun, Jul 7, 2019 at 8:34 AM Tomas Vondra wrote: On Thu, Jul 04, 2019 at 09:29:49AM -0400, James Coleman wrote: >On Tue, Jun 25, 2019 at 7:22 PM Tomas Vondra > wrote: >> >> On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote: >> > >> >Unrelated: if you or someone else you know that's more familiar with >> >the parallel code, I'd be interested in their looking at the patch at >> >some point, because I have a suspicion it might not be operating in >... >> So I've looked into that, and the reason seems fairly simple - when >> generating the Gather Merge paths, we only look at paths that are in >> partial_pathlist. See generate_gather_paths(). >> >> And we only have sequential + index paths in partial_pathlist, not >> incremental sort paths. >> >> IMHO we can do two things: >> >> 1) modify generate_gather_paths to also consider incremental sort for >> each sorted path, similarly to what create_ordered_paths does >> >> 2) modify build_index_paths to also generate an incremental sort path >> for each index path >> >> IMHO (1) is the right choice here, because it automatically does the >> trick for all other types of ordered paths, not just index scans. So, >> something like the attached patch, which gives me plans like this: >... >> But I'm not going to claim those are total fixes, it's the minimum I >> needed to do to make this particular type of plan work. > >Thanks for looking into this! > >I intended to apply this to my most recent version of the patch (just >sent a few minutes ago), but when I apply it I noticed that the >partition_aggregate regression tests have several of these failures: > >ERROR: could not find pathkey item to sort > >I haven't had time to look into the cause yet, so I decided to wait >until the next patch revision. > I wanted to investigate this today, but I can't reprodure it. How are you building and running the regression tests? Attached is a patch adding the incremental sort below gather merge, and also tweaking the costing. But that's mostly for and better planning decisions, I don't get any pathkey errors even with the first patch. On 12be7f7f997debe4e05e84b69c03ecf7051b1d79 (the last patch I sent, which is based on top of 5683b34956b4e8da9dccadc2e3a53b86104ebb33), I did this: patch -p1 < ~/Downloads/parallel-incremental-sort.patch (FWIW I configure with ./configure --prefix=$HOME/postgresql-test --enable-cassert --enable-debug --enable-depend CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer -DOPTIMIZER_DEBUG") make check-world And I get the attached regression failures. OK, thanks. Apparently it's the costing changes that make it go away, if I try just the patch that tweaks generate_gather_paths() I see the same failures. The failure happens during plan construction, so I think the costing changes simply mean the path with incremental sort end up not being the cheapest one (for the problematic queries), but that's just pure luck - it's definitely an issue that needs fixing. That error message is triggered in two places in createplan.c, and after changing them to Assert(false) I get a core dump with this backtrace: #0 0x702b3328857f in raise () from /lib64/libc.so.6 #1 0x702b33272895 in abort () from /lib64/libc.so.6 #2 0x00a59a9d in ExceptionalCondition (conditionName=0xc52e84 "!(0)", errorType=0xc51f96 "FailedAssertion", fileName=0xc51fe6 "createplan.c", lineNumber=5937) at assert.c:54 #3 0x007d4ab5 in prepare_sort_from_pathkeys (lefttree=0x2bbbce0, pathkeys=0x2b7a130, relids=0x0, reqColIdx=0x0, adjust_tlist_in_place=false, p_numsortkeys=0x7ffe1abcfd6c, p_sortColIdx=0x7ffe1abcfd60, p_sortOperators=0x7ffe1abcfd58, p_collations=0x7ffe1abcfd50, p_nullsFirst=0x7ffe1abcfd48) at createplan.c:5937 #4 0x007d4e7f in make_incrementalsort_from_pathkeys (lefttree=0x2bbbce0, pathkeys=0x2b7a130, relids=0x0, presortedCols=1) at createplan.c:6101 #5 0x007cdd3f in create_incrementalsort_plan (root=0x2b787c0, best_path=0x2bb92b0, flags=1) at createplan.c:2019 #6 0x007cb7ad in create_plan_recurse (root=0x2b787c0, best_path=0x2bb92b0, flags=1) at createplan.c:469 #7 0x007cd778 in create_gather_merge_plan (root=0x2b787c0, best_path=0x2bb94a0) at createplan.c:1764 #8 0x007cb8fb in create_plan_recurse (root=0x2b787c0, best_path=0x2bb94a0, flags=4) at createplan.c:516 #9 0x007cdf10 in create_agg_plan (root=0x2b787c0, best_path=0x2bb9b28) at createplan.c:2115 #10 0x007cb834 in create_plan_recurse (root=0x2b787c0, best_path=0x2bb9b28, flags=3) at createplan.c:484 #11 0x007cdc16 in create_sort_plan (root=0x2b787c0, best_path=0x2bba1e8, flags=1) at createplan.c:1986 #12 0x007cb78e in create_plan_recurse (root=0x2b787c0, best_path=0x2bba1e8, flags=1) at createplan.c:464 #13 0x007cb4ae in create_plan (root=0x2b787c0, best_path=0x2bba1e8) at createplan.c:330 #14 0x007db63c in standard
Re: SQL/JSON path issues/questions
On Thu, Jul 4, 2019 at 4:38 PM Liudmila Mantrova wrote: > Thank you! > > I think we can make this sentence even shorter, the fix is attached: > > "To refer to a JSON element stored at a lower nesting level, add one or > more accessor operators after @." Thanks, looks good to me. Attached revision of patch contains commit message. I'm going to commit this on no objections. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0004-clarify-jsonpath-docs-5.patch Description: Binary data
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY
On Sun, Jun 30, 2019 at 7:30 PM Goel, Dhruv wrote: > > On Jun 10, 2019, at 1:20 PM, Goel, Dhruv wrote: > >> On Jun 9, 2019, at 5:33 PM, Tom Lane wrote: > >> Andres Freund writes: > >>> On June 9, 2019 8:36:37 AM PDT, Tom Lane wrote: > I think you are mistaken that doing transactional updates in pg_index > is OK. If memory serves, we rely on xmin of the pg_index row for > purposes such as detecting whether a concurrently-created index is safe > to use yet. > > > > I took a deeper look regarding this use case but was unable to find more > > evidence. As part of this patch, we essentially make concurrently-created > > index safe to use only if transaction started after the xmin of Phase 3. > > Even today concurrent indexes can not be used for transactions before this > > xmin because of the wait (which I am trying to get rid of in this patch), > > is there any other denial of service you are talking about? Both the other > > states indislive, indisready can be transactional updates as far as I > > understand. Is there anything more I am missing here? > > I did some more concurrency testing here through some python scripts which > compare the end state of the concurrently created indexes. I also back-ported > this patch to PG 9.6 and ran some custom concurrency tests (Inserts, Deletes, > and Create Index Concurrently) which seem to succeed. The intermediate states > unfortunately are not easy to test in an automated manner, but to be fair > concurrent indexes could never be used for older transactions. Do you have > more inputs/ideas on this patch? I noticed that check-world passed several times with this patch applied, but the most recent CI run failed in multiple-cic: +error in steps s2i s1i: ERROR: cache lookup failed for index 26303 https://travis-ci.org/postgresql-cfbot/postgresql/builds/555472214 -- Thomas Munro https://enterprisedb.com
Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?
On Wed, Jun 12, 2019 at 1:21 AM Daniel Gustafsson wrote: > The patch is still rough around the edges (TODO’s left to mark some areas), > but > I prefer to get some feedback early rather than working too far in potentially > the wrong direction, so parking this in the CF for now. Hi Daniel, Given the above disclaimers the following may be entirely expected, but just in case you weren't aware: t/010_logical_decoding_timelines.pl fails with this patch applied. https://travis-ci.org/postgresql-cfbot/postgresql/builds/555205042 -- Thomas Munro https://enterprisedb.com
Re: dropdb --force
On Thu, Jun 27, 2019 at 7:15 AM Pavel Stehule wrote: > fixed Hi Pavel, FYI t/050_dropdb.pl fails consistently with this patch applied: https://travis-ci.org/postgresql-cfbot/postgresql/builds/555234838 -- Thomas Munro https://enterprisedb.com
Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY
On Mon, Jul 8, 2019 at 9:51 AM Thomas Munro wrote: > I noticed that check-world passed several times with this patch > applied, but the most recent CI run failed in multiple-cic: > > +error in steps s2i s1i: ERROR: cache lookup failed for index 26303 > > https://travis-ci.org/postgresql-cfbot/postgresql/builds/555472214 And in another run, this time on Windows, create_index failed: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46455 -- Thomas Munro https://enterprisedb.com
Re: Control your disk usage in PG: Introduction to Disk Quota Extension
On Mon, Feb 18, 2019 at 7:39 PM Hubert Zhang wrote: > Based on the assumption we use smgr as hook position, hook API option1 or > option2 which is better? > Or we could find some balanced API between option1 and option2? > > Again comments on other better hook positions are also appreciated! Hi Hubert, The July Commitfest is now running, and this entry is in "needs review" state. Could you please post a rebased patch? I have questions about how disk quotas should work and I think we'll probably eventually want more hooks than these, but simply adding these hooks so extensions can do whatever they want doesn't seem very risky for core. I think it's highly likely that the hook signatures will have to change in future releases too, but that seems OK for such detailed internal hooks. As for your question, my first reaction was that I preferred your option 1, because SMgrRelation seems quite private and there are no existing examples of that object being exposed to extensions. But on reflection, other callbacks don't take such a mollycoddling approach. So my vote is for option 2 "just pass all the arguments to the callback", which I understand to be the approach of patch you have posted. +if (smgrcreate_hook) +{ +(*smgrcreate_hook)(reln, forknum, isRedo); +} Usually we don't use curlies for single line if branches. -- Thomas Munro https://enterprisedb.com
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development
On 06/07/2019 00:06, Tomas Vondra wrote: > First of all, it's pretty difficult to follow the discussion when it's > not clear what's the original message and what's the response. E-mail > clients generally indent the original message with '>' or someting like > that, but your client does not do that (which is pretty silly). And > copying the message at the top does not really help. Please do something > about that. I would like to apologise. I did not realize that my client was doing that and now I have changed the client. I hope it's fine now. > > On Fri, Jul 05, 2019 at 09:37:03PM +, Tom Mercha wrote: >>> I might be missing something, but it seems like you intend to replace >>> the SQL grammar we have with something else. It's not clear to me what >>> would be the point of doing that, and it definitely looks like a huge >>> amount of work - e.g. we don't have any support for switching between >>> two distinct grammars the way you envision, and just that alone seems >>> like a multi-year project. And if you don't have that capability, all >>> external tools kinda stop working. Good luck with running such database. >> >> I was considering having two distinct grammars as an option - thanks >> for indicating the effort involved. At the end of the day I want both >> my DSL and the PostgreSQL grammars to coexist. Is extending >> PostgreSQL's grammar with my own through the PostgreSQL extension >> infrastructure worth consideration or is it also difficult to develop? >> Could you suggest any reference material on this topic? >> > > Well, I'm not an expert in that area, but we currently don't have any > infrastructure to support that. It's a topic that was discussed in the > past (perhaps you can find some references in the archives) and it > generally boils down to: > > 1) We're using bison as parser generator. > 2) Bison does not allow adding rules on the fly. > > So you have to modify the in-core src/backend/parser/gram.y and rebuild > postgres. See for example for an example of such discussion > > https://www.postgresql.org/message-id/flat/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_%3D5i5C2jog%40mail.gmail.com > > > > When two of the smartest people on the list say it's a hard problem, it > probably is. Particularly for someone who does not know the internals. You are right. Thanks for bringing it to my attention! I didn't design my language for interaction with triggers and whatnot, but I think that it would be very interesting to support those as well, so looking at CREATE LANGUAGE functionality is actually exciting and appropriate once I make some changes in design. Thanks again for this point! I hope this is not off topic but I was wondering if you know what are the intrinsic differences between HANDLER and INLINE parameters of CREATE LANGUAGE? I know that they are functions which are invoked at different instances of time (e.g. one is for handling anonymous code blocks), but at the end of the day they seem to have the same purpose? >>> What I'd look at first is implementing the grammar as a procedural >>> language (think PL/pgSQL, pl/perl etc.) implementing whatever you >>> expect from your DSL. And it's not like you'd have to wrap everything >>> in functions, because we have anonymous DO blocks. >> >> Thanks for pointing out this direction! I think I will indeed adopt >> this approach especially if directly extending PostgreSQL grammar would >> be difficult. > > Well, it's the only way to deal with it at the moment. > > > regards >
Re: [HACKERS] Cached plans and statement generalization
On Tue, Jul 2, 2019 at 3:29 AM Konstantin Knizhnik wrote: > Attached please find rebased version of the patch. > Also this version can be found in autoprepare branch of this repository > https://github.com/postgrespro/postgresql.builtin_pool.git > on github. Thanks. I haven't looked at the code but this seems like interesting work and I hope it will get some review. I guess this is bound to use a lot of memory. I guess we'd eventually want to figure out how to share the autoprepared plan cache between sessions, which is obviously a whole can of worms. A couple of trivial comments with my CF manager hat on: 1. Can you please fix the documentation? It doesn't build. Obviously reviewing the goals, design and implementation are more important than the documentation at this point, but if that is fixed then the CF bot will be able to run check-world every day and we might learn something about the code. 2. Accidental editor junk included: src/include/catalog/pg_proc.dat.~1~ -- Thomas Munro https://enterprisedb.com
Re: Bloom index cost model seems to be wrong
On Fri, Mar 1, 2019 at 7:11 AM Jeff Janes wrote: > I'm adding it to the commitfest targeting v13. I'm more interested in > feedback on the conceptual issues rather than stylistic ones, as I would > probably merge the two functions together before proposing something to > actually be committed. >From the trivialities department, this patch shows up as a CI failure with -Werror, because there is no declaration for genericcostestimate2(). I realise that's just a temporary name in a WIP patch anyway so this isn't useful feedback, but for the benefit of anyone going through CI failures in bulk looking for things to complain about: this isn't a real one. -- Thomas Munro https://enterprisedb.com
Re: Built-in connection pooler
On Tue, Jul 2, 2019 at 3:11 AM Konstantin Knizhnik wrote: > On 01.07.2019 12:57, Thomas Munro wrote: > > Interesting work. No longer applies -- please rebase. > > > Rebased version of the patch is attached. > Also all this version of built-ni proxy can be found in conn_proxy > branch of https://github.com/postgrespro/postgresql.builtin_pool.git Thanks Konstantin. I haven't looked at the code, but I can't help noticing that this CF entry and the autoprepare one are both features that come up again an again on feature request lists I've seen. That's very cool. They also both need architectural-level review. With my Commitfest manager hat on: reviewing other stuff would help with that; if you're looking for something of similar complexity and also the same level of everyone-knows-we-need-to-fix-this-!@#$-we-just-don't-know-exactly-how-yet factor, I hope you get time to provide some more feedback on Takeshi Ideriha's work on shared caches, which doesn't seem a million miles from some of the things you're working on. Could you please fix these compiler warnings so we can see this running check-world on CI? https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46324 https://travis-ci.org/postgresql-cfbot/postgresql/builds/555180678 -- Thomas Munro https://enterprisedb.com
Re: Duplicated LSN in ReorderBuffer
On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin wrote: > Attached is a simple patch that uses subxid instead of top-level xid > in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but > i'm not sure that this is a valid change. Can someone please verify it > or maybe suggest a better solution for the issue? Hello Ildar, I hope someone more familiar with this code than me can comment, but while going through the Commitfest CI results I saw this segfault with your patch: https://travis-ci.org/postgresql-cfbot/postgresql/builds/555184304 At a glance, HistoricSnapshotGetTupleCids() returned NULL in HeapTupleSatisfiesHistoricMVCC(), so ResolveCminCmaxDuringDecoding() blew up. -- Thomas Munro https://enterprisedb.com
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
On Sat, Jul 6, 2019 at 2:42 AM Antonin Houska wrote: > I've reworked the way key is passed to postmaster (v04-0003-...) so it's > easier to call the encryption key command interactively, adjusted pg_upgrade > so that it preserves database, tablespace and relfilenode OIDs (v04-0014-...), > reorganized the code a bit and split the code into more diffs. Hi Antonin, Some robotic feedback: 1. On Linux, an assertion failed: ExceptionalCondition (conditionName=conditionName@entry=0x973891 "!(decrypt_p == p)", errorType=errorType@entry=0x928d7d "FailedAssertion", fileName=fileName@entry=0x973827 "xlogutils.c", lineNumber=lineNumber@entry=815) at assert.c:54 See full stack here: https://travis-ci.org/postgresql-cfbot/postgresql/builds/50833 2. On Windows the build failed: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46469 -- Thomas Munro https://enterprisedb.com
Re: Optimze usage of immutable functions as relation
On Thu, Mar 21, 2019 at 5:58 AM Alexander Kuzmenkov wrote: > On 11/16/18 22:03, Tom Lane wrote: > > A possible fix for this is to do eval_const_expressions() on > > function RTE expressions at this stage (and then not need to > > do it later), and then pull up only when we find that the > > RTE expression has been reduced to a single Const. > > > Attached is a patch that does this, and transforms RTE_FUCTION that was > reduced to a single Const into an RTE_RESULT. Hi Alexander, The July Commitfest is here. Could we please have a rebase of this patch? Thanks, -- Thomas Munro https://enterprisedb.com
Re: FETCH FIRST clause PERCENT option
On Thu, Jun 27, 2019 at 9:06 PM Surafel Temesgen wrote: > The attached patch include the fix for all the comment given Hi Surafel, There's a call to adjust_limit_rows_costs() hiding under contrib/postgres_fdw, so this fails check-world. -- Thomas Munro https://enterprisedb.com
Re: [RFC] [PATCH] Flexible "partition pruning" hook
On Sat, Apr 6, 2019 at 3:06 PM Andres Freund wrote: > I've moved this to the next CF, and marked it as targeting v13. Hi Mike, Commifest 1 for PostgreSQL 13 is here. I was just going through the automated build results for the 'fest and noticed that your patch causes a segfault in the regression tests (possibly due to other changes that have been made in master since February). You can see the complete backtrace on the second link below, but it looks like this is happening every time, so hopefully not hard to track down locally. https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46412 https://travis-ci.org/postgresql-cfbot/postgresql/builds/555380617 Thanks, -- Thomas Munro https://enterprisedb.com
Re: Another way to fix inherited UPDATE/DELETE
On Wed, Jul 3, 2019 at 10:50 PM Tom Lane wrote: > > Amit Langote writes: > > On Wed, Feb 20, 2019 at 6:49 AM Tom Lane wrote: > >> Obviously this'd be a major rewrite with no chance of making it into v12, > >> but it doesn't sound too big to get done during v13. > > > Are you planning to work on this? > > It's on my list, but so are a lot of other things. If you'd like to > work on it, feel free. Thanks for the reply. Let me see if I can get something done for the September CF. Regards, Amit
Re: Run-time pruning for ModifyTable
Kato-san, On Thu, Jul 4, 2019 at 1:40 PM Kato, Sho wrote: > > If I understand the details of [1] correctly, ModifyTable will no longer > > have N subplans for N result relations as there are today. So, it doesn't > > make sense for ModifyTable to contain PartitionedRelPruneInfos and for > > ExecInitModifyTable/ExecModifyTable > > to have to perform initial and execution-time pruning, respectively. > > Does this mean that the generic plan will not have N subplans for N result > relations? > I thought [1] would make creating generic plans faster, but is this correct? Yeah, making a generic plan for UPDATE of inheritance tables will certainly become faster, because we will no longer plan the same query N times for N child tables. There will still be N result relations but only one sub-plan to fetch the rows from. Also, planning will still cost O(N), but with a much smaller constant factor. By the way, let's keep any further discussion on this particular topic in the other thread. Thanks, Amit
Re: FETCH FIRST clause PERCENT option
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: tested, passed Spec compliant: not tested Documentation:not tested The basic functionality works as I expect. In the following example I would have guessed it would return 4 rows instead of 5. I don't mind that it uses ceil here, but think that deserves a mention in the documentation. CREATE TABLE r100 (id INT); INSERT INTO r100 SELECT generate_series(1, 100); SELECT * FROM r100 FETCH FIRST 4.01 PERCENT ROWS ONLY; id 1 2 3 4 5 (5 rows) There's a missing space between the period and following sentence in src\backend\executor\nodeLimit.c "previous time we got a different result.In PERCENTAGE option there are" There's a missing space and the beginning "w" should be capitalized in doc\src\sgml\ref\select.sgml with PERCENT count specifies the maximum number of rows to return in percentage.ROW Another missing space after the period. previous time we got a different result.In PERCENTAGE option there are" Ryan Lambert The new status of this patch is: Waiting on Author
Re: Improve search for missing parent downlinks in amcheck
On Wed, May 1, 2019 at 12:58 PM Peter Geoghegan wrote: > On Sun, Apr 28, 2019 at 10:15 AM Alexander Korotkov > wrote: > > I think this definitely not bug fix. Bloom filter was designed to be > > lossy, no way blaming it for that :) > > I will think about a simple fix, but after the upcoming point release. > There is no hurry. A bureaucratic question: What should the status be for this CF entry? -- Thomas Munro https://enterprisedb.com
Add test case for sslinfo
Hi Hackers, I see there is no test case for sslinfo. I have added a test case for it in my project. Do you mind if I apply this test case to postgresql? Best regards, Hao Wu 0001-Add-certificates-keys-and-test-cases-for-contrib-ssl.patch Description: Binary data
Re: warning to publication created and wal_level is not set to logical
On Wed, Mar 27, 2019 at 1:36 AM Lucas Viecelli wrote: >> Oh, OK, then this seems like it's basically covered already. I think >> the original suggestion to add a WARNING during CREATE PUBLICATION >> isn't unreasonable. But we don't need to do more than that (and it >> shouldn't be higher than WARNING). > > Okay, I think it will improve understanding of new users. > > Since everything is fine, thank you all for the comments Hi Lucas, The July Commitfest has started. This patch is in "Needs review" status, but it doesn't apply. If I read the above discussion correctly, it seems there is agreement that a warning here is a good idea to commit this patch. Could you please post a rebased patch? A note on the message: WARNING: `PUBLICATION` created but wal_level `is` not set to logical, you need to change it before creating any SUBSCRIPTION I wonder if it would be more typical project style to put the clue on what to do into an "errhint" message, something like this: WARNING: insufficient wal_level to publish logical changes HINT: Set wal_level to logical before creating subscriptions. -- Thomas Munro https://enterprisedb.com
Re: tableam vs. TOAST
On Wed, Jun 12, 2019 at 4:17 AM Robert Haas wrote: > On Tue, May 21, 2019 at 2:10 PM Robert Haas wrote: > > Updated and rebased patches attached. > > And again. Hi Robert, Thus spake GCC: detoast.c: In function ‘toast_fetch_datum’: detoast.c:308:12: error: variable ‘toasttupDesc’ set but not used [-Werror=unused-but-set-variable] TupleDesc toasttupDesc; ^ -- Thomas Munro https://enterprisedb.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
On Wed, Jun 19, 2019 at 7:22 PM Paul Guo wrote: > I updated the patch to v3. In this version, we skip the error if copydir > fails due to missing src/dst directory, > but to make sure the ignoring is legal, I add a simple log/forget mechanism > (Using List) similar to the xlog invalid page > checking mechanism. Two tap tests are included. One is actually from a > previous patch by Kyotaro in this > email thread and another is added by me. In addition, dbase_desc() is fixed > to make the message accurate. Hello Paul, FYI t/011_crash_recovery.pl is failing consistently on Travis CI with this patch applied: https://travis-ci.org/postgresql-cfbot/postgresql/builds/555368907 -- Thomas Munro https://enterprisedb.com
Re: refactoring - share str2*int64 functions
On Fri, May 24, 2019 at 3:23 AM Fabien COELHO wrote: > >> Although I agree it is not worth a lot of trouble, and even if I don't do > >> Windows, I think it valuable that the behavior is the same on all platform. > >> The attached match shares pg_str2*int64 functions between frontend and > >> backend by moving them to "common/", which avoids some code duplication. > >> > >> This is more refactoring, and it fixes the behavior change on 32 bit > >> architectures. > > V2 is a rebase. Hi Fabien, Here's some semi-automated feedback, noted while going through failures on cfbot.cputube.org. You have a stray editor file src/backend/parser/parse_node.c.~1~. Something is failing to compile while doing the temp-install in make check-world, which probably indicates that some test or contrib module is using the interface you changed? -- Thomas Munro https://enterprisedb.com
Re: allow_system_table_mods stuff
On Mon, Jun 24, 2019 at 11:20:51AM -0400, Tom Lane wrote: > I do see value in two switches not one, but it's what I said above, > to not need to give people *more* chance-to-break-things than they > had before when doing manual catalog fixes. That is, we need a > setting that corresponds more or less to current default behavior. > > There's an aesthetic argument to be had about whether to have two > bools or one three-way switch, but I prefer the former; there's > no backward-compatibility issue here since allow_system_table_mods > couldn't be set by applications anyway. I like a single three-way switch since if you are allowing DDL, you probably don't care if you restrict DML. log_statement already has a similar distinction with values of none, ddl, mod, all. I assume allow_system_table_mods could have value of false, dml, true. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
RE: Run-time pruning for ModifyTable
On Monday, July 8, 2019 11:34 AM, Amit Langote wrote: > By the way, let's keep any further discussion on this particular topic > in the other thread. Thanks for the details. I got it. Regards, Kato sho > -Original Message- > From: Amit Langote [mailto:amitlangot...@gmail.com] > Sent: Monday, July 8, 2019 11:34 AM > To: Kato, Sho/加藤 翔 > Cc: David Rowley ; PostgreSQL Hackers > > Subject: Re: Run-time pruning for ModifyTable > > Kato-san, > > On Thu, Jul 4, 2019 at 1:40 PM Kato, Sho wrote: > > > If I understand the details of [1] correctly, ModifyTable will no > > > longer have N subplans for N result relations as there are today. > > > So, it doesn't make sense for ModifyTable to contain > > > PartitionedRelPruneInfos and for > ExecInitModifyTable/ExecModifyTable > > > to have to perform initial and execution-time pruning, respectively. > > > > Does this mean that the generic plan will not have N subplans for N > result relations? > > I thought [1] would make creating generic plans faster, but is this > correct? > > Yeah, making a generic plan for UPDATE of inheritance tables will > certainly become faster, because we will no longer plan the same query > N times for N child tables. There will still be N result relations but > only one sub-plan to fetch the rows from. Also, planning will still cost > O(N), but with a much smaller constant factor. > > By the way, let's keep any further discussion on this particular topic > in the other thread. > > Thanks, > Amit
Re: Add test case for sslinfo
On Mon, Jul 8, 2019 at 2:59 PM Hao Wu wrote: > I see there is no test case for sslinfo. I have added a test case for it in > my project. Hi Hao Wu, Thanks! I see that you created a CF entry https://commitfest.postgresql.org/24/2203/. While I was scanning through the current CF looking for trouble, this one popped in front of my eyes, so here's some quick feedback even though it's in the next CF: +#!/bin/bash I don't think we can require that script interpreter. This failed[1] with permissions errors: +cp: cannot create regular file '/server.crt': Permission denied It looks like that's because the script assumes that PGDATA is set. I wonder if we want to include more SSL certificates, or if we want to use the same set of fixed certificates (currently under src/test/ssl/ssl) for all tests like this. I don't have a strong opinion on that, but I wanted to mention that policy decision. (There is also a test somewhere that creates a new one on the fly.) [1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/76601 -- Thomas Munro https://enterprisedb.com
Re: proposal - patch: psql - sort_by_size
On Sun, Jun 30, 2019 at 8:48 PM Pavel Stehule wrote: > I used this text in today patch Hi Pavel, Could you please post a rebased patch? Thanks, -- Thomas Munro https://enterprisedb.com
Re: Fix typos and inconsistencies for HEAD (take 5)
On Sun, Jul 07, 2019 at 08:03:01AM +0300, Alexander Lakhin wrote: > 5.8. dictlexize -> thesaurus_lexize There could be other dictionaries. > 5.9. regression.diffsregression.planregress/inh -> regression.diffs > planregress/diffs.inh I am wondering if we should not just nuke that... For now I have included your change as the mistake is obvious, but I am starting a new thread. The history around this script does not play in favor of it: commit: 2fc80e8e8304913c8dd1090bb2976632c0f4a8c3 author: Bruce Momjian date: Wed, 12 Feb 2014 17:29:19 -0500 Rename 'gmake' to 'make' in docs and recommended commands This simplifies the docs and makes it easier to cut/paste command lines. commit: c77e2e42fb4cf5c90a7562b9df289165ff164df1 author: Tom Lane date: Mon, 18 Dec 2000 02:45:47 + Tweak regressplans.sh to use any already-set PGOPTIONS. And looking closer it seems that there are other issues linked to it... > 5.27. equivalentOpersAfterPromotion -> remove (irrelevant since > 8536c962, but the whole comments is too old to be informational too) It seems to me that this could be much more reworked. So discarded for now. > 5.29. ExclusiveRowLock -> RowExclusiveLock Grammar mistake here. > 5.31. ExecBitmapHeapNext -> BitmapHeapNext Well, BitmapHeapRecheck is not listed in the interface routines either.. > 5.37. ExecSeqNext -> SeqNext > 5.40. ExecSubqueryNext -> SubqueryNext > 5.41. ExecValuesNext -> ValuesNext Here as well these sets are incomplete. Instead for those series I'd like to think that it would be better to do a larger cleanup and just remove all these in the executor "INTERFACE ROUTINES". Your proposed patches don't make things better either, as the interfaces are listed in alphabetical order. > 5.39. exec_subplan_get_plan -> remove (not used since 1cc29fe7) This could be used by extensions. So let's not remove it. And committed most of the rest. Thanks. -- Michael signature.asc Description: PGP signature
Re: fix for BUG #3720: wrong results at using ltree
On Sun, Apr 7, 2019 at 3:46 AM Tom Lane wrote: > =?UTF-8?Q?Filip_Rembia=C5=82kowski?= writes: > > Here is my attempt to fix a 12-years old ltree bug (which is a todo item). > > I see it's not backward-compatible, but in my understanding that's > > what is documented. Previous behavior was inconsistent with > > documentation (where single asterisk should match zero or more > > labels). > > http://archives.postgresql.org/pgsql-bugs/2007-11/msg00044.php [...] > In short, I'm wondering if we should treat this as a documentation > bug not a code bug. But to do that, we'd need a more accurate > description of what the code is supposed to do, because the statement > quoted above is certainly not a match to the actual behavior. This patch doesn't apply. More importantly, it seems like we don't have a consensus on whether we want it. Teodor, Oleg, would you like to offer an opinion here? If I understand correctly, the choices are doc change, code/comment change or WONT_FIX. This seems to be an entry that we can bring to a conclusion in this CF with some input from the ltree experts. -- Thomas Munro https://enterprisedb.com
Re: [HACKERS] [PATCH] Generic type subscripting
On Fri, Jun 7, 2019 at 6:22 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > >> Rebase after pg_indent. Besides, off the list there was a suggestion > > > >> that this > > > >> could be useful to accept more than one data type as a key for > > > >> subscripting. > > > >> E.g. for jsonb it probably makes sense to understand both a simple key > > > >> name and > > > >> jsonpath: > > > > And one more rebase. > > Oh, looks like I was just confused and it wasn't necessary - for some reason > starting from v22 cfbot tries to apply v6 instead of the latest one. Hi Dmitry, Sorry about that. It looks like I broke the cfbot code that picks which thread to pull patches from when there are several registered in the CF app, the last time the HTML format changed. Now it's back to picking whichever thread has the most recent message on it. Such are the joys of web scraping (obviously we need better integration and that will happen, I just haven't had time yet). Anyway, I fixed that. But now you really do need to rebase :-) -- Thomas Munro https://enterprisedb.com
Re: Declared but no defined functions
On Sun, Jul 7, 2019 at 10:04 AM Michael Paquier wrote: > > On Sun, Jul 07, 2019 at 07:31:12AM +0800, Masahiko Sawada wrote: > > Attached patch removes these functions. > > Thanks, applied. Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: proposal - patch: psql - sort_by_size
Hi po 8. 7. 2019 v 6:12 odesílatel Thomas Munro napsal: > On Sun, Jun 30, 2019 at 8:48 PM Pavel Stehule > wrote: > > I used this text in today patch > > Hi Pavel, > > Could you please post a rebased patch? > rebased patch attached Regards Pavel > > Thanks, > > -- > Thomas Munro > https://enterprisedb.com > diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7789fc6177..96fc4f489a 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -3973,6 +3973,17 @@ bar + +SORT_BY_SIZE + + +Setting this variable to on, sorts +\d*+, \db+, \l+ +and \dP*+ outputs by size (when size is displayed). + + + + SQLSTATE diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 8b4cd53631..ae79cbb312 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -223,6 +223,7 @@ describeTablespaces(const char *pattern, bool verbose) PQExpBufferData buf; PGresult *res; printQueryOpt myopt = pset.popt; + const char *sizefunc = NULL; if (pset.sversion < 8) { @@ -265,9 +266,12 @@ describeTablespaces(const char *pattern, bool verbose) gettext_noop("Options")); if (verbose && pset.sversion >= 90200) + { appendPQExpBuffer(&buf, ",\n pg_catalog.pg_size_pretty(pg_catalog.pg_tablespace_size(oid)) AS \"%s\"", gettext_noop("Size")); + sizefunc = "pg_catalog.pg_tablespace_size(oid)"; + } if (verbose && pset.sversion >= 80200) appendPQExpBuffer(&buf, @@ -281,7 +285,10 @@ describeTablespaces(const char *pattern, bool verbose) NULL, "spcname", NULL, NULL); - appendPQExpBufferStr(&buf, "ORDER BY 1;"); + if (pset.sort_by_size && sizefunc) + appendPQExpBuffer(&buf, "ORDER BY %s DESC;", sizefunc); + else + appendPQExpBufferStr(&buf, "ORDER BY 1;"); res = PSQLexec(buf.data); termPQExpBuffer(&buf); @@ -863,6 +870,7 @@ listAllDbs(const char *pattern, bool verbose) PGresult *res; PQExpBufferData buf; printQueryOpt myopt = pset.popt; + const char *sizefunc = NULL; initPQExpBuffer(&buf); @@ -882,12 +890,15 @@ listAllDbs(const char *pattern, bool verbose) appendPQExpBufferStr(&buf, " "); printACLColumn(&buf, "d.datacl"); if (verbose && pset.sversion >= 80200) + { appendPQExpBuffer(&buf, ",\n CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')\n" "THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))\n" "ELSE 'No Access'\n" " END as \"%s\"", gettext_noop("Size")); + sizefunc = "pg_catalog.pg_database_size(d.datname)"; + } if (verbose && pset.sversion >= 8) appendPQExpBuffer(&buf, ",\n t.spcname as \"%s\"", @@ -906,7 +917,10 @@ listAllDbs(const char *pattern, bool verbose) processSQLNamePattern(pset.db, &buf, pattern, false, false, NULL, "d.datname", NULL, NULL); - appendPQExpBufferStr(&buf, "ORDER BY 1;"); + if (pset.sort_by_size && sizefunc) + appendPQExpBuffer(&buf, "ORDER BY %s DESC;", sizefunc); + else + appendPQExpBufferStr(&buf, "ORDER BY 1;"); res = PSQLexec(buf.data); termPQExpBuffer(&buf); if (!res) @@ -3628,6 +3642,7 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys bool showMatViews = strchr(tabtypes, 'm') != NULL; bool showSeq = strchr(tabtypes, 's') != NULL; bool showForeign = strchr(tabtypes, 'E') != NULL; + const char *sizefunc = NULL; PQExpBufferData buf; PGresult *res; @@ -3711,13 +3726,19 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys * size of a table, including FSM, VM and TOAST tables. */ if (pset.sversion >= 9) + { appendPQExpBuffer(&buf, ",\n pg_catalog.pg_size_pretty(pg_catalog.pg_table_size(c.oid)) as \"%s\"", gettext_noop("Size")); + sizefunc = "pg_catalog.pg_table_size(c.oid)"; + } else if (pset.sversion >= 80100) + { appendPQExpBuffer(&buf, ",\n pg_catalog.pg_size_pretty(pg_catalog.pg_relation_size(c.oid)) as \"%s\"", gettext_noop("Size")); + sizefunc = "pg_catalog.pg_relation_size(c.oid)"; + } appendPQExpBuffer(&buf, ",\n pg_catalog.obj_description(c.oid, 'pg_class') as \"%s\"", @@ -3770,7 +3791,10 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys "n.nspname", "c.relname", NULL, "pg_catalog.pg_table_is_visible(c.oid)"); - appendPQExpBufferStr(&buf, "ORDER BY 1,2;"); + if (pset.sort_by_size && sizefunc) + appendPQExpBuffer(&buf, "ORDER BY %s DESC;", sizefunc); + else + appendPQExpBufferStr(&buf, "ORDER BY 1,2;"); res = PSQLexec(buf.data); termPQExpBuffer(&buf); @@ -3946,6 +3970,7 @@ listPartitionedTables(const char *reltypes, const char *pattern, bool verbose) " JOIN d ON i.inhparen
PGOPTIONS="-fh" make check gets stuck since Postgres 11
Hi all, I have begun playing with regressplans.sh which enforces various combinations of "-f s|i|n|m|h" when running the regression tests, and I have noticed that -fh can cause the server to become stuck in the test join_hash.sql with this query (not sure which portion of the SET LOCAL parameters are involved) : select count(*) from simple r join extremely_skewed s using (id); This does not happen with REL_10_STABLE where the test executes immediately, so we has visibly an issue caused by v11 here. Any thoughts? -- Michael signature.asc Description: PGP signature
Re: Add test case for sslinfo
Hi Thomas, Thank you for your quick response! I work on greenplum, and I didn't see this folder(src/test/ssl/ssl) before. I will add more certificates to test and resend again. Do you have any suggestion about the missing PGDATA? Since the test needs to configure postgresql.conf, maybe there are other ways to determine this environment. Thank you very much! On Mon, Jul 8, 2019 at 12:05 PM Thomas Munro wrote: > On Mon, Jul 8, 2019 at 2:59 PM Hao Wu wrote: > > I see there is no test case for sslinfo. I have added a test case for it > in my project. > > Hi Hao Wu, > > Thanks! I see that you created a CF entry > https://commitfest.postgresql.org/24/2203/. While I was scanning > through the current CF looking for trouble, this one popped in front > of my eyes, so here's some quick feedback even though it's in the next > CF: > > +#!/bin/bash > > I don't think we can require that script interpreter. > > This failed[1] with permissions errors: > > +cp: cannot create regular file '/server.crt': Permission denied > > It looks like that's because the script assumes that PGDATA is set. > > I wonder if we want to include more SSL certificates, or if we want to > use the same set of fixed certificates (currently under > src/test/ssl/ssl) for all tests like this. I don't have a strong > opinion on that, but I wanted to mention that policy decision. (There > is also a test somewhere that creates a new one on the fly.) > > [1] > https://urldefense.proofpoint.com/v2/url?u=https-3A__travis-2Dci.org_postgresql-2Dcfbot_postgresql_builds_76601&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=tqYUKh-fXcYPWSaF4E-D6A&m=N21IAtFKoqkBqeNv3h-dDX50l6qCVe5xQlAHlqn0KeY&s=lgcvJiqqeNAtrRYSM2eGPbfv6a1GxgUgig2PicIES8Q&e= > > -- > Thomas Munro > > https://urldefense.proofpoint.com/v2/url?u=https-3A__enterprisedb.com&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=tqYUKh-fXcYPWSaF4E-D6A&m=N21IAtFKoqkBqeNv3h-dDX50l6qCVe5xQlAHlqn0KeY&s=3d9-Snq6Ul9p-LLkcinUksa_kt6tTmq8nBkdUSBRqm8&e= >
Re: Implementing Incremental View Maintenance
On Fri, Jun 28, 2019 at 10:56 PM Yugo Nagata wrote: > Attached is a WIP patch of IVM which supports some aggregate functions. Hi Nagata-san and Hoshiai-san, Thank you for working on this. I enjoyed your talk at PGCon. I've added Kevin Grittner just in case he missed this thread; he has talked often about implementing the counting algorithm, and he wrote the "trigger transition tables" feature to support exactly this. While integrating trigger transition tables with the new partition features, we had to make a number of decisions about how that should work, and we tried to come up with answers that would work for IMV, and I hope we made the right choices! I am quite interested to learn how IVM interacts with SERIALIZABLE. A couple of superficial review comments: +const char *aggname = get_func_name(aggref->aggfnoid); ... +else if (!strcmp(aggname, "sum")) I guess you need a more robust way to detect the supported aggregates than their name, or I guess some way for aggregates themselves to specify that they support this and somehow supply the extra logic. Perhaps I just waid what Greg Stark already said, except not as well. +elog(ERROR, "Aggrege function %s is not supported", aggname); s/Aggrege/aggregate/ Of course it is not helpful to comment on typos at this early stage, it's just that this one appears many times in the test output :-) +static bool +isIvmColumn(const char *s) +{ +char pre[7]; + + strlcpy(pre, s, sizeof(pre)); +return (strcmp(pre, "__ivm_") == 0); +} What about strncmp(s, "__ivm_", 6) == 0)? As for the question of how to reserve a namespace for system columns that won't clash with user columns, according to our manual the SQL standard doesn't allow $ in identifier names, and according to my copy SQL92 "intermediate SQL" doesn't allow identifiers that end in an underscore. I don't know what the best answer is but we should probably decide on a something based the standard. As for how to make internal columns invisible to SELECT *, previously there have been discussions about doing that using a new flag in pg_attribute: https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com +"WITH t AS (" +" SELECT diff.__ivm_count__, (diff.__ivm_count__ = mv.__ivm_count__) AS for_dlt, mv.ctid" +", %s" +" FROM %s AS mv, %s AS diff WHERE (%s) = (%s)" +"), updt AS (" +" UPDATE %s AS mv SET __ivm_count__ = mv.__ivm_count__ - t.__ivm_count__" +", %s " +" FROM t WHERE mv.ctid = t.ctid AND NOT for_dlt" +") DELETE FROM %s AS mv USING t WHERE mv.ctid = t.ctid AND for_dlt;", I fully understand that this is POC code, but I am curious about one thing. These queries that are executed by apply_delta() would need to be converted to C, or at least used reusable plans, right? Hmm, creating and dropping temporary tables every time is a clue that the ultimate form of this should be tuplestores and C code, I think, right? > Moreover, some regression test are added for aggregate functions support. > This is Hoshiai-san's work. Great. Next time you post a WIP patch, could you please fix this small compiler warning? describe.c: In function ‘describeOneTableDetails’: describe.c:3270:55: error: ‘*((void *)&tableinfo+48)’ may be used uninitialized in this function [-Werror=maybe-uninitialized] if (verbose && tableinfo.relkind == RELKIND_MATVIEW && tableinfo.isivm) ^ describe.c:1495:4: note: ‘*((void *)&tableinfo+48)’ was declared here } tableinfo; ^ Then our unofficial automatic CI system[1] will run these tests every day, which sometimes finds problems. [1] cfbot.cputube.org -- Thomas Munro https://enterprisedb.com
Re: Broken defenses against dropping a partitioning column
On Mon, Jul 8, 2019 at 4:11 AM Tom Lane wrote: > (Moved from pgsql-bugs thread at [1]) Thanks. > Consider > > regression=# create domain d1 as int; > CREATE DOMAIN > regression=# create table t1 (f1 d1) partition by range(f1); > CREATE TABLE > regression=# alter table t1 drop column f1; > ERROR: cannot drop column named in partition key > > So far so good, but that defense has more holes than a hunk of > Swiss cheese: Indeed. > regression=# drop domain d1 cascade; > psql: NOTICE: drop cascades to column f1 of table t1 > DROP DOMAIN > > Of course, the table is now utterly broken, e.g. > > regression=# \d t1 > psql: ERROR: cache lookup failed for type 0 Oops. > (More-likely variants of this include dropping an extension that > defines the type of a partitioning column, or dropping the schema > containing such a type.) Yeah. Actually, it's embarrassingly easy to fall through the holes. create type mytype as (a int); create table mytyptab (a mytype) partition by list (a); drop type mytype cascade; NOTICE: drop cascades to column a of table mytyptab DROP TYPE select * from mytyptab; ERROR: cache lookup failed for type 0 LINE 1: select * from mytyptab; ^ drop table mytyptab; ERROR: cache lookup failed for type 0 > The fix I was speculating about in the pgsql-bugs thread was to add > explicit pg_depend entries making the table's partitioning columns > internally dependent on the whole table (or maybe the other way around; > haven't experimented). That fix has a couple of problems though: > > 1. In the example, "drop domain d1 cascade" would automatically > cascade to the whole partitioned table, including child partitions > of course. This might leave a user sad, if a few terabytes of > valuable data went away; though one could argue that they'd better > have paid more attention to what the cascade cascaded to. > > 2. It doesn't fix anything for pre-existing tables in pre-v12 branches. > > > I thought of a different possible approach, which is to move the > "cannot drop column named in partition key" error check from > ATExecDropColumn(), where it is now, to RemoveAttributeById(). > That would be back-patchable, but the implication would be that > dropping anything that a partitioning column depends on would be > impossible, even with CASCADE; you'd have to manually drop the > partitioned table first. Good for data safety, but a horrible > violation of expectations, and likely of the SQL spec as well. I prefer this second solution as it works for both preexisting and new tables, although I also agree that it is not user-friendly. Would it help to document that one would be unable to drop anything that a partitioning column directly and indirectly depends on (type, domain, schema, extension, etc.)? > I'm not sure we could avoid order-of-traversal problems, either. > > Ideally, perhaps, a DROP CASCADE like this would not cascade to > the whole table but only to the table's partitioned-ness property, > leaving you with a non-partitioned table with most of its data > intact. Yeah, it would've been nice if the partitioned-ness property of table could be deleted independently of the table. > It would take a lot of work to make that happen though, > and it certainly wouldn't be back-patchable, and I'm not really > sure it's worth it. Agreed that this sounds maybe more like a new feature. Thanks, Amit