date:20190707

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-07 Thread Tomas Vondra


On Thu, Jul 04, 2019 at 09:29:49AM -0400, James Coleman wrote:

On Tue, Jun 25, 2019 at 7:22 PM Tomas Vondra
 wrote:


On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote:
>
>Unrelated: if you or someone else you know that's more familiar with
>the parallel code, I'd be interested in their looking at the patch at
>some point, because I have a suspicion it might not be operating in

...

So I've looked into that, and the reason seems fairly simple - when
generating the Gather Merge paths, we only look at paths that are in
partial_pathlist. See generate_gather_paths().

And we only have sequential + index paths in partial_pathlist, not
incremental sort paths.

IMHO we can do two things:

1) modify generate_gather_paths to also consider incremental sort for
each sorted path, similarly to what create_ordered_paths does

2) modify build_index_paths to also generate an incremental sort path
for each index path

IMHO (1) is the right choice here, because it automatically does the
trick for all other types of ordered paths, not just index scans. So,
something like the attached patch, which gives me plans like this:

...

But I'm not going to claim those are total fixes, it's the minimum I
needed to do to make this particular type of plan work.


Thanks for looking into this!

I intended to apply this to my most recent version of the patch (just
sent a few minutes ago), but when I apply it I noticed that the
partition_aggregate regression tests have several of these failures:

ERROR:  could not find pathkey item to sort

I haven't had time to look into the cause yet, so I decided to wait
until the next patch revision.



I wanted to investigate this today, but I can't reprodure it. How are
you building and running the regression tests?

Attached is a patch adding the incremental sort below gather merge, and
also tweaking the costing. But that's mostly for and better planning
decisions, I don't get any pathkey errors even with the first patch.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 3efc807164..d7bf33f64d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2719,6 +2719,8 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, 
bool override_rows)
{
Path   *subpath = (Path *) lfirst(lc);
GatherMergePath *path;
+   boolis_sorted;
+   int presorted_keys;
 
if (subpath->pathkeys == NIL)
continue;
@@ -2727,6 +2729,26 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo 
*rel, bool override_rows)
path = create_gather_merge_path(root, rel, subpath, 
rel->reltarget,

subpath->pathkeys, NULL, rowsp);
add_path(rel, &path->path);
+
+   /* consider incremental sort */
+   is_sorted = pathkeys_common_contained_in(root->sort_pathkeys,
+   
 subpath->pathkeys, &presorted_keys);
+
+   if (!is_sorted && (presorted_keys > 0))
+   {
+   /* Also consider incremental sort. */
+   subpath = (Path *) create_incremental_sort_path(root,
+   
rel,
+   
subpath,
+   
root->sort_pathkeys,
+   
presorted_keys,
+   
-1);
+
+   path = create_gather_merge_path(root, rel, subpath, 
rel->reltarget,
+   
subpath->pathkeys, NULL, rowsp);
+
+   add_path(rel, &path->path);
+   }
}
 }
 
diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index 7f820e7351..c6aa17ba67 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1875,16 +1875,8 @@ cost_incremental_sort(Path *path,
   limit_tuples);
 
/* If we have a LIMIT, adjust the number of groups we'll have to 
return. */
-   if (limit_tuples > 0 && limit_tuples < input_tuples)
-   {
-

(select query)/relation as first class citizen

2019-07-07 Thread Roman Pekar

Hello,

Just a bit of background - I currently work as a full-time db developer,
mostly with Ms Sql server but I like Postgres a lot, especially because I
really program in sql all the time and type system / plpgsql language of
Postgres seems to me more suitable for actual programming then t-sql.

Here's the problem - current structure of the language doesn't allow to
decompose the code well and split calculations and data into different
modules.

For example. Suppose I have a table employee and I have a function like
this (I'll skip definition of return types for the sake of simplicity):

create function departments_salary ()
returns  table (...)
as
return $$
select department, sum(salary) as salary from employee group by
department;
$$;

so that's fine, but what if I want to run this function on filtered
employee? I can adjust the function of course, but it implies I can predict
all possible filters I'm going to need in the future.
And logically, function itself doesn't have to be run on employee table,
anything with department and salary columns will fit.
So it'd be nice to be able to define the function like this:

create function departments_salary(_employee query)
returns table (...)
as
return $$
select department, sum(salary) as salary from _employee group by
department;
$$;

and then call it like this:

declare _employee query;
...
_poor_employee = (select salary, department from employee where salary <
1000);
select * from  departments_salary( _poor_employee);

And just to be clear, the query is not really invoked until the last line,
so re-assigning _employee variable is more like building query expression.

As far as I understand the closest way to do this is to put the data into
temporary table and use this temporary table inside of the function. It's
not exactly the same of course, cause in case of temporary tables data
should be transferred to temporary table, while it will might be filtered
later. So it's something like array vs generator in python, or List vs
IQueryable in C#.

Adding this functionality will allow much better decomposition of the
program's logic.
What do you think about the idea itself? If you think the idea is worthy,
is it even possible to implement it?

Regards,
Roman Pekar

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-07 Thread James Coleman

On Sun, Jul 7, 2019 at 8:34 AM Tomas Vondra
 wrote:
>
> On Thu, Jul 04, 2019 at 09:29:49AM -0400, James Coleman wrote:
> >On Tue, Jun 25, 2019 at 7:22 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote:
> >> >
> >> >Unrelated: if you or someone else you know that's more familiar with
> >> >the parallel code, I'd be interested in their looking at the patch at
> >> >some point, because I have a suspicion it might not be operating in
> >...
> >> So I've looked into that, and the reason seems fairly simple - when
> >> generating the Gather Merge paths, we only look at paths that are in
> >> partial_pathlist. See generate_gather_paths().
> >>
> >> And we only have sequential + index paths in partial_pathlist, not
> >> incremental sort paths.
> >>
> >> IMHO we can do two things:
> >>
> >> 1) modify generate_gather_paths to also consider incremental sort for
> >> each sorted path, similarly to what create_ordered_paths does
> >>
> >> 2) modify build_index_paths to also generate an incremental sort path
> >> for each index path
> >>
> >> IMHO (1) is the right choice here, because it automatically does the
> >> trick for all other types of ordered paths, not just index scans. So,
> >> something like the attached patch, which gives me plans like this:
> >...
> >> But I'm not going to claim those are total fixes, it's the minimum I
> >> needed to do to make this particular type of plan work.
> >
> >Thanks for looking into this!
> >
> >I intended to apply this to my most recent version of the patch (just
> >sent a few minutes ago), but when I apply it I noticed that the
> >partition_aggregate regression tests have several of these failures:
> >
> >ERROR:  could not find pathkey item to sort
> >
> >I haven't had time to look into the cause yet, so I decided to wait
> >until the next patch revision.
> >
>
> I wanted to investigate this today, but I can't reprodure it. How are
> you building and running the regression tests?
>
> Attached is a patch adding the incremental sort below gather merge, and
> also tweaking the costing. But that's mostly for and better planning
> decisions, I don't get any pathkey errors even with the first patch.

On 12be7f7f997debe4e05e84b69c03ecf7051b1d79 (the last patch I sent,
which is based on top of 5683b34956b4e8da9dccadc2e3a53b86104ebb33), I
did this:

patch -p1 < ~/Downloads/parallel-incremental-sort.patch
 (FWIW I configure with ./configure
--prefix=$HOME/postgresql-test --enable-cassert --enable-debug
--enable-depend CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer
-DOPTIMIZER_DEBUG")
make check-world

And I get the attached regression failures.

James Coleman


regression.diffs
Description: Binary data


regression.out
Description: Binary data

Re: (select query)/relation as first class citizen

2019-07-07 Thread Pavel Stehule

Hi

ne 7. 7. 2019 v 14:54 odesílatel Roman Pekar  napsal:

> Hello,
>
> Just a bit of background - I currently work as a full-time db developer,
> mostly with Ms Sql server but I like Postgres a lot, especially because I
> really program in sql all the time and type system / plpgsql language of
> Postgres seems to me more suitable for actual programming then t-sql.
>
> Here's the problem - current structure of the language doesn't allow to
> decompose the code well and split calculations and data into different
> modules.
>
> For example. Suppose I have a table employee and I have a function like
> this (I'll skip definition of return types for the sake of simplicity):
>
> create function departments_salary ()
> returns  table (...)
> as
> return $$
> select department, sum(salary) as salary from employee group by
> department;
> $$;
>
> so that's fine, but what if I want to run this function on filtered
> employee? I can adjust the function of course, but it implies I can predict
> all possible filters I'm going to need in the future.
> And logically, function itself doesn't have to be run on employee table,
> anything with department and salary columns will fit.
> So it'd be nice to be able to define the function like this:
>
> create function departments_salary(_employee query)
> returns table (...)
> as
> return $$
> select department, sum(salary) as salary from _employee group by
> department;
> $$;
>
> and then call it like this:
>
> declare _employee query;
> ...
> _poor_employee = (select salary, department from employee where salary <
> 1000);
> select * from  departments_salary( _poor_employee);
>
> And just to be clear, the query is not really invoked until the last line,
> so re-assigning _employee variable is more like building query expression.
>
> As far as I understand the closest way to do this is to put the data into
> temporary table and use this temporary table inside of the function. It's
> not exactly the same of course, cause in case of temporary tables data
> should be transferred to temporary table, while it will might be filtered
> later. So it's something like array vs generator in python, or List vs
> IQueryable in C#.
>
> Adding this functionality will allow much better decomposition of the
> program's logic.
> What do you think about the idea itself? If you think the idea is worthy,
> is it even possible to implement it?
>

If we talk about plpgsql, then I afraid so this idea can disallow plan
caching - or significantly increase the cost of plan cache.

There are two possibilities of implementation - a) query like cursor -
unfortunately it effectively disables any optimization and it carry ORM
performance to procedures. This usage is known performance antipattern, b)
query like view - it should not to have a performance problems with late
optimization, but I am not sure about possibility to reuse execution plans.

Currently PLpgSQL is compromise between performance and dynamic (PLpgSQL is
really static language). Your proposal increase much more dynamic behave,
but performance can be much more worse.

More - with this behave, there is not possible to do static check - so you
have to find bugs only at runtime. I afraid about performance of this
solution.

Regards

Pavel



> Regards,
> Roman Pekar
>
>
>

Re: Use relative rpath if possible

2019-07-07 Thread Tom Lane

I wrote:
> Peter Eisentraut  writes:
>> rebased patch attached, no functionality changes

> I poked at this a bit, and soon found that it fails check-world,
> because the isolationtester binary is built with an rpath that
> only works if it's part of the temp install tree, which it ain't.

Oh ... just thought of another issue in the same vein: what about
modules being built out-of-tree with pgxs?  (I'm imagining something
with a libpq.so dependency, like postgres_fdw.)  We probably really
have to keep using the absolute rpath for that, because not only
would such modules certainly fail "make check" with a relative
rpath, but it's not really certain that they're intended to get
installed into the same installdir as the core libraries.

regards, tom lane

Re: (select query)/relation as first class citizen

2019-07-07 Thread Roman Pekar

 Hi,

Yes, I'm thinking about 'query like a view', 'query like a cursor' is
probably possible even now in ms sql server (not sure about postgresql),
but it requires this paradygm shift from set-based thinking to row-by-row
thinking which I'd not want to do.

I completely agree with your points of plan caching and static checks. With
static checks, though it might be possible to do if the query would be
defined as typed, so all the types of the columns is known in advance.
In certain cases having possibility of much better decomposition is might
be more important than having cached plan. Not sure how often these cases
appear in general, but personally for me it'd be awesome to have this
possibility.

Regards,
Roman Pekar

On Sun, 7 Jul 2019 at 15:39, Pavel Stehule  wrote:

> Hi
>
> ne 7. 7. 2019 v 14:54 odesílatel Roman Pekar 
> napsal:
>
>> Hello,
>>
>> Just a bit of background - I currently work as a full-time db developer,
>> mostly with Ms Sql server but I like Postgres a lot, especially because I
>> really program in sql all the time and type system / plpgsql language of
>> Postgres seems to me more suitable for actual programming then t-sql.
>>
>> Here's the problem - current structure of the language doesn't allow to
>> decompose the code well and split calculations and data into different
>> modules.
>>
>> For example. Suppose I have a table employee and I have a function like
>> this (I'll skip definition of return types for the sake of simplicity):
>>
>> create function departments_salary ()
>> returns  table (...)
>> as
>> return $$
>> select department, sum(salary) as salary from employee group by
>> department;
>> $$;
>>
>> so that's fine, but what if I want to run this function on filtered
>> employee? I can adjust the function of course, but it implies I can predict
>> all possible filters I'm going to need in the future.
>> And logically, function itself doesn't have to be run on employee table,
>> anything with department and salary columns will fit.
>> So it'd be nice to be able to define the function like this:
>>
>> create function departments_salary(_employee query)
>> returns table (...)
>> as
>> return $$
>> select department, sum(salary) as salary from _employee group by
>> department;
>> $$;
>>
>> and then call it like this:
>>
>> declare _employee query;
>> ...
>> _poor_employee = (select salary, department from employee where salary <
>> 1000);
>> select * from  departments_salary( _poor_employee);
>>
>> And just to be clear, the query is not really invoked until the last
>> line, so re-assigning _employee variable is more like building query
>> expression.
>>
>> As far as I understand the closest way to do this is to put the data into
>> temporary table and use this temporary table inside of the function. It's
>> not exactly the same of course, cause in case of temporary tables data
>> should be transferred to temporary table, while it will might be filtered
>> later. So it's something like array vs generator in python, or List vs
>> IQueryable in C#.
>>
>> Adding this functionality will allow much better decomposition of the
>> program's logic.
>> What do you think about the idea itself? If you think the idea is worthy,
>> is it even possible to implement it?
>>
>
> If we talk about plpgsql, then I afraid so this idea can disallow plan
> caching - or significantly increase the cost of plan cache.
>
> There are two possibilities of implementation - a) query like cursor -
> unfortunately it effectively disables any optimization and it carry ORM
> performance to procedures. This usage is known performance antipattern, b)
> query like view - it should not to have a performance problems with late
> optimization, but I am not sure about possibility to reuse execution plans.
>
> Currently PLpgSQL is compromise between performance and dynamic (PLpgSQL
> is really static language). Your proposal increase much more dynamic
> behave, but performance can be much more worse.
>
> More - with this behave, there is not possible to do static check - so you
> have to find bugs only at runtime. I afraid about performance of this
> solution.
>
> Regards
>
> Pavel
>
>
>
>> Regards,
>> Roman Pekar
>>
>>
>>

Re: Switching PL/Python to Python 3 by default in PostgreSQL 12

2019-07-07 Thread Peter Eisentraut

On 2019-07-07 00:34, Steven Pousty wrote:
> Why would it be a 13 or later issue?

Because PostgreSQL 12 is feature frozen and in beta, and this issue is
not a regression.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Switching PL/Python to Python 3 by default in PostgreSQL 12

2019-07-07 Thread Tom Lane

Peter Eisentraut  writes:
> On 2019-07-07 00:34, Steven Pousty wrote:
>> Why would it be a 13 or later issue?

> Because PostgreSQL 12 is feature frozen and in beta, and this issue is
> not a regression.

More to the point: it does not seem to me that we should change what
"plpythonu" means until Python 2 is effectively extinct in the wild.
Which is surely some years away yet.  If we change it sooner than
that, the number of people complaining that we broke perfectly good
installations will vastly outweigh the number of people who are
happy because we saved them one keystroke per function definition.

As a possibly relevant comparison, I get the impression that most
packagers of Python are removing the versionless "python" executable
name and putting *nothing* in its place.  You have to write python2
or python3 nowadays.  Individuals might still be setting up symlinks
so that "python" does what they want, but it's not happening at the
packaging/distro level.

(This comparison suggests that maybe what we should be thinking
about is a way to make it easier to change what "plpythonu" means
at the local-opt-in level.)

regards, tom lane

Re: [RFC] Removing "magic" oids

2019-07-07 Thread Noah Misch

On Tue, Nov 20, 2018 at 01:20:04AM -0800, Andres Freund wrote:
> On 2018-11-14 21:02:41 -0800, Andres Freund wrote:
> > On 2018-11-15 04:57:28 +, Noah Misch wrote:
> > > On Wed, Nov 14, 2018 at 12:01:52AM -0800, Andres Freund wrote:
> > > > - one pgbench test tested concurrent insertions into a table with
> > > >   oids, as some sort of stress test for lwlocks and spinlocks. I *think*
> > > >   this doesn't really have to be a system oid column, and this was just
> > > >   because that's how we triggered a bug on some machine. Noah, do I get
> > > >   this right?
> > > 
> > > The point of the test is to exercise OidGenLock by issuing many parallel
> > > GetNewOidWithIndex() and verifying absence of duplicates.  There's nothing
> > > special about OidGenLock, but it is important to use an operation that 
> > > takes a
> > > particular LWLock many times, quickly.  If the test query spends too much 
> > > time
> > > on things other than taking locks, it will catch locking races too rarely.
> > 
> > Sequences ought to do that, too. And if it's borked, we'd hopefully see
> > unique violations. But it's definitely not a 1:1 replacement.

> I've tested this on ppc.  Neither the old version nor the new version
> stress test spinlocks sufficiently to error out with weakened spinlocks
> (not that surprising, there are no spinlocks in any hot path of either
> workload). Both versions very reliably trigger on weakened lwlocks. So I
> think we're comparatively good on that front.

I tested this on xlc, the compiler that motivated the OID test, and the v12+
version of the test didn't catch the bug[1] with xlc 13.1.3.  CREATE TYPE
... AS ENUM generates an OID for each label, so the attached patch makes the
v12+ test have locking behavior similar to its v11 ancestor.

[1] https://postgr.es/m/flat/a72cfcb0-37d0-de2f-b3ec-f38ad8d6a...@postgrespro.ru
diff --git a/src/bin/pgbench/t/001_pgbench_with_server.pl 
b/src/bin/pgbench/t/001_pgbench_with_server.pl
index dc2c72f..3b097a9 100644
--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -58,27 +58,20 @@ sub pgbench
return;
 }
 
-# Test concurrent insertion into table with serial column.  This
-# indirectly exercises LWLock and spinlock concurrency.  This test
-# makes a 5-MiB table.
-
-$node->safe_psql('postgres',
-   'CREATE UNLOGGED TABLE insert_tbl (id serial primary key); ');
-
+# Test concurrent OID generation via pg_enum_oid_index.  This indirectly
+# exercises LWLock and spinlock concurrency.
+my $labels = join ',', map { "'l$_'" } 1 .. 1000;
 pgbench(
'--no-vacuum --client=5 --protocol=prepared --transactions=25',
0,
[qr{processed: 125/125}],
[qr{^$}],
-   'concurrent insert workload',
+   'concurrent OID generation',
{
'001_pgbench_concurrent_insert' =>
- 'INSERT INTO insert_tbl SELECT FROM generate_series(1,1000);'
+ "CREATE TYPE pg_temp.e AS ENUM ($labels); DROP TYPE 
pg_temp.e;"
});
 
-# cleanup
-$node->safe_psql('postgres', 'DROP TABLE insert_tbl;');
-
 # Trigger various connection errors
 pgbench(
'no-such-database',

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-07 Thread Peter Eisentraut

On 2019-07-05 22:24, Tomas Vondra wrote:
> What if the granular encryption (not the "whole cluster with a single
> key") case does not encrypt whole blocks, but just tuple data? Would
> that allow at least the most critical WAL use cases (recovery, physical
> replication) to work without having to know all the encryption keys?

Finding the exact point where you divide up sensitive and non-sensitive
data would be difficult.

For example, say, you encrypt the tuple payload but not the tuple
header, so that vacuum would still work.  Then, someone who has access
to the raw data directory could infer in combination with commit
timestamps for example, that on Friday between 5pm and 6pm, 1
records were updated, 500 were inserted, and 200 were deleted, and that
table has about this size, and this happens every Friday, and so on.
That seems way to much information to reveal for an allegedly encrypted
data directory.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Broken defenses against dropping a partitioning column

2019-07-07 Thread Tom Lane

(Moved from pgsql-bugs thread at [1])

Consider

regression=# create domain d1 as int;
CREATE DOMAIN
regression=# create table t1 (f1 d1) partition by range(f1);
CREATE TABLE
regression=# alter table t1 drop column f1;
ERROR: cannot drop column named in partition key

So far so good, but that defense has more holes than a hunk of
Swiss cheese:

regression=# drop domain d1 cascade;
psql: NOTICE: drop cascades to column f1 of table t1
DROP DOMAIN

Of course, the table is now utterly broken, e.g.

regression=# \d t1
psql: ERROR: cache lookup failed for type 0

(More-likely variants of this include dropping an extension that
defines the type of a partitioning column, or dropping the schema
containing such a type.)

The fix I was speculating about in the pgsql-bugs thread was to add
explicit pg_depend entries making the table's partitioning columns
internally dependent on the whole table (or maybe the other way around;
haven't experimented). That fix has a couple of problems though:

1. In the example, "drop domain d1 cascade" would automatically
cascade to the whole partitioned table, including child partitions
of course. This might leave a user sad, if a few terabytes of
valuable data went away; though one could argue that they'd better
have paid more attention to what the cascade cascaded to.

2. It doesn't fix anything for pre-existing tables in pre-v12 branches.

I thought of a different possible approach, which is to move the
"cannot drop column named in partition key" error check from
ATExecDropColumn(), where it is now, to RemoveAttributeById().
That would be back-patchable, but the implication would be that
dropping anything that a partitioning column depends on would be
impossible, even with CASCADE; you'd have to manually drop the
partitioned table first. Good for data safety, but a horrible
violation of expectations, and likely of the SQL spec as well.
I'm not sure we could avoid order-of-traversal problems, either.

Ideally, perhaps, a DROP CASCADE like this would not cascade to
the whole table but only to the table's partitioned-ness property,
leaving you with a non-partitioned table with most of its data
intact. It would take a lot of work to make that happen though,
and it certainly wouldn't be back-patchable, and I'm not really
sure it's worth it.

Thoughts?

regards, tom lane

[1]
https://www.postgresql.org/message-id/flat/CA%2Bu7OA4JKCPFrdrAbOs7XBiCyD61XJxeNav4LefkSmBLQ-Vobg%40mail.gmail.com

Re: Switching PL/Python to Python 3 by default in PostgreSQL 12

2019-07-07 Thread Steven Pousty

The point of the links I sent from the Python community is that they wanted
Python extinct in the wild as of Jan 1 next year. They are never fixing it,
even for a security vulnerability.

It seems to me we roll out breaking changes with major versions. So yes, if
the user chooses to upgrade to 12 and they haven't migrated their code to
Python 2 it might not work.

I don't have a good answer to no changes except regressions. I do hope,
given how much our users expect us to be secure, that we weigh the
consequences of making our default Python a version which is dead to the
community a month or so after Postgresql 12s release. We can certainly take
the stance of leave the Python version be, but it seems that we should then
come up with a plan if there is a security vulnerability found in Python 2
after Jan 1st 2020.

If Python 2 wasn't our *default* choice then I would be much more
comfortable letting this just pass without mention.

All that aside, I think allowing the admin set the default version of
plpythonu to be an excellent idea.

Thanks
Steve

On Sun, Jul 7, 2019, 8:26 AM Tom Lane  wrote:

> Peter Eisentraut  writes:
> > On 2019-07-07 00:34, Steven Pousty wrote:
> >> Why would it be a 13 or later issue?
>
> > Because PostgreSQL 12 is feature frozen and in beta, and this issue is
> > not a regression.
>
> More to the point: it does not seem to me that we should change what
> "plpythonu" means until Python 2 is effectively extinct in the wild.
> Which is surely some years away yet.  If we change it sooner than
> that, the number of people complaining that we broke perfectly good
> installations will vastly outweigh the number of people who are
> happy because we saved them one keystroke per function definition.
>
> As a possibly relevant comparison, I get the impression that most
> packagers of Python are removing the versionless "python" executable
> name and putting *nothing* in its place.  You have to write python2
> or python3 nowadays.  Individuals might still be setting up symlinks
> so that "python" does what they want, but it's not happening at the
> packaging/distro level.
>
> (This comparison suggests that maybe what we should be thinking
> about is a way to make it easier to change what "plpythonu" means
> at the local-opt-in level.)
>
> regards, tom lane
>

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-07 Thread Tomas Vondra


On Sun, Jul 07, 2019 at 09:01:43AM -0400, James Coleman wrote:

On Sun, Jul 7, 2019 at 8:34 AM Tomas Vondra
 wrote:


On Thu, Jul 04, 2019 at 09:29:49AM -0400, James Coleman wrote:
>On Tue, Jun 25, 2019 at 7:22 PM Tomas Vondra
> wrote:
>>
>> On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote:
>> >
>> >Unrelated: if you or someone else you know that's more familiar with
>> >the parallel code, I'd be interested in their looking at the patch at
>> >some point, because I have a suspicion it might not be operating in
>...
>> So I've looked into that, and the reason seems fairly simple - when
>> generating the Gather Merge paths, we only look at paths that are in
>> partial_pathlist. See generate_gather_paths().
>>
>> And we only have sequential + index paths in partial_pathlist, not
>> incremental sort paths.
>>
>> IMHO we can do two things:
>>
>> 1) modify generate_gather_paths to also consider incremental sort for
>> each sorted path, similarly to what create_ordered_paths does
>>
>> 2) modify build_index_paths to also generate an incremental sort path
>> for each index path
>>
>> IMHO (1) is the right choice here, because it automatically does the
>> trick for all other types of ordered paths, not just index scans. So,
>> something like the attached patch, which gives me plans like this:
>...
>> But I'm not going to claim those are total fixes, it's the minimum I
>> needed to do to make this particular type of plan work.
>
>Thanks for looking into this!
>
>I intended to apply this to my most recent version of the patch (just
>sent a few minutes ago), but when I apply it I noticed that the
>partition_aggregate regression tests have several of these failures:
>
>ERROR:  could not find pathkey item to sort
>
>I haven't had time to look into the cause yet, so I decided to wait
>until the next patch revision.
>

I wanted to investigate this today, but I can't reprodure it. How are
you building and running the regression tests?

Attached is a patch adding the incremental sort below gather merge, and
also tweaking the costing. But that's mostly for and better planning
decisions, I don't get any pathkey errors even with the first patch.


On 12be7f7f997debe4e05e84b69c03ecf7051b1d79 (the last patch I sent,
which is based on top of 5683b34956b4e8da9dccadc2e3a53b86104ebb33), I
did this:

patch -p1 < ~/Downloads/parallel-incremental-sort.patch
 (FWIW I configure with ./configure
--prefix=$HOME/postgresql-test --enable-cassert --enable-debug
--enable-depend CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer
-DOPTIMIZER_DEBUG")
make check-world

And I get the attached regression failures.



OK, thanks. Apparently it's the costing changes that make it go away, if
I try just the patch that tweaks generate_gather_paths() I see the same
failures. The failure happens during plan construction, so I think the
costing changes simply mean the path with incremental sort end up not
being the cheapest one (for the problematic queries), but that's just
pure luck - it's definitely an issue that needs fixing.

That error message is triggered in two places in createplan.c, and after
changing them to Assert(false) I get a core dump with this backtrace:

#0  0x702b3328857f in raise () from /lib64/libc.so.6
#1  0x702b33272895 in abort () from /lib64/libc.so.6
#2  0x00a59a9d in ExceptionalCondition (conditionName=0xc52e84 "!(0)", errorType=0xc51f96 
"FailedAssertion", fileName=0xc51fe6 "createplan.c", lineNumber=5937) at assert.c:54
#3  0x007d4ab5 in prepare_sort_from_pathkeys (lefttree=0x2bbbce0, pathkeys=0x2b7a130, relids=0x0, reqColIdx=0x0, adjust_tlist_in_place=false, p_numsortkeys=0x7ffe1abcfd6c, p_sortColIdx=0x7ffe1abcfd60, p_sortOperators=0x7ffe1abcfd58, p_collations=0x7ffe1abcfd50, 
   p_nullsFirst=0x7ffe1abcfd48) at createplan.c:5937

#4  0x007d4e7f in make_incrementalsort_from_pathkeys 
(lefttree=0x2bbbce0, pathkeys=0x2b7a130, relids=0x0, presortedCols=1) at 
createplan.c:6101
#5  0x007cdd3f in create_incrementalsort_plan (root=0x2b787c0, 
best_path=0x2bb92b0, flags=1) at createplan.c:2019
#6  0x007cb7ad in create_plan_recurse (root=0x2b787c0, 
best_path=0x2bb92b0, flags=1) at createplan.c:469
#7  0x007cd778 in create_gather_merge_plan (root=0x2b787c0, 
best_path=0x2bb94a0) at createplan.c:1764
#8  0x007cb8fb in create_plan_recurse (root=0x2b787c0, 
best_path=0x2bb94a0, flags=4) at createplan.c:516
#9  0x007cdf10 in create_agg_plan (root=0x2b787c0, best_path=0x2bb9b28) 
at createplan.c:2115
#10 0x007cb834 in create_plan_recurse (root=0x2b787c0, 
best_path=0x2bb9b28, flags=3) at createplan.c:484
#11 0x007cdc16 in create_sort_plan (root=0x2b787c0, 
best_path=0x2bba1e8, flags=1) at createplan.c:1986
#12 0x007cb78e in create_plan_recurse (root=0x2b787c0, 
best_path=0x2bba1e8, flags=1) at createplan.c:464
#13 0x007cb4ae in create_plan (root=0x2b787c0, best_path=0x2bba1e8) at 
createplan.c:330
#14 0x007db63c in standard

Re: SQL/JSON path issues/questions

2019-07-07 Thread Alexander Korotkov

On Thu, Jul 4, 2019 at 4:38 PM Liudmila Mantrova
 wrote:
> Thank  you!
>
> I think we can make this sentence even shorter, the fix is attached:
>
> "To refer to a JSON element stored at a lower nesting level, add one or
> more accessor operators after @."

Thanks, looks good to me.  Attached revision of patch contains commit
message.  I'm going to commit this on no objections.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0004-clarify-jsonpath-docs-5.patch
Description: Binary data

50 matches

Mail list logo