date:20190603

Re: How do we support FULL JOIN on PostGIS types?

2019-06-03 Thread Komяpa

Hi,

Thanks a lot RhodiumToad on IRC for suggestion of setting HASHES, MERGES on
OPERATOR =.

Now we have other problem: how do we set these flags on upgrade to new
version of extension? Dropping an OPERATOR = will drop all indexes an views
depending on it so isn't really an option.

Also, if someone can sneak "ERROR: FULL JOIN is only supported with
merge-joinable or hash-joinable join conditions" keywords into
https://www.postgresql.org/docs/current/xoper-optimization.html#id-1.8.3.17.8
it would greatly help future extension writers - it's not possible to
google this page out by the error message.

On Thu, May 16, 2019 at 7:05 PM Darafei "Komяpa" Praliaskouski <
m...@komzpa.net> wrote:

> Hi!
>
> Greetings from OSGeo Code sprint in Minneapolis :)
>
> We're trying to make FULL JOIN on equality of geometry and can't figure
> out why it doesn't work.
>
> Here's reproducer, it works on bytea but not on PostGIS geometry throwing
> out
>
> ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable
> join conditions
>
> https://trac.osgeo.org/postgis/ticket/4394
>
> We already have a btree opclass with equality:
>
> https://github.com/postgis/postgis/blob/svn-trunk/postgis/postgis.sql.in#L420
>
>
> We also have hash equality opclass:
>
> https://github.com/postgis/postgis/blob/svn-trunk/postgis/postgis.sql.in#L440
>
>
> Reading through Postgres documentation I can't figure out what else shall
> we do for this join to work. How do we make it work?
>
> --
> Darafei Praliaskouski
> Support me: http://patreon.com/komzpa
>

-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-06-03 Thread James Coleman

On Sun, Jun 2, 2019 at 5:18 PM Tomas Vondra
 wrote:
> Thanks for the rebase! I think the patch is in pretty good shape - I'm
> sure we'll find ways to make it more efficient etc. but IMO that's fine
> and should not prevent getting it committed.

Thank you for the in-depth review!

> Currently, the costing logic (cost_incremental_sort) assumes all groups
> have the same size, which is fine for uniform distributions. But for
> skewed distributions, that may be an issue.
>
> Consider for example a table with 1M rows, two columns, 100 groups in each
> column, and index on the first one.
>
> CREATE table t (a INT, b INT);
>
> INSERT INTO t SELECT 100*random(), 100*random()
>   FROM generate_series(1,100);
>
> Now, let's do a simple limit query to find the first row:
>
> SELECT * FROM t ORDER BU a, b LIMIT 1;
>
> In this case the current costing logic is fine - the groups are close to
> average size, and we only need to sort the first group, i.e. 1% of data.
>
> Now, let's say the first group is much larger:
>
>
> INSERT INTO t SELECT 0, 100*random()
>   FROM generate_series(1,90) s(i);
>
> INSERT INTO t SELECT 100*random(), 100*random()
>   FROM generate_series(1,10);
>
> That is, the first group is roughly 90% of data, but the number of groups
> is still the same. But we need to scan 90% of data. But the average group
> size is still the same, so the current cost model is oblivious to this.

Thinking out loud here: the current implementation doesn't guarantee
that sort groups always have the same prefix column values because
(from code comments) "Sorting many small groups with tuplesort is
inefficient). While this seems a reasonable optimization, I think it's
possible that thinking steered away from an optimization in the
opposite direction. Perhaps we should always track whether or not all
prefix tuples are the same (currently we only do that after reaching
DEFAULT_MIN_GROUP_SIZE tuples) and use that information to be able to
have tuplesort only care about the non-prefix columns (where currently
it has to sort on all pathkey columns even though for a large group
the prefix columns are guaranteed to be equal).

Essentially I'm trying to think of ways that would get us to
comparable performance with regular sort in the case of large batch
sizes.

One other thing about the DEFAULT_MIN_GROUP_SIZE logic is that in the
case where you have a very small group and then a very large batch,
we'd lose the ability to optimize in the above way. That makes me
wonder if we shouldn't intentionally optimize for the possibility of
large batch sizes, since a little extra expense per group/tuple is
more likely to be a non-concern with small groups anyway when there
are large numbers of input tuples but a relatively small limit.

Thoughts?

> So I think we should look at the MCV list, and use that information when
> computing the startup/total cost. I think using the first/largest group to
> compute the startup cost, and the average group size for total cost would
> do the trick.

I think this sounds very reasonable.

> I don't think we can do much better than this during planning. There will
> inevitably be cases where the costing model will push us to do the wrong
> thing, in either direction. The question is how serious issue that is, and
> whether we could remedy that during execution somehow.
>
> When we "incorrectly" pick full sort (when the incremental sort would be
> faster), that's obviously sad but I think it's mostly fine because it's
> not a regression.
>
> The opposite direction (picking incremental sort, while full sort being
> faster) is probably more serious, because it's a regression between
> releases.
>
> I don't think we can fully fix that by refining the cost model. We have
> two basic options:
>
> 1) Argue/show that this is not an issue in practice, because (a) such
> cases are very rare, and/or (b) the regression is limited. In short, the
> benefits of the patch outweight the losses.

My comments above go in this direction. If we can improve performance
in the worst case, I think it's plausible this concern becomes a
non-issue.

> 2) Provide some fallback at execution time. For example, we might watch
> the size of the group, and if we run into an unexpectedly large one we
> might abandon the incremental sort and switch to a "full sort" mode.

Are there good examples of our doing this in other types of nodes
(whether the fallback is an entirely different algorithm/node type)? I
like this idea in theory, but I also think it's likely it would add a
very significant amount of complexity. The other problem is knowing
where to draw the line: you end up creating these kinds of cliffs
where pulling one more tuple through the incremental sort would give
you your batch and result in not having to pull many more tuples in a
regular sort node, but the fallback logic kicks in anyway.

Unrelated to all of the above: if I read the patch properly it
intentionally excludes backwa

Re: How do we support FULL JOIN on PostGIS types?

2019-06-03 Thread Tom Lane

=?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?=  writes:
> Thanks a lot RhodiumToad on IRC for suggestion of setting HASHES, MERGES on
> OPERATOR =.
> Now we have other problem: how do we set these flags on upgrade to new
> version of extension? Dropping an OPERATOR = will drop all indexes an views
> depending on it so isn't really an option.

I think you're going to have to use a direct UPDATE on pg_operator in
the extension update script :-(.  Perhaps ALTER OPERATOR should be able
to handle changing these flags, but for now it can't.

regards, tom lane

Binary support for pgoutput plugin

2019-06-03 Thread Dave Cramer

Is there a reason why pgoutput sends data in text format? Seems to me that
sending data in binary would provide a considerable performance improvement.


Dave Cramer

Re: Dead stores in src/common/sha2.c

2019-06-03 Thread Tom Lane

Heikki Linnakangas  writes:
> On 29/05/2019 18:47, Hamlin, Garick L wrote:
>> On Wed, May 29, 2019 at 11:01:05AM -0400, Tom Lane wrote:
>>> At the same time, I'm not sure if we should just write this off as an
>>> ignorable warning.  If the C compiler concludes these are dead stores
>>> it'll probably optimize them away, leading to not accomplishing the
>>> goal of wiping the values.

>> Yeah, I mean it's odd to put code in to zero/hide state knowing it's
>> probably optimized out.
>> We could also take it out, but maybe it does help somewhere?
>> ... or put in a comment that says: This probably gets optimized away, but
>> we don't consider it much of a risk.

> There's a function called explicit_bzero() in glibc, for this purpose. 
> See 
> https://www.gnu.org/software/libc/manual/html_node/Erasing-Sensitive-Data.html.
>  
> It's not totally portable, but it's also available in some BSDs, at 
> least. That documentation mentions the possibility that it might force 
> variables to be stored in memory that would've otherwise been kept only 
> in registers, but says that in most situations it's nevertheless better 
> to use explicit_bero() than not. So I guess we should use that, if it's 
> available.

Meh.  After looking at this closer, I'm convinced that doing anything
that might force the variables into memory would be utterly stupid.
Aside from any performance penalty we'd take, that would make their
values more observable not less so.

In any case, though, it's very hard to see why we should care in the
least.  Somebody who can observe the contents of server memory (or
application memory, in the case of frontend usage) can surely extract
the input or output of the SHA2 transform, which is going to be way
more useful than a few intermediate values.

So I think we should either do nothing, or suppress this warning by
commenting out the variable-zeroing.

(Note that an eyeball scan finds several more dead zeroings besides
the ones in Garick's patch.  Some of them are ifdef'd out ...)

regards, tom lane

WITH NOT MATERIALIZED and DML CTEs

2019-06-03 Thread Elvis Pranskevichus

Currently, WITH a AS NOT MATERIALIZED (INSERT ...) would silently 
disregard the "NOT MATERIALIZED" instruction and execute the data-
modifying CTE to completion (as per the long-standing DML CTE rule).

This seems like an omission to me.  Ideally, the presence of an explicit 
"NOT MATERIALIZED" clause on a data-modifying CTE should disable the 
"run to completion" logic.

It is understandably late in the 12 cycle, so maybe prohibit NOT 
MATERIALIZED with DML altogheter and revisit this in 13?

Thoughts?

   Elvis

Re: WITH NOT MATERIALIZED and DML CTEs

2019-06-03 Thread Andres Freund

Hi,

On 2019-06-03 11:45:51 -0400, Elvis Pranskevichus wrote:
> Currently, WITH a AS NOT MATERIALIZED (INSERT ...) would silently 
> disregard the "NOT MATERIALIZED" instruction and execute the data-
> modifying CTE to completion (as per the long-standing DML CTE rule).
> 
> This seems like an omission to me.  Ideally, the presence of an explicit 
> "NOT MATERIALIZED" clause on a data-modifying CTE should disable the 
> "run to completion" logic.

I don't see us ever doing that. The result of minor costing and other
planner changes would yield different updated data. That'll just create
endless bug reports.


> It is understandably late in the 12 cycle, so maybe prohibit NOT 
> MATERIALIZED with DML altogheter and revisit this in 13?

I could see us adding an error, or just continuing to silently ignore
it.

Greetings,

Andres Freund

undo: zedstore vs. zheap

2019-06-03 Thread Robert Haas

Hi,

On Saturday, we had a nice in-person conversation about the
requirements that zedstore has for an undo facility vs. the
requirements that zheap has vs. the current design of the undo patch
set. The people present were: Heikki Linnakangas, Amit Kapila, Thomas
Munro, Kuntal Ghosh, Andres Freund, and me.  The remainder of this
email consists of the notes that I took during that conversation, with
some subsequent editing that I did to try to make them more broadly
understandable.

zedstore needs a very fast way to determine whether an undo pointer is
old enough that it doesn't matter.  Perhaps we should keep a
backend-local cache of discard pointers so that we can test an undo
pointer against the discard horizon without needing to touch shared
memory.

zedstore needs a very fast way of obtaining the XID when it looks up
an undo pointer.  It seems inefficient to store the full XID
(especially an 8-byte FullTransactionId) in every undo record, but
storing it only in the transaction header makes getting the XID
prohibitively expensive.  Heikki had the idea of storing a flag in
each undo record saying 'same X as first tuple as on the page', where
X might be XID, CID, relation OID, block number, etc.  That seems like
a good idea.  We'd need to decide exactly how many such flags to have
and which fields they cover; and it's probably best not to depend on
an undo record for another transaction that might be independently
discarded. Another option is to put a "default" for each of these
values in the page header and then store a value in individual undo
records only if it differs from the value in the page header.  We
don't have a way to do similar optimization for whatever individual
clients of the undo machinery choose to store in the undo record's
payload, which might be nice if we had a good idea how to do it.

zedstore intends to store an undo pointer per tuple, whereas the
current zheap code stores an undo pointer per transaction slot.
Therefore, zheap wants an undo record to point to the previous undo
record for that transaction slot; whereas zedstore wants an undo
record to point to the previous undo record for the same tuple (which
might not belong to the same transaction).  The undo code that assumes
per-block chaining (uur_blkprev) needs to be made more generic.

For either zedstore or zheap, newly-inserted tuples could potentially
point to an XID/CID rather than an undo record pointer, because
visibility checks don't need to look at the actual undo record.  An
undo record still needs to be written in case we abort, but we don't
need to be able to refer to it for visibility purposes.

zedstore and zheap have different batching requirements.  Both want to
process undo records in TID order, but zedstore doesn't care about
pages.  The current undo patch set calls the RMGR-specific callback
once per page; that needs to be generalized.  Is it possible that some
other table AM might want to sort by something other than TID?

Right now, zedstore stores a separate btree for each column, and an
extra btree for the visibility information, but it could have column
groups instead.  If you put all of the columns and the visibility
information into a single column group, you'd have a row store.  How
would the performance of that solution compare with zheap?

Maybe zheap should forget about having transaction slots, and just
store an undo pointer in each tuple.  That would be worse in cases
such as bulk loading, where all the tuples on the page are modified by
the same transaction ID.  We could optimize that case by using
something like a transaction slot, but then there's a problem if, say,
every tuple on the page is updated or deleted by a separate
transaction: the tuples need to get bigger to accommodate separate
undo pointers, and the page might overflow.  Perhaps we should design
a solution that allows for the temporary use of overflow space in such
cases.

Tuple locking is complicated for both zheap and zedstore.  Storing the
locker XIDs in the tuple or page is tempting, but you can run out of
space; where do you put the extras?  Writing an undo record per lock
is also tempting, but that could generate a very large amount of undo
if there are many transactions that are each locking a whole table,
whereas the heap doesn't have that problem, because it uses
MultiXacts.  On the other hand, in the current heap, it's easily
possible for N transactions to burn through O(N^2) MultiXactIds, which
is worse than an undo record per lock.

A perfect system would avoid permanently bloating the table when many
transactions each take many locks, but it's hard to design a system
that has that property AND ALSO is maximally compact for a table that
is bulk-loaded and then never updated again.  (Perhaps it's OK for the
table to expand a little bit if we find that we need to make space for
a MultiXactId pointer or similar in each tuple?  Thomas later dubbed
this doubtless-uncontroversial design non-in-place select, and we're
all

Re: WITH NOT MATERIALIZED and DML CTEs

2019-06-03 Thread Elvis Pranskevichus

On Monday, June 3, 2019 11:50:15 A.M. EDT Andres Freund wrote:
> > This seems like an omission to me.  Ideally, the presence of an
> > explicit "NOT MATERIALIZED" clause on a data-modifying CTE should
> > disable the "run to completion" logic.
> 
> I don't see us ever doing that. The result of minor costing and other
> planner changes would yield different updated data. That'll just
> create endless bug reports.

I understand why the rule exists in the first place, but I think that an 
explicit opt-in signals the assumption of responsibility and opens the 
possibility of using this in a well-defined evaluation context, such as 
CASE WHEN.

  Elvis

Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-06-03 Thread Robert Haas

On Fri, May 31, 2019 at 2:59 AM Antonin Houska  wrote:
> > Sounds good.  I'm not quite sure how this is going to work.  Ideally
> > you'd like the encryption key command to fetch the key from something
> > like ssh-agent, or maybe pop up a window on the user's terminal with a
> > key prompt.  Just reading from stdin and writing to stdout is not
> > going to be very convenient...
>
> When I mentioned writing to stdout in my previous email, I viewed it from the
> perspective of postmaster: whichever way the command gets the password from
> the DBA, it'll always write it to stdout and postmaster will read it from
> there. This was to address your objection that the executables do not actually
> "return" strings.

So the part about returning strings is really just a wording issue:
the documentation needs to talk about data sent to stdout or wherever,
because that's what really happens, and somebody writing such a
command from scratch needs to know what it must do.

What I'm talking about here is that it also has to be reasonably
possible to write an encryption key command that does something
useful.  I don't have a really clear vision for how that's going to
work.  Nobody wants the server, which is probably being launched by
pg_ctl or systemd or both, to prompt using its own stdin/stderr, but
the we need to think about how the prompting is actually going to
work.  One idea is to do it via the FEBE protocol: connect to the
server using libpq, issue a new command that causes the server to
enter COPY mode, and then send the encryption key as COPY data.
However, that idea doesn't really work, because we've got to be able
to get the key before we run recovery or reach consistency, so the
server won't be listening for connections at that point.  Also, we've
got to have a way for this to work in single-user mode, which also
can't listen for connections.  It's probably all fine if your
encryption key command is something like 'curl
https://my.secret.keyserver.me/sekcret.key' because then there's an
external server which you can just go access - but I don't quite
understand how you'd do interactive prompting from here.  Sorry to get
hung up on what may seem like a minor point, but I think it's actually
fairly fundamental: we've got to have a clear vision of how end-users
will really be able to make use of this.

> The header comment in pg_upgrade.c indicates that this is because of TOAST. So
> I think that dbnode and relfilenode are not preserved just because there was
> no strong reason to do so by now.

Maybe you want to propose an independent patch making that change?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WITH NOT MATERIALIZED and DML CTEs

2019-06-03 Thread Tom Lane

Elvis Pranskevichus  writes:
> On Monday, June 3, 2019 11:50:15 A.M. EDT Andres Freund wrote:
>>> This seems like an omission to me.  Ideally, the presence of an
>>> explicit "NOT MATERIALIZED" clause on a data-modifying CTE should
>>> disable the "run to completion" logic.

>> I don't see us ever doing that. The result of minor costing and other
>> planner changes would yield different updated data. That'll just
>> create endless bug reports.

> I understand why the rule exists in the first place, but I think that an 
> explicit opt-in signals the assumption of responsibility and opens the 
> possibility of using this in a well-defined evaluation context, such as 
> CASE WHEN.

TBH, if you think it's well-defined, you're wrong.  I concur with
Andres that turning off run-to-completion for DMLs would be disastrous.
For just one obvious point, what about firing AFTER triggers?

It's already the case that the planner will silently ignore NOT
MATERIALIZED for other cases where it can't inline the CTE for semantic
or implementation reasons -- see comments in SS_process_ctes().  I see
no good reason to treat the DML exception much differently from other
exceptions, such as presence of volatile functions or recursion.

regards, tom lane

Re: WITH NOT MATERIALIZED and DML CTEs

2019-06-03 Thread Elvis Pranskevichus

On Monday, June 3, 2019 12:09:46 P.M. EDT Tom Lane wrote:
> > I understand why the rule exists in the first place, but I think
> > that an explicit opt-in signals the assumption of responsibility
> > and opens the possibility of using this in a well-defined
> > evaluation context, such as CASE WHEN.
> 
> TBH, if you think it's well-defined, you're wrong.

The documentation seems to strongly suggest otherwise:

"When it is essential to force evaluation order, a CASE construct (see 
Section 9.17) can be used. ... CASE construct used in this fashion will 
defeat optimization attempts"

Are there cases where this is not true outside of the documented 
exceptions (i.e. immutable early-eval and aggregates)?

 Elvis

Re: WITH NOT MATERIALIZED and DML CTEs

2019-06-03 Thread Tom Lane

Elvis Pranskevichus  writes:
> On Monday, June 3, 2019 12:09:46 P.M. EDT Tom Lane wrote:
>>> I understand why the rule exists in the first place, but I think
>>> that an explicit opt-in signals the assumption of responsibility
>>> and opens the possibility of using this in a well-defined
>>> evaluation context, such as CASE WHEN.

>> TBH, if you think it's well-defined, you're wrong.

> The documentation seems to strongly suggest otherwise:

> "When it is essential to force evaluation order, a CASE construct (see 
> Section 9.17) can be used. ... CASE construct used in this fashion will 
> defeat optimization attempts"

> Are there cases where this is not true outside of the documented 
> exceptions (i.e. immutable early-eval and aggregates)?

CASE is a scalar-expression construct.  It's got little to do with
the timing of scan/join operations such as row fetches.  We offer
users essentially no control over when those happen ... other than
the guarantees about CTE materialization, which are exactly what
you say you want to give up.

regards, tom lane

Re: Fix inconsistencies for v12

2019-06-03 Thread Tom Lane

Amit Langote  writes:
> On 2019/05/30 18:51, Amit Kapila wrote:
>> I think it will be better to include postgres_fdw in the comment in
>> some way so that if someone wants a concrete example, there is
>> something to refer to.

> Maybe a good idea, but this will be the first time to mention postgres_fdw
> in the core source code.  If you think that's OK, how about the attached?

This wording seems fine to me.

Now that we've beat that item into the ground ... there were a bunch of
other tweaks suggested in Alexander's initial email.  Amit (K), were you
going to review/commit those?

regards, tom lane

Re: WITH NOT MATERIALIZED and DML CTEs

2019-06-03 Thread Elvis Pranskevichus

On Monday, June 3, 2019 1:03:44 P.M. EDT Tom Lane wrote:
> CASE is a scalar-expression construct.  It's got little to do with
> the timing of scan/join operations such as row fetches.  We offer
> users essentially no control over when those happen ... other than
> the guarantees about CTE materialization, which are exactly what
> you say you want to give up.

In the general case, yes, but I *can* use a scalar-returning INSERT CTE 
in a THEN clause as a subquery.  Useful for a conditional INSERT, when 
you can't use ON CONFLICT.

Anyway, I understand that the complications are probably not worth it.

Thanks,

 Elvis

Re: Index Skip Scan

2019-06-03 Thread Jesper Pedersen


Hi Floris,

On 6/1/19 12:10 AM, Floris Van Nee wrote:

Given a table definition of (market text, feedcode text, updated_at 
timestamptz, value float8) and an index on (market, feedcode, updated_at desc) 
(note that this table slightly deviates from what I described in my previous 
mail) and filling it with data.


The following query uses an index skip scan and returns just 1 row (incorrect!)

select distinct on (market, feedcode) market, feedcode
from streams.base_price
where market='TEST'

The following query still uses the regular index scan and returns many more 
rows (correct)
select distinct on (market, feedcode) *
from streams.base_price
where market='TEST'


It seems that partially filtering on one of the distinct columns triggers 
incorrect behavior where too many rows in the index are skipped.




Thanks for taking a look at the patch, and your feedback on it.

I'll def look into this once I'm back from my travels.

Best regards,
 Jesper

Re: Index Skip Scan

2019-06-03 Thread Jesper Pedersen


Hi Rafia,

On 6/1/19 6:03 AM, Rafia Sabih wrote:

Here is my repeatable test case,

create table t (market text, feedcode text, updated_at timestamptz,
value float8) ;
create index on t (market, feedcode, updated_at desc);
insert into t values('TEST', 'abcdef', (select timestamp '2019-01-10
20:00:00' + random() * (timestamp '2014-01-20 20:00:00' - timestamp
'2019-01-20 20:00:00') ), generate_series(1,100)*9.88);
insert into t values('TEST', 'jsgfhdfjd', (select timestamp
'2019-01-10 20:00:00' + random() * (timestamp '2014-01-20 20:00:00' -
timestamp '2019-01-20 20:00:00') ), generate_series(1,100)*9.88);

Now, without the patch,
select distinct on (market, feedcode) market, feedcode from t  where
market='TEST';
  market | feedcode
+---
  TEST   | abcdef
  TEST   | jsgfhdfjd
(2 rows)
explain select distinct on (market, feedcode) market, feedcode from t
where market='TEST';
QUERY PLAN

  Unique  (cost=12.20..13.21 rows=2 width=13)
->  Sort  (cost=12.20..12.70 rows=201 width=13)
  Sort Key: feedcode
  ->  Seq Scan on t  (cost=0.00..4.51 rows=201 width=13)
Filter: (market = 'TEST'::text)
(5 rows)

And with the patch,
select distinct on (market, feedcode) market, feedcode from t   where
market='TEST';
  market | feedcode
+--
  TEST   | abcdef
(1 row)

explain select distinct on (market, feedcode) market, feedcode from t
where market='TEST';
QUERY PLAN

  Index Only Scan using t_market_feedcode_updated_at_idx on t
(cost=0.14..0.29 rows=2 width=13)
Scan mode: Skip scan
Index Cond: (market = 'TEST'::text)
(3 rows)

Notice that in the explain statement it shows correct number of rows
to be skipped.



Thanks for your test case; this is very helpful.

For now, I would like to highlight that

 SET enable_indexskipscan = OFF

can be used for testing with the patch applied.

Dmitry and I will look at the feedback provided.

Best regards,
 Jesper

Re: Pinned files at Windows

2019-06-03 Thread Robert Haas

On Thu, May 30, 2019 at 3:25 AM Konstantin Knizhnik
 wrote:
> If call of stat() is succeed, then my assumption is that the only reason
> of GetFileAttributesEx
> failure is that file is deleted and returning ENOENT error code in this
> case is correct behavior.

In my experience, the assumption "the only possible cause of an error
during X is Y" turns out to be wrong nearly 100% of the time.  Our job
is to report the errors the OS gives us, not guess what they mean.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #15821: Parallel Workers with functions and auto_explain: ERROR: could not find key 3 in shm TOC

2019-06-03 Thread Tom Lane

PG Bug reporting form  writes:
> We have enabled auto_explain and see errors on PostgreSQL 11.3 when
> SELECTing from a user defined function. No such crashes have been
> observed on 10.7.

I think that you didn't give a complete dump of relevant settings,
but after some fooling around I was able to reproduce this error,
and the cause is this: auto_explain hasn't a single clue about
parallel query.

1. In the parent process, we have a parallelizable hash join being
executed in a statement inside a function.  Since
auto_explain.log_nested_statements is not enabled, auto_explain
does not deem that it should trace the statement, so the query
starts up with estate->es_instrument = 0, and therefore
ExecHashInitializeDSM chooses not to create any shared
SharedHashInfo area.

2. In the worker processes, auto_explain manages to grab execution
control when ParallelQueryMain calls ExecutorStart, thanks to being
in ExecutorStart_hook.  Having no clue what's going on, it decides
that this is a new top-level query that it should trace, and it
sets some bits in queryDesc->instrument_options.

3. When the workers get to ExecHashInitializeWorker, they see that
instrumentation is active so they try to look up the SharedHashInfo.
Kaboom.

I'm inclined to think that explain_ExecutorStart should simply
keep its hands off of everything when in a parallel worker;
if instrumentation is required, that'll be indicated by options
passed down from the parent process.  It looks like this could
conveniently be merged with the rate-sampling logic by forcing
current_query_sampled to false when IsParallelWorker().

Likely this should be back-patched all the way to 9.6.  I'm
not sure how we managed to avoid noticing it before now,
but there are probably ways to cause visible trouble in
any release that has any parallel query support.

regards, tom lane

Fix runtime errors from -fsanitize=undefined

2019-06-03 Thread Peter Eisentraut

After many years of trying, it seems the -fsanitize=undefined checking
in gcc is now working somewhat reliably.  Attached is a patch that fixes
all errors of the kind

runtime error: null pointer passed as argument N, which is declared to
never be null

Most of the cases are calls to memcpy(), memcmp(), etc. with a length of
zero, so one appears to get away with passing a null pointer.

Note that these are runtime errors, not static analysis, so the code in
question is actually reached.

To reproduce, configure normally and then set

COPT=-fsanitize=undefined -fno-sanitize=alignment -fno-sanitize-recover=all

and build and run make check-world.  Unpatched, this will core dump in
various places.

(-fno-sanitize=alignment should also be fixed but I took it out here to
deal with it separately.)

See https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html for
further documentation.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 4b657d1af4a8518218be207024b123cfa6d799d8 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 3 Jun 2019 20:55:35 +0200
Subject: [PATCH] Fix runtime errors from -fsanitize-undefined

These are of the form

runtime error: null pointer passed as argument N, which is declared to never be 
null
---
 contrib/pgcrypto/px.c   |  2 +-
 src/backend/access/heap/heapam.c|  2 +-
 src/backend/access/heap/heapam_visibility.c |  2 ++
 src/backend/access/transam/clog.c   |  1 +
 src/backend/access/transam/xact.c   |  5 +++--
 src/backend/commands/indexcmds.c|  3 ++-
 src/backend/utils/cache/relcache.c  |  8 ++--
 src/backend/utils/misc/guc.c|  2 +-
 src/backend/utils/time/snapmgr.c| 10 ++
 src/fe_utils/print.c|  3 ++-
 10 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/contrib/pgcrypto/px.c b/contrib/pgcrypto/px.c
index 0f02fb56c4..ec9cf99086 100644
--- a/contrib/pgcrypto/px.c
+++ b/contrib/pgcrypto/px.c
@@ -200,7 +200,7 @@ combo_init(PX_Combo *cx, const uint8 *key, unsigned klen,
memset(ivbuf, 0, ivs);
if (ivlen > ivs)
memcpy(ivbuf, iv, ivs);
-   else
+   else if (ivlen > 0)
memcpy(ivbuf, iv, ivlen);
}
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 419da8784a..52e3782282 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -307,7 +307,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool 
keep_startblock)
/*
 * copy the scan key, if appropriate
 */
-   if (key != NULL)
+   if (key != NULL && scan->rs_base.rs_nkeys)
memcpy(scan->rs_base.rs_key, key, scan->rs_base.rs_nkeys * 
sizeof(ScanKeyData));
 
/*
diff --git a/src/backend/access/heap/heapam_visibility.c 
b/src/backend/access/heap/heapam_visibility.c
index 537e681b23..9c291cd1ab 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1520,6 +1520,8 @@ HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
 static bool
 TransactionIdInArray(TransactionId xid, TransactionId *xip, Size num)
 {
+   if (!xip)
+   return false;
return bsearch(&xid, xip, num,
   sizeof(TransactionId), xidComparator) != 
NULL;
 }
diff --git a/src/backend/access/transam/clog.c 
b/src/backend/access/transam/clog.c
index 47db7a8a88..4324229a61 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -294,6 +294,7 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 * on the same page.  Check those conditions, too.
 */
if (all_xact_same_page && xid == MyPgXact->xid &&
+   nsubxids > 0 &&
nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT &&
nsubxids == MyPgXact->nxids &&
memcmp(subxids, MyProc->subxids.xids,
diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index f1108ccc8b..976ab5a950 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5201,8 +5201,9 @@ SerializeTransactionState(Size maxsize, char 
*start_address)
{
if (FullTransactionIdIsValid(s->fullTransactionId))
workspace[i++] = 
XidFromFullTransactionId(s->fullTransactionId);
-   memcpy(&workspace[i], s->childXids,
-  s->nChildXids * sizeof(TransactionId));
+   if (s->nChildXids > 0)
+   memcpy(&workspace[i], s->childXids,
+  s->nChildXids * sizeof(TransactionId));
i += s->nChildXids;
}
Assert(i == nxids);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/co

Re: Question about some changes in 11.3

2019-06-03 Thread Tom Lane

Mat Arye  writes:
> On Fri, May 24, 2019 at 5:05 PM Mat Arye  wrote:
>> 11.3 included some change to partition table planning. Namely
>> commit 925f46f ("Fix handling of targetlist SRFs when scan/join relation is
>> known empty.") seems to redo all paths for partitioned tables
>> in apply_scanjoin_target_to_paths. It clears the paths in:
>> 
>> ```
>> if (rel_is_partitioned)
>> rel->pathlist = NIL
>> ```
>> 
>> Then the code rebuild the paths. However, the rebuilt append path never
>> gets the
>> set_rel_pathlist_hook called. Thus, the work that hook did previously gets
>> thrown away and the rebuilt append path can never be influenced by this
>> hook. Is this intended behavior? Am I missing something?

Hm.  I'd say this was already broken by the invention of
apply_scanjoin_target_to_paths; perhaps 11-before-11.3 managed to
still work for you, but it's not hard to envision applications of
set_rel_pathlist_hook for which it would not have worked.  The contract
for set_rel_pathlist_hook is supposed to be that it gets to editorialize
on all normal (non-Gather) paths created by the core code, and that's
no longer the case now that apply_scanjoin_target_to_paths can add more.

> I've  attached a small patch to address this discrepancy for when the
> set_rel_pathlist_hook is called so that's it is called for actual paths
> used for partitioned rels. Please let me know if I am misunderstanding how
> this should be handled.

I'm not very happy with this patch either, as it makes the situation
even more confused, not less so.  The best-case scenario is that the
set_rel_pathlist_hook runs twice and does useless work; the worst case
is that it gets confused completely by being called twice for the same
rel.  I think we need to maintain the invariant that that hook is
called exactly once per baserel.

I wonder whether we could fix matters by postponing the
set_rel_pathlist_hook call till later in the same cases where
we postpone generate_gather_paths, ie, it's the only baserel.

That would make its name pretty misleading, though.  Maybe we
should leave it alone and invent a separate hook to be called
by/after apply_scanjoin_target_to_paths?  Although I don't
know if it'd be useful to add a new hook to v11 at this point.
Extensions would have a hard time knowing if they could use it.

regards, tom lane

Re: Index Skip Scan

2019-06-03 Thread Dmitry Dolgov

> On Sat, Jun 1, 2019 at 6:57 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Sat, Jun 1, 2019 at 5:34 PM Floris Van Nee  
> > wrote:
> >
> > I did a little bit of investigation and it seems to occur because in
> > pathkeys.c the function pathkey_is_redundant considers pathkeys redundant if
> > there is an equality condition with a constant in the corresponding WHERE
> > clause.
> > ...
> > However, the index skip scan interprets this as that it has to skip over 
> > just
> > the first column.
>
> Right, passing correct number of columns fixes this particular problem. But
> while debugging I've also discovered another related issue, when the current
> implementation seems to have a few assumptions, that are not correct if we 
> have
> an index condition and a distinct column is not the first in the index. I'll
> try to address these in a next version of the patch in the nearest future.

So, as mentioned above, there were a few problems, namely the number of
distinct_pathkeys with and without redundancy, and using _bt_search when the
order of distinct columns doesn't match the index. As far as I can see the
problem in the latter case (when we have an index condition) is that it's still
possible to find a value, but lastItem value after the search is always zero
(due to _bt_checkkeys filtering) and _bt_next stops right away.

To address this, probably we can do something like in the attached patch.
Altogether with distinct_pathkeys uniq_distinct_pathkeys are stored, which is
the same, but without the constants elimination. It's being used then for
getting the real number of distinct keys, and to check the order of the columns
to not consider index skip scan if it's different. Hope it doesn't
look too hacky.

Also I've noticed, that the current implementation wouldn't work e.g. for:

select distinct a, a from table;

because in this case an IndexPath is hidden behind a ProjectionPath. For now I
guess it's fine, but probably it's possible here to apply skip scan too.

v17-0001-Index-skip-scan.patch
Description: Binary data

Re: Pinned files at Windows

2019-06-03 Thread Konstantin Knizhnik





On 03.06.2019 22:15, Robert Haas wrote:

On Thu, May 30, 2019 at 3:25 AM Konstantin Knizhnik
 wrote:

If call of stat() is succeed, then my assumption is that the only reason
of GetFileAttributesEx
failure is that file is deleted and returning ENOENT error code in this
case is correct behavior.

In my experience, the assumption "the only possible cause of an error
during X is Y" turns out to be wrong nearly 100% of the time.  Our job
is to report the errors the OS gives us, not guess what they mean.


This is what we are try to do now:

    r = stat(path, buf);
    if (r < 0)
    {
        if (GetLastError() == ERROR_DELETE_PENDING)
        {
            /*
             * File has been deleted, but is not gone from the 
filesystem yet.
             * This can happen when some process with FILE_SHARE_DELETE 
has it
             * open and it will be fully removed once that handle is 
closed.

             * Meanwhile, we can't open it, so indicate that the file just
             * doesn't exist.
             */
            errno = ENOENT;
            return -1;
        }

        return r;
    }


but without success because ERROR_DELETE_PENDING is never returned by Win32.
And moreover, stat() doesn't ever return error in this case.

Re: Sort support for macaddr8

2019-06-03 Thread Chapman Flack

On 6/3/19 3:23 PM, Melanie Plageman wrote:
> Peter and I implemented this small (attached) patch to extend
> abbreviated key compare sort to macaddr8 datatype (currently supported
> for macaddr).

Am I going cross-eyed, or would the memset be serving more of a purpose
if it were in the SIZEOF_DATUM != 8 branch?

Regards,
-Chap

Re: Avoiding hash join batch explosions with extreme skew and weird stats

2019-06-03 Thread Melanie Plageman

On Sun, May 19, 2019 at 4:07 PM Thomas Munro  wrote:

> On Sat, May 18, 2019 at 12:15 PM Melanie Plageman
>  wrote:
> > On Thu, May 16, 2019 at 3:22 PM Thomas Munro 
> wrote:
> >> Admittedly I don't have a patch, just a bunch of handwaving.  One
> >> reason I haven't attempted to write it is because although I know how
> >> to do the non-parallel version using a BufFile full of match bits in
> >> sync with the tuples for outer joins, I haven't figured out how to do
> >> it for parallel-aware hash join, because then each loop over the outer
> >> batch could see different tuples in each participant.  You could use
> >> the match bit in HashJoinTuple header, but then you'd have to write
> >> all the tuples out again, which is more IO than I want to do.  I'll
> >> probably start another thread about that.
> >
> > Could you explain more about the implementation you are suggesting?
> >
> > Specifically, what do you mean "BufFile full of match bits in sync with
> the
> > tuples for outer joins?"
>
> First let me restate the PostgreSQL terminology for this stuff so I
> don't get confused while talking about it:
>
> * The inner side of the join = the right side = the side we use to
> build a hash table.  Right and full joins emit inner tuples when there
> is no matching tuple on the outer side.
>
> * The outer side of the join = the left side = the side we use to
> probe the hash table.  Left and full joins emit outer tuples when
> there is no matching tuple on the inner side.
>
> * Semi and anti joins emit exactly one instance of each outer tuple if
> there is/isn't at least one match on the inner side.
>
> We have a couple of relatively easy cases:
>
> * Inner joins: for every outer tuple, we try to find a match in the
> hash table, and if we find one we emit a tuple.  To add looping
> support, if we run out of memory when loading the hash table we can
> just proceed to probe the fragment we've managed to load so far, and
> then rewind the outer batch, clear the hash table and load in the next
> work_mem-sized fragment and do it again... rinse and repeat until
> we've eventually processed the whole inner batch.  After we've
> finished looping, we move on to the next batch.
>
> * For right and full joins ("HJ_FILL_INNER"), we also need to emit an
> inner tuple for every tuple that was loaded into the hash table but
> never matched.  That's done using a flag HEAP_TUPLE_HAS_MATCH in the
> header of the tuples of the hash table, and a scan through the whole
> hash table at the end of each batch to look for unmatched tuples
> (ExecScanHashTableForUnmatched()).  To add looping support, that just
> has to be done at the end of every inner batch fragment, that is,
> after every loop.
>
> And now for the cases that need a new kind of match bit, as far as I can
> see:
>
> * For left and full joins ("HJ_FILL_OUTER"), we also need to emit an
> outer tuple for every tuple that didn't find a match in the hash
> table.  Normally that is done while probing, without any need for
> memory or match flags: if we don't find a match, we just spit out an
> outer tuple immediately.  But that simple strategy won't work if the
> hash table holds only part of the inner batch.  Since we'll be
> rewinding and looping over the outer batch again for the next inner
> batch fragment, we can't yet say if there will be a match in a later
> loop.  But the later loops don't know on their own either.  So we need
> some kind of cumulative memory between loops, and we only know which
> outer tuples have a match after we've finished all loops.  So there
> would need to be a new function ExecScanOuterBatchForUnmatched().
>
> * For semi joins, we need to emit exactly one outer tuple whenever
> there is one or more match on the inner side.  To add looping support,
> we need to make sure that we don't emit an extra copy of the outer
> tuple if there is a second match in another inner batch fragment.
> Again, this implies some kind of memory between loops, so we can
> suppress later matches.
>
> * For anti joins, we need to emit an outer tuple whenever there is no
> match.  To add looping support, we need to wait until we've seen all
> the inner batch fragments before we know that a given outer tuple has
> no match, perhaps with the same new function
> ExecScanOuterBatchForUnmatched().
>
> So, we need some kind of inter-loop memory, but we obviously don't
> want to create another source of unmetered RAM gobbling.  So one idea
> is a BufFile that has one bit per outer tuple in the batch.  In the
> first loop, we just stream out the match results as we go, and then
> somehow we OR the bitmap with the match results in subsequent loops.
> After the last loop, we have a list of unmatched tuples -- just scan
> it in lock-step with the outer batch and look for 0 bits.
>
> Unfortunately that bits-in-order scheme doesn't work for parallel
> hash, where the SharedTuplestore tuples seen by each worker are
> non-deterministic.  So perhaps in that case we could use the
>

Re: initdb recommendations

2019-06-03 Thread Stephen Frost

Greetings,

* Noah Misch (n...@leadboat.com) wrote:
> On Tue, May 28, 2019 at 12:15:35PM -0400, Magnus Hagander wrote:
> > On Fri, May 24, 2019 at 11:24 AM Noah Misch  wrote:
> > > On Thu, May 23, 2019 at 06:56:49PM +0200, Magnus Hagander wrote:
> > > > On Thu, May 23, 2019, 18:54 Peter Eisentraut 
> > > >  wrote:
> > > > > To recap, the idea here was to change the default authentication 
> > > > > methods
> > > > > that initdb sets up, in place of "trust".
> > > > >
> > > > > I think the ideal scenario would be to use "peer" for local and some
> > > > > appropriate password method (being discussed elsewhere) for host.
> > > > >
> > > > > Looking through the buildfarm, I gather that the only platforms that
> > > > > don't support peer are Windows, AIX, and HP-UX.  I think we can 
> > > > > probably
> > > > > figure out some fallback or alternative default for the latter two
> > > > > platforms without anyone noticing.  But what should the defaults be on
> > > > > Windows?  It doesn't have local sockets, so the lack of peer wouldn't
> > > > > matter.  But is it OK to default to a password method, or would that
> > > > > upset people particularly?
> > > >
> > > > I'm sure password would be fine there. It's what "everybody else" does
> > > > (well sqlserver also cord integrated security, but people are used to 
> > > > it).
> > > 
> > > Our sspi auth is a more-general version of peer auth, and it works over 
> > > TCP.
> > > It would be a simple matter of programming to support "peer" on Windows,
> > > consisting of sspi auth with an implicit pg_ident map.  Nonetheless, I 
> > > agree
> > > password would be fine.
> >
> > I hope oyu don't mean "make peer use sspi on windows". I think that's a
> > really bad idea from a confusion perspective.
> 
> I don't mean "make peer an alias for SSPI", but I do mean "implement peer on
> Windows as a special case of sspi, using the same Windows APIs".  To the
> client, "peer" would look like "sspi".  If that's confusion-prone, what's
> confusing about it?

I tend to agree with Magnus here.  It's confusing because 'peer' in our
existing parlance discusses connections over a unix socket, which
certainly isn't what's happening on Windows.  I do agree with the
general idea of making SSPI work by default on Windows.

> > However, what we could do there is have the defaut pg_hba.conf file contain
> > a "reasonable setup using sspi" that's a different story.
> 
> That's another way to do it.  Currently, to behave like "peer" behaves, one
> hard-codes the machine's SSPI realm into pg_ident.conf.  If we introduced
> pg_ident.conf syntax to remove that need (e.g. %MACHINE_REALM%), that approach
> would work.

I would be in favor of something like this, provided the variables are
defined in such a way that we could avoid conflicting with real values
(and remember that you'd need a regexp in pg_ident.conf for this to
work...).  %xyz%, while supporting %% to mean a literal percent, seems
likely to work.  Not sure if that's what you were thinking though.

> > But I wonder if that isn't better implemented at the installer level. I
> > think we're better off doing something like scram as the config when you
> > build from source ,and then encourage installers to do other things based on
> > the fact that they know more information about the setup (such as usernames
> > actually used).
> 
> If initdb has the information needed to configure the recommended
> authentication, that's the best place to do it, since there's one initdb and
> many installers.  So far, I haven't seen a default auth configuration proposal
> involving knowledge of OS usernames or other information initdb lacks.

I agree with doing it at initdb time.

Note that the current default auth configuration (to some extent) does
depend on the OS username- but that's also something that initdb knows,
and therefore it isn't an issue here.  I don't see a reason that we
wouldn't be able to have initdb handle this.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Sort support for macaddr8

2019-06-03 Thread Peter Geoghegan

On Mon, Jun 3, 2019 at 1:17 PM Melanie Plageman
 wrote:
> I tried checking to see if there is a performance difference using the
> attached DDL based on src/test/regress/sql/macaddr8.sql. I found
> that the sort function is only exercised when creating an index (not,
> for example, when doing some type of aggregation).

As you know, it's a bit weird that we're proposing adding sort support
with abbreviated keys for a type that is 8 bytes, since you'd expect
it to also be pass-by-value on most platforms, which largely defeats
the purpose of having abbreviated keys (though sort support could
still make sense, for the same reason it makes sense to have it for
int8). However, macaddr8 isn't actually pass-by-value, and it seems
too late to do anything about that now, so abbreviated keys actually
make sense.

> With the patch applied to current master and using the DDL attached,
> the timing for creating the index hovered around 20 ms for master and
> 15 ms for the patched version.

I would expect a sufficiently large sort to execute in about half the
time with abbreviation, based on previous experience. However, I think
that this patch can be justified in a relatively straightforward way.
It extends sort support for macaddr to macaddr8, since these two types
are almost identical in every other way. We should still validate the
performance out of an abundance of caution, but I would be very
surprised if there was much difference between the macaddr and
macaddr8 cases.

In short, users should not be surprised by the big gap in performance
between macaddr and macaddr8. It's worth being consistent there.

> I think that that seems like an improvement. I was thinking of
> registering this patch for the next commitfest. Is that okay?

Definitely, yes.

> I was just wondering what the accepted way to test and share
> performance numbers is for a small patch like this. Is sharing DDL
> enough? Do I need to use pg_bench?

I've always used custom microbenchmarks for stuff like this.
Repeatedly executing a particular query and taking the median
execution time as representative seems to be the best approach.

--
Peter Geoghegan

[PATCH] ruleutils: Fix subqueries with shadowed aliases

2019-06-03 Thread Philip Dubé

Discovered while looking into issue here: 
https://github.com/citusdata/citus/pull/2733

For completeness I'll quote the example code to demonstrate the issue:

postgres=# create table events_table (id integer primary key, user_id integer); 
CREATE TABLE postgres=# create table users_table_ref (id integer primary key, 
value_2 integer); CREATE TABLE postgres=# create view asdf as SELECT r FROM
  (SELECT  user_id_deep, random() as r -- prevent pulling up the subquery
   FROM (events_table
 INNER JOIN
 users_table_ref ON (events_table.user_id = users_table_ref.value_2)) 
AS join_alias(user_id_dee
p)) AS bar,
 (events_table
  INNER JOIN
 users_table_ref ON (events_table.user_id = users_table_ref.value_2)) AS 
join_alias(user_id_deep) WHERE (bar.user_id_deep = join_alias.user_id_deep); 
CREATE VIEW postgres=# \d+ asdf
 View "public.asdf"
 Column |   Type   | Collation | Nullable | Default | Storage | 
Description 
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
+--+---+--+-+-+-
 r  | double precision |   |  | | plain   | View 
definition:
 SELECT bar.r
   FROM ( SELECT join_alias_1.user_id_deep,
random() AS r
   FROM (events_table events_table_1
 JOIN users_table_ref users_table_ref_1 ON events_table_1.user_id = 
users_table_ref_1.value_2) join_alias(user_id_deep, user_id, id, value_2)) bar,
(events_table
 JOIN users_table_ref ON events_table.user_id = users_table_ref.value_2) 
join_alias(user_id_deep, user_id, id, value_2)
  WHERE bar.user_id_deep = join_alias.user_id_deep;

Where the 2nd join_alias should be renamed to join_alias_1


0001-ruleutils-Fix-subqueries-with-shadowed-aliases.patch
Description: 0001-ruleutils-Fix-subqueries-with-shadowed-aliases.patch

Re: Sort support for macaddr8

2019-06-03 Thread Peter Geoghegan

On Mon, Jun 3, 2019 at 2:03 PM Chapman Flack  wrote:
> Am I going cross-eyed, or would the memset be serving more of a purpose
> if it were in the SIZEOF_DATUM != 8 branch?

No, it wouldn't -- that's the correct place for it with the macaddr
type. However, it isn't actually necessary to memset() at the
equivalent point for macaddr8, since we cannot "run out of bytes from
the authoritative representation" that go in the Datum/abbreviated
key. I suppose that the memset() should simply be removed, since it is
superfluous here.

-- 
Peter Geoghegan

Use of multi-column gin index

2019-06-03 Thread Jared Rulison

Hello, we have some confusion over the planner's use of an index.

Suppose we have a table "parades" with columns:

"city_id" of type integer
"description" of type text
"start_time" of type timestamp without time zone

Suppose also we have indexes:

"parades_city_id_description_tsv_index" gin (city_id,
to_tsvector('simple'::regconfig, description)) WHERE description IS NOT NULL
"parades_city_id_start_time_index" btree (city_id, start_time)

When we EXPLAIN the query

SELECT * FROM "parades" WHERE ((description IS NOT NULL) AND
(to_tsvector('simple', description) @@ to_tsquery('simple', 'fun')) AND
("city_id" IN ()));

We get

Bitmap Heap Scan on parades  (cost=12691.97..18831.21 rows=2559
width=886)
  Recheck Cond: ((to_tsvector('simple'::regconfig, description) @@
'''fun'''::tsquery) AND (description IS NOT NULL) AND (city_id = ANY
('{}'::integer[])))
  ->  BitmapAnd  (cost=12691.97..12691.97 rows=2559 width=0)
->  Bitmap Index Scan on parades_city_id_description_tsv_index
 (cost=0.00..2902.97 rows=229463 width=0)
  Index Cond: (to_tsvector('simple'::regconfig, title) @@
'''fun'''::tsquery)
->  Bitmap Index Scan on parades_city_id_start_time_index
 (cost=0.00..9787.47 rows=565483 width=0)
  Index Cond: (city_id = ANY ('{}'::integer[]))

When we EXPLAIN the same query but with one city_id

SELECT * FROM "parades" WHERE ((description IS NOT NULL) AND
(to_tsvector('simple', description) @@ to_tsquery('simple', 'fun')) AND
("city_id" IN (1)));

We get

 Bitmap Heap Scan on parades  (cost=36.20..81.45 rows=20 width=886)
   Recheck Cond: ((city_id = 1) AND (to_tsvector('simple'::regconfig,
description) @@ '''fun'''::tsquery) AND (description IS NOT NULL))
   ->  Bitmap Index Scan on parades_city_id_description_tsv_index
(cost=0.00..36.20
rows=20 width=0)
 Index Cond: ((city_id = 1) AND
(to_tsvector('simple'::regconfig, description) @@ '''fun'''::tsquery))

This leaves us with two questions:

1. How is postgres able to use parades_city_id_description_tsv_index in the
first explain result without any filter on "city_id"?
2. Why does the planner in the first query decide not to simply use
parades_city_id_description_tsv_index (as in the second explain result)
when the cardinality of the set of "city_id"s is high?

Thanks,
Jared

Re: Sort support for macaddr8

2019-06-03 Thread Chapman Flack

On 6/3/19 5:03 PM, Chapman Flack wrote:
> On 6/3/19 3:23 PM, Melanie Plageman wrote:
>> Peter and I implemented this small (attached) patch to extend
>> abbreviated key compare sort to macaddr8 datatype (currently supported
>> for macaddr).
> 
> Am I going cross-eyed, or would the memset be serving more of a purpose
> if it were in the SIZEOF_DATUM != 8 branch?

It looks like a copy-pasto coming from mac.c, where the size of
the thing to be compared isn't itself 8 bytes.

With sizeof(macaddr) being 6, that original code may have had
these cases in mind:

- SIZEOF_DATUM is something smaller than 6 (likely 4). The whole key
  doesn't fit, but that's ok, because abbreviated "equality" just means
  to recheck with the authoritative routine.
- SIZEOF_DATUM is exactly 6. Probably not a thing.
- SIZEOF_DATUM is anything larger than 6 (likely 8). Needs the memset.
  Also, in this case, abbreviated "equality" could be taken as true
  equality, never needing the authoritative fallback.

For macaddr8, the cases morph into these:

- SIZEOF_DATUM is something smaller than 8 (likely 4). Ok; it's
  just an abbreviation.
- SIZEOF_DATUM is exactly 8. Now an actual thing, even likely.
- SIZEOF_DATUM is larger than 8. Our flying cars run postgres, and
  we need the memset to make sure they don't crash.

This leaves me with a couple of questions:

1. (This one seems like a bug.) In the little-endian case, if
   SIZEOF_DATUM is smaller than the type, I'm not convinced by doing
   the DatumBigEndianToNative() after the memcpy(). Seems like that's
   too late to make sure the most-significant bytes got copied.

2. (This one seems like an API opportunity.) If it becomes common to
   add abbreviation support for smallish types such that (as here,
   when SIZEOF_DATUM >= 8), an abbreviated "equality" result is in fact
   authoritative, would it be worthwhile to have some way for the sort
   support routine to announce that fact to the caller? That could
   spare the caller the effort of re-checking with the authoritative
   routine. It could also (by making the equality case less costly)
   end up changing the weight assigned to the cardinality estimate in
   deciding whether to abbrev..

Regards,
-Chap

Re: Sort support for macaddr8

2019-06-03 Thread Chapman Flack

On 6/3/19 5:59 PM, Chapman Flack wrote:
> On 6/3/19 5:03 PM, Chapman Flack wrote:
> 1. (This one seems like a bug.) In the little-endian case, if
>SIZEOF_DATUM is smaller than the type, I'm not convinced by doing
>the DatumBigEndianToNative() after the memcpy(). Seems like that's
>too late to make sure the most-significant bytes got copied.

Wait, I definitely was cross-eyed for that one. It's the abbreviated
copy whose endianness varies. Never mind.

Regards,
-Chap

Re: Sort support for macaddr8

2019-06-03 Thread Peter Geoghegan

On Mon, Jun 3, 2019 at 2:59 PM Chapman Flack  wrote:
> 1. (This one seems like a bug.) In the little-endian case, if
>SIZEOF_DATUM is smaller than the type, I'm not convinced by doing
>the DatumBigEndianToNative() after the memcpy(). Seems like that's
>too late to make sure the most-significant bytes got copied.

Uh, when else would you do it? Before the memcpy()?

> 2. (This one seems like an API opportunity.) If it becomes common to
>add abbreviation support for smallish types such that (as here,
>when SIZEOF_DATUM >= 8), an abbreviated "equality" result is in fact
>authoritative, would it be worthwhile to have some way for the sort
>support routine to announce that fact to the caller? That could
>spare the caller the effort of re-checking with the authoritative
>routine.

It's possible that that would make sense, but I don't think that this
patch needs to do that. There is at least one pre-existing cases that
does this -- macaddr.

-- 
Peter Geoghegan

Re: Use of multi-column gin index

2019-06-03 Thread Tom Lane

Jared Rulison  writes:
> Hello, we have some confusion over the planner's use of an index.
> ...
> 1. How is postgres able to use parades_city_id_description_tsv_index in the
> first explain result without any filter on "city_id"?

GIN indexes don't have any particular bias towards earlier or later
columns (unlike btrees).  So this isn't any harder than if you'd
put the index columns in the other order.

> 2. Why does the planner in the first query decide not to simply use
> parades_city_id_description_tsv_index (as in the second explain result)
> when the cardinality of the set of "city_id"s is high?

[ shrug... ]  It thinks it's cheaper.  Whether it's correct is impossible
to say from the given data, but there is a moderately complex cost model
in there.  The comments for gincost_scalararrayopexpr note

 * A ScalarArrayOpExpr will give rise to N separate indexscans at runtime,
 * each of which involves one value from the RHS array, plus all the
 * non-array quals (if any).

I haven't checked the actual execution code, but this seems to be saying
that the GIN indexscan executor always does ANDs before ORs.  That means
that doing everything in the same GIN indexscan would require executing
the to_tsvector part 50 times, so I can definitely believe that shoving
the IN part to a different index and AND'ing afterwards is a better idea.
(Whether the GIN executor should be made smarter to avoid that is a
separate question ;-))

regards, tom lane

Re: WITH NOT MATERIALIZED and DML CTEs

2019-06-03 Thread David Fetter

On Mon, Jun 03, 2019 at 11:45:51AM -0400, Elvis Pranskevichus wrote:
> Currently, WITH a AS NOT MATERIALIZED (INSERT ...) would silently 
> disregard the "NOT MATERIALIZED" instruction and execute the data-
> modifying CTE to completion (as per the long-standing DML CTE rule).
> 
> This seems like an omission to me.  Ideally, the presence of an explicit 
> "NOT MATERIALIZED" clause on a data-modifying CTE should disable the 
> "run to completion" logic.

It might be worth documenting the fact that NOT MATERIALIZED doesn't
affect DML CTEs, just as it doesn't affect statements with volatile
functions and recursive CTEs.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: WITH NOT MATERIALIZED and DML CTEs

2019-06-03 Thread Tom Lane

David Fetter  writes:
> It might be worth documenting the fact that NOT MATERIALIZED doesn't
> affect DML CTEs, just as it doesn't affect statements with volatile
> functions and recursive CTEs.

We already do:

However, if a WITH query is non-recursive and side-effect-free (that
is, it is a SELECT containing no volatile functions) then it can be
folded into the parent query, allowing joint optimization of the two
query levels. By default, this happens if the parent query references
the WITH query just once, but not if it references the WITH query more
than once. You can override that decision by specifying MATERIALIZED
to force separate calculation of the WITH query, or by specifying NOT
MATERIALIZED to force it to be merged into the parent query. The
latter choice risks duplicate computation of the WITH query, but it
can still give a net savings if each usage of the WITH query needs
only a small part of the WITH query's full output.

regards, tom lane

Re: Confusing error message for REINDEX TABLE CONCURRENTLY

2019-06-03 Thread Ashwin Agrawal

On Tue, May 28, 2019 at 12:05 PM David Rowley 
wrote:

> On Tue, 28 May 2019 at 01:23, Ashwin Agrawal  wrote:
> > I think we will need to separate out the NOTICE message for concurrent
> and regular case.
> >
> > For example this doesn't sound correct
> > WARNING:  cannot reindex exclusion constraint index
> "public.circles_c_excl" concurrently, skipping
> > NOTICE:  table "circles" has no indexes to reindex
> >
> > As no indexes can't be reindexed *concurrently* but there are still
> indexes which can be reindexed, invalid indexes I think fall in same
> category.
>
> Swap "can't" for "can" and, yeah. I think it would be good to make the
> error messages differ for these two cases. This would serve as a hint
> to the user that they might have better luck trying without the
> "concurrently" option.
>

Please check if the attached patch addresses and satisfies all the points
discussed so far in this thread.

Was thinking of adding explicit errhint for concurrent case NOTICE to
convey, either the table has no indexes or can only be reindexed without
CONCURRENTLY. But thought may be its obvious but feel free to add if would
be helpful.
From 2e9eceff07d9bdadef82a54c17f399263436bf9d Mon Sep 17 00:00:00 2001
From: Ashwin Agrawal 
Date: Mon, 3 Jun 2019 16:30:39 -0700
Subject: [PATCH v2] Avoid confusing error message for REINDEX.

---
 src/backend/commands/indexcmds.c   | 26 +++---
 src/test/regress/expected/create_index.out | 10 -
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 40ea629ffe7..bab20028090 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2438,17 +2438,25 @@ ReindexTable(RangeVar *relation, int options, bool concurrent)
 	   RangeVarCallbackOwnsTable, NULL);
 
 	if (concurrent)
+	{
 		result = ReindexRelationConcurrently(heapOid, options);
+
+		if (!result)
+			ereport(NOTICE,
+	(errmsg("table \"%s\" has no indexes that can be concurrently reindexed",
+			relation->relname)));
+	}
 	else
+	{
 		result = reindex_relation(heapOid,
   REINDEX_REL_PROCESS_TOAST |
   REINDEX_REL_CHECK_CONSTRAINTS,
   options);
-
-	if (!result)
-		ereport(NOTICE,
-(errmsg("table \"%s\" has no indexes",
-		relation->relname)));
+		if (!result)
+			ereport(NOTICE,
+	(errmsg("table \"%s\" has no indexes to reindex",
+			relation->relname)));
+	}
 
 	return heapOid;
 }
@@ -2630,7 +2638,6 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
 	foreach(l, relids)
 	{
 		Oid			relid = lfirst_oid(l);
-		bool		result;
 
 		StartTransactionCommand();
 		/* functions in indexes may want a snapshot set */
@@ -2638,11 +2645,12 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
 
 		if (concurrent)
 		{
-			result = ReindexRelationConcurrently(relid, options);
+			ReindexRelationConcurrently(relid, options);
 			/* ReindexRelationConcurrently() does the verbose output */
 		}
 		else
 		{
+			bool result;
 			result = reindex_relation(relid,
 	  REINDEX_REL_PROCESS_TOAST |
 	  REINDEX_REL_CHECK_CONSTRAINTS,
@@ -2656,7 +2664,6 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
 
 			PopActiveSnapshot();
 		}
-
 		CommitTransactionCommand();
 	}
 	StartTransactionCommand();
@@ -2676,6 +2683,9 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
  * The locks taken on parent tables and involved indexes are kept until the
  * transaction is committed, at which point a session lock is taken on each
  * relation.  Both of these protect against concurrent schema changes.
+ *
+ * Returns true if any indexes have been rebuilt (including toast table's
+ * indexes when relevant).
  */
 static bool
 ReindexRelationConcurrently(Oid relationOid, int options)
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index c8bc6be0613..72a8fca48e4 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1944,9 +1944,9 @@ DROP TABLE reindex_verbose;
 CREATE TABLE concur_reindex_tab (c1 int);
 -- REINDEX
 REINDEX TABLE concur_reindex_tab; -- notice
-NOTICE:  table "concur_reindex_tab" has no indexes
+NOTICE:  table "concur_reindex_tab" has no indexes to reindex
 REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
-NOTICE:  table "concur_reindex_tab" has no indexes
+NOTICE:  table "concur_reindex_tab" has no indexes that can be concurrently reindexed
 ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
 -- Normal index with integer column
 CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
@@ -2043,10 +2043,10 @@ ERROR:  REINDEX is not yet implemented for partitioned indexes
 -- REINDEX is a no-op for partitioned tables
 REINDEX TABLE concur_reindex_part_10;
 WARNING:  REINDEX of partition

Re: Pinned files at Windows

2019-06-03 Thread Michael Paquier

On Mon, Jun 03, 2019 at 11:37:30PM +0300, Konstantin Knizhnik wrote:
> but without success because ERROR_DELETE_PENDING is never returned by Win32.
> And moreover, stat() doesn't ever return error in this case.

Could it be possible to find a reliable way to detect that?
Cloberring errno with an incorrect value is not something we can rely
on, and I am ready to buy that GetFileAttributesEx() can also return
EACCES for some legit cases, like a file it has no access to.  What
if for example something is done on a file between the stat() call and
the GetFileAttributesEx() call in pgwin32_safestat() so as EACCES is
a legit error?
--
Michael


signature.asc
Description: PGP signature

Re: Comment typo in tableam.h

2019-06-03 Thread Ashwin Agrawal

There were few more minor typos I had collected for table am, passing them
along here.

Some of the required callback functions are missing Assert checking (minor
thing), adding them in separate patch.
From f32bdf5d0d3af5fd6ee6bf6430905f9c4bf5fefa Mon Sep 17 00:00:00 2001
From: Ashwin Agrawal 
Date: Fri, 24 May 2019 16:30:38 -0700
Subject: [PATCH v1 1/2] Fix typos in few tableam comments.

---
 src/backend/access/table/tableamapi.c |  2 +-
 src/include/access/tableam.h  | 14 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 32877e7674f..0df5ba35803 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -91,7 +91,7 @@ GetTableAmRoutine(Oid amhandler)
 
 	Assert(routine->relation_estimate_size != NULL);
 
-	/* optional, but one callback implies presence of hte other */
+	/* optional, but one callback implies presence of the other */
 	Assert((routine->scan_bitmap_next_block == NULL) ==
 		   (routine->scan_bitmap_next_tuple == NULL));
 	Assert(routine->scan_sample_next_block != NULL);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 0b6ac15d316..2d06d52d71f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -231,7 +231,7 @@ typedef struct TableAmRoutine
 
 	/*
 	 * Estimate the size of shared memory needed for a parallel scan of this
-	 * relation. The snapshot does not need to be accounted for.
+	 * relation.
 	 */
 	Size		(*parallelscan_estimate) (Relation rel);
 
@@ -434,8 +434,8 @@ typedef struct TableAmRoutine
 	 *
 	 * Note that only the subset of the relcache filled by
 	 * RelationBuildLocalRelation() can be relied upon and that the relation's
-	 * catalog entries either will either not yet exist (new relation), or
-	 * will still reference the old relfilenode.
+	 * catalog entries will either not yet exist (new relation), or will still
+	 * reference the old relfilenode.
 	 *
 	 * As output *freezeXid, *minmulti must be set to the values appropriate
 	 * for pg_class.{relfrozenxid, relminmxid}. For AMs that don't need those
@@ -591,7 +591,7 @@ typedef struct TableAmRoutine
 	 * See table_relation_estimate_size().
 	 *
 	 * While block oriented, it shouldn't be too hard for an AM that doesn't
-	 * doesn't internally use blocks to convert into a usable representation.
+	 * internally use blocks to convert into a usable representation.
 	 *
 	 * This differs from the relation_size callback by returning size
 	 * estimates (both relation size and tuple count) for planning purposes,
@@ -967,7 +967,7 @@ table_index_fetch_end(struct IndexFetchTableData *scan)
  *
  * *all_dead, if all_dead is not NULL, will be set to true by
  * table_index_fetch_tuple() iff it is guaranteed that no backend needs to see
- * that tuple. Index AMs can use that do avoid returning that tid in future
+ * that tuple. Index AMs can use that to avoid returning that tid in future
  * searches.
  *
  * The difference between this function and table_fetch_row_version is that
@@ -1014,8 +1014,8 @@ extern bool table_index_fetch_tuple_check(Relation rel,
  * true, false otherwise.
  *
  * See table_index_fetch_tuple's comment about what the difference between
- * these functions is. This function is the correct to use outside of
- * index entry->table tuple lookups.
+ * these functions is. This function is correct to use outside of index
+ * entry->table tuple lookups.
  */
 static inline bool
 table_tuple_fetch_row_version(Relation rel,
-- 
2.19.1

From 022569d249918da60d145d7877dc0f8df4ccc6cd Mon Sep 17 00:00:00 2001
From: Ashwin Agrawal 
Date: Mon, 3 Jun 2019 17:07:05 -0700
Subject: [PATCH v1 2/2] Add assertions for required table am callbacks.

---
 src/backend/access/table/tableamapi.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 0df5ba35803..55bd1eea3db 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -50,6 +50,7 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_begin != NULL);
 	Assert(routine->scan_end != NULL);
 	Assert(routine->scan_rescan != NULL);
+	Assert(routine->scan_getnextslot != NULL);
 
 	Assert(routine->parallelscan_estimate != NULL);
 	Assert(routine->parallelscan_initialize != NULL);
@@ -61,8 +62,12 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->index_fetch_tuple != NULL);
 
 	Assert(routine->tuple_fetch_row_version != NULL);
+	Assert(routine->tuple_tid_valid != NULL);
+	Assert(routine->tuple_get_latest_tid != NULL);
 	Assert(routine->tuple_satisfies_snapshot != NULL);
 
+	Assert(routine->compute_xid_horizon_for_tuples != NULL);
+
 	Assert(routine->tuple_insert != NULL);
 
 	/*
@@ -88,6 +93,7 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->index_validate_scan != NULL);
 
 	Assert(routine->relation_size != NULL);
+	Assert(routine->r

Re: Comment typo in tableam.h

2019-06-03 Thread Andres Freund

Hi,

Thanks for these!

On 2019-06-03 17:24:15 -0700, Ashwin Agrawal wrote:
>   /*
>* Estimate the size of shared memory needed for a parallel scan of this
> -  * relation. The snapshot does not need to be accounted for.
> +  * relation.
>*/
>   Size(*parallelscan_estimate) (Relation rel);

That's not a typo?

Greetings,

Andres Freund

Re: Fix inconsistencies for v12

2019-06-03 Thread Amit Kapila

On Mon, Jun 3, 2019 at 10:56 PM Tom Lane  wrote:
>
> Amit Langote  writes:
> > On 2019/05/30 18:51, Amit Kapila wrote:
> >> I think it will be better to include postgres_fdw in the comment in
> >> some way so that if someone wants a concrete example, there is
> >> something to refer to.
>
> > Maybe a good idea, but this will be the first time to mention postgres_fdw
> > in the core source code.  If you think that's OK, how about the attached?
>
> This wording seems fine to me.
>
> Now that we've beat that item into the ground ... there were a bunch of
> other tweaks suggested in Alexander's initial email.  Amit (K), were you
> going to review/commit those?
>

Yes, I am already reviewing those.  I will post my comments today.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WITH NOT MATERIALIZED and DML CTEs

2019-06-03 Thread David Fetter

On Mon, Jun 03, 2019 at 07:33:35PM -0400, Tom Lane wrote:
> David Fetter  writes:
> > It might be worth documenting the fact that NOT MATERIALIZED doesn't
> > affect DML CTEs, just as it doesn't affect statements with volatile
> > functions and recursive CTEs.
> 
> We already do:
> 
> However, if a WITH query is non-recursive and side-effect-free (that
> is, it is a SELECT containing no volatile functions) then it can be

I guess this part makes it pretty clear that DML isn't part of the
party just yet.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Table AM callbacks referring to heap in declarations (+typos)

2019-06-03 Thread Michael Paquier

On Sat, Jun 01, 2019 at 12:43:11PM -0700, Andres Freund wrote:
> I don't mind at all (although it's imo not a fix for the s/heap/table)!

Thanks, committed what I had.
--
Michael


signature.asc
Description: PGP signature

Re: Binary support for pgoutput plugin

2019-06-03 Thread David Fetter

On Mon, Jun 03, 2019 at 10:49:54AM -0400, Dave Cramer wrote:
> Is there a reason why pgoutput sends data in text format? Seems to
> me that sending data in binary would provide a considerable
> performance improvement.

Are you seeing something that suggests that the text output is taking
a lot of time or other resources?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Comment typo in tableam.h

2019-06-03 Thread Ashwin Agrawal

On Mon, Jun 3, 2019 at 5:26 PM Andres Freund  wrote:

> Hi,
>
> Thanks for these!
>
> On 2019-06-03 17:24:15 -0700, Ashwin Agrawal wrote:
> >   /*
> >* Estimate the size of shared memory needed for a parallel scan
> of this
> > -  * relation. The snapshot does not need to be accounted for.
> > +  * relation.
> >*/
> >   Size(*parallelscan_estimate) (Relation rel);
>
> That's not a typo?
>

The snapshot is not passed as argument to that function hence seems weird
to refer to snapshot in the comment, as anyways callback function can't
account for it. Seems stale piece of comment and hence that piece of text
should be removed. I should have refereed to changes as general comment
fixes instead of explicit typo fixes :-)

Re: Comment typo in tableam.h

2019-06-03 Thread Andres Freund

Hi,

On 2019-06-03 18:21:56 -0700, Ashwin Agrawal wrote:
> On Mon, Jun 3, 2019 at 5:26 PM Andres Freund  wrote:
> 
> > Hi,
> >
> > Thanks for these!
> >
> > On 2019-06-03 17:24:15 -0700, Ashwin Agrawal wrote:
> > >   /*
> > >* Estimate the size of shared memory needed for a parallel scan
> > of this
> > > -  * relation. The snapshot does not need to be accounted for.
> > > +  * relation.
> > >*/
> > >   Size(*parallelscan_estimate) (Relation rel);
> >
> > That's not a typo?
> >
> 
> The snapshot is not passed as argument to that function hence seems weird
> to refer to snapshot in the comment, as anyways callback function can't
> account for it.

It's part of the parallel scan struct, and it used to be accounted for
by pre tableam function...

Greetings,

Andres Freund

Re: Confusing error message for REINDEX TABLE CONCURRENTLY

2019-06-03 Thread Michael Paquier

On Mon, Jun 03, 2019 at 04:53:48PM -0700, Ashwin Agrawal wrote:
> Please check if the attached patch addresses and satisfies all the points
> discussed so far in this thread.

It looks to be so, please see below for some comments.

> +{
>  result = ReindexRelationConcurrently(heapOid, options);
> +
> +if (!result)
> +ereport(NOTICE,
> +(errmsg("table \"%s\" has no indexes that can be 
> concurrently reindexed",
> +relation->relname)));

"concurrently" should be at the end of this string.  I have had the
exact same argument with Tom for 508300e.

> @@ -2630,7 +2638,6 @@ ReindexMultipleTables(const char *objectName, 
> ReindexObjectType objectKind,
>  foreach(l, relids)
>  {
>  Oidrelid = lfirst_oid(l);
> -boolresult;
>  
>  StartTransactionCommand();
>  /* functions in indexes may want a snapshot set */
> @@ -2638,11 +2645,12 @@ ReindexMultipleTables(const char *objectName, 
> ReindexObjectType objectKind,
>  
>  if (concurrent)
>  {
> -result = ReindexRelationConcurrently(relid, options);
> +ReindexRelationConcurrently(relid, options);
>  /* ReindexRelationConcurrently() does the verbose output */

Indeed this variable is not used.  So we could just get rid of it
completely.

> +bool result;
>  result = reindex_relation(relid,
>REINDEX_REL_PROCESS_TOAST |
>REINDEX_REL_CHECK_CONSTRAINTS,
> @@ -2656,7 +2664,6 @@ ReindexMultipleTables(const char *objectName, 
> ReindexObjectType objectKind,
>  
>  PopActiveSnapshot();
>  }

The table has been considered for reindexing even if nothing has been
reindexed, so perhaps we'd want to keep this part as-is?  We have the
same level of reporting for a couple of releases for this part.

> -
>  CommitTransactionCommand();

Useless noise diff.
--
Michael


signature.asc
Description: PGP signature

Re: Print baserestrictinfo for varchar fields

2019-06-03 Thread Michael Paquier

On Mon, Jun 03, 2019 at 01:37:22AM -0400, Tom Lane wrote:
> It's hard to muster much enthusiasm for extending print_expr(),
> considering how incomplete and little-used it is.  I'd rather
> spend effort on ripping it out in favor of using the far more
> complete, and better-tested, code in ruleutils.c.

If it is possible to get the same amount of coverage when debugging
the planner, count me in.  Now it seems to me that we'd still require
some work to get the same level of information as for range table
entry kinds..
--
Michael


signature.asc
Description: PGP signature

Re: Comment typo in tableam.h

2019-06-03 Thread Ashwin Agrawal

On Mon, Jun 3, 2019 at 6:24 PM Andres Freund  wrote:

> Hi,
>
> On 2019-06-03 18:21:56 -0700, Ashwin Agrawal wrote:
> > On Mon, Jun 3, 2019 at 5:26 PM Andres Freund  wrote:
> >
> > > Hi,
> > >
> > > Thanks for these!
> > >
> > > On 2019-06-03 17:24:15 -0700, Ashwin Agrawal wrote:
> > > >   /*
> > > >* Estimate the size of shared memory needed for a parallel
> scan
> > > of this
> > > > -  * relation. The snapshot does not need to be accounted for.
> > > > +  * relation.
> > > >*/
> > > >   Size(*parallelscan_estimate) (Relation rel);
> > >
> > > That's not a typo?
> > >
> >
> > The snapshot is not passed as argument to that function hence seems weird
> > to refer to snapshot in the comment, as anyways callback function can't
> > account for it.
>
> It's part of the parallel scan struct, and it used to be accounted for
> by pre tableam function...
>

Reads like the comment written from past context then, and doesn't have
much value now. Its confusing than helping, to state not to account for
snapshot and not any other field.
table_parallelscan_estimate() has snapshot argument and it accounts for it,
but callback doesn't. I am not sure how a callback would explicitly use
that comment and avoid accounting for snapshot if its using generic
ParallelTableScanDescData. But if you feel is helpful, please feel free to
keep that text.

Re: coverage additions

2019-06-03 Thread Michael Paquier

On Sat, Jun 01, 2019 at 12:55:47AM -0400, Alvaro Herrera wrote:
> Ah, now I remember that I tried this before, but it requires some extra
> packages installed in the machine I think, and those create running
> services.  Did you note that src/backend/libpq does not even list the
> gssapi file?

Do you mean the header file be-gssapi-common.h?  It is stored in
src/backend/libpq/ which is obviously incorrect.  I think that it
should be moved to src/include/libpq/be-gssapi-common.h.  Its
identification marker even says that.  Perhaps that's because of MSVC?
Stephen?
--
Michael


signature.asc
Description: PGP signature

Re: Fix inconsistencies for v12

2019-06-03 Thread Amit Kapila

Hi Andres,

I have added you here as some of these are related to commits done by
you.  So need your opinion on the same.  I have mentioned where your
feedback will be helpful.

On Sun, May 26, 2019 at 2:20 AM Alexander Lakhin  wrote:
>
> 6. ExecGetResultSlot - remove (not used since introduction in 1a0586de)
>

Yeah, I also think this is not required.  Andres, this API is not
defined.  Is it intended for some purpose?

> 8. heap_parallelscan_initialize -> remove the sentence (changed in c2fe139c)
>

The same check has been moved to table_block_parallelscan_initialize.
So, I think instead of removing the sentence you need to change the
function name in the comment.

> 10. HeapTupleSatisfiesSnapshot -> HeapTupleSatisfiesVisibility (an
> internal inconsistency)
>

  * This is an interface to HeapTupleSatisfiesVacuum that's callable via
- * HeapTupleSatisfiesSnapshot, so it can be used through a Snapshot.
+ * HeapTupleSatisfiesVisibility, so it can be used through a Snapshot.

I think now we don't need to write the second half of the comment ("so
it can be used through a Snapshot").  It makes more sense with
previous style API.

Another related point:
* HeapTupleSatisfiesNonVacuumable
 *
 * True if tuple might be visible to some
transaction; false if it's
 * surely dead to everyone, ie, vacuumable.
 *
 * See SNAPSHOT_TOAST's definition for the intended behaviour.

Here, I think instead of SNAPSHOT_TOAST, we should mention
SNAPSHOT_NON_VACUUMABLE.

Andres, do you have any comments on the proposed changes?

> 14. latestRemovedxids -> latestRemovedXids (an inconsistent case)

* Conjecture: if hitemid is dead then it had xids before the xids
  * marked on LP_NORMAL items. So we just ignore this item and move
  * onto the next, for the purposes of calculating
- * latestRemovedxids.
+ * latestRemovedXids.

I think it should be latestRemovedXid.

> 16. NextSampletuple -> NextSampleTuple (an inconsistent case)

I think this doesn't make much difference, but we can fix it so that
NextSampleTuple's occurrence can be found during grep.

> 20. RelationGetOidIndex ? just to remove the paragraph (orphaned after
> 578b2297)

- * This is exported separately because there are cases where we want to use
- * an index that will not be recognized by RelationGetOidIndex: TOAST tables
- * have indexes that are usable, but have multiple columns and are on
- * ordinary columns rather than a true OID column.  This code will work
- * anyway, so long as the OID is the index's first column.  The caller must
- * pass in the actual heap attnum of the OID column, however.
- *

Instead of removing the entire paragraph, how about changing it like
"This also handles the special cases where TOAST tables have indexes
that are usable, but have multiple columns and are on ordinary columns
rather than a true OID column.  This code will work anyway, so long as
the OID is the index's first column.  The caller must
pass in the actual heap attnum of the OID column, however."

Andres, any suggestions?

> 27. XactTopTransactionId -> XactTopFullTransactionId (an internal
> inconsistency)
>

- * XactTopTransactionId stores the XID of our toplevel transaction, which
+ * XactTopFullTransactionId stores the XID of our toplevel transaction, which
  * will be the same as TopTransactionState.transactionId in an ordinary

I think in the above sentence, now we need to use
TopTransactionState.fullTransactionId.

Note that I agree with your changes for the points where I have not
responded anything.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

PG 11 JIT deform failure

2019-06-03 Thread didier

Hi,

JIT slot_compile_deform assumes there's at least  'natts' in TupleDesc, eg
/*
 * Iterate over each attribute that needs to be deformed, build code to
 * deform it.
 */
for (attnum = 0; attnum < natts; attnum++)
{
Form_pg_attribute att = TupleDescAttr(desc, attnum);

but a new TupleDesc has no attribute and the caller only tests
TupleDesc is not null.
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 0da318218f..3831ed2f5a 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -336,7 +336,7 @@ llvm_compile_expr(ExprState *state)
 	 * function specific to tupledesc and the exact number of
 	 * to-be-extracted attributes.
 	 */
-	if (desc && (context->base.flags & PGJIT_DEFORM))
+	if (desc && desc->natts >= op->d.fetch.last_var && (context->base.flags & PGJIT_DEFORM))
 	{
 		LLVMValueRef params[1];
 		LLVMValueRef l_jit_deform;

52 matches

Mail list logo