[HACKERS] Showing parallel status in \df+

2016-07-08 Thread Michael Paquier
Hi all,

Fujii-san has reminded me of the fact that we do not show in \df+ the
parallel status of a function. The output of \df+ is already very
large, so I guess that any people mentally sane already use it with
the expanded display mode, and it may not matter adding more
information.
Thoughts about adding this piece of information?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Magnus Hagander
On Friday, July 8, 2016, Michael Paquier  wrote:

> Hi all,
>
> Fujii-san has reminded me of the fact that we do not show in \df+ the
> parallel status of a function. The output of \df+ is already very
> large, so I guess that any people mentally sane already use it with
> the expanded display mode, and it may not matter adding more
> information.
> Thoughts about adding this piece of information?
>
>
Seems like a good idea to me. It's going to be useful in debugging


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Pavel Stehule
2016-07-08 9:00 GMT+02:00 Michael Paquier :

> Hi all,
>
> Fujii-san has reminded me of the fact that we do not show in \df+ the
> parallel status of a function. The output of \df+ is already very
> large, so I guess that any people mentally sane already use it with
> the expanded display mode, and it may not matter adding more
> information.
> Thoughts about adding this piece of information?
>

It has 11 columns. I don't see any problem to show few columns more. It is
better than missing important information.

Regards

Pavel

--
> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Craig Ringer
On 8 July 2016 at 03:50, Pete Stevenson  wrote:

> Hi Simon -
>
> Thanks for the note. I think it's fair to say that I didn't provide enough
> context, so let me try and elaborate on my question.
>

Please reply in-line in posts to make it easier to follow conversations
with multiple people.


> It is the case that for the database to implement MVCC it must provide
> consistent read to multiple different versions of data, i.e. depending on
> the version used at transaction start.
>

Not necessarily transaction start; for REPEATABLE READ isolation, statement
start is sufficient, or even weaker than that.


> I'm not an expert on postgresql internals, but this must have some cost.
>

Sure it does. Disk space, efficiency of use of RAM for disk cache, CPU cost
of scanning over not-visible tuples, etc.


> I think the cost related to MVCC guarantees can roughly be categorized as:
> creating new versions (linking them in)
>

The way PostgreSQL does that (read the manual) is pretty lightweight. You
will have already found the old tuple so setting its xmax is cheap. Writing
the new tuple costs much the same as an insert.


> version checking on read
>

Yep. In particular, index scans because PostgreSQL doesn't maintain
visibility information in indexes. Read up on PostgreSQL's mvcc
implementation, index scans, index-only scans, visibility map, etc.


> garbage collecting old versions
>

As implemented in PostgreSQL by VACUUM


> and then there is an additional cost that I am interested in (again not
> claiming it is unnecessary in any sense) but there is a cost to generating
> the log.
>

The write-ahead log is orthogonal to MVCC. You can have MVCC without WAL
(or other write durability). You can have write durability without MVCC.
The two are almost entirely unrelated.


> Thanks, by the way, for the warning about lab vs. reality. That's why I'm
> asking this question here. I want to keep the hypothetical tagged as such,
> but find defensible and realistic metrics where those exist, i.e. in this
> instance, we do have a database that can use MVCC. It should be possible to
> figure out how much work goes into maintaining that property.
>

MVCC logic is woven deeply thoughout PostgreSQL. I'm not sure how you'd
even begin to offload it in any meaningful way, nor if it'd be useful to do
so. Presumably you're thinking of some way to tell the storage layer "show
me the table as if it has only rows visible to [this xact]" so Pg doesn't
have to do any checking at all. But it's not always that simple. See:

- Logical decoding (time travel)
- VACUUM
- EvalPlanQual, re-checks of updated rows
- ...


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-08 Thread Simon Riggs
On 8 July 2016 at 03:55, Tom Lane  wrote:


> > One of my examples was full text search and it does have
> > DDL, but that was an anti-example; all the feedback I have is that it was
> > much easier to use before it had DDL and that forcing it to use DDL
> pretty
> > much killed it for most users.
>
> That's just unsupported FUD.


No, its me relaying opinions I have heard back to this list, for the
purposes of understanding them.

("Fear, Uncertainty and Doubt" or FUD is doesn't apply here, unless its
meant in the same way as "that's rubbish, I disagree".)


> I would say that most of the problems we've
> had with text search DDL came from the fact that previously people had
> done things in other ways and transitioning was hard.  That experience
> doesn't make me want to repeat it; but building a feature that's supposed
> to be managed by direct catalog updates is precisely repeating that
> mistake.
>
> I'm okay with things like replication configuration being managed outside
> the system catalogs entirely (as indeed they are now).  But if a feature
> has a catalog, it should have DDL to manipulate the catalog.


It's a good rule. In this case all it does is move the discussion to
"should it have a catalog?".

Let me return to my end point from last night: it's becoming clear that
asking the question "DDL or not?" is too high level a thought and is
leading to argument. The most likely answer is "some", but still not sure.
I am looking at this in more detail and will return in a few days with a
much more specific design that we can use to answer the question in detail.


> Direct SQL
> on a catalog should *never* become standard operating procedure.
>

Agreed, but it has always been considered to be something we should
consider when making DDL work.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


[HACKERS] Re: [BUGS] BUG #14230: Wrong timeline returned by pg_stop_backup on a standby

2016-07-08 Thread Marco Nenciarini
On 07/07/16 08:38, Michael Paquier wrote:
> On Thu, Jul 7, 2016 at 12:57 AM, Marco Nenciarini
>  wrote:
>> After further analysis, the issue is that we retrieve the starttli from
>> the ControlFile structure, but it was using ThisTimeLineID when writing
>> the backup label.
>>
>> I've attached a very simple patch that fixes it.
> 
> ThisTimeLineID is always set at 0 on purpose on a standby, so we
> cannot rely on it (well it is set temporarily when recycling old
> segments). At recovery when parsing the backup_label file there is no
> actual use of the start segment name, so that's only a cosmetic
> change. But surely it would be better to get that fixed, because
> that's useful for debugging.
> 
> While looking at your patch, I thought that it would have been
> tempting to use GetXLogReplayRecPtr() to get the timeline ID when in
> recovery, but what we really want to know here is the timeline of the
> last REDO pointer, which is starttli, and that's more consistent with
> the fact that we use startpoint when writing the backup_label file. In
> short, +1 for this fix.
> 
> I am adding that in the list of open items, adding Magnus in CC whose
> commit for non-exclusive backups is at the origin of this defect.
> 

While we were testing the patch we noticed another behavior that is not
strictly a bug, but can confuse backup tools:

To quickly produce some WAL files we were executing a series of
pg_switch_xlog+CHECKPOINT, and we noticed that doing a backup from a
standby after that results in a startpoint higher than the stoppoint.

Let me show it on a brand new master/replica cluster (master is port
5496, replica is 6496). The script is attached.

---
You are now connected to database "postgres" as user "postgres" via
socket in "/tmp" at port "5496".
SELECT pg_is_in_recovery();
-[ RECORD 1 ]-+--
pg_is_in_recovery | f

CHECKPOINT;
CHECKPOINT
SELECT pg_switch_xlog();
-[ RECORD 1 ]--+--
pg_switch_xlog | 0/3E8

CHECKPOINT;
CHECKPOINT
SELECT pg_switch_xlog();
-[ RECORD 1 ]--+--
pg_switch_xlog | 0/4E8

You are now connected to database "postgres" as user "postgres" via
socket in "/tmp" at port "6496".
SELECT pg_is_in_recovery();
-[ RECORD 1 ]-+--
pg_is_in_recovery | t

SELECT pg_start_backup('tst backup',TRUE,FALSE);
-[ RECORD 1 ]---+--
pg_start_backup | 0/428

SELECT * FROM pg_stop_backup(FALSE);
-[ RECORD 1 ]-
lsn| 0/2F8
labelfile  | START WAL LOCATION: 0/428 (file 0004)+
   | CHECKPOINT LOCATION: 0/460   +
   | BACKUP METHOD: streamed  +
   | BACKUP FROM: standby +
   | START TIME: 2016-07-08 10:46:55 CEST +
   | LABEL: tst backup+
   |
spcmapfile |

SELECT * FROM pg_control_checkpoint();
-[ RECORD 1 ]+-
checkpoint_location  | 0/460
prior_location   | 0/260
redo_location| 0/428
redo_wal_file| 00010004
timeline_id  | 1
prev_timeline_id | 1
full_page_writes | t
next_xid | 0:865
next_oid | 12670
next_multixact_id| 1
next_multi_offset| 0
oldest_xid   | 858
oldest_xid_dbid  | 1
oldest_active_xid| 865
oldest_multi_xid | 1
oldest_multi_dbid| 1
oldest_commit_ts_xid | 865
newest_commit_ts_xid | 865
checkpoint_time  | 2016-07-08 10:46:55+02

SELECT * FROM pg_control_recovery();
-[ RECORD 1 ]-+--
min_recovery_end_location | 0/2F8
min_recovery_end_timeline | 1
backup_start_location | 0/0
backup_end_location   | 0/0
end_of_backup_record_required | f

---

In particular, the pg_start_backup LSN is 0/428 and the
pg_stop_backup LSN is 0/2F8.


The same issue is present when you do a backup using pg_basebackup:

---
transaction log start point: 0/828 on timeline 1
pg_basebackup: starting background WAL receiver
22244/22244 kB (100%), 1/1 tablespace
transaction log end point: 0/2F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
---

The resulting backup is working perfectly, because Postgres has no use
for pg_stop_backup LSN, but this can confuse any tool that uses the stop
LSN to figure out which WAL files are needed by the backup (in this case
the only file needed is the one containing the start checkpoint).

After some discussion with Álvaro, my proposal is to avoid that by
returning the stoppoint as the maximum between the startpoint and t

Re: [HACKERS] A Modest Upgrade Proposal

2016-07-08 Thread Petr Jelinek

On 08/07/16 10:59, Simon Riggs wrote:

On 8 July 2016 at 03:55, Tom Lane mailto:t...@sss.pgh.pa.us>> wrote:

> One of my examples was full text search and it does have
> DDL, but that was an anti-example; all the feedback I have is that it was
> much easier to use before it had DDL and that forcing it to use DDL pretty
> much killed it for most users.

That's just unsupported FUD.


No, its me relaying opinions I have heard back to this list, for the
purposes of understanding them.

("Fear, Uncertainty and Doubt" or FUD is doesn't apply here, unless its
meant in the same way as "that's rubbish, I disagree".)

I would say that most of the problems we've
had with text search DDL came from the fact that previously people had
done things in other ways and transitioning was hard.  That experience
doesn't make me want to repeat it; but building a feature that's
supposed
to be managed by direct catalog updates is precisely repeating that
mistake.

I'm okay with things like replication configuration being managed
outside
the system catalogs entirely (as indeed they are now).  But if a feature
has a catalog, it should have DDL to manipulate the catalog.


It's a good rule. In this case all it does is move the discussion to
"should it have a catalog?".



I think it should have catalog for most things. Otherwise it will be 
really hard to track mapping of tables to replication sets, etc. We'd 
have to invent mechanism of tracking dependencies outside of catalog, 
which is not endeavor that I want to go through. The feature is complex 
enough without this. Which means there should be DDL for those things as 
well. So the work I am doing now is based on this assumption. The DDL 
versus function from the point of implementation means tons of 
additional boilerplate code but it's not complex thing to do.


One interesting thing will be making sure we can replicate from physical 
standby in the future as you mentioned elsewhere in the thread but I 
think that should be possible as long as you define the catalogs on 
master (not really sure yet if there are any barriers or not).


About the pg_dump support. While I don't think we'll necessarily want to 
dump all the information related to logical replication (like 
subscriptions), I definitely think we should dump replication sets and 
their membership info for example.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-08 Thread Simon Riggs
On 8 July 2016 at 11:09, Petr Jelinek  wrote:

>
> One interesting thing will be making sure we can replicate from physical
> standby in the future as you mentioned elsewhere in the thread but I think
> that should be possible as long as you define the catalogs on master (not
> really sure yet if there are any barriers or not).
>

Agreed, after having spent the morning working on the details.


> About the pg_dump support. While I don't think we'll necessarily want to
> dump all the information related to logical replication (like
> subscriptions), I definitely think we should dump replication sets and
> their membership info for example.


Agreed

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


[HACKERS] doc: Incorrect return type of IsForeignScanParallelSafe in fdwhandler.sgml

2016-07-08 Thread Etsuro Fujita

Hi,

I noticed that the return type of IsForeignScanParallelSafe described in  
fdwhandler.sgml isn't correct; that should be bool, not Size.  Please  
find attached a small patch for that.


Best regards,
Etsuro Fujita
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 455eef6..4cd79f3 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -1201,7 +1201,7 @@ ImportForeignSchema (ImportForeignSchemaStmt *stmt, Oid serverOid);
 
 
 
-Size
+bool
 IsForeignScanParallelSafe(PlannerInfo *root, RelOptInfo *rel,
   RangeTblEntry *rte);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-08 Thread Craig Ringer
On 8 July 2016 at 09:41, Robert Haas  wrote:


>
> > Personally, I'm in the group of people that don't see the need for DDL.
> > There are already many successful features that don't utilize DDL, such
> as
> > backup, advisory locks and some features that use DDL that don't really
> need
> > to such as LISTEN/NOTIFY, full text search etc.. Also note that both
> Oracle
> > and SQLServer have moved away from DDL in favour of function APIs, most
> > NoSQL databases and almost all languages prefer functional interfaces
> over
> > parsed text languages, so I don't see a huge industry revival for DDL as
> > means of specifying things.
>
> DDL is our standard way of getting things into the system catalogs.
> We have no system catalog metadata that is intended to be populated by
> any means other than DDL.


Replication slots? (Arguably not catalogs, I guess)

Replication origins?



> If you want to add a column to a table, you
> say ALTER TABLE .. ADD COLUMN.  If you want to add a column to an
> extension, you say ALTER EXTENSION .. ADD TABLE.   If you want to add
> an option to a foreign table, you say ALTER FOREIGN TABLE .. OPTIONS
> (ADD ..).  Therefore, I think it is entirely reasonable and obviously
> consistent with existing practice that if you want to add a table to a
> replication set, you should write ALTER REPLICATION SET .. ADD TABLE.
> I don't understand why logical replication should be the one feature
> that departs from the way that all of our other features work.


Because unlike all the other features, it can work usefully *across
versions*.

We have no extension points for DDL.

For function interfaces, we do.

That, alone, makes a function based interface overwhelmingly compelling
unless there are specific things we *cannot reasonably do* without DDL.


> Really, where this jumped the shark for me is when
> you argued that this stuff didn't even need pg_dump support.  Come on.
> This feature doesn't get a pass from handling all of the things that
> every existing similar feature needs to deal with.
>

Well, replication slots and replication origins aren't handled by pg_dump
(or pg_basebackup). So not quite. Nor, for that matter, is streaming
physical replication handled by pg_dumpall. What makes this different?

That said, with pg_dump, the question to me isn't one of "support vs don't
support", it's how it should work and look.

In many cases it's actively undesirable to dump and restore logical
replication state. Most, I'd say. There probably are cases where it's
desirable to retain logical replication state such that restoring a dump
resumes replication, but I challenge you to come up with any sensible and
sane way that can actually be implemented. Especially since you must
obviously consider the possibility of both upstream and downstream being
restored from dumps.

IMO the problem mostly devolves to making sure dumps taken of different DBs
are consistent so new replication sessions can be established safely. And
really, I think it's a separate feature to logical replication its self.

To what extent are you approaching this from the PoV of wanting to use this
in FDW sharding? It's unclear what vision for users you have behind the
things you say must be done, and I'd like to try to move to more concrete
ground. You want DDL? OK, what should it look like? What does it add over a
function based interface? What's cluster-wide and what's DB-local? etc.

FWIW, Petr is working on some code in the area, but I don't know how far
along the work is.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-08 Thread Craig Ringer
On 8 July 2016 at 11:18, Alvaro Herrera  wrote:

> Tom Lane wrote:
> > Simon Riggs  writes:
>
> > > pg_am has existed for decades without supporting DDL
> >
> > That argument has been obsoleted by events ;-) ... and in any case, the
> > reason we went without CREATE ACCESS METHOD for so long was not that we
> > encouraged "INSERT INTO pg_am" but that non-core index AMs were
> > effectively unsupported anyway, until we thought of a reasonable way to
> > let them generate WAL.  Without the WAL stumbling block, I'm sure we
> would
> > have built CREATE ACCESS METHOD long ago.
>
> Note that the alternative to DDL-based replication handling is not
> INSERT INTO pg_replication, but a function-based interface such as
> SELECT pg_replication_node_create(foo, bar); so there's no need to
> hardcode catalog definitions; nor there is a need to skip backup-ability
> of logical replication config: pg_dump support can be added by having it
> output function calls -- not catalog INSERTs!
>

Yeah. Direct DDL on the catalogs is crazy-talk, I can't imagine anyone
seriously suggesting that as an alternative. The only ways ahead are a
function-based interface or DDL.

Personally I strongly favour function-based for this. With named parameters
and default parameters it's nicely readable and self-documenting, so I
don't really buy the usability argument. You get slightly better output
from \h for DDL than from \df for a function, but only marginally, and
that's about it. Now, if we find that there are areas where a function
based interface is actually limiting, sure, lets use DDL. But not just for
the sake of DDL.

Note that you can implement a function based version in extensions for
older versions. This matters for logical replication because one of the
major appeals of it is up-version migration. If we rely on a bunch of new
DDL there isn't going to be a sane way to implement the decoding upstream
side in a way that'll work for connecting to old versions where the output
plugin has been backported as an extension.

Take pg_dump. Can you imagine pg_dump not supporting dumping from older
versions? Well, why should we not try to make it easy and practical to
stream from older versions?

Now, if the consensus here is that "we" don't care about supporting
decoding from the versions of Pg people actually use in the wild and making
it easier for them to move up to newer ones, well, that's why pglogical was
done as an extension. It'll be hard to get enthusiastic about some
re-imagined logical replication in-core that does much less than pglogical
for fewer target versions and fewer use cases though. Especially since "we
should use DDL" seems to have stayed at the hand-waving stage so far, with
no concrete proposals for what that DDL should look like and why it's
better.

The only difference between DDL and no DDL is that a function-based
> interface can be added with a few pg_proc.h entries, whereas the DDL
> stuff requires gram.y additions, new nodes, etc.
>

... and unlike DDL, a function based interface can be exposed for older
versions by extensions.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [BUGS] BUG #14230: Wrong timeline returned by pg_stop_backup on a standby

2016-07-08 Thread Michael Paquier
On Fri, Jul 8, 2016 at 6:40 PM, Marco Nenciarini
 wrote:
> The resulting backup is working perfectly, because Postgres has no use
> for pg_stop_backup LSN, but this can confuse any tool that uses the stop
> LSN to figure out which WAL files are needed by the backup (in this case
> the only file needed is the one containing the start checkpoint).
>
> After some discussion with Álvaro, my proposal is to avoid that by
> returning the stoppoint as the maximum between the startpoint and the
> min_recovery_end_location, in case of backup from the standby.

You are facing a pattern similar to the problem reported already on
this thread by Horiguchi-san:
http://www.postgresql.org/message-id/20160609.215558.118976703.horiguchi.kyot...@lab.ntt.co.jp
And it seems to me that you are jumping to an incorrect conclusion,
what we'd want to do is to update a bit more aggressively the minimum
recovery point in cases on a node in recovery in the case where no
buffers are flushed by other backends.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A Modest Upgrade Proposal

2016-07-08 Thread Petr Jelinek

On 08/07/16 12:47, Craig Ringer wrote:

On 8 July 2016 at 09:41, Robert Haas mailto:robertmh...@gmail.com>> wrote:

If you want to add a column to a table, you
say ALTER TABLE .. ADD COLUMN.  If you want to add a column to an
extension, you say ALTER EXTENSION .. ADD TABLE.   If you want to add
an option to a foreign table, you say ALTER FOREIGN TABLE .. OPTIONS
(ADD ..).  Therefore, I think it is entirely reasonable and obviously
consistent with existing practice that if you want to add a table to a
replication set, you should write ALTER REPLICATION SET .. ADD TABLE.
I don't understand why logical replication should be the one feature
that departs from the way that all of our other features work.


Because unlike all the other features, it can work usefully *across
versions*.


I don't see how that matters for definitions in catalogs though. It's 
not like we want to do any kind of RPC to add table to replication set 
on the remote node.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Michael Paquier
On Fri, Jul 8, 2016 at 4:04 PM, Magnus Hagander  wrote:
> On Friday, July 8, 2016, Michael Paquier  wrote:
>>
>> Hi all,
>>
>> Fujii-san has reminded me of the fact that we do not show in \df+ the
>> parallel status of a function. The output of \df+ is already very
>> large, so I guess that any people mentally sane already use it with
>> the expanded display mode, and it may not matter adding more
>> information.
>> Thoughts about adding this piece of information?
>>
>
> Seems like a good idea to me. It's going to be useful in debugging

Okay. Here we go. I named the column for the parallel information "Parallelism".
-- 
Michael
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index aeffd63..e36efc3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1381,7 +1381,7 @@ testdb=>
 modifier to include system objects.
 If the form \df+ is used, additional information
 about each function is shown, including security classification,
-volatility, owner, language, source code and description.
+volatility, parallelism, owner, language, source code and description.
 
 
 
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 2cdc5ac..8d0e655 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -298,7 +298,7 @@ describeFunctions(const char *functypes, const char *pattern, bool verbose, bool
 	PQExpBufferData buf;
 	PGresult   *res;
 	printQueryOpt myopt = pset.popt;
-	static const bool translate_columns[] = {false, false, false, false, true, true, true, false, false, false, false};
+	static const bool translate_columns[] = {false, false, false, false, true, true, true, true, false, false, false, false};
 
 	if (strlen(functypes) != strspn(functypes, "antwS+"))
 	{
@@ -417,6 +417,11 @@ describeFunctions(const char *functypes, const char *pattern, bool verbose, bool
 		  "  WHEN p.provolatile = 's' THEN '%s'\n"
 		  "  WHEN p.provolatile = 'v' THEN '%s'\n"
 		  " END as \"%s\""
+		  ",\n CASE\n"
+		  "  WHEN p.proparallel = 'r' THEN '%s'\n"
+		  "  WHEN p.proparallel = 's' THEN '%s'\n"
+		  "  WHEN p.proparallel = 'u' THEN '%s'\n"
+		  " END as \"%s\""
    ",\n  pg_catalog.pg_get_userbyid(p.proowner) as \"%s\",\n"
 		  "  l.lanname as \"%s\",\n"
 		  "  p.prosrc as \"%s\",\n"
@@ -428,6 +433,10 @@ describeFunctions(const char *functypes, const char *pattern, bool verbose, bool
 		  gettext_noop("stable"),
 		  gettext_noop("volatile"),
 		  gettext_noop("Volatility"),
+		  gettext_noop("restricted"),
+		  gettext_noop("safe"),
+		  gettext_noop("unsafe"),
+		  gettext_noop("Parallelism"),
 		  gettext_noop("Owner"),
 		  gettext_noop("Language"),
 		  gettext_noop("Source code"),

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Header and comments describing routines in incorrect shape in visibilitymap.c

2016-07-08 Thread Michael Paquier
On Fri, Jul 8, 2016 at 8:31 AM, Michael Paquier
 wrote:
> On Fri, Jul 8, 2016 at 1:18 AM, Tom Lane  wrote:
>> Alvaro Herrera  writes:
>>> Regarding the first hunk, I don't like these INTERFACE sections too
>>> much; they get seriously outdated over the time and aren't all that
>>> helpful anyway.  See the one on heapam.c for example.  I'd rather get
>>> rid of that one instead of patching it.  The rest, of course, merit
>>> revision.
>>
>> +1, as long as we make sure that any useful info therein gets migrated
>> to the per-function header comments rather than dropped.  If there's
>> anything that doesn't seem to fit naturally in any per-function comment,
>> maybe move it into the file header comment?
>
> OK, that removes comment duplication. Also, what about replacing
> "bit(s)" by "one or more bits" in the comment terms where adapted?
> That's bikeshedding, but that's what this patch is about.

Translating my thoughts into code, I get the attached.
-- 
Michael
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index b472d31..3d963c3 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -10,15 +10,6 @@
  * IDENTIFICATION
  *	  src/backend/access/heap/visibilitymap.c
  *
- * INTERFACE ROUTINES
- *		visibilitymap_clear  - clear a bit in the visibility map
- *		visibilitymap_pin	 - pin a map page for setting a bit
- *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set a bit in a previously pinned page
- *		visibilitymap_get_status - get status of bits
- *		visibilitymap_count  - count number of bits set in visibility map
- *		visibilitymap_truncate	- truncate the visibility map
- *
  * NOTES
  *
  * The visibility map is a bitmap with two bits (all-visible and all-frozen)
@@ -34,7 +25,7 @@
  * might not be true.
  *
  * Clearing visibility map bits is not separately WAL-logged.  The callers
- * must make sure that whenever a bit is cleared, the bit is cleared on WAL
+ * must make sure that whenever bits are cleared, the bits are cleared on WAL
  * replay of the updating operation as well.
  *
  * When we *set* a visibility map during VACUUM, we must write WAL.  This may
@@ -78,8 +69,8 @@
  * When a bit is set, the LSN of the visibility map page is updated to make
  * sure that the visibility map update doesn't get written to disk before the
  * WAL record of the changes that made it possible to set the bit is flushed.
- * But when a bit is cleared, we don't have to do that because it's always
- * safe to clear a bit in the map from correctness point of view.
+ * But when bits are cleared, we don't have to do that because it's always
+ * safe to clear bits in the map from correctness point of view.
  *
  *-
  */
@@ -195,13 +186,13 @@ visibilitymap_clear(Relation rel, BlockNumber heapBlk, Buffer buf)
 }
 
 /*
- *	visibilitymap_pin - pin a map page for setting a bit
+ *	visibilitymap_pin - pin a map page for setting one or more bits
  *
- * Setting a bit in the visibility map is a two-phase operation. First, call
- * visibilitymap_pin, to pin the visibility map page containing the bit for
+ * Setting bits in the visibility map is a two-phase operation. First, call
+ * visibilitymap_pin, to pin the visibility map page containing the bits for
  * the heap page. Because that can require I/O to read the map page, you
  * shouldn't hold a lock on the heap page while doing that. Then, call
- * visibilitymap_set to actually set the bit.
+ * visibilitymap_set to actually set the bits.
  *
  * On entry, *buf should be InvalidBuffer or a valid buffer returned by
  * an earlier call to visibilitymap_pin or visibilitymap_get_status on the same
@@ -243,7 +234,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer buf)
 }
 
 /*
- *	visibilitymap_set - set bit(s) on a previously pinned page
+ *	visibilitymap_set - set one or more bits on a previously pinned page
  *
  * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
  * or InvalidXLogRecPtr in normal running.  The page LSN is advanced to the
@@ -340,13 +331,13 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * On entry, *buf should be InvalidBuffer or a valid buffer returned by an
  * earlier call to visibilitymap_pin or visibilitymap_get_status on the same
  * relation. On return, *buf is a valid buffer with the map page containing
- * the bit for heapBlk, or InvalidBuffer. The caller is responsible for
+ * the bits for heapBlk, or InvalidBuffer. The caller is responsible for
  * releasing *buf after it's done testing and setting bits.
  *
  * NOTE: This function is typically called without a lock on the heap page,
- * so somebody else could change the bit just after we look at it.  In fact,
+ * so somebody else could change the bits just after we look at it.  In fact,
  * since we don't lock the visibility map 

[HACKERS] Re: [BUGS] BUG #14230: Wrong timeline returned by pg_stop_backup on a standby

2016-07-08 Thread Marco Nenciarini
On 08/07/16 13:10, Michael Paquier wrote:
> On Fri, Jul 8, 2016 at 6:40 PM, Marco Nenciarini
>  wrote:
>> The resulting backup is working perfectly, because Postgres has no use
>> for pg_stop_backup LSN, but this can confuse any tool that uses the stop
>> LSN to figure out which WAL files are needed by the backup (in this case
>> the only file needed is the one containing the start checkpoint).
>>
>> After some discussion with Álvaro, my proposal is to avoid that by
>> returning the stoppoint as the maximum between the startpoint and the
>> min_recovery_end_location, in case of backup from the standby.
> 
> You are facing a pattern similar to the problem reported already on
> this thread by Horiguchi-san:
> http://www.postgresql.org/message-id/20160609.215558.118976703.horiguchi.kyot...@lab.ntt.co.jp
> And it seems to me that you are jumping to an incorrect conclusion,
> what we'd want to do is to update a bit more aggressively the minimum
> recovery point in cases on a node in recovery in the case where no
> buffers are flushed by other backends.
> 

Yes, it is exactly the same bug. My proposal was based on the assumption
that it were only a cosmetic issue, but given that it can trigger
errors, I agree that the right solution is to advance the  minimum
recovery point in that case.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Amit Kapila
On Fri, Jul 8, 2016 at 5:27 PM, Michael Paquier
 wrote:
> On Fri, Jul 8, 2016 at 4:04 PM, Magnus Hagander  wrote:
>> On Friday, July 8, 2016, Michael Paquier  wrote:
>>>
>>> Hi all,
>>>
>>> Fujii-san has reminded me of the fact that we do not show in \df+ the
>>> parallel status of a function. The output of \df+ is already very
>>> large, so I guess that any people mentally sane already use it with
>>> the expanded display mode, and it may not matter adding more
>>> information.
>>> Thoughts about adding this piece of information?
>>>
>>
>> Seems like a good idea to me. It's going to be useful in debugging
>
> Okay. Here we go. I named the column for the parallel information 
> "Parallelism".
>

Another option could be to name it as Parallel Mode.  We are using
that in the description of "Parallel" in "Create Function"
documentation.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Pete Stevenson
Good info, thanks for the note. Agreed that it is difficult to pull things 
apart to isolate these features for offload — so actually running experiments 
with offload is not possible, as you point out (and for other reasons).

Maybe I could figure out the lines of code that add versions into a table and 
then those that collect old versions (they do get collected, right?). Anyway, 
thought being I could profile while running TPC-C or similar. I was hoping that 
someone might be able to jump on this with a response that they already did 
something similar. I know that Stonebraker has done some analysis along these 
lines, but I’m looking for an independent result that confirms (or not) his 
work.

Thank you,
Pete Stevenson


> On Jul 7, 2016, at 3:43 PM, Simon Riggs  wrote:
> 
> On 7 July 2016 at 20:50, Pete Stevenson  > wrote:
> Hi Simon -
> 
> Thanks for the note. I think it's fair to say that I didn't provide enough 
> context, so let me try and elaborate on my question.
> 
> I agree, MVCC is a benefit. The research angle here is about enabling MVCC 
> with hardware offload. Since I didn't explicitly mention it, the offload I 
> refer to will respect all consistency guarantees also.
> 
> It is the case that for the database to implement MVCC it must provide 
> consistent read to multiple different versions of data, i.e. depending on the 
> version used at transaction start. I'm not an expert on postgresql internals, 
> but this must have some cost. I think the cost related to MVCC guarantees can 
> roughly be categorized as: creating new versions (linking them in), version 
> checking on read, garbage collecting old versions, and then there is an 
> additional cost that I am interested in (again not claiming it is unnecessary 
> in any sense) but there is a cost to generating the log.
> 
> Thanks, by the way, for the warning about lab vs. reality. That's why I'm 
> asking this question here. I want to keep the hypothetical tagged as such, 
> but find defensible and realistic metrics where those exist, i.e. in this 
> instance, we do have a database that can use MVCC. It should be possible to 
> figure out how much work goes into maintaining that property.
> 
> PostgreSQL uses a no overwrite storage mechanism, so any additional row 
> versions are maintained in the same table alongside other rows. The MVCC 
> actions are mostly mixed in with other aspects of the storage, so not 
> isolated much for offload.
> 
> Oracle has a different mechanism that does isolate changed row versions into 
> a separate data structure, so would be much more amenable to offload than 
> PostgreSQL, in its current form.
> 
> Maybe look at SLRUs (clog etc) as a place to offload something?
> 
> -- 
> Simon Riggshttp://www.2ndQuadrant.com/ 
> 
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Reviewing freeze map code

2016-07-08 Thread Amit Kapila
On Thu, Jul 7, 2016 at 12:34 PM, Masahiko Sawada  wrote:
> Than you for reviewing!
>
> On Thu, Jul 7, 2016 at 7:06 AM, Andres Freund  wrote:
>> On 2016-07-05 23:37:59 +0900, Masahiko Sawada wrote:
>>> diff --git a/src/backend/access/heap/heapam.c 
>>> b/src/backend/access/heap/heapam.c
>>> index 57da57a..fd66527 100644
>>> --- a/src/backend/access/heap/heapam.c
>>> +++ b/src/backend/access/heap/heapam.c
>>> @@ -3923,6 +3923,17 @@ l2:
>>>
>>>   if (need_toast || newtupsize > pagefree)
>>>   {
>>> + /*
>>> +  * To prevent data corruption due to updating old tuple by
>>> +  * other backends after released buffer
>>
>> That's not really the reason, is it? The prime problem is crash safety /
>> replication. The row-lock we're faking (by setting xmax to our xid),
>> prevents concurrent updates until this backend died.
>
> Fixed.
>
>>> , we need to emit that
>>> +  * xmax of old tuple is set and clear visibility map bits if
>>> +  * needed before releasing buffer. We can reuse xl_heap_lock
>>> +  * for this purpose. It should be fine even if we crash midway
>>> +  * from this section and the actual updating one later, since
>>> +  * the xmax will appear to come from an aborted xid.
>>> +  */
>>> + START_CRIT_SECTION();
>>> +
>>>   /* Clear obsolete visibility flags ... */
>>>   oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
>>>   oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
>>> @@ -3936,6 +3947,46 @@ l2:
>>>   /* temporarily make it look not-updated */
>>>   oldtup.t_data->t_ctid = oldtup.t_self;
>>>   already_marked = true;
>>> +
>>> + /* Clear PD_ALL_VISIBLE flags */
>>> + if (PageIsAllVisible(BufferGetPage(buffer)))
>>> + {
>>> + all_visible_cleared = true;
>>> + PageClearAllVisible(BufferGetPage(buffer));
>>> + visibilitymap_clear(relation, 
>>> BufferGetBlockNumber(buffer),
>>> + vmbuffer);
>>> + }
>>> +
>>> + MarkBufferDirty(buffer);
>>> +
>>> + if (RelationNeedsWAL(relation))
>>> + {
>>> + xl_heap_lock xlrec;
>>> + XLogRecPtr recptr;
>>> +
>>> + /*
>>> +  * For logical decoding we need combocids to properly 
>>> decode the
>>> +  * catalog.
>>> +  */
>>> + if (RelationIsAccessibleInLogicalDecoding(relation))
>>> + log_heap_new_cid(relation, &oldtup);
>>
>> Hm, I don't see that being necessary here. Row locks aren't logically
>> decoded, so there's no need to emit this here.
>
> Fixed.
>
>>
>>> + /* Clear PD_ALL_VISIBLE flags */
>>> + if (PageIsAllVisible(page))
>>> + {
>>> + Buffer  vmbuffer = InvalidBuffer;
>>> + BlockNumber block = BufferGetBlockNumber(*buffer);
>>> +
>>> + all_visible_cleared = true;
>>> + PageClearAllVisible(page);
>>> + visibilitymap_pin(relation, block, &vmbuffer);
>>> + visibilitymap_clear(relation, block, vmbuffer);
>>> + }
>>> +
>>
>> That clears all-visible unnecessarily, we only need to clear all-frozen.
>>
>
> Fixed.
>
>>
>>> @@ -8694,6 +8761,23 @@ heap_xlog_lock(XLogReaderState *record)
>>>   }
>>>   HeapTupleHeaderSetXmax(htup, xlrec->locking_xid);
>>>   HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
>>> +
>>> + /* The visibility map need to be cleared */
>>> + if ((xlrec->infobits_set & XLHL_ALL_VISIBLE_CLEARED) != 0)
>>> + {
>>> + RelFileNode rnode;
>>> + Buffer  vmbuffer = InvalidBuffer;
>>> + BlockNumber blkno;
>>> + Relationreln;
>>> +
>>> + XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
>>> + reln = CreateFakeRelcacheEntry(rnode);
>>> +
>>> + visibilitymap_pin(reln, blkno, &vmbuffer);
>>> + visibilitymap_clear(reln, blkno, vmbuffer);
>>> + PageClearAllVisible(page);
>>> + }
>>> +
>>
>>
>>>   PageSetLSN(page, lsn);
>>>   MarkBufferDirty(buffer);
>>>   }
>>> diff --git a/src/include/access/heapam_xlog.h 
>>> b/src/include/access/heapam_xlog.h
>>> index a822d0b..41b3c54 100644
>>> --- a/src/include/access/heapam_xlog.h
>>> +++ b/src/include/access/heapam_xlog.h
>>> @@ -242,6 +242,7 @@ typedef struct xl_heap_cleanup_info
>>>  #define XLHL_XMAX_EXCL_LOCK  0x04
>>>  #define XLHL_XMAX_KEYSHR_LOCK0x08
>>>  #define XLHL_KEYS_UPDATED0x10
>>> +#d

Re: [HACKERS] can we optimize STACK_DEPTH_SLOP

2016-07-08 Thread Greg Stark
On Fri, Jul 8, 2016 at 4:46 AM, Tom Lane  wrote:
> Based on what I'm seeing so far, really 100K ought to be more than plenty
> of slop for most architectures, but I'm afraid to go there for IA64.

Searching for info on ia64 turned up this interesting thread:

https://www.postgresql.org/message-id/21563.1289064886%40sss.pgh.pa.us

>From that discussion it seems we should probably run these tests with
-O0 because the stack usage can be substantially higher without
optimizations. And it doesn't sound like ia64 uses much more *normal*
stack, just that there's the additional register stack.

It might not be unreasonable to commit the pmap hack, gather the data
from the build farm then later add an #ifdef around it. (or just make
it #ifdef USE_ASSERTIONS which I assume most build farm members are
running with anyways).

Alternatively it wouldn't be very hard to use mincore(2) to implement
it natively. I believe mincore is nonstandard but present in Linux and
BSD.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] can we optimize STACK_DEPTH_SLOP

2016-07-08 Thread Tom Lane
Greg Stark  writes:
> Searching for info on ia64 turned up this interesting thread:
> https://www.postgresql.org/message-id/21563.1289064886%40sss.pgh.pa.us

Yeah, that's the same one I referenced upthread ;-)

> From that discussion it seems we should probably run these tests with
> -O0 because the stack usage can be substantially higher without
> optimizations. And it doesn't sound like ia64 uses much more *normal*
> stack, just that there's the additional register stack.

> It might not be unreasonable to commit the pmap hack, gather the data
> from the build farm then later add an #ifdef around it. (or just make
> it #ifdef USE_ASSERTIONS which I assume most build farm members are
> running with anyways).

Hmm.  The two IA64 critters in the farm are running HPUX, which means
they likely don't have pmap.  But I could clean up the hack I used to
gather stack size data on gaur's host and commit it temporarily.
On non-HPUX platforms we could just try system("pmap -x") and see what
happens; as long as we're ignoring the result it shouldn't cause anything
really bad.

I was going to object that this would probably not tell us anything
about the worst-case IA64 stack usage, but I see that neither of those
critters are selecting any optimization, so actually it would.

So, agreed, let's commit some temporary debug code and see what the
buildfarm can teach us.  Will go work on that in a bit.

> Alternatively it wouldn't be very hard to use mincore(2) to implement
> it natively. I believe mincore is nonstandard but present in Linux and
> BSD.

Hm, after reading the man page I don't quite see how that would help?
You'd have to already know the mapped stack address range in order to
call the function without getting ENOMEM.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Tom Lane
Magnus Hagander  writes:
> On Friday, July 8, 2016, Michael Paquier  wrote:
>> Fujii-san has reminded me of the fact that we do not show in \df+ the
>> parallel status of a function. The output of \df+ is already very
>> large, so I guess that any people mentally sane already use it with
>> the expanded display mode, and it may not matter adding more
>> information.
>> Thoughts about adding this piece of information?

> Seems like a good idea to me. It's going to be useful in debugging

If we're going to change \df+ at all, could I lobby for putting the Owner
column next to Security?  They're logically related, and not related to
Volatility which somehow got crammed between.  So I'm imagining the column
order as

Schema   | Name | Result data type | Argument data types |  Type  | Security | 
Owner | Volatility | Parallel | Language | Source code | Description 

Or maybe Owner then Security.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Tom Lane
Amit Kapila  writes:
> On Fri, Jul 8, 2016 at 5:27 PM, Michael Paquier
>  wrote:
>> Okay. Here we go. I named the column for the parallel information 
>> "Parallelism".

> Another option could be to name it as Parallel Mode.

I'd go with just "Parallel", to keep it from being noticeably wider than
any of the possible column contents.  Just because you're arguing that
\df+ output is already unreadable in non-expanded mode doesn't mean it's
a good idea to throw away horizontal space for nothing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Amit Kapila  writes:
> > On Fri, Jul 8, 2016 at 5:27 PM, Michael Paquier
> >  wrote:
> >> Okay. Here we go. I named the column for the parallel information 
> >> "Parallelism".
> 
> > Another option could be to name it as Parallel Mode.
> 
> I'd go with just "Parallel", to keep it from being noticeably wider than
> any of the possible column contents.  Just because you're arguing that
> \df+ output is already unreadable in non-expanded mode doesn't mean it's
> a good idea to throw away horizontal space for nothing.

Agreed.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Magnus Hagander  writes:
> > On Friday, July 8, 2016, Michael Paquier  wrote:
> >> Fujii-san has reminded me of the fact that we do not show in \df+ the
> >> parallel status of a function. The output of \df+ is already very
> >> large, so I guess that any people mentally sane already use it with
> >> the expanded display mode, and it may not matter adding more
> >> information.
> >> Thoughts about adding this piece of information?
> 
> > Seems like a good idea to me. It's going to be useful in debugging
> 
> If we're going to change \df+ at all, could I lobby for putting the Owner
> column next to Security?  They're logically related, and not related to
> Volatility which somehow got crammed between.  So I'm imagining the column
> order as
> 
> Schema   | Name | Result data type | Argument data types |  Type  | Security 
> | Owner | Volatility | Parallel | Language | Source code | Description 
> 
> Or maybe Owner then Security.

I've always wondered why there isn't any way to see the ACL for the
function through \d commands.  I'd suggest including that in \df+ also.
Note that \dn+, \dL+ and \db+, for example, include access privs for
those object types.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-08 Thread Kevin Grittner
On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund  wrote:

> So I don't think that approach still allows old snapshot related
> cleanups for toast triggered vacuums?  Is that an acceptable
> restriction?

What I would rather see is that if the heap is vacuumed (whether or
not by autovacuum) then the related TOAST table is also vacuumed
(using the same horizon the heap used), but if the TOAST relation
is chosen for vacuum by itself that it does not attempt to adjust
the horizon based on old_snapshot_threshold.  I am looking to see
how to make that happen; expect a new patch Monday.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] doc: Incorrect return type of IsForeignScanParallelSafe in fdwhandler.sgml

2016-07-08 Thread Tom Lane
Etsuro Fujita  writes:
> I noticed that the return type of IsForeignScanParallelSafe described in  
> fdwhandler.sgml isn't correct; that should be bool, not Size.  Please  
> find attached a small patch for that.

Pushed, thanks!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] minor plpgsql doc patch

2016-07-08 Thread Tom Lane
Fabien COELHO  writes:
> The very minor patch attached improves the PL/pgSQL documentation about 
> trigger functions. It moves the description common to both data change & 
> database event triggers out of the first section and into a common header. 
> It adds a link at the beginning of the sections to their corresponding 
> generic chapters.

Looks like a good idea.  Pushed with slight wording adjustments.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Alvaro Herrera
Tom Lane wrote:
> Magnus Hagander  writes:
> > On Friday, July 8, 2016, Michael Paquier  wrote:
> >> Fujii-san has reminded me of the fact that we do not show in \df+ the
> >> parallel status of a function. The output of \df+ is already very
> >> large, so I guess that any people mentally sane already use it with
> >> the expanded display mode, and it may not matter adding more
> >> information.
> >> Thoughts about adding this piece of information?
> 
> > Seems like a good idea to me. It's going to be useful in debugging
> 
> If we're going to change \df+ at all, could I lobby for putting the Owner
> column next to Security?  They're logically related, and not related to
> Volatility which somehow got crammed between.  So I'm imagining the column
> order as
> 
> Schema   | Name | Result data type | Argument data types |  Type  | Security 
> | Owner | Volatility | Parallel | Language | Source code | Description 
> 
> Or maybe Owner then Security.

Agreed.

As a separate concern, IMO having the source code in a \df+ column is
almost completely useless.  I propose to split that out to a separate
\df command (say \df% or \df/) that shows *only* the source code.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] minor plpgsql doc patch

2016-07-08 Thread Fabien COELHO



The very minor patch attached improves the PL/pgSQL documentation about
trigger functions. It moves the description common to both data change &
database event triggers out of the first section and into a common header.
It adds a link at the beginning of the sections to their corresponding
generic chapters.


Looks like a good idea.  Pushed with slight wording adjustments.


Fine!

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Alvaro Herrera
Pete Stevenson wrote:

> Maybe I could figure out the lines of code that add versions into a
> table and then those that collect old versions (they do get collected,
> right?). Anyway, thought being I could profile while running TPC-C or
> similar. I was hoping that someone might be able to jump on this with
> a response that they already did something similar.

Old tuple versions are "collected" (removed) by either vacuum (see
vacuumlazy.c) and heap_page_prune.  The latter is one thing that could
perhaps somehow be offloaded, as it's quite independent from the other
stuff.  You can prune removable tuples at no additional cost from an
unlocked dirty page, which is a useful optimization because then
client-connected backends don't need to prune them later.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-08 Thread Andres Freund
On 2016-07-08 11:00:50 -0500, Kevin Grittner wrote:
> On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund  wrote:
> 
> > So I don't think that approach still allows old snapshot related
> > cleanups for toast triggered vacuums?  Is that an acceptable
> > restriction?
> 
> What I would rather see is that if the heap is vacuumed (whether or
> not by autovacuum) then the related TOAST table is also vacuumed
> (using the same horizon the heap used), but if the TOAST relation
> is chosen for vacuum by itself that it does not attempt to adjust
> the horizon based on old_snapshot_threshold.

Uh, wouldn't that quote massively regress the autovacuum workload in
some cases? There's a reason they're considered separately after
all. And in many cases, even if there's lots of updates in the heap
table, the toast table doesn't get any updates. And the toast table is
often a lot larger than the data.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Logical decoding

2016-07-08 Thread Joshua Bay
Hi,


I have a question about logical decoding of Postgres.


where are the entry points to logical decoding?

Specifically, we want to know whether logical decoding happens immediately
after commit, or whether there is a polling thread that scans the Write
Ahead Log and then dumps to the special table?

and where are this code is in the codebase?

Thanks,
Joshua


[HACKERS] Fix Error Message for allocate_recordbuf() Failure

2016-07-08 Thread Shoaib Lari
Hello,

Attached is a patch for xlogreader.c for a more informative error message
for allocate_recordbuf() failure.

The patch details are:


   - Project name.: None



   - Uniquely identifiable file name, so we can tell difference between
   your v1 and v24.:  src/backend/access/transam/xlogreader.c



   - What the patch does in a short paragraph.:

The patch enhances the error message when memory cannot be allocated to
extend the readRecordBuf.  The previous error message was implying an
invalid (too long) record length and also did not specify how much memory
we were trying to allocate when the memory allocation failure happened.

We have observed failed allocations in production systems where the
reclength was only 344 bytes, but the extended allocation of 5 *
Max(BLKSIZE, XLOG_BLCKSZ) failed due to out of memory.



   - Whether the patch is for discussion or for application (see WIP notes
   below): for application.



   - Which branch the patch is against (ordinarily this will be *master*).
   For more on branches in PostgreSQL, see Using Back Branches
   .:
master



   - Whether it compiles and tests successfully, so we know nothing obvious
   is broken.:  Yes.  We ran make check and all 166 tests passed.



   - Whether it contains any platform-specific items and if so, has it been
   tested on other platforms.:  Nothing platform specific.



   - Confirm that the patch includes regression tests
    to check
   the new feature actually works as described.:  None.



   - Include documentation on how to use the new feature, including
   examples. See the documentation documentation
    for more
   information.:  None needed.



   - Describe the effect your patch has on performance, if any.:  None.



   - Try to include a few lines about why you chose to do things particular
   ways, rather than let your reviewer guess what was happening. This can be
   done as code comments, but it might also be an additional reviewers' guide,
   or additions to a README file in one of the code directories.:

Besides making the error message more informative, we had to modify
allocate_recordbuf() to return the actual number of bytes that were being
allocated.



   - If your patch addresses a Todo
 item,
   please give the name of the Todo item in your email. This is so that the
   reviewers will know that the item needs to be marked as done if your patch
   is applied.:  None.



Thank you.

Shoaib Lari


fix_error_message_for_allocate_recordbuf_failure.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange explain in upstream - subplan 1 twice - is it bug?

2016-07-08 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jun 1, 2016 at 7:29 AM, Pavel Stehule  wrote:
>> When I tested some queries, I found strange plan
>> postgres=# explain analyze select s.nazev, o.nazev, o.pocet_obyvatel from
>> (select nazev, array(select id from obce_pocet_obyvatel where okresy.id =
>> okres_id order by pocet_obyvatel desc limit 3) as obceids from okresy) s
>> join obce_pocet_obyvatel o on o.id = ANY(obceids) order by 1, 3 desc;

> The EXPLAIN plan you posted certainly looks weird, since I wouldn't
> expect SubPlan 1 to be displayed twice, but I'm wondering if it's a
> display artifact rather than an actual defect in the plan.

It is an artifact.  The reason is that the same SubPlan appears in both
indexqual and indexqualorig of the IndexScan node.  (I'm not sure it's
physically the same SubPlan node both places, and indeed that might vary
depending on whether the plan tree had gotten copied; but they've got the
same plan_id and thus refer to the same sub-plan within the PlannedStmt's
subplans list.)  We run ExecInitExpr on both, so we end up with two
SubPlanState nodes that are both linked into the IndexScanState's subPlan
list, and that's what explain.c prints from.  They're pointing at the same
subplan state tree, which is why you always see identical stats.

The reason you don't see two copies without ANALYZE is that in
EXPLAIN_ONLY mode, ExecInitIndexScan quits before doing
ExecIndexBuildScanKeys, so the indexqual copy isn't ExecInitExpr'd.

A crude way to improve this would be to have ExplainSubPlans check for
duplicate plan_id values and not print the same plan_id more than once.
I think we might have to do that globally across the whole plan tree,
not just per invocation of ExplainSubPlans, because in bitmap scan
cases the "indexqualorig" equivalent is in the parent BitmapHeapScan
node while the "indexqual" equivalent is in the child BitmapIndexScan.
So the duplicate subplans might not be in the same plan node's subPlan
list.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Tom Lane
Alvaro Herrera  writes:
> Pete Stevenson wrote:
>> Maybe I could figure out the lines of code that add versions into a
>> table and then those that collect old versions (they do get collected,
>> right?). Anyway, thought being I could profile while running TPC-C or
>> similar. I was hoping that someone might be able to jump on this with
>> a response that they already did something similar.

> Old tuple versions are "collected" (removed) by either vacuum (see
> vacuumlazy.c) and heap_page_prune.  The latter is one thing that could
> perhaps somehow be offloaded, as it's quite independent from the other
> stuff.  You can prune removable tuples at no additional cost from an
> unlocked dirty page, which is a useful optimization because then
> client-connected backends don't need to prune them later.

VACUUM in itself is an offloading optimization; the whole point of it
is to do maintenance in a background process not foreground queries.
AFAIR, heap_page_prune is just a small subset of VACUUM work that
we decided we could afford to do in foreground.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-08 Thread Kevin Grittner
On Fri, Jul 8, 2016 at 12:53 PM, Andres Freund  wrote:
> On 2016-07-08 11:00:50 -0500, Kevin Grittner wrote:
>> On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund  wrote:
>>
>> > So I don't think that approach still allows old snapshot related
>> > cleanups for toast triggered vacuums?  Is that an acceptable
>> > restriction?
>>
>> What I would rather see is that if the heap is vacuumed (whether or
>> not by autovacuum) then the related TOAST table is also vacuumed
>> (using the same horizon the heap used), but if the TOAST relation
>> is chosen for vacuum by itself that it does not attempt to adjust
>> the horizon based on old_snapshot_threshold.
>
> Uh, wouldn't that quote massively regress the autovacuum workload in
> some cases? There's a reason they're considered separately after
> all. And in many cases, even if there's lots of updates in the heap
> table, the toast table doesn't get any updates. And the toast table is
> often a lot larger than the data.

Of course, the toast table has only one index, and it is narrow.
With the visibility map, it should visit only the needed pages in
the toast's heap area, so any regression would be in the case that:

(1)  old_snapshot_threshold >= 0
(2)  the "normal" heap met the conditions for vacuum, but the heap
 didn't
(3)  when passing the toast heap based on visibility map, *some*
 cleanup was done (otherwise the TID list would be empty, so no
 index pass is needed)

Any extra work would be at least partially offset by pushing back
the point where the next vacuum of toast data would be needed and
by removing index entries and keeping both the toast data and index
smaller.  I'm sure you could find cases where there was a net
performance loss, but I'm also sure that by containing toast size
when it would otherwise grow for weeks or months, it could be a
very large performance gain.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Pete Stevenson wrote:
> >> Maybe I could figure out the lines of code that add versions into a
> >> table and then those that collect old versions (they do get collected,
> >> right?). Anyway, thought being I could profile while running TPC-C or
> >> similar. I was hoping that someone might be able to jump on this with
> >> a response that they already did something similar.
> 
> > Old tuple versions are "collected" (removed) by either vacuum (see
> > vacuumlazy.c) and heap_page_prune.  The latter is one thing that could
> > perhaps somehow be offloaded, as it's quite independent from the other
> > stuff.  You can prune removable tuples at no additional cost from an
> > unlocked dirty page, which is a useful optimization because then
> > client-connected backends don't need to prune them later.
> 
> VACUUM in itself is an offloading optimization; the whole point of it
> is to do maintenance in a background process not foreground queries.

Well, if VACUUM worked so great, we wouldn't get so many trouble reports
with it.  There's substantial improvement we could make in that area.

> AFAIR, heap_page_prune is just a small subset of VACUUM work that
> we decided we could afford to do in foreground.

Sure, but we could *also* do it separately, splitting VACUUMs tasks of
tuple freezing, page compaction, and index entry removal each into
separate tasks.

Currently vacuuming a 4TB table can take weeks, meanwhile dead tuples
accumulate in already scanned pages leading to further bloat, leading to
Xid wraparound danger later, emergency vacuuming leading to applications
blocking on DDL.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Tom Lane
Alvaro Herrera  writes:
> As a separate concern, IMO having the source code in a \df+ column is
> almost completely useless.

Good point.  It works okay for C/internal functions, but in those cases
it's usually redundant with the proname.  For PL functions it's a disaster
formatting-wise, because they're often wide and/or multi-line.

> I propose to split that out to a separate
> \df command (say \df% or \df/) that shows *only* the source code.

As to those names, ick.  Also, what do you envision the output looking
like when multiple functions are selected?  Or would you ban wildcards?
If you do, it's not clear what this does that \sf doesn't do better.

Maybe, given the existence of \sf, we should just drop prosrc from \df+
altogether.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] pg_basebackup from disconnected standby fails

2016-07-08 Thread Alvaro Herrera
Kyotaro HORIGUCHI wrote:

> At Fri, 10 Jun 2016 17:39:59 +0900, Michael Paquier 
>  wrote in 
> 
> > On Thu, Jun 9, 2016 at 9:55 PM, Kyotaro HORIGUCHI
> >  wrote:

> > Indeed, and you could just do the following to reproduce the failure
> > with the recovery test suite, so I would suggest adding this test in
> > the patch:
> > --- a/src/test/recovery/t/001_stream_rep.pl
> > +++ b/src/test/recovery/t/001_stream_rep.pl
> > @@ -24,6 +24,11 @@ $node_standby_1->start;
> >  # pg_basebackup works on a standby).
> >  $node_standby_1->backup($backup_name);
> > 
> > +# Take a second backup of the standby while the master is offline.
> > +$node_master->stop;
> > +$node_standby_1->backup('my_backup_2');
> > +$node_master->start;
> 
> I'm not sure that adding the test case for a particular bug like
> this is appropriate. But it would be acceptable because it
> doesn't take long time and it is separate from standard checks.

The reason this test is appropiate is that it's testing a feature we
want to support, namely that taking a backup from a standby works even
when the master is stopped.  It's not a test for this particular bug,
even though the feature doesn't work because of this bug.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> VACUUM in itself is an offloading optimization; the whole point of it
>> is to do maintenance in a background process not foreground queries.

> Well, if VACUUM worked so great, we wouldn't get so many trouble reports
> with it.  There's substantial improvement we could make in that area.

Sure, and we've been chipping away at that problem over time, including
some significant improvement in 9.6.  My point is just that it's a good
idea to understand VACUUM as being some pre-existing work that's related
to this offloading idea.

> Sure, but we could *also* do it separately, splitting VACUUMs tasks of
> tuple freezing, page compaction, and index entry removal each into
> separate tasks.

Uh ... wouldn't that tend to make things worse?  The knocks on VACUUM are
too much I/O and too much latency for cleanup, and I can't see how
splitting it does anything good on either score.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Kevin Grittner
On Thu, Jul 7, 2016 at 11:45 AM, Pete Stevenson
 wrote:

> I would like to find some analysis (published work, blog posts)
> on the overheads affiliated with the guarantees provided by MVCC
> isolation.

There are three levels of isolation implemented[1]; the incremental
cost of SERIALIZABLE isolation over REPEATABLE READ for several
standard benchmarking loads is available in section 8 of a paper
presented an a VLDB conference[2].

Hopefully that helps some.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]  PostgreSQL current online documentation.  Transaction Isolation.
 https://www.postgresql.org/docs/current/static/transaction-iso.html

[2]  Dan R. K. Ports and Kevin Grittner. 2012.
 Serializable Snapshot Isolation in PostgreSQL.
 Proceedings of the VLDB Endowment, Vol. 5, No. 12.
 The 38th International Conference on Very Large Data Bases,
 August 27th - 31st 2012, Istanbul, Turkey.
 http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Peter Geoghegan
On Fri, Jul 8, 2016 at 11:44 AM, Tom Lane  wrote:
>> Sure, but we could *also* do it separately, splitting VACUUMs tasks of
>> tuple freezing, page compaction, and index entry removal each into
>> separate tasks.
>
> Uh ... wouldn't that tend to make things worse?  The knocks on VACUUM are
> too much I/O and too much latency for cleanup, and I can't see how
> splitting it does anything good on either score.

Has anyone ever done any kind of write-up of the "TED" design that was
discussed during FOSDEM (I hope I recall the name it was given
correctly)? Apparently that's something that's been discussed a few
times among senior community members, and I think it has promise.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-08 Thread Andres Freund
On 2016-07-08 13:32:35 -0500, Kevin Grittner wrote:
> On Fri, Jul 8, 2016 at 12:53 PM, Andres Freund  wrote:
> > On 2016-07-08 11:00:50 -0500, Kevin Grittner wrote:
> >> On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund  wrote:
> >>
> >> > So I don't think that approach still allows old snapshot related
> >> > cleanups for toast triggered vacuums?  Is that an acceptable
> >> > restriction?
> >>
> >> What I would rather see is that if the heap is vacuumed (whether or
> >> not by autovacuum) then the related TOAST table is also vacuumed
> >> (using the same horizon the heap used), but if the TOAST relation
> >> is chosen for vacuum by itself that it does not attempt to adjust
> >> the horizon based on old_snapshot_threshold.
> >
> > Uh, wouldn't that quote massively regress the autovacuum workload in
> > some cases? There's a reason they're considered separately after
> > all. And in many cases, even if there's lots of updates in the heap
> > table, the toast table doesn't get any updates. And the toast table is
> > often a lot larger than the data.
> 
> Of course, the toast table has only one index, and it is narrow.

But that index and the table are often large...

> With the visibility map, it should visit only the needed pages in
> the toast's heap area, so any regression would be in the case that:
> 
> (1)  old_snapshot_threshold >= 0
> (2)  the "normal" heap met the conditions for vacuum, but the heap
>  didn't
> (3)  when passing the toast heap based on visibility map, *some*
>  cleanup was done (otherwise the TID list would be empty, so no
>  index pass is needed)

Unfortunately btree performs an index scan, even if there's no tids to
clean up. See the unconditional calls to
lazy_cleanup_index()->amvacuumcleanup(). C.f.
/*
 * If btbulkdelete was called, we need not do anything, just return the
 * stats from the latest btbulkdelete call.  If it wasn't called, we 
must
 * still do a pass over the index, to recycle any newly-recyclable pages
 * and to obtain index statistics.
 *
 * Since we aren't going to actually delete any leaf items, there's no
 * need to go through all the vacuum-cycle-ID pushups.


> but I'm also sure that by containing toast size
> when it would otherwise grow for weeks or months, it could be a
> very large performance gain.

That's an argument for changing autovacuum heuristics, not for making
this change as a side-effect of a bugfix.


I'm a bit confused, why aren't we simply adding LSN interlock checks for
toast? Doesn't look that hard? Seems like a much more natural course of
fixing this issue?


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Merlin Moncure
On Thu, Jul 7, 2016 at 11:45 AM, Pete Stevenson
 wrote:
> Hi postgresql hackers -
>
> I would like to find some analysis (published work, blog posts) on the 
> overheads affiliated with the guarantees provided by MVCC isolation. More 
> specifically, assuming the current workload is CPU bound (as opposed to IO) 
> what is the CPU overhead of generating the WAL, the overhead of version 
> checking and version creation, and of garbage collecting old and unnecessary 
> versions? For what it’s worth, I am working on a research project where it is 
> envisioned that some of this work can be offloaded.

That's going to be hard to measure.   First, what you didn't say is,
'with respect to what?'. You mention WAL for example.  WAL is more of
a crash safety mechanism than anything and it's not really fair to
include it in an analysis of 'MVCC overhead', or at least not
completely.  One thing that MVCC *does* objectively cause is bloat,
although you can still get bloat without MVCC if you (for example)
delete rows or rewrite rows such that they can't fit in their old
slot.

MVCC definitely incurs some runtime overhead to check visibility but
the amount of overhead is highly dependent on the specific workload.
Postgres 'hint bits' reduce the cost to near zero for many workloads
but in other workloads they are expensive to maintain and cause a lot
of extra traffic.   One nice feature about not having to worry about
visibility is that you can read data directly out of the index.  We
have some workarounds to deal with that ('all visible bit') but again
the amount of benefit from that strategy is going to be very situation
specific.

Stepping back, the overhead of MVCC in postgres (and probably other
systems too) has been continually reduced over the years -- the really
nasty parts have been relegated to background cleanup processing.
That processing is pretty sequential and the 'i/o bottleneck' is
finally getting solved on cheap storage pushing things back into the
cpu space.

In summary, I think the future of MVCC and transactional systems is
very bright, and the data management systems that discard
transactional safety in order to get some short term performance gains
is, uh, not so bright.  Transactions are essential in systems where
data integrity matters.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Pavel Stehule
2016-07-08 20:39 GMT+02:00 Tom Lane :

> Alvaro Herrera  writes:
> > As a separate concern, IMO having the source code in a \df+ column is
> > almost completely useless.
>
> Good point.  It works okay for C/internal functions, but in those cases
> it's usually redundant with the proname.  For PL functions it's a disaster
> formatting-wise, because they're often wide and/or multi-line.
>
> > I propose to split that out to a separate
> > \df command (say \df% or \df/) that shows *only* the source code.
>
> As to those names, ick.  Also, what do you envision the output looking
> like when multiple functions are selected?  Or would you ban wildcards?
> If you do, it's not clear what this does that \sf doesn't do better.
>
> Maybe, given the existence of \sf, we should just drop prosrc from \df+
> altogether.
>

prosrc has still benefit for me (for C hacking). Can we show data there
only for internal or C functions? I agree, it useless for PLpgSQL.

Pavel

>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] can we optimize STACK_DEPTH_SLOP

2016-07-08 Thread Greg Stark
On Fri, Jul 8, 2016 at 3:32 PM, Tom Lane  wrote:
> Hm, after reading the man page I don't quite see how that would help?
> You'd have to already know the mapped stack address range in order to
> call the function without getting ENOMEM.


I had assumed unmapped pages would just return a 0 in the bitmap. I
suppose you could still do it by just probing one page at a time until
you find an unmapped page. In a way that's better since you can count
stack pages even if they're paged out.


Fwiw here's the pmap info from burbot (Linux Sparc64):
136  48  48 rw---[ stack ]
136  48  48 rw---[ stack ]
136  48  48 rw---[ stack ]
136  48  48 rw---[ stack ]
136  56  56 rw---[ stack ]
136  80  80 rw---[ stack ]
136  96  96 rw---[ stack ]
136 112 112 rw---[ stack ]
136 112 112 rw---[ stack ]
576 576 576 rw---[ stack ]
205620562056 rw---[ stack ]

I'm actually a bit confused how to interpret these numbers. This
appears to be an 8kB pagesize architecture so is that 576*8kB or over
5MB of stack for the regexp test? But we don't know if there are any
check_stack_depth calls in that call tree?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Alvaro Herrera
Peter Geoghegan wrote:

> Has anyone ever done any kind of write-up of the "TED" design that was
> discussed during FOSDEM (I hope I recall the name it was given
> correctly)? Apparently that's something that's been discussed a few
> times among senior community members, and I think it has promise.

https://www.postgresql.org/message-id/55511d1f.7050...@iki.fi

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] can we optimize STACK_DEPTH_SLOP

2016-07-08 Thread Tom Lane
Greg Stark  writes:
> Fwiw here's the pmap info from burbot (Linux Sparc64):
> 136  48  48 rw---[ stack ]
> 136  48  48 rw---[ stack ]
> 136  48  48 rw---[ stack ]
> 136  48  48 rw---[ stack ]
> 136  56  56 rw---[ stack ]
> 136  80  80 rw---[ stack ]
> 136  96  96 rw---[ stack ]
> 136 112 112 rw---[ stack ]
> 136 112 112 rw---[ stack ]
> 576 576 576 rw---[ stack ]
> 205620562056 rw---[ stack ]

> I'm actually a bit confused how to interpret these numbers. This
> appears to be an 8kB pagesize architecture so is that 576*8kB or over
> 5MB of stack for the regexp test?

No, pmap specifies that its outputs are measured in kilobytes.  So this
is by and large the same as what I'm seeing on x86_64, again with the
caveat that the recursive regex routines seem to vary all over the place
in terms of stack consumption.

> But we don't know if there are any
> check_stack_depth calls in that call tree?

The regex recursion definitely does have check_stack_depth calls in it
(since commit b63fc2877).  But what we're trying to measure here is the
worst-case stack depth regardless of any check_stack_depth calls.  That's
a ceiling on what we might need to set STACK_DEPTH_SLOP to --- probably a
very loose ceiling, but I don't want to err on the side of underestimating
it.  I wouldn't consider either the regex or errors tests as needing to
bound STACK_DEPTH_SLOP, since we know that most of their consumption is
from recursive code that contains check_stack_depth calls.  But it's
useful to look at those depths just as a sanity check that we're getting
valid numbers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MVCC overheads

2016-07-08 Thread Gavin Flower

Please see comment at the bottom of this post.

On 08/07/16 10:48, Pete Stevenson wrote:
Good info, thanks for the note. Agreed that it is difficult to pull 
things apart to isolate these features for offload — so actually 
running experiments with offload is not possible, as you point out 
(and for other reasons).


Maybe I could figure out the lines of code that add versions into a 
table and then those that collect old versions (they do get collected, 
right?). Anyway, thought being I could profile while running TPC-C or 
similar. I was hoping that someone might be able to jump on this with 
a response that they already did something similar. I know that 
Stonebraker has done some analysis along these lines, but I’m looking 
for an independent result that confirms (or not) his work.


Thank you,
Pete Stevenson


On Jul 7, 2016, at 3:43 PM, Simon Riggs > wrote:


On 7 July 2016 at 20:50, Pete Stevenson > wrote:


Hi Simon -

Thanks for the note. I think it's fair to say that I didn't
provide enough context, so let me try and elaborate on my question.

I agree, MVCC is a benefit. The research angle here is about
enabling MVCC with hardware offload. Since I didn't explicitly
mention it, the offload I refer to will respect all consistency
guarantees also.

It is the case that for the database to implement MVCC it must
provide consistent read to multiple different versions of data,
i.e. depending on the version used at transaction start. I'm not
an expert on postgresql internals, but this must have some cost.
I think the cost related to MVCC guarantees can roughly be
categorized as: creating new versions (linking them in), version
checking on read, garbage collecting old versions, and then there
is an additional cost that I am interested in (again not claiming
it is unnecessary in any sense) but there is a cost to generating
the log.

Thanks, by the way, for the warning about lab vs. reality. That's
why I'm asking this question here. I want to keep the
hypothetical tagged as such, but find defensible and realistic
metrics where those exist, i.e. in this instance, we do have a
database that can use MVCC. It should be possible to figure out
how much work goes into maintaining that property.


PostgreSQL uses a no overwrite storage mechanism, so any additional 
row versions are maintained in the same table alongside other rows. 
The MVCC actions are mostly mixed in with other aspects of the 
storage, so not isolated much for offload.


Oracle has a different mechanism that does isolate changed row 
versions into a separate data structure, so would be much more 
amenable to offload than PostgreSQL, in its current form.


Maybe look at SLRUs (clog etc) as a place to offload something?

--
Simon Riggs http://www.2ndQuadrant.com/ 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


In this list, the convention is to post replies at the end (with some 
rare exceptions), or interspersed when appropriate, and to omit parts no 
longer relevant.


The motivation of bottom posting like this: is that people get to see 
the context before the reply, AND emails don't end up getting longer & 
longer as people reply at the beginning forgetting to trim the now 
irrelevant stuff at the end.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in batch tuplesort memory CLUSTER case (9.6 only)

2016-07-08 Thread Peter Geoghegan
On Thu, Jul 7, 2016 at 12:34 AM, Noah Misch  wrote:
>> How do you feel about adding testing to tuplesort.c not limited to
>> hitting this bug (when Valgrind memcheck is used)?
>
> Sounds great, but again, not in the patch fixing this bug.

Attached patch adds a CLUSTER external sort test case to the
regression tests, as discussed.

This makes a hardly noticeable difference on the run time of the
CLUSTER tests, at least for me. Consider the following:

$ time make installcheck-tests TESTS="create_function_1 create_type
create_table copy cluster"
*** SNIP ***
(using postmaster on Unix socket, default port)
== dropping database "regression" ==
DROP DATABASE
== creating database "regression" ==
CREATE DATABASE
ALTER DATABASE
== running regression test queries==
test create_function_1... ok
test create_type  ... ok
test create_table ... ok
test copy ... ok
test cluster  ... ok

=
 All 5 tests passed.
=

make[1]: Leaving directory '/home/pg/postgresql/src/test/regress'
make installcheck-tests   0.20s user 0.03s system 16% cpu 1.422 total

The added overhead is discernible from the noise for me, but not by
much. If I had to put a number on it, I'd say that these new additions
make the regression tests take approximately 120 milliseconds longer
to complete on a fast machine. And yet, the new tests add test
coverage for:

* Multiple pass external sorts. With only 1MB of maintenance_work_mem,
only the minimum number of tapes, 7, are used, so it's really easy to
see multiple merge passes with only 6 input tapes (the wide caller
tuples used by this CLUSTER case also help). The final on-the-fly
merge still uses batch memory, of course.

* This CLUSTER external sort bug -- I've verified that the new
movetup_cluster() function is hit (it's hit over 10 times, in fact),
and so the Valgrind animal ("Skink") would have caught the CLUSTER bug
had this testing been in place earlier.

* The best case for replacement selection, where only one run is
produced, and so no merge is required. (So, there are 2 CLUSTER
operations that perform external sort operations added in total.)

* Due to hitting that replacement selection best case,
TSS_SORTEDONTAPE tuple handling also gets some coverage, and so (since
we also test multiple passes) we perhaps manage to cover all code used
for the randomAccess/materialized tape as final output case, without
directly testing a randomAccess caller case separately (direct testing
would not be possible with CLUSTER, which never cares about getting
randomAccess to the final output).

Overall, significant test coverage is added, with only a small impact
on test runtime.

It seemed to not make much sense to produce separate patches to do
this much. As discussed, I intend to produce more test coverage still,
including coverage of hash index build tuplesorts. I hope Noah does
not imagine that I disregarded his remarks on doing more than the
minimum in my first pass. I would have to increase
maintenance_work_mem to not get multiple passes on of CLUSTER of any
conveniently available regression test table. With 2MB of
maintenance_work_mem, a one pass sort of 3 runs is all that is
required (and certainly not multiple passes). With 6MB of
maintenance_work_mem, the sort completes in memory. There seems to be
little reason to not do this much at once.

Testing replacement selection in the second CLUSTER is made very
convenient by the fact that we just ran CLUSTER, so input should be
presorted.

--
Peter Geoghegan


0001-Add-test-coverage-for-CLUSTER-external-sorts.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Showing parallel status in \df+

2016-07-08 Thread Michael Paquier
On Sat, Jul 9, 2016 at 4:02 AM, Pavel Stehule  wrote:
>
>
> 2016-07-08 20:39 GMT+02:00 Tom Lane :
>>
>> Alvaro Herrera  writes:
>> > As a separate concern, IMO having the source code in a \df+ column is
>> > almost completely useless.
>>
>> Good point.  It works okay for C/internal functions, but in those cases
>> it's usually redundant with the proname.  For PL functions it's a disaster
>> formatting-wise, because they're often wide and/or multi-line.
>>
>> > I propose to split that out to a separate
>> > \df command (say \df% or \df/) that shows *only* the source code.
>>
>> As to those names, ick.  Also, what do you envision the output looking
>> like when multiple functions are selected?  Or would you ban wildcards?
>> If you do, it's not clear what this does that \sf doesn't do better.
>>
>> Maybe, given the existence of \sf, we should just drop prosrc from \df+
>> altogether.
>
> prosrc has still benefit for me (for C hacking). Can we show data there only
> for internal or C functions? I agree, it useless for PLpgSQL.

So to sum up:
- Add "Parallel" column
- Add ACLs
- Reordering the columns, I'd suggest as follows):
-- Schema
-- Name
-- Result data type
-- Argument data types
-- Type
-- Language
-- Volatility
-- Parallel
-- Owner
-- Security
-- ACL
-- Source code
-- Description
Or by thema, 1) General info, 2) specificity (volatility, parallel,
type), 3) Ownership.
And regarding "source code", I think that's useful for debugging.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #14230: Wrong timeline returned by pg_stop_backup on a standby

2016-07-08 Thread Noah Misch
On Thu, Jul 07, 2016 at 03:38:26PM +0900, Michael Paquier wrote:
> On Thu, Jul 7, 2016 at 12:57 AM, Marco Nenciarini
>  wrote:
> > After further analysis, the issue is that we retrieve the starttli from
> > the ControlFile structure, but it was using ThisTimeLineID when writing
> > the backup label.
> >
> > I've attached a very simple patch that fixes it.
> 
> ThisTimeLineID is always set at 0 on purpose on a standby, so we
> cannot rely on it (well it is set temporarily when recycling old
> segments). At recovery when parsing the backup_label file there is no
> actual use of the start segment name, so that's only a cosmetic
> change. But surely it would be better to get that fixed, because
> that's useful for debugging.
> 
> While looking at your patch, I thought that it would have been
> tempting to use GetXLogReplayRecPtr() to get the timeline ID when in
> recovery, but what we really want to know here is the timeline of the
> last REDO pointer, which is starttli, and that's more consistent with
> the fact that we use startpoint when writing the backup_label file. In
> short, +1 for this fix.
> 
> I am adding that in the list of open items, adding Magnus in CC whose
> commit for non-exclusive backups is at the origin of this defect.

[Action required within 72 hours.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Magnus,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] 
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dumping database privileges broken in 9.6

2016-07-08 Thread Noah Misch
On Wed, Jul 06, 2016 at 07:03:33PM -0400, Stephen Frost wrote:
> * Noah Misch (n...@leadboat.com) wrote:
> > On Wed, Jun 29, 2016 at 11:50:17AM -0400, Stephen Frost wrote:
> > > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> > > > Do this:
> > > > 
> > > > CREATE DATABASE test1;
> > > > REVOKE CONNECT ON DATABASE test1 FROM PUBLIC;
> > > > 
> > > > Run pg_dumpall.
> > > > 
> > > > In 9.5, this produces
> > > > 
> > > > CREATE DATABASE test1 WITH TEMPLATE = template0 OWNER = peter;
> > > > REVOKE ALL ON DATABASE test1 FROM PUBLIC;
> > > > REVOKE ALL ON DATABASE test1 FROM peter;
> > > > GRANT ALL ON DATABASE test1 TO peter;
> > > > GRANT TEMPORARY ON DATABASE test1 TO PUBLIC;
> > > > 
> > > > In 9.6, this produces only
> > > > 
> > > > CREATE DATABASE test1 WITH TEMPLATE = template0 OWNER = peter;
> > > > GRANT TEMPORARY ON DATABASE test1 TO PUBLIC;
> > > > GRANT ALL ON DATABASE test1 TO peter;
> > > > 
> > > > Note that the REVOKE statements are missing.  This does not
> > > > correctly recreate the original state.
> > > 
> > > I see what happened here, the query in dumpCreateDB() needs to be
> > > adjusted to pull the default information to then pass to
> > > buildACLComments(), similar to how the objects handled by pg_dump work.
> > > The oversight was in thinking that databases didn't have any default
> > > rights granted, which clearly isn't correct.
> > > 
> > > I'll take care of that in the next day or so and add an appropriate
> > > regression test.
> > 
> > This PostgreSQL 9.6 open item is past due for your status update.  Kindly 
> > send
> > a status update within 24 hours, and include a date for your subsequent 
> > status
> > update.  Refer to the policy on open item ownership:
> > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
> 
> I've not forgotten about this and have an initial patch, but I'm
> considering if I like the way template0/template1 are handled.
> Specifically, we don't currently record their initdb-set privileges into
> pg_init_privs (unlike all other objects with initial privileges).  This
> is complicated by the idea that template1 could be dropped/recreated
> (ending up with a different OID in the process).
> 
> More to come tomorrow.

This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in citext's upgrade script for parallel aggregates

2016-07-08 Thread David Rowley
On 30 June 2016 at 03:49, Robert Haas  wrote:
> On Sat, Jun 25, 2016 at 3:44 AM, Andreas Karlsson  wrote:
>> On 06/24/2016 01:31 PM, David Rowley wrote:
>>> Seems there's a small error in the upgrade script for citext for 1.1
>>> to 1.2 which will cause min(citext) not to be parallel enabled.
>>>
>>> max(citext)'s combinefunc is first set incorrectly, but then updated
>>> to the correct value. I assume it was meant to set the combine
>>> function for min(citext) instead.
>>>
>>> Fix attached. I've assumed that because we're still in beta that we
>>> can get away with this fix rather than making a 1.3 version to fix the
>>> issue.
>>
>> Yes, this is indeed a bug.
>
> Since we've already released beta2, I think we need to do a whole new
> extension version.  We treated beta1 as a sufficiently-significant
> event to mandate a version bump, so we should do the same here.

Ok, good point. Patch attached.




-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From bb9e38894f928b47260f5fd9ba34535b47be3d88 Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Sat, 9 Jul 2016 15:37:33 +1200
Subject: [PATCH] Add missing combine function for min(citext)

This would have been set correctly for new installs of 1.2, but the upgrade
step from 1.1 to 1.2 was incorrect and instead of setting the combine function
for min(citext) max(citext) was set twice.
---
 contrib/citext/Makefile |   4 +-
 contrib/citext/citext--1.2--1.3.sql |   7 +
 contrib/citext/citext--1.2.sql  | 493 
 contrib/citext/citext--1.3.sql  | 493 
 contrib/citext/citext.control   |   2 +-
 5 files changed, 503 insertions(+), 496 deletions(-)
 create mode 100644 contrib/citext/citext--1.2--1.3.sql
 delete mode 100644 contrib/citext/citext--1.2.sql
 create mode 100644 contrib/citext/citext--1.3.sql

diff --git a/contrib/citext/Makefile b/contrib/citext/Makefile
index 363a11b..e39d3ee 100644
--- a/contrib/citext/Makefile
+++ b/contrib/citext/Makefile
@@ -3,8 +3,8 @@
 MODULES = citext
 
 EXTENSION = citext
-DATA = citext--1.2.sql citext--1.1--1.2.sql citext--1.0--1.1.sql \
-	citext--unpackaged--1.0.sql
+DATA = citext--1.3.sql citext--1.2--1.3.sql citext--1.1--1.2.sql \
+	citext--1.0--1.1.sql citext--unpackaged--1.0.sql
 PGFILEDESC = "citext - case-insensitive character string data type"
 
 REGRESS = citext
diff --git a/contrib/citext/citext--1.2--1.3.sql b/contrib/citext/citext--1.2--1.3.sql
new file mode 100644
index 000..4ab8679
--- /dev/null
+++ b/contrib/citext/citext--1.2--1.3.sql
@@ -0,0 +1,7 @@
+/* contrib/citext/citext--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION citext UPDATE TO '1.3'" to load this file. \quit
+
+UPDATE pg_aggregate SET aggcombinefn = 'citext_smaller'
+WHERE aggfnoid = 'min(citext)'::pg_catalog.regprocedure;
diff --git a/contrib/citext/citext--1.2.sql b/contrib/citext/citext--1.2.sql
deleted file mode 100644
index c2d0c0c..000
--- a/contrib/citext/citext--1.2.sql
+++ /dev/null
@@ -1,493 +0,0 @@
-/* contrib/citext/citext--1.2.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use "CREATE EXTENSION citext" to load this file. \quit
-
---
---  PostgreSQL code for CITEXT.
---
--- Most I/O functions, and a few others, piggyback on the "text" type
--- functions via the implicit cast to text.
---
-
---
--- Shell type to keep things a bit quieter.
---
-
-CREATE TYPE citext;
-
---
---  Input and output functions.
---
-CREATE FUNCTION citextin(cstring)
-RETURNS citext
-AS 'textin'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citextout(citext)
-RETURNS cstring
-AS 'textout'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citextrecv(internal)
-RETURNS citext
-AS 'textrecv'
-LANGUAGE internal STABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citextsend(citext)
-RETURNS bytea
-AS 'textsend'
-LANGUAGE internal STABLE STRICT PARALLEL SAFE;
-
---
---  The type itself.
---
-
-CREATE TYPE citext (
-INPUT  = citextin,
-OUTPUT = citextout,
-RECEIVE= citextrecv,
-SEND   = citextsend,
-INTERNALLENGTH = VARIABLE,
-STORAGE= extended,
--- make it a non-preferred member of string type category
-CATEGORY   = 'S',
-PREFERRED  = false,
-COLLATABLE = true
-);
-
---
--- Type casting functions for those situations where the I/O casts don't
--- automatically kick in.
---
-
-CREATE FUNCTION citext(bpchar)
-RETURNS citext
-AS 'rtrim1'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citext(boolean)
-RETURNS citext
-AS 'booltext'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
-CREATE FUNCTION citext(inet)
-RETURNS citext
-AS 'network_show'
-LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE;
-
---
---  Implicit and assignment t

Re: [HACKERS] dumping database privileges broken in 9.6

2016-07-08 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote:
> This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

Unfortunately, not going to make any further progress on this tonight or
over the weekend as I'm going to be out of town.  I believe I've
convinced myself that adding a template1 entry to pg_init_privs will be
both sufficient and produce the correct results, along with adjusting
the query in pg_dumpall to join through it.  Will provide an update on
Monday.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [BUGS] BUG #14230: Wrong timeline returned by pg_stop_backup on a standby

2016-07-08 Thread Amit Kapila
On Thu, Jul 7, 2016 at 12:08 PM, Michael Paquier
 wrote:
> On Thu, Jul 7, 2016 at 12:57 AM, Marco Nenciarini
>  wrote:
>> After further analysis, the issue is that we retrieve the starttli from
>> the ControlFile structure, but it was using ThisTimeLineID when writing
>> the backup label.
>>
>> I've attached a very simple patch that fixes it.
>
> ThisTimeLineID is always set at 0 on purpose on a standby, so we
> cannot rely on it (well it is set temporarily when recycling old
> segments). At recovery when parsing the backup_label file there is no
> actual use of the start segment name, so that's only a cosmetic
> change. But surely it would be better to get that fixed, because
> that's useful for debugging.
>
> While looking at your patch, I thought that it would have been
> tempting to use GetXLogReplayRecPtr() to get the timeline ID when in
> recovery, but what we really want to know here is the timeline of the
> last REDO pointer, which is starttli, and that's more consistent with
> the fact that we use startpoint when writing the backup_label file. In
> short, +1 for this fix.
>

+1, the fix looks right to me as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2016-07-08 Thread Fabien COELHO


Here is a simple patch which adds a bunch of operators (bitwise: & | ^ ~, 
comparisons: =/== <>/!= < <= > >=, logical: and/&& or/|| xor/^^ not/!) and 
functions (exp ln if) to pgbench. I've tried to be pg's SQL compatible where 
appropriate.


Also attached is a simple test script.


Here is a sightly updated version.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index f3afedb..ea319f6 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -821,9 +821,17 @@ pgbench  options  dbname
   The expression may contain integer constants such as 5432,
   double constants such as 3.14159,
   references to variables :variablename,
-  unary operators (+, -) and binary operators
-  (+, -, *, /,
-  %) with their usual precedence and associativity,
+  arithmetic operators (unary: +, -; binary:
+  +, -, *, /, %),
+  bitwise operators (unary: ~; binary: &,
+  |, #/^, <<,
+  >>)
+  comparisons (=/==, <>/!=,
+  <=, <, >=, >),
+  logical operators (unary: not/!;
+  binary: and/&&,
+  or/||, xor/^^),
+  with their usual precedence and associativity,
   function calls, and
   parentheses.
  
@@ -955,6 +963,13 @@ pgbench  options  dbname
5432.0
   
   
+   exp(x)
+   double
+   exponential
+   exp(1.0)
+   2.718281828459045
+  
+  
greatest(a [, ... ] )
double if any a is double, else integer
largest value among arguments
@@ -962,6 +977,13 @@ pgbench  options  dbname
5
   
   
+   if(c,e1,e2)
+   same as e*
+   if c is not zero then e1 else e2
+   if(0,1.0,2.0)
+   2.0
+  
+  
int(x)
integer
cast to int
@@ -976,6 +998,13 @@ pgbench  options  dbname
2.1
   
   
+   ln(x)
+   double
+   natural logarithm
+   ln(2.718281828459045)
+   1.0
+  
+  
pi()
double
value of the constant PI
diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y
index 0cc665b..233de9c 100644
--- a/src/bin/pgbench/exprparse.y
+++ b/src/bin/pgbench/exprparse.y
@@ -52,8 +52,22 @@ static PgBenchExpr *make_func(yyscan_t yyscanner, int fnumber, PgBenchExprList *
 %type  VARIABLE FUNCTION
 
 %token INTEGER_CONST DOUBLE_CONST VARIABLE FUNCTION
+%token AND_OP OR_OP XOR_OP NE_OP LE_OP GE_OP LS_OP RS_OP
 
 /* Precedence: lowest to highest */
+
+/* logical */
+%left	XOR_OP
+%left	OR_OP
+%left	AND_OP
+%right  UNOT
+/* comparison */
+%left	'=' NE_OP
+%left	'<' '>' LE_OP GE_OP
+/* bitwise */
+%left	'^' '|' '&' LS_OP RS_OP
+%right	UINV
+/* arithmetic */
 %left	'+' '-'
 %left	'*' '/' '%'
 %right	UMINUS
@@ -71,11 +85,29 @@ expr: '(' expr ')'			{ $$ = $2; }
 	| '+' expr %prec UMINUS	{ $$ = $2; }
 	| '-' expr %prec UMINUS	{ $$ = make_op(yyscanner, "-",
 		   make_integer_constant(0), $2); }
+	| '~' expr %prec UINV	{ $$ = make_op(yyscanner, "^",
+		   make_integer_constant(-1), $2); }
+	| '!' expr %prec UNOT	{ $$ = make_op(yyscanner, "^^",
+		   make_integer_constant(1), $2); }
 	| expr '+' expr			{ $$ = make_op(yyscanner, "+", $1, $3); }
 	| expr '-' expr			{ $$ = make_op(yyscanner, "-", $1, $3); }
 	| expr '*' expr			{ $$ = make_op(yyscanner, "*", $1, $3); }
 	| expr '/' expr			{ $$ = make_op(yyscanner, "/", $1, $3); }
 	| expr '%' expr			{ $$ = make_op(yyscanner, "%", $1, $3); }
+	| expr '<' expr			{ $$ = make_op(yyscanner, "<", $1, $3); }
+	| expr LE_OP expr		{ $$ = make_op(yyscanner, "<=", $1, $3); }
+	| expr '>' expr			{ $$ = make_op(yyscanner, "<", $3, $1); }
+	| expr GE_OP expr		{ $$ = make_op(yyscanner, "<=", $3, $1); }
+	| expr '=' expr			{ $$ = make_op(yyscanner, "=", $1, $3); }
+	| expr NE_OP expr		{ $$ = make_op(yyscanner, "<>", $1, $3); }
+	| expr '&' expr			{ $$ = make_op(yyscanner, "&", $1, $3); }
+	| expr '|' expr			{ $$ = make_op(yyscanner, "|", $1, $3); }
+	| expr '^' expr			{ $$ = make_op(yyscanner, "^", $1, $3); }
+	| expr LS_OP expr		{ $$ = make_op(yyscanner, "<<", $1, $3); }
+	| expr RS_OP expr		{ $$ = make_op(yyscanner, ">>", $1, $3); }
+	| expr AND_OP expr		{ $$ = make_op(yyscanner, "&&", $1, $3); }
+	| expr OR_OP expr		{ $$ = make_op(yyscanner, "||", $1, $3); }
+	| expr XOR_OP expr		{ $$ = make_op(yyscanner, "^^", $1, $3); }
 	| INTEGER_CONST			{ $$ = make_integer_constant($1); }
 	| DOUBLE_CONST			{ $$ = make_double_constant($1); }
 	| VARIABLE { $$ = make_variable($1); }
@@ -177,6 +209,12 @@ static const struct
 		"sqrt", 1, PGBENCH_SQRT
 	},
 	{
+		"ln", 1, PGBENCH_LN
+	},
+	{
+		"exp", 1, PGBENCH_EXP
+	},
+	{
 		"int", 1, PGBENCH_INT
 	},
 	{
@@ -191,6 +229,45 @@ static const struct
 	{
 		"random_exponential", 3, PGBENCH_RANDOM_EXPONENTIAL
 	},
+	{
+		"&&", 2, PGBENCH_AND
+	},
+	{
+		"||", 2, PGBENCH_OR
+	},
+	{
+		"^^", 2, PGBENCH_XOR
+	},
+	{
+		"&", 2, PGBENCH_BITAND
+	},
+	{
+		"|", 2, PGBENCH_BITOR
+	},
+	{
+		"^", 2, PGBENCH_BITXOR
+	},
+	{
+		"<<", 2,