date:20171212

README.tuplock seems to have outdated information

2017-12-12 Thread Amit Kapila

"The minimum across all databases, in turn, is recorded in checkpoint
records, and CHECKPOINT removes pg_multixact/ segments older than that
value once the checkpoint record has been flushed."

The above part of the paragraph in README.tuplock seems to be
outdated.  I think now we truncate the multixact files only during
vacuum.  This file has been last updated by commit
cdbdc4382743fcfabb3437ea7c4d103adaa01324 and it seems that there are
substantial changes happened since then in this area, so it might be a
good idea to update this file.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-12-12 Thread Kyotaro HORIGUCHI

Hello,

At Wed, 29 Nov 2017 14:04:01 +0900, Michael Paquier  
wrote in 
> On Tue, Sep 19, 2017 at 3:04 PM, Kyotaro HORIGUCHI
>  wrote:
> >> By the way, I will take a look at your patch when I come back from the
> >> vacation.  Meanwhile, I noticed that it needs another rebase after
> >> 0a480502b092 [1].
> 
> Moved to CF 2018-01.

Thank you. (I'll do that by myself from the next CF)

This is rebased patch and additional patch of isolation test for
this problem.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 4336b15d2c0d95e7044746aa5c3ae622712e41b3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 12 Dec 2017 17:06:53 +0900
Subject: [PATCH 1/2] Add isolation test

---
 src/test/isolation/expected/select-noinherit.out |  9 +
 src/test/isolation/isolation_schedule|  1 +
 src/test/isolation/specs/select-noinherit.spec   | 23 +++
 3 files changed, 33 insertions(+)
 create mode 100644 src/test/isolation/expected/select-noinherit.out
 create mode 100644 src/test/isolation/specs/select-noinherit.spec

diff --git a/src/test/isolation/expected/select-noinherit.out b/src/test/isolation/expected/select-noinherit.out
new file mode 100644
index 000..5885167
--- /dev/null
+++ b/src/test/isolation/expected/select-noinherit.out
@@ -0,0 +1,9 @@
+Parsed test spec with 2 sessions
+
+starting permutation: alt1 sel2 c1
+step alt1: ALTER TABLE c1 NO INHERIT p;
+step sel2: SELECT * FROM p; 
+step c1: COMMIT;
+step sel2: <... completed>
+a  
+
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index e41b916..6e04ea4 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -63,3 +63,4 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+test: select-noinherit
diff --git a/src/test/isolation/specs/select-noinherit.spec b/src/test/isolation/specs/select-noinherit.spec
new file mode 100644
index 000..31662a9
--- /dev/null
+++ b/src/test/isolation/specs/select-noinherit.spec
@@ -0,0 +1,23 @@
+# SELECT and ALTER TABLE NO INHERIT test
+#
+
+setup
+{
+ CREATE TABLE p (a integer);
+ CREATE TABLE c1 () INHERITS (p);
+}
+
+teardown
+{
+ DROP TABLE p CASCADE;
+}
+
+session "s1"
+setup		{ BEGIN ISOLATION LEVEL READ COMMITTED; }
+step "alt1"	{ ALTER TABLE c1 NO INHERIT p; }
+step "c1"	{ COMMIT; }
+
+session "s2"
+step "sel2"	{ SELECT * FROM p; }
+
+permutation "alt1" "sel2" "c1"
-- 
2.9.2

>From c2793d9200948d693150a5bbeb3815e3b5404be2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 12 Dec 2017 17:38:12 +0900
Subject: [PATCH 2/2] Lock parent on ALTER TABLE NO INHERIT

NO INHERIT doesn't modify the parent at all but lock is required to
avoid error caused when a concurrent access see a false child.
---
 src/backend/commands/tablecmds.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d979ce2..a8d119f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13246,7 +13246,28 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 		reltype = ((AlterObjectSchemaStmt *) stmt)->objectType;
 
 	else if (IsA(stmt, AlterTableStmt))
-		reltype = ((AlterTableStmt *) stmt)->relkind;
+	{
+		AlterTableStmt *alterstmt = (AlterTableStmt *)stmt;
+		ListCell *lc;
+
+		reltype = alterstmt->relkind;
+
+		foreach (lc, alterstmt->cmds)
+		{
+			AlterTableCmd *cmd = lfirst_node(AlterTableCmd, lc);
+			Assert(IsA(cmd, AlterTableCmd));
+
+			/*
+			 * Though NO INHERIT doesn't modify the parent, lock on the parent
+			 * is necessary so that no concurrent expansion of inheritances
+			 * sees a false child and ends with ERROR.  But no need to ascend
+			 * further.
+			 */
+			if (cmd->subtype == AT_DropInherit)
+RangeVarGetRelid((RangeVar *)cmd->def,
+ AccessExclusiveLock, false);
+		}
+	}
 	else
 	{
 		reltype = OBJECT_TABLE; /* placate compiler */
-- 
2.9.2

Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-12-12 Thread Kyotaro HORIGUCHI

At Wed, 29 Nov 2017 14:33:08 +0900, Michael Paquier  
wrote in 
> On Sat, Nov 25, 2017 at 2:32 AM, Pavel Stehule  
> wrote:
> > fixed regress test
> 
> The last patch still applies, but did not get any reviews.
> Horiguchi-san, you are marked as a reviewer of this patch. Could you
> look at it? For now, I am moving it to next CF.

Sorry for the absense. It is near to complete. I'll look this soon.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-12 Thread Maksim Milyutin


Hi, Fujita-san!


On 24.11.2017 16:03, Etsuro Fujita wrote:

(2017/10/27 20:00), Etsuro Fujita wrote:

Please find attached an updated version of the patch.


Amit rebased this patch and sent me the rebased version off list. 
Thanks for that, Amit!


One thing I noticed I overlooked is about this change I added to 
make_modifytable to make a valid-looking plan node to pass to 
PlanForeignModify to plan remote insert to each foreign partition:


+   /*
+    * The column list of the child might have a different 
column
+    * order and/or a different set of dropped columns 
than that

+    * of its parent, so adjust the subplan's tlist as well.
+    */
+   tlist = preprocess_targetlist(root,
+ child_parse->targetList);

This would be needed because the FDW might reference the tlist. Since 
preprocess_targetlist references root->parse, it's needed to replace 
that with the child query before calling that function, but I forgot 
to do that.  So I fixed that.  Attached is an updated version of the 
patch.


Your patch already is not applied on master. Please rebase it.

--
Regards,
Maksim Milyutin

Re: Boolean partitions syntax

2017-12-12 Thread Ashutosh Bapat

On Tue, Dec 12, 2017 at 7:19 AM, Amit Langote
 wrote:
> Hi.
>
> Horiguchi-san pointed out [1] on a nearby thread that the partitioning
> syntax (the FOR VALUES clause) doesn't accept true and false as valid
> partition bound datums, which seems to me like an oversight.  Attached a
> patch to fix that.
>
> create table bools (a bool) partition by list (a);
>
> Before patch:
>
> create table bools_t partition of bools for values in (true);
> ERROR:  syntax error at or near "true"
> LINE 1: ...reate table bools_t partition of bools for values in (true);
>
> After:
>
> create table bools_t partition of bools for values in (true);
> CREATE TABLE
> \d bools_t
>   Table "public.bools_t"
>  Column |  Type   | Collation | Nullable | Default
> +-+---+--+-
>  a  | boolean |   |  |
> Partition of: bools FOR VALUES IN (true)
>
> Thanks,
> Amit
>
> [1]
> https://www.postgresql.org/message-id/20171128.203915.26713586.horiguchi.kyotaro%40lab.ntt.co.jp

May be you should use opt_boolean_or_string instead of TRUE_P and
FALSE_P. It also supports ON and OFF, which will be bonus.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Learned Index

2017-12-12 Thread Laurenz Albe

Deepak Balasubramanyam wrote:
> I came across this paper making a case for indices that use machine learning 
> to optimise search.
> 
> https://arxiv.org/pdf/1712.01208.pdf
> 
> The gist seems to be to use a linear regression model or feed a tensor flow 
> model when a more complicated distribution is needed for the data and allow 
> SIMD instructions working on top of GPUs / TPUs to speed up lookups. The 
> speedup observed is anywhere from 40-60%. 
> 
> That result looks impressive but I don't have enough context on say 
> rebuilding a neural net on every DML operation. The equivalent operation that 
> I can relate to on PG would be to rebalance the B-tree for DML operations.
> 
> In your opinion, would a ML model work for a table whose operations are both 
> write and read heavy? I'd love to hear your thoughts on the paper.

I have read into the paper.

This may be interesting or not, but the paper is very vague about
its concepts and algorithms, so it's hard to tell.

I'd say that the paper does not meet publication standards.

For example, they say that their results were generated by comparing
a B-tree implementation with "learned indexes using a 2-stage
RMI model and different second-stage sizes (i.e., 10k, 50k, 100k, and 200k)",
but they don't say exactly what the neural network in these stages is
(at least it is not obvious to me).  Their "Learning Index Framework" (LIF)
is described with a few vague sentences and a reference to the literature
saying that is where they got some ideas from.

There is also no clear concept of how these indexes should handle
data modifications, so I think that there are some loose ends to be
tied up before it is ready for implementation.

Finally, I don't see any clear statement as to the error guarantees
that the neural network prediction can give, and if it is possible that
it may degrade to scanning relevant parts of the table in some cases.

Yours,
Laurenz Albe

Re: [HACKERS] pg_stat_wal_write statistics view

2017-12-12 Thread Haribabu Kommi

On Mon, Nov 27, 2017 at 2:12 PM, Haribabu Kommi 
wrote:

>
> On Wed, Nov 8, 2017 at 8:46 AM, Robert Haas  wrote:
>
>> On Tue, Nov 7, 2017 at 4:31 AM, Haribabu Kommi 
>> wrote:
>> >> Updated patch attached.
>> > Patch rebased.
>>
>> I think the earlier concerns about the performance impact of this are
>> probably very valid concerns, and I don't see how the new version of
>> the patch gets us much closer to solving them.
>>
>
> I will check the performance with the changes of removing the stats
> collector
> usage and provide the details.
>

I ran the pgbench performance after removing stats collector usage and
moving
the stats into shared memory. I find it that this stats addition doesn't
have any
overhead while collecting the stats. I ran these tests on a small machine,
these may
need to be retested on an high end machine to confirm that there is no
impact.

Threads Clients HEAD PATCH Diff
1 1 88.72981533 88.527789 -0.23%
2 2 88.17999   88.823842 0.73%
4 4 169.430813 167.897937 -0.90%
8 8 311.790313 315.6659117 1.24%
16 16 558.6014777 562.5883583 0.71%
32 32 874.0996587 899.855634 2.95%

Attached is the updated patch accordingly to use the shared memory stats.

Regards,
Hari Babu
Fujitsu Australia

pg_stat_walwrites-statistics-view_v11.patch
Description: Binary data

Re: Boolean partitions syntax

2017-12-12 Thread Amit Langote

On 2017/12/12 18:12, Ashutosh Bapat wrote:
> On Tue, Dec 12, 2017 at 7:19 AM, Amit Langote wrote:
>> Horiguchi-san pointed out [1] on a nearby thread that the partitioning
>> syntax (the FOR VALUES clause) doesn't accept true and false as valid
>> partition bound datums, which seems to me like an oversight.  Attached a
>> patch to fix that.
>
> May be you should use opt_boolean_or_string instead of TRUE_P and
> FALSE_P. It also supports ON and OFF, which will be bonus.

Thanks for the suggestion.  I tried that but NonReservedWord_or_Sconst
conflicts with Sconst that partbound_datum itself has a rule for,
resulting in the following error:

gram.y: conflicts: 6 reduce/reduce
gram.y: expected 0 reduce/reduce conflicts
gram.y:2769.25-81: warning: rule useless in parser due to conflicts:
partbound_datum: Sconst

Moreover, it seems like on/off are not being accepted as valid Boolean
values like true/false are.

insert into rp values (true);
INSERT 0 1
insert into rp values (on);
ERROR:  syntax error at or near "on"
LINE 1: insert into rp values (on);
   ^
What's going on with that?   Maybe on/off values work only with SET
statements?

Thanks,
Amit

Re: [HACKERS] Parallel Hash take II

2017-12-12 Thread Thomas Munro

On Sat, Dec 2, 2017 at 3:46 PM, Andres Freund  wrote:
> Looking at 0004-Add-shared-tuplestores.patch
>
> Comments:
> - I'd rename mutex to lock. Seems quite possible that we end up with shared
>   lockers too.

Done.

> - Expand "Simple mechanism for sharing tuples between backends." intro
>   comment - that doesn't really seem like a meaningful description of
>   the mechanism. Should probably mention that it's similar to
>   tuplestores etc...

Done.

> - I'm still concerned about the chunking mechanism. How about this
>   sketch of an alternative solution?
>
>   Chunks are always the same length. To avoid having to read the length
>   from disk while holding a lock, introduce continuation chunks which
>   store the amount of space needed by the overlarge tuple at the
>   start. The reading process stays largely the same, except that if a
>   backend reads a chunk that's a continuation, it reads the length of
>   the continuation and skips ahead. That could mean that more than one
>   backend read continuation chunks, but the window is small and there's
>   normally not goign to be that many huge tuples (otherwise things are
>   slow anyway).

Done.

I've also included a simple test harness that can be used to drive
SharedTuplestore independently of Parallel Hash, but that patch is not
for commit.  See example of usage in the commit message.
(Incidentally I noticed that ParallelWorkerNumber is not marked
PGDLLIMPORT so that fails to build on Windows CI.)

> - why are we using a separate hardcoded 32 for sts names? Why not just
>   go for NAMEDATALEN or such?

Done.

> - I'd replace most of the "should's" in comments with "need".

Done.

Another problem I discovered is that v27's way of installing a
different function into ExecProcNode in ExecInitHashJoin() was broken:
it didn't allow for the possibility that there is no DSA area
available due to lack of resources.  ExecInitNode() is too soon to
decide.  My solution is to provide a way for executor nodes to change
their ExecProcNode functions at any later time, which requires a way
for execProcnode.c to redo any wrapper functions.

-- 
Thomas Munro
http://www.enterprisedb.com

parallel-hash-v28.patchset.tgz
Description: GNU Zip compressed data

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-12 Thread Ashutosh Bapat

Here are review comments for 0009
Only full aggregation is pushed on the remote server.

I think we can live with that for a while but we need to be able to push down
partial aggregates to the foreign server. I agree that it needs some
infrastructure to serialized and deserialize the partial aggregate values,
support unpartitioned aggregation first and then work on partitioned
aggregates. That is definitely a separate piece of work.

+-- ===
+-- test partition-wise-aggregates
+-- ===
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+CREATE TABLE pagg_tab_p1 (a int, b int, c text);
+CREATE TABLE pagg_tab_p2 (a int, b int, c text);
+CREATE TABLE pagg_tab_p3 (a int, b int, c text);

Like partition-wise join testcases please use LIKE so that it's easy to change
the table schema if required.

+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30,
'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30,
'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i %
30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30,
'FM') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i %
30) >= 20;

We have to do this because INSERT tuple routing to a foreign partition is not
supported right now. Somebody has to remember to change this to a single
statement once that's done.

+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;

I thought this is not needed. When you ANALYZE the partitioned table, it would
analyze the partitions as well. But I see that partition-wise join is also
ANALYZING the foreign partitions separately. When I ran ANALYZE on a
partitioned table with foreign partitions, statistics for only the local tables
(partitioned and partitions) was updated. Of course this is separate work, but
probably needs to be fixed.

+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan when partition-wise-agg is disabled

s/when/with/

+-- Plan when partition-wise-agg is enabled

s/when/with/

+   ->  Append

Just like ForeignScan node's Relations tell what kind of ForeignScan this is,
may be we should annotate Append to tell whether the children are joins,
aggregates or relation scans. That might be helpful. Of course as another
patch.

+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING
avg(b) < 25 ORDER BY 1;
+ a  | sum  | min | count
++--+-+---
+  0 | 2000 |   0 |   100
+  1 | 2100 |   1 |   100
[ ... clipped ...]
+ 23 | 2300 |   3 |   100
+ 24 | 2400 |   4 |   100
+(15 rows)

May be we want to reduce the number of rows to a few by using a stricter HAVING
clause?

+
+-- When GROUP BY clause not matches with PARTITION KEY.

... clause does not match ...

+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
sum(a) < 800 ORDER BY 1;
+
QUERY PLAN
+-
+ Sort
+   Output: fpagg_tab_p1.b, (avg(fpagg_tab_p1.a)),
(max(fpagg_tab_p1.a)), (count(*))
+   ->  Partial HashAggregate
[ ... clipped ... ]
+ Output: fpagg_tab_p3.b, PARTIAL
avg(fpagg_tab_p3.a), PARTIAL max(fpagg_tab_p3.a), PARTIAL count(*),
PARTIAL sum(fpagg_tab_p3.a)
+ Group Key: fpagg_tab_p3.b
+ ->  Foreign Scan on public.fpagg_tab_p3
+   Output: fpagg_tab_p3.b, fpagg_tab_p3.a
+   Remote SQL: SELECT a, b FROM public.pagg_tab_p3
+(26 rows)

I think we interested in overall shape of the plan and not the details of
Remote SQL etc. So, may be turn off VERBOSE. This comment applies to an earlier
plan with enable_partition_wise_agg = false;

+
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
sum(a) < 800 ORDER BY 1;
+ b  | avg | max | count
++-+-+---
+  0 | 10. |  20 |60
+  1 | 11. |  21 |60
[... clipped ...]
+ 42 | 12. |  22 |60
+ 43 | 13. |  23 |60
+(20 rows)

Since the aggregates were not pushed down, I doubt we should be testing the
output. But this test is good to check partial aggregates over foreign
partition scans, which we don't have in postgres_fdw.sql I think. So, may be
add it as a separate patch?

Can you please add a test where we reference a whole-row; that usually has
troubles.

-if (root->hasHavingQual && query->havingQual)
+if (root->hasHavingQual && fpinfo->havingQual)

This is not exactly a problem with your patch, but why do we need to check both
the boolean and the actual clauses? If the boolean is true, query->havingQual
should be non-NULL and NULL otherwise.

 /* Grouping informatio

Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD

2017-12-12 Thread Kuntal Ghosh

On Mon, Dec 11, 2017 at 2:26 PM, Thomas Munro
 wrote:
> On Mon, Dec 11, 2017 at 8:14 PM, Amit Kapila  wrote:
>
>> Thanks for looking into it.  I will see if we can write some test.  In
>> the meantime if possible, can you please request Patrick Hemmer to
>> verify the attached patch?
>
> Our discussion was on the #postgresql Freenode channel.  I pointed him
> at this thread, but I'm not sure if he'll see my message or be able to
> test.
After discussing with Amit, I'm able to reproduce the scenario in a
master-standby setup. The issue occurs when we perform parallel
index(-only) scan on a BTP_HALF_DEAD -marked page. (If a page is
marked as BTP_DELETED, it's already unlinked from the index).

When a btree page is deleted during vacuum, it's first marked as
BTP_HALF_DEAD in _bt_mark_page_halfdead and then marked as BTP_DELETED
in _bt_unlink_halfdead_page without releasing cleanup lock on the
buffer. Hence, any scan node cannot lock the same buffer. So, the
issue can't be reproduced on master.

However, after replaying XLOG_BTREE_MARK_PAGE_HALFDEAD record, standby
releases the lock on the same buffer. If we force parallelism, an
index scan on the same page will cause hang the standby server.
Following is a (unpleasant)way to reproduce the issue:

In master (with autovacuum = off):
1. create table t1(a int primary key);
2. insert into t1 select * from generate_series(1,1000); --generates 3
leaf nodes (block no 1,2,4) and 1 root node (block no 3)
3. delete from t1 where a>=367 and a<=735; --delete all tuples pointed by leaf 2
4. analyze t1; --update the stats
5. explain analyze select * from t1 where a>=1 and a<=1000; --ensures
that the next vacuum will consider leaf 2 for page deletion
Now, put a break point at _bt_unlink_halfdead_page, so that vacuum
can't unlink the page.

In standby,
1. force parallelism.
2. explain analyze select * from t1 where a>=1 and a<=1000; and the
parallel workers hang at the above-discussed place!

The attached patch fixes the issue. One comment on the same:
+ else if (scan->parallel_scan != NULL)
+ {
+ /* allow next page be processed by parallel worker */
+ _bt_parallel_release(scan, opaque->btpo_next);
+ }

  /* nope, keep going */
  if (scan->parallel_scan != NULL)

IMHO, There is no need to add an extra if condition.
_bt_parallel_release can be included in the next one.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Runtime Partition Pruning

2017-12-12 Thread Jesper Pedersen


Hi Beena,

On 12/07/2017 02:22 AM, Beena Emerson wrote:

I have added the partition quals that are used for pruning.

PFA the updated patch. I have changed the names of variables to make
it more appropriate, along with adding more code comments and doing
some refactoring and other code cleanups.



As the current patch conflicts with [1] could you provide a rebased 
version ?


Thanks in advance !

[1] 
https://www.postgresql.org/message-id/9b98fc47-34b8-0ab6-27fc-c8a0889f2e5b%40lab.ntt.co.jp


Best regards,
 Jesper

Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2017-12-12 Thread Teodor Sigaev

0002-pg-trgm-strict_word-similarity.patch – implementation of 
strict_word_similarity() with comments, docs and tests.


The patch looks commmitable, but sometime I get
*** ...pgsql/contrib/pg_trgm/expected/pg_strict_word_trgm.out 2017-12-12 
14:16:55.190989000 +0300
--- .../pgsql/contrib/pg_trgm/results/pg_strict_word_trgm.out  2017-12-12 
14:17:27.645639000 +0300

***
*** 153,160 
  --+--
  0 | Kabankala
   0.25 | Kabankalan City Public Plaza
-  0.416667 | Kabakala
   0.416667 | Abankala
   0.538462 | Kabikala
  0.625 | Ntombankala School
   0.642857 | Nehalla Bankalah Reserved Forest
--- 153,160 
  --+--
  0 | Kabankala
   0.25 | Kabankalan City Public Plaza
   0.416667 | Abankala
+  0.416667 | Kabakala
   0.538462 | Kabikala
  0.625 | Ntombankala School
   0.642857 | Nehalla Bankalah Reserved Forest

==


Seems, some stability order should be added to tests

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: [HACKERS] Runtime Partition Pruning

2017-12-12 Thread Beena Emerson

Hello Robert,
Thank you for the comments. I have started working on it.

On Fri, Dec 8, 2017 at 9:27 PM, Robert Haas  wrote:
> On Thu, Dec 7, 2017 at 2:22 AM, Beena Emerson  wrote:
>> I have added the partition quals that are used for pruning.
>>
>> PFA the updated patch. I have changed the names of variables to make
>> it more appropriate, along with adding more code comments and doing
>> some refactoring and other code cleanups.
>
> - initClauses() seems to be a duplicate of the existing function
> ExecInitExprList(), except for the extra NULL test, which isn't
> necessary.

The initClauses has been removed and ExecInitExprList has been used.

> - The executor already has a system for making sure that relations get
> opened and locked, and it's different from the new one scheme which
> set_append_subplan_indexes() implements. Relations should be locked
> during the executor initialization phase (i.e. ExecInitAppend) and not
> when the first tuple is requested (i.e. ExecAppend).  Also, there's
> already code to lock both child relations (i.e. the scans of those
> relations, see InitScanRelation, ExecInitIndexScan) and non-leaf
> partitions (ExecLockNonLeafAppendTables).   The call to
> find_all_inheritors() will lock all of that same stuff again *plus*
> the leaf partitions that were pruned during planning - moreover, if
> the Append is rescanned, we'll walk the partitioning structure again
> for every rescan.  I think RelationGetPartitionDispatchInfo should be
> called directly from ExecInitAppend after the existing code to take
> locks has been called, and store a pointer to the PartitionDispatch
> object in the AppendState for future use.

I have moved the call to ExecInitAppend. This still uses the previous
locking method, I will work on it in the next version of the patch.


> - I am surprised that set_append_subplan_indexes() needs to worry
> about multi-level partitioning directly.  I would have thought that
> Amit's patch would take care of that, just returning a Bitmapset of
> indexes which this function could use directly.  It also doesn't seem
> like a very good idea to convert the Bitmapset (subplans) into a list
> of integers (node->subplan_indexes), as set_append_subplan_indexes()
> does at the bottom.  The Bitmapset will be a lot more efficient; we
> should be able to just iterate over that directly rather than
> converting it into a List.  Note that a Bitmapset can be created with
> a single palloc, but an integer list needs one per list element plus
> one for the list itself.

The function get_partitions_from_clauses returns the Bitmap set of
partitions for a level of partition. So when the BitmapSet that
indicates a child partitioned table, set_append_subplan_indexes loops
throgh again till it gets the list of all leaf indexes.

I am working on the other comments and will post the patch along with
rebasing to v14 of Amit's patch soon.


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Runtime Partition Pruning

2017-12-12 Thread Beena Emerson

On Tue, Dec 12, 2017 at 4:57 PM, Beena Emerson  wrote:
> Hello Robert,
> Thank you for the comments. I have started working on it.
>
> On Fri, Dec 8, 2017 at 9:27 PM, Robert Haas  wrote:
>> On Thu, Dec 7, 2017 at 2:22 AM, Beena Emerson  
>> wrote:
>>> I have added the partition quals that are used for pruning.
>>>
>>> PFA the updated patch. I have changed the names of variables to make
>>> it more appropriate, along with adding more code comments and doing
>>> some refactoring and other code cleanups.
>>
>> - initClauses() seems to be a duplicate of the existing function
>> ExecInitExprList(), except for the extra NULL test, which isn't
>> necessary.
>
> The initClauses has been removed and ExecInitExprList has been used.
>
>> - The executor already has a system for making sure that relations get
>> opened and locked, and it's different from the new one scheme which
>> set_append_subplan_indexes() implements. Relations should be locked
>> during the executor initialization phase (i.e. ExecInitAppend) and not
>> when the first tuple is requested (i.e. ExecAppend).  Also, there's
>> already code to lock both child relations (i.e. the scans of those
>> relations, see InitScanRelation, ExecInitIndexScan) and non-leaf
>> partitions (ExecLockNonLeafAppendTables).   The call to
>> find_all_inheritors() will lock all of that same stuff again *plus*
>> the leaf partitions that were pruned during planning - moreover, if
>> the Append is rescanned, we'll walk the partitioning structure again
>> for every rescan.  I think RelationGetPartitionDispatchInfo should be
>> called directly from ExecInitAppend after the existing code to take
>> locks has been called, and store a pointer to the PartitionDispatch
>> object in the AppendState for future use.
>
> I have moved the call to ExecInitAppend. This still uses the previous
> locking method, I will work on it in the next version of the patch.
>
>
>> - I am surprised that set_append_subplan_indexes() needs to worry
>> about multi-level partitioning directly.  I would have thought that
>> Amit's patch would take care of that, just returning a Bitmapset of
>> indexes which this function could use directly.  It also doesn't seem
>> like a very good idea to convert the Bitmapset (subplans) into a list
>> of integers (node->subplan_indexes), as set_append_subplan_indexes()
>> does at the bottom.  The Bitmapset will be a lot more efficient; we
>> should be able to just iterate over that directly rather than
>> converting it into a List.  Note that a Bitmapset can be created with
>> a single palloc, but an integer list needs one per list element plus
>> one for the list itself.
>
> The function get_partitions_from_clauses returns the Bitmap set of
> partitions for a level of partition. So when the BitmapSet that
> indicates a child partitioned table, set_append_subplan_indexes loops
> throgh again till it gets the list of all leaf indexes.
>
> I am working on the other comments and will post the patch along with
> rebasing to v14 of Amit's patch soon.
>
>
> --
>


PFA the updated patch, this can be applied over the v13 patches [1]
over commit 487a0c1518af2f3ae2d05b7fd23d636d687f28f3

[1] 
https://www.postgresql.org/message-id/df609168-b7fd-4c0b-e9b2-6e398d411e27%40lab.ntt.co.jp


-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-Implement-runtime-partiton-pruning_v6.patch
Description: Binary data

Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2017-12-12 Thread Teodor Sigaev

0002-pg-trgm-strict_word-similarity.patch – implementation of 
strict_word_similarity() with comments, docs and tests.

After some looking in

1)
repeated piece of code:
+   if (strategy == SimilarityStrategyNumber)
+   nlimit = similarity_threshold;
+   else if (strategy == WordSimilarityStrategyNumber)
+   nlimit = word_similarity_threshold;
+   else /* strategy == StrictWordSimilarityStrategyNumber */
+   nlimit = strict_word_similarity_threshold;
Isn't it better to move that piece to separate function?


2)
calc_word_similarity(char *str1, int slen1, char *str2, int slen2,
 bool check_only, bool word_bounds)

Seems, two bools args are replaceble to  bitwise-ORed flag. It will simplify 
adding new options in future.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: [HACKERS] Runtime Partition Pruning

2017-12-12 Thread Beena Emerson

Hello,

On Fri, Dec 8, 2017 at 3:37 AM, David Rowley
 wrote:
> On 7 December 2017 at 23:56, Beena Emerson  wrote:
>> Currently Amit's v13 patches do not apply on the HEAD and I was
>> working on 487a0c1518af2f3ae2d05b7fd23d636d687f28f3 which is the last
>> commit where all Amit's v13 patches applies cleanly.
>
> Thanks.
>
> I was just looking over this and was wondering about the following case:
>
> drop table if exists p;
> create table p (a int not null, b int not null) partition by range (a);
> create table p1 partition of p for values from (0) to (1000);
> create table p2 partition of p for values from (1000) to (2000);
> create table p3 partition of p for values from (2000) to (3000);
> create table p4 partition of p for values from (3000) to (4000);
>
> create index on p1 (a);
> create index on p2 (a);
> create index on p3 (a);
> create index on p4 (a);
>
>
> insert into p select x,x from generate_series(1,3999) x;
>
> drop table if exists t;
> create table t (a int not null);
>
> insert into t select generate_Series(1,10);
>
> analyze p;
>
> analyze t;
>
> set enable_mergejoin=0;
> set enable_hashjoin=0;
>
> explain analyze select * from p inner join t on p.a = t.a;
>
> The patch gives me:
>
>QUERY PLAN
> 
>  Nested Loop (actual time=0.032..0.159 rows=10 loops=1)
>->  Seq Scan on t (actual time=0.012..0.013 rows=10 loops=1)
>->  Append (actual time=0.004..0.013 rows=1 loops=10)
>  ->  Index Scan using p1_a_idx on p1 (actual time=0.004..0.004
> rows=1 loops=10)
>Index Cond: (a = t.a)
>  ->  Index Scan using p2_a_idx on p2 (actual time=0.003..0.003
> rows=0 loops=10)
>Index Cond: (a = t.a)
>  ->  Index Scan using p3_a_idx on p3 (actual time=0.002..0.002
> rows=0 loops=10)
>Index Cond: (a = t.a)
>  ->  Index Scan using p4_a_idx on p4 (actual time=0.003..0.003
> rows=0 loops=10)
>Index Cond: (a = t.a)
>  Planning time: 0.472 ms
>  Execution time: 0.241 ms
> (13 rows)
>
> but I expected to get (never executed) for p2, p3 and p4.
>
> The following code makes me think you intend this to work:
>
> @@ -280,6 +438,10 @@ ExecReScanAppend(AppendState *node)
>  {
>   int i;
>
> + /* Determine subplans to scan based on the new Params */
> + if (node->ps.chgParam != NULL && node->join_clauses)
> + set_append_subplan_indexes(node, node->join_clauses);
> +
>
> It just does not due to the node->join_clauses being NULL.
>
Thank you for your tests. I am working on this.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Runtime Partition Pruning

2017-12-12 Thread Beena Emerson

Hello Jesper,

On Tue, Dec 12, 2017 at 4:04 PM, Jesper Pedersen
 wrote:
> Hi Beena,
>
> On 12/07/2017 02:22 AM, Beena Emerson wrote:
>>
>> I have added the partition quals that are used for pruning.
>>
>> PFA the updated patch. I have changed the names of variables to make
>> it more appropriate, along with adding more code comments and doing
>> some refactoring and other code cleanups.
>>
>
> As the current patch conflicts with [1] could you provide a rebased version
> ?
>
> Thanks in advance !
>
> [1]
> https://www.postgresql.org/message-id/9b98fc47-34b8-0ab6-27fc-c8a0889f2e5b%40lab.ntt.co.jp
>

I am aware of this and will post a rebased version soon.


Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Learned Indexes in PostgreSQL?

2017-12-12 Thread Stefan Keller

Hi,

This is an awesome paper about a new index called "Learned Index".
it's aka dense hash structure derived ("learned") from actual data.
Looks very promising for certain properties [*].

Anyone already working on this in Postgres?
How could this be implemented in Postgres?

:Stefan

[*] "The Case for Learned Index Structures" Kraska et al. (Dec 2017)
https://arxiv.org/abs/1712.01208

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Ildus Kurbangaliev

On Mon, 11 Dec 2017 20:53:29 +0100
Tomas Vondra  wrote:

> But let me play the devil's advocate for a while and question the
> usefulness of this approach to compression. Some of the questions were
> mentioned in the thread before, but I don't think they got the
> attention they deserve.

Hi. I will try to explain why this approach could be better than others.

> 
> 
> Replacing the algorithm used to compress all varlena values (in a way
> that makes it transparent for the data type code).
> --
> 
> While pglz served us well over time, it was repeatedly mentioned that
> in some cases it becomes the bottleneck. So supporting other state of
> the art compression algorithms seems like a good idea, and this patch
> is one way to do that.
> 
> But perhaps we should simply make it an initdb option (in which case
> the whole cluster would simply use e.g. lz4 instead of pglz)?
> 
> That seems like a much simpler approach - it would only require some
> ./configure options to add --with-lz4 (and other compression
> libraries), an initdb option to pick compression algorithm, and
> probably noting the choice in cluster controldata.

Replacing pglz for all varlena values wasn't the goal of the patch, but
it's possible to do with it and I think that's good. And as Robert
mentioned pglz could appear as builtin undroppable compresssion method
so the others could be added too. And in the future it can open the
ways to specify compression for specific database or cluster.

> 
> Custom datatype-aware compression (e.g. the tsvector)
> --
> 
> Exploiting knowledge of the internal data type structure is a
> promising way to improve compression ratio and/or performance.
> 
> The obvious question of course is why shouldn't this be done by the
> data type code directly, which would also allow additional benefits
> like operating directly on the compressed values.
> 
> Another thing is that if the datatype representation changes in some
> way, the compression method has to change too. So it's tightly coupled
> to the datatype anyway.
> 
> This does not really require any new infrastructure, all the pieces
> are already there.
> 
> In some cases that may not be quite possible - the datatype may not be
> flexible enough to support alternative (compressed) representation,
> e.g. because there are no bits available for "compressed" flag, etc.
> 
> Conclusion: IMHO if we want to exploit the knowledge of the data type
> internal structure, perhaps doing that in the datatype code directly
> would be a better choice.

It could be, but let's imagine there will be internal compression for
tsvector. It means that tsvector has two formats now and minus one bit
somewhere in the header. After a while we found a better compression
but we can't add it because there is already one and it's not good to
have three different formats for one type. Or, the compression methods
were implemented and we decided to use dictionaries for tsvector (if
the user going to store limited number of words). But it will mean that
tsvector will go two compression stages (for its internal and for
dictionaries).

> 
> 
> Custom datatype-aware compression with additional column-specific
> metadata (e.g. the jsonb with external dictionary).
> --
> 
> Exploiting redundancy in multiple values in the same column (instead
> of compressing them independently) is another attractive way to help
> the compression. It is inherently datatype-aware, but currently can't
> be implemented directly in datatype code as there's no concept of
> column-specific storage (e.g. to store dictionary shared by all values
> in a particular column).
> 
> I believe any patch addressing this use case would have to introduce
> such column-specific storage, and any solution doing that would
> probably need to introduce the same catalogs, etc.
> 
> The obvious disadvantage of course is that we need to decompress the
> varlena value before doing pretty much anything with it, because the
> datatype is not aware of the compression.
> 
> So I wonder if the patch should instead provide infrastructure for
> doing that in the datatype code directly.
> 
> The other question is if the patch should introduce some
> infrastructure for handling the column context (e.g. column
> dictionary). Right now, whoever implements the compression has to
> implement this bit too.

Column specific storage sounds optional to me. For example compressing
timestamp[] using some delta compression will not require it.

-- 

Regards,
Ildus Kurbangaliev

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-12 Thread Alvaro Herrera

David Rowley wrote:

> ATTACH/REPLACE sounds fine. My objection was more about the
> DETACH/ATTACH method to replace an index.

So what happens if you do ALTER INDEX .. ATTACH and you already have
another index in that partition that is attached to the same parent in
the index?  With my code, what happens is you have two indexes attached
to the same parent, and you can't drop any.  If we don't have DETACH,
how can you get out of that situation?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD

2017-12-12 Thread Amit Kapila

On Tue, Dec 12, 2017 at 4:00 PM, Kuntal Ghosh
 wrote:
> On Mon, Dec 11, 2017 at 2:26 PM, Thomas Munro
>  wrote:
>> On Mon, Dec 11, 2017 at 8:14 PM, Amit Kapila  wrote:
>>
>>> Thanks for looking into it.  I will see if we can write some test.  In
>>> the meantime if possible, can you please request Patrick Hemmer to
>>> verify the attached patch?
>>
>> Our discussion was on the #postgresql Freenode channel.  I pointed him
>> at this thread, but I'm not sure if he'll see my message or be able to
>> test.
> After discussing with Amit, I'm able to reproduce the scenario in a
> master-standby setup. The issue occurs when we perform parallel
> index(-only) scan on a BTP_HALF_DEAD -marked page. (If a page is
> marked as BTP_DELETED, it's already unlinked from the index).
>
> When a btree page is deleted during vacuum, it's first marked as
> BTP_HALF_DEAD in _bt_mark_page_halfdead and then marked as BTP_DELETED
> in _bt_unlink_halfdead_page without releasing cleanup lock on the
> buffer. Hence, any scan node cannot lock the same buffer. So, the
> issue can't be reproduced on master.
>
> However, after replaying XLOG_BTREE_MARK_PAGE_HALFDEAD record, standby
> releases the lock on the same buffer. If we force parallelism, an
> index scan on the same page will cause hang the standby server.
> Following is a (unpleasant)way to reproduce the issue:
>
> In master (with autovacuum = off):
> 1. create table t1(a int primary key);
> 2. insert into t1 select * from generate_series(1,1000); --generates 3
> leaf nodes (block no 1,2,4) and 1 root node (block no 3)
> 3. delete from t1 where a>=367 and a<=735; --delete all tuples pointed by 
> leaf 2
> 4. analyze t1; --update the stats
> 5. explain analyze select * from t1 where a>=1 and a<=1000; --ensures
> that the next vacuum will consider leaf 2 for page deletion

What do you mean by next vacuum, here autovacuum is off?  Are you
missing any step which manually performs the vacuum?

> Now, put a break point at _bt_unlink_halfdead_page, so that vacuum
> can't unlink the page.
>
> In standby,
> 1. force parallelism.
> 2. explain analyze select * from t1 where a>=1 and a<=1000; and the
> parallel workers hang at the above-discussed place!
>
> The attached patch fixes the issue. One comment on the same:
> + else if (scan->parallel_scan != NULL)
> + {
> + /* allow next page be processed by parallel worker */
> + _bt_parallel_release(scan, opaque->btpo_next);
> + }
>
>   /* nope, keep going */
>   if (scan->parallel_scan != NULL)
>
> IMHO, There is no need to add an extra if condition.
> _bt_parallel_release can be included in the next one.
>

I don't think so because it is quite possible that _bt_readpage has
already released it (consider the case where _bt_readpage returns
false).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] logical decoding of two-phase transactions

2017-12-12 Thread Nikhil Sontakke

Hi,

Thanks for the warning fix, I will also look at the cassert case soon.

I have been adding more test cases to this patch. I added a TAP test
which now allows us to do a concurrent ROLLBACK PREPARED when the
walsender is in the midst of decoding this very prepared transaction.

Have added  a "decode-delay" parameter to test_decoding via which each
apply call sleeps for a few configurable number of seconds allowing us
to have deterministic rollback in parallel. This logic seems to work
ok.

However, I am battling an issue with invalidations now. Consider the
below test case:

CREATE TABLE test_prepared1(id integer primary key);
-- test prepared xact containing ddl
BEGIN; INSERT INTO test_prepared1 VALUES (5);
ALTER TABLE test_prepared1 ADD COLUMN data text;
INSERT INTO test_prepared1 VALUES (6, 'frakbar');
PREPARE TRANSACTION 'test_prepared#3';
COMMIT PREPARED 'test_prepared#3';
SELECT data FROM pg_logical_slot_get_changes(..) <-- this shows the
2PC being decoded appropriately
-- make sure stuff still works
INSERT INTO test_prepared1 VALUES (8);
SELECT data FROM pg_logical_slot_get_changes(..)

The last pg_logical_slot_get_changes call, shows:

table public.test_prepared1: INSERT: id[integer]:8

whereas since the 2PC committed, it should have shown:

table public.test_prepared1: INSERT: id[integer]:8 data[text]:null

This is an issue because of the way we are handling invalidations. We
don't allow ReorderBufferAddInvalidations() at COMMIT PREPARE time
since we assume that handling them at PREPARE time is enough.
Apparently, it's not enough. Am trying to allow invalidations at
COMMIT PREPARE time as well, but maybe calling
ReorderBufferAddInvalidations() blindly again is not a good idea.
Also, if I do that, then I am getting some restart_lsn inconsistencies
which causes subsequent pg_logical_slot_get_changes() calls to
re-decode older records. I continue to investigate.

I am attaching the latest WIP patch. This contains the additional TAP
test changes.

Regards,
Nikhils

On 8 December 2017 at 01:15, Peter Eisentraut
 wrote:
> On 12/7/17 08:31, Peter Eisentraut wrote:
>> On 12/4/17 10:15, Nikhil Sontakke wrote:
>>> PFA, latest patch for this functionality.
>>
>> This probably needs documentation updates for the logical decoding chapter.
>
> You need the attached patch to be able to compile without warnings.
>
> Also, the regression tests crash randomly for me at
>
> frame #4: 0x00010a6febdb
> postgres`heap_prune_record_prunable(prstate=0x7ffee5578990, xid=0)
> at pruneheap.c:625
>622   * This should exactly match the PageSetPrunable macro.  We
> can't store
>623   * directly into the page header yet, so we update working 
> state.
>624   */
> -> 625  Assert(TransactionIdIsNormal(xid));
>626  if (!TransactionIdIsValid(prstate->new_prune_xid) ||
>627  TransactionIdPrecedes(xid, prstate->new_prune_xid))
>628  prstate->new_prune_xid = xid;
>
> Did you build with --enable-cassert?
>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

2pc_logical_12_12_17_wip.patch
Description: Binary data

Re: Boolean partitions syntax

2017-12-12 Thread Ashutosh Bapat

On Tue, Dec 12, 2017 at 3:13 PM, Amit Langote
 wrote:
> On 2017/12/12 18:12, Ashutosh Bapat wrote:
>> On Tue, Dec 12, 2017 at 7:19 AM, Amit Langote wrote:
>>> Horiguchi-san pointed out [1] on a nearby thread that the partitioning
>>> syntax (the FOR VALUES clause) doesn't accept true and false as valid
>>> partition bound datums, which seems to me like an oversight.  Attached a
>>> patch to fix that.
>>
>> May be you should use opt_boolean_or_string instead of TRUE_P and
>> FALSE_P. It also supports ON and OFF, which will be bonus.
>
> Thanks for the suggestion.  I tried that but NonReservedWord_or_Sconst
> conflicts with Sconst that partbound_datum itself has a rule for,
> resulting in the following error:
>
> gram.y: conflicts: 6 reduce/reduce
> gram.y: expected 0 reduce/reduce conflicts
> gram.y:2769.25-81: warning: rule useless in parser due to conflicts:
> partbound_datum: Sconst

Probably that would get fixed if you remove Sconst from the
partition_datum and leave NonReservedWord_or_Sconst.

>
> Moreover, it seems like on/off are not being accepted as valid Boolean
> values like true/false are.
>
> insert into rp values (true);
> INSERT 0 1
> insert into rp values (on);
> ERROR:  syntax error at or near "on"
> LINE 1: insert into rp values (on);
>^
> What's going on with that?   Maybe on/off values work only with SET
> statements?


But this is more important observation. Looks like  on/off work with
SET and EXPLAIN only, not in normal SQL. So probably my suggestion was
wrongheaded.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-12 Thread Etsuro Fujita


Hi Maksim,

(2017/12/12 17:57), Maksim Milyutin wrote:

Your patch already is not applied on master. Please rebase it.


Done.  Please find attached an updated version of the patch.

Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***
*** 176,182  COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
--- 176,188 
  SELECT tableoid::regclass, * FROM pt;
  SELECT tableoid::regclass, * FROM p1;
  SELECT tableoid::regclass, * FROM p2;
+ \t on
+ EXPLAIN (VERBOSE, COSTS FALSE) INSERT INTO pt VALUES (1, 'xyzzy');
+ \t off
  INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
+ \t on
+ EXPLAIN (VERBOSE, COSTS FALSE) INSERT INTO pt VALUES (2, 'xyzzy');
+ \t off
  INSERT INTO pt VALUES (2, 'xyzzy');
  SELECT tableoid::regclass, * FROM pt;
  SELECT tableoid::regclass, * FROM p1;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***
*** 315,321  SELECT tableoid::regclass, * FROM p2;
  (0 rows)
  
  COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter ','); -- ERROR
! ERROR:  cannot route inserted tuples to a foreign table
  CONTEXT:  COPY pt, line 2: "1,qux"
  COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
  SELECT tableoid::regclass, * FROM pt;
--- 315,321 
  (0 rows)
  
  COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter ','); -- ERROR
! ERROR:  cannot route copied tuples to a foreign table
  CONTEXT:  COPY pt, line 2: "1,qux"
  COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
  SELECT tableoid::regclass, * FROM pt;
***
*** 341,348  SELECT tableoid::regclass, * FROM p2;
   p2   | 2 | qux
  (2 rows)
  
  INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
! ERROR:  cannot route inserted tuples to a foreign table
  INSERT INTO pt VALUES (2, 'xyzzy');
  SELECT tableoid::regclass, * FROM pt;
   tableoid | a |   b   
--- 341,366 
   p2   | 2 | qux
  (2 rows)
  
+ \t on
+ EXPLAIN (VERBOSE, COSTS FALSE) INSERT INTO pt VALUES (1, 'xyzzy');
+  Insert on public.pt
+Foreign Insert on public.p1
+Insert on public.p2
+->  Result
+  Output: 1, 'xyzzy'::text
+ 
+ \t off
  INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
! ERROR:  cannot route inserted tuples to foreign table "p1"
! \t on
! EXPLAIN (VERBOSE, COSTS FALSE) INSERT INTO pt VALUES (2, 'xyzzy');
!  Insert on public.pt
!Foreign Insert on public.p1
!Insert on public.p2
!->  Result
!  Output: 2, 'xyzzy'::text
! 
! \t off
  INSERT INTO pt VALUES (2, 'xyzzy');
  SELECT tableoid::regclass, * FROM pt;
   tableoid | a |   b   
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 7088,7093  NOTICE:  drop cascades to foreign table bar2
--- 7088,7295 
  drop table loct1;
  drop table loct2;
  -- ===
+ -- test tuple routing for foreign-table partitions
+ -- ===
+ create table pt (a int, b int) partition by list (a);
+ create table locp partition of pt for values in (1);
+ create table locfoo (a int check (a in (2)), b int);
+ create foreign table remp partition of pt for values in (2) server loopback options (table_name 'locfoo');
+ explain (verbose, costs off)
+ insert into pt values (1, 1), (2, 1);
+QUERY PLAN
+ -
+  Insert on public.pt
+Insert on public.locp
+Foreign Insert on public.remp
+  Remote SQL: INSERT INTO public.locfoo(a, b) VALUES ($1, $2)
+->  Values Scan on "*VALUES*"
+  Output: "*VALUES*".column1, "*VALUES*".column2
+ (6 rows)
+ 
+ insert into pt values (1, 1), (2, 1);
+ select tableoid::regclass, * FROM pt;
+  tableoid | a | b 
+ --+---+---
+  locp | 1 | 1
+  remp | 2 | 1
+ (2 rows)
+ 
+ select tableoid::regclass, * FROM locp;
+  tableoid | a | b 
+ --+---+---
+  locp | 1 | 1
+ (1 row)
+ 
+ select tableoid::regclass, * FROM remp;
+  tableoid | a | b 
+ --+---+---
+  remp | 2 | 1
+ (1 row)
+ 
+ explain (verbose, costs off)
+ insert into pt values (1, 2), (2, 2) returning *;
+QUERY PLAN   
+ 
+  Insert on public.pt
+Output: pt.a, pt.b
+Insert on public.locp
+Foreign Insert on public.remp
+  Remote SQL: INSERT INTO public.locfoo(a, b) VALUES ($1, $2) RETURNING a, b
+->  Values Scan on "*VALUES*"
+  Output: "*VALUES*".column1, "*VALUES*".column2
+ (7 rows)
+ 
+ insert into pt values (1, 2), (2, 2) returning *;
+  a | b 
+ ---+---
+  1 | 2
+  2 | 2
+ (2 rows)
+ 
+ select tableoid::regcl

Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2017-12-12 Thread Teodor Sigaev

0001-pg-trgm-word-similarity-docs-improvement.patch – contains improvement to 
documentation of word_similarity() and related operators.  I decided to give 
formal definition first (what exactly it internally does), and then example and 
some more human-understandable description.  This patch also adjusts two 
comments where lower and upper bounds mess up.


I'm ready for commit that, but I'd like someone from native English speaker to 
check that. Thank you.


And, suppose, this patch should be backpatched to 9.6

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: [HACKERS] logical decoding of two-phase transactions

2017-12-12 Thread Simon Riggs

On 12 December 2017 at 12:04, Nikhil Sontakke  wrote:

> This is an issue because of the way we are handling invalidations. We
> don't allow ReorderBufferAddInvalidations() at COMMIT PREPARE time
> since we assume that handling them at PREPARE time is enough.
> Apparently, it's not enough.

Not sure what that means.

I think we would need to fire invalidations at COMMIT PREPARED, yet
logically decode them at PREPARE.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: CUBE seems a bit confused about ORDER BY

2017-12-12 Thread Teodor Sigaev

Yes, the thing is that we change behavior of existing ~> operator.  In general, 
this is not good idea because it could affect existing users whose already use 
this operator.  Typically in such situation, we could leave existing operator as 
is, and invent new operator with new behavior.  However, in this particular 
case, I think there are reasons to make an exception to the rules.  The reasons 
are following:

1) The ~> operator was designed especially for knn gist.
2) Knn gist support for current behavior is broken by design and can't be 
fixed.  Most we can do to fix existing ~> operator behavior as is to drop knn 
gist support.  But then ~> operator would be left useless.
3) It doesn't seems that ~> operator have many users yet, because an error 
wasn't reported during whole release cycle.


I hope these reasons justify altering behavior of existing operator as an 
exception to the rules.  Another question is whether we should backpatch it.  
But I think we could leave this decision to committer.


I think that this patch is ready for committer.
I'm agree with changing behavior of existing ~> operator, but is any objection 
here? Current implementation is not fixable and I hope that users which use this 
operator will understand why we change it. Fortunately, the fix doesn't require 
changes in system catalog.


The single question here is about index over expression with this operator, they 
will need to reindex, which should be noted in release notes.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: CUBE seems a bit confused about ORDER BY

2017-12-12 Thread Alexander Korotkov

On Tue, Dec 12, 2017 at 3:49 PM, Teodor Sigaev  wrote:

> Yes, the thing is that we change behavior of existing ~> operator.  In
>> general, this is not good idea because it could affect existing users whose
>> already use this operator.  Typically in such situation, we could leave
>> existing operator as is, and invent new operator with new behavior.
>> However, in this particular case, I think there are reasons to make an
>> exception to the rules.  The reasons are following:
>> 1) The ~> operator was designed especially for knn gist.
>> 2) Knn gist support for current behavior is broken by design and can't be
>> fixed.  Most we can do to fix existing ~> operator behavior as is to drop
>> knn gist support.  But then ~> operator would be left useless.
>> 3) It doesn't seems that ~> operator have many users yet, because an
>> error wasn't reported during whole release cycle.
>>
>> I hope these reasons justify altering behavior of existing operator as an
>> exception to the rules.  Another question is whether we should backpatch
>> it.  But I think we could leave this decision to committer.
>>
>> I think that this patch is ready for committer.
>>
> I'm agree with changing behavior of existing ~> operator, but is any
> objection here? Current implementation is not fixable and I hope that users
> which use this operator will understand why we change it. Fortunately, the
> fix doesn't require changes in system catalog.
>
> The single question here is about index over expression with this
> operator, they will need to reindex, which should be noted in release notes.


Yes.  I bet only few users have built indexes over ~> operator if any.  Ask
them to reindex in the release notes seems OK for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2017-12-12 Thread Oleg Bartunov

On Mon, Dec 11, 2017 at 11:11 PM, Nikolay Samokhvalov
 wrote:
> Very interesting read: https://arxiv.org/abs/1712.01208
>
> HN discussion: https://news.ycombinator.com/item?id=15894896
>
> Some of the comments (from Twitter
> https://twitter.com/schrockn/status/940037656494317568): "Jeff Dean and co
> at GOOG just released a paper showing how machine-learned indexes can
> replace B-Trees, Hash Indexes, and Bloom Filters. Execute 3x faster than
> B-Trees, 10-100x less space. Executes on GPU, which are getting faster
> unlike CPU. Amazing."
>
> Can those ideas be applied to Postgres in its current state? Or it's not
> really down-to-earth?

Oleg made some analysis of the paper.

Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 10:04 PM, Alvaro Herrera
 wrote:
> David Rowley wrote:
>
>> ATTACH/REPLACE sounds fine. My objection was more about the
>> DETACH/ATTACH method to replace an index.
>
> So what happens if you do ALTER INDEX .. ATTACH and you already have
> another index in that partition that is attached to the same parent in
> the index?

I think that should be an ERROR.  You can use REPLACE if you want to
switch which index is attached, but you shouldn't be able to attach
two indexes from the same partition at the same time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Inconsistency in plpgsql's error context reports

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 3:52 PM, Tom Lane  wrote:
> Here's a quick hack at that.  I guess the main question that needs to be
> asked is whether we're happy with plpgsql getting so much chattier
> (as per all the regression test changes).
>
> If we're not, the alternative that could be considered is to make SPI
> expose some way to suppress its use of a context callback, and change
> plpgsql to invoke that when dealing with an expression.  That would
> be rather more invasive code-wise, but would largely eliminate the
> behavioral change as seen by users.
>
> Another angle, if we do keep it like this, is that maybe SPI should
> export _SPI_error_callback so that plpgsql can use it directly,
> rather than having a copy that needs to be kept in sync.

I confess to never having really grokked, even in the pre-patch state,
why we sometimes get an "SQL statement" context line and sometimes
not.  However, what strikes me about this is that the SQL statement is
a completely fabricated one that the user never entered.  Consider
this bit from the existing regression test output, unpatched:

create function namedparmcursor_test7() returns void as $$
declare
  c1 cursor (p1 int, p2 int) for
select * from tenk1 where thousand = p1 and tenthous = p2;
begin
  open c1 (p2 := 77, p1 := 42/0);
end $$ language plpgsql;
select namedparmcursor_test7();
ERROR:  division by zero
CONTEXT:  SQL statement "SELECT 42/0 AS p1, 77 AS p2;"
PL/pgSQL function namedparmcursor_test7() line 6 at OPEN

Quite obviously, nothing like "SELECT 42/0 AS p1, 77 AS p2;" is
present in there anywhere.  When people see an SQL statement in the
context, or at least when I see it, my inclination is to go grep for
where that SQL statement is to be found, and to be unhappy when the
answer is nowhere.  It would really be a lot better if we could say
something like

CONTEXT: SQL expression 42/0

...but I realize that's probably hard to do.  However, the current
situation is that plpgsql.out contains 5 "SQL statement" context
lines, of which only 1 is an SQL statement that actually appears in
the procedure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Out of date comment in cached_plan_cost

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 5:00 AM, David Rowley
 wrote:
> On 11 December 2017 at 21:39, Ashutosh Bapat
>  wrote:
>> I don't see much difference in the old and new wording. The word
>> "generally" confuses more than clarifying the cases when the planning
>> cost curves do not change with the number of relations i.e.
>> partitions.
>
> I added that to remove the false claim that inheritance children don't
> make the join problem more complex. This was only true before we had
> partition-wise joins.
>
> I've re-read my original patch and I don't really see the problem with
> it. The comment is talking about inheritance child relations, which
> you could either interpret to mean INHERITS (sometable), or some table
> listed in pg_inherits. The statement that I added forces me into
> thinking of the former rather than the latter, so I don't really see
> any issue.
>
> I'm going to leave it here. I don't want to spend too much effort
> rewording a comment. My vote is for the original patch I sent. I only
> changed it because Robert complained that technically an inheritance
> child could actually be a partition.

I basically feel like we're not really solving any problem by changing
this.  I mean, partition-wise join makes this statement less true, but
adding the word "generally" doesn't really help anybody understand the
situation better.  If we're going to add anything here, I think it
should be something like:

(This might need to be rethought in light of partition-wise join.)

If that's more specific than we want to get, then let's just leave it
alone.  Partition-wise join isn't even enabled by default at this
point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Gasper Zejn

On 12. 12. 2017 06:58, Tom Lane wrote:
> Craig Ringer  writes:
>> I doubt I've ever written just "exit" or "quit" without indentation. I
>> think if it requires them to be a bareword with no indentation, strictly
>> ^(exit|quit)\n when isatty, then that's probably a safe and helpful choice.
> FWIW, I think that the special behavior (whatever it winds up being
> exactly) ought to trigger on an input line matching
>
>   [whitespace]*help[whitespace]*(;[whitespace]*)?
>
> and similarly for exit/quit.  I think that novices might have internalized
> enough SQL syntax to think that they need to terminate the command with a
> semicolon --- in fact, we regularly see examples in which seasoned users
> think they need to terminate backslash commands with a semicolon, so
> that's hardly far-fetched.  And we might as well allow as much whitespace
> as we can, because nobody but Guido Rossum thinks that having whitespace
> be semantically significant is a good idea.
>
>   regards, tom lane
>
If tabs are considered whitespace, psql can sometimes treat it as
semantically significant, since
"create materializedtest11 as select 1;" will be autocompleted to
correct syntax if you paste the line into interactive psql session.

I have seen psql error out with invalid syntax when a similar query was
pasted and psql autocompleted it.

kind regards, Gasper

Re: Testing Extension Upgrade Paths

2017-12-12 Thread Jeremy Finzel

On Mon, Dec 11, 2017 at 7:55 PM, Craig Ringer  wrote:

> On 12 December 2017 at 07:25, Michael Paquier 
> wrote:
>
>> On Tue, Dec 12, 2017 at 2:55 AM, Jeremy Finzel  wrote:
>> > I understand how to setup a regression suite for a postgres extension,
>> but
>> > what I'm not clear on from the docs is if there is a pattern that
>> exists for
>> > testing not only the latest version of an extension, but also an
>> upgraded
>> > previous version.
>> >
>> > Is there any straightforward way to do this that doesn't involve
>> manually
>> > copying tests?
>>
>> It is perfectly possible to do tests on a specific version and test
>> upgrade paths using CREATE EXTENSION and ALTER EXTENSION which support
>> versioning in a SQL regression script, say:
>> CREATE EXTENSION foo VERSION "0.1";
>> -- Do stuff
>> ALTER EXTENSION foo UPDATE TO "0.2";
>> -- Do other stuff
>>
>
> This works well when you want to test 1.1 binaries with a 1.0 SQL
> extension definition.
>
> It's not effective if you need to test 1.0 binaries+extension, then an
> upgrade to 1.1 binaries and extension. You have no way to install and run
> the 1.0 binaries without going outside pg_regress / TAP and using an
> external test harness/framework/script. I've been bitten by this multiple
> times before. For example, if 1.1 always assigns a value to some column
> that was nullable in 1.0, you're never going to exercise 1.1's handling of
> nulls in that column.
>
> It also doesn't lend its self to complex multi-path upgrades, where for
> example you could upgrade 1.0.0 -> 1.0.1 -> 1.1.0 or  upgrade directly from
> 1.0.0 -> 1.1.0. This rapidly becomes an issue when you release 1.0.0,
> release 1.1.0, then release a maintenance 1.0.1  version. Now you have
> toadd the 1.0.1 -> 1.1.0 upgrade path, but you still have the 1.0.0 ->
> 1.1.0 path.
>
> You can handle that with TAP if you're willing to write something to do
> the various combinations of steps and exercise them. It's not practical
> with pg_regress.
>
> More thorough testing benefits from an external harness.
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Thanks both of you for the feedback.

I understood that I can do CREATE then ALTER EXTENSION UPDATE, but my goal
was to actually be able to run this workflow multiple ways without having
to hardcode that or duplicate steps.  To clarify, my intention was to only
test the binary for 1.1, but to then run the same test steps (without
having to duplicate things) under 2 separate scenarios:

   1. Create the extension right away at version 1.1, run the suite
   2. Create the extension at 1.0, populate extension config tables as if I
   have actually been using it, then upgrade to 1.1, and run the suite.

Here is how I ended up implementing this, and I am very open to feedback:

The first file is like this:

-- Allow running regression suite with upgrade paths
\set v `echo ${FROMVERSION:-1.1}`
SET client_min_messages = warning;
CREATE EXTENSION pglogical;
CREATE EXTENSION pgl_ddl_deploy VERSION :'v';

So if you run the suite without any environment variable, it will start at
1.1.  But if you add FROMVERSION=1.0, it will start at 1.0.

Then in test steps 2 and 3, I assume test results are identical between 1.0
and 1.1.  But in step 4, I run ALTER EXTENSION UPDATE, which is either a
no-op, or actually upgrades from 1.0.  The remaining tests in the whole
suite are designed to yield identical results in either path.

I am fairly happy with this because what I really wanted to test is
upgrading from 1.0 to 1.1 as opposed to a bare 1.1 install.

With yet later versions, I would of course need to modify this as
necessary, and I would want to test then yet more of the upgrade paths as
it is feasible.

Thanks,
Jeremy

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Geoff Winkless

I wouldn't exactly say -1 but it's a "meh" from me.

On 8 December 2017 at 13:57, Daniel Vérité"  wrote:
> "How to exit from PostgreSQL command line utility: psql"
>
> now at 430k views and 1368 upvotes.

There is also stackexchange question with 51000 views that asks how to
start the postgresql client. Should we rename psql to mysql to help
new users too?

Ctrl-D works almost everywhere. It takes 2 seconds to find the answer
and there are going to be many other things you have to find out if
you're coming at PostgreSQL from a different DB.

As others have pointed out, there are _far_ more frustrating things
(like the lack of SHOW CREATE [TABLE|etc]... support in the backend)
for anyone coming from mysql that I would suggest should be addressed
if you were serious about making life easier for those moving DBs.

Geoff

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 5:20 AM, Masahiko Sawada  wrote:
>> The question I have is how would we deal with a foreign server that is
>> not available for longer duration due to crash, longer network outage
>> etc. Example is the foreign server crashed/got disconnected after
>> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain
>> blocked for much longer duration without user having an idea of what's
>> going on. May be we should add some timeout.
>
> After more thought, I agree with adding some timeout. I can image
> there are users who want the timeout, for example, who cannot accept
> even a few seconds latency. If the timeout occurs backend unlocks the
> foreign transactions and breaks the loop. The resolver process will
> keep to continue to resolve foreign transactions at certain interval.

I don't think a timeout is a very good idea.  There is no timeout for
synchronous replication and the issues here are similar.  I will not
try to block a patch adding a timeout, but I think it had better be
disabled by default and have very clear documentation explaining why
it's really dangerous.  And this is why: with no timeout, you can
count on being able to see the effects of your own previous
transactions, unless at some point you sent a query cancel or got
disconnected.  With a timeout, you may or may not see the effects of
your own previous transactions depending on whether or not you hit the
timeout, which you have no sure way of knowing.

>>> transactions after the coordinator server recovered. On the other
>>> hand, for the reading a consistent result on such situation by
>>> subsequent reads, for example, we can disallow backends to inquiry SQL
>>> to the foreign server if a foreign transaction of the foreign server
>>> is remained.
>>
>> +1 for the last sentence. If we do that, we don't need the backend to
>> be blocked by resolver since a subsequent read accessing that foreign
>> server would get an error and not inconsistent data.
>
> Yeah, however the disadvantage of this is that we manage foreign
> transactions per foreign servers. If a transaction that modified even
> one table is remained as a in-doubt transaction, we cannot issue any
> SQL that touches that foreign server. Can we occur an error at
> ExecInitForeignScan()?

I really feel strongly we shouldn't complicate the initial patch with
this kind of thing.  Let's make it enough for this patch to guarantee
that either all parts of the transaction commit eventually or they all
abort eventually.  Ensuring consistent visibility is a different and
hard project, and if we try to do that now, this patch is not going to
be done any time soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Jeremy Finzel

Someone mentioned about needing to read the documentation of vim to learn
how to exit the program.  I personally think exactly the same applies here,
and am a bit surprised at the depth of discussion over this.

When I first was new to cli programs, the only "standard" way to exit a
program I found across the board was CTRL+D.  I have never even thought
that psql, even though different, was odd in its choice of how to exit the
program.  And their choices for why \q is used in a SQL cli program have
very good reasons as has been discussed.

So what if that stack overflow is the most hit on psql questions?  How many
of you have learned what you need within 5 seconds on google?  I don't
think this is a significant stumbling block for users adopting postgres.
psql is amazing and for anyone who likes cli programs, they will be fine.

FWIW I am +1 in favor of not overcomplicating the special psql commands and
keep to the "\" standard.

On Tue, Dec 12, 2017 at 8:35 AM, Gasper Zejn  wrote:

> On 12. 12. 2017 06:58, Tom Lane wrote:
> > Craig Ringer  writes:
> >> I doubt I've ever written just "exit" or "quit" without indentation. I
> >> think if it requires them to be a bareword with no indentation, strictly
> >> ^(exit|quit)\n when isatty, then that's probably a safe and helpful
> choice.
> > FWIW, I think that the special behavior (whatever it winds up being
> > exactly) ought to trigger on an input line matching
> >
> >   [whitespace]*help[whitespace]*(;[whitespace]*)?
> >
> > and similarly for exit/quit.  I think that novices might have
> internalized
> > enough SQL syntax to think that they need to terminate the command with a
> > semicolon --- in fact, we regularly see examples in which seasoned users
> > think they need to terminate backslash commands with a semicolon, so
> > that's hardly far-fetched.  And we might as well allow as much whitespace
> > as we can, because nobody but Guido Rossum thinks that having whitespace
> > be semantically significant is a good idea.
> >
> >   regards, tom lane
> >
> If tabs are considered whitespace, psql can sometimes treat it as
> semantically significant, since
> "create materializedtest11 as select 1;" will be autocompleted to
> correct syntax if you paste the line into interactive psql session.
>
> I have seen psql error out with invalid syntax when a similar query was
> pasted and psql autocompleted it.
>
> kind regards, Gasper
>
>
>

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Alexander Korotkov

Hi!

Let me add my two cents too.

On Tue, Dec 12, 2017 at 2:41 PM, Ildus Kurbangaliev <
i.kurbangal...@postgrespro.ru> wrote:

> On Mon, 11 Dec 2017 20:53:29 +0100 Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
> > Replacing the algorithm used to compress all varlena values (in a way
> > that makes it transparent for the data type code).
> > --
> >
> > While pglz served us well over time, it was repeatedly mentioned that
> > in some cases it becomes the bottleneck. So supporting other state of
> > the art compression algorithms seems like a good idea, and this patch
> > is one way to do that.
> >
> > But perhaps we should simply make it an initdb option (in which case
> > the whole cluster would simply use e.g. lz4 instead of pglz)?
> >
> > That seems like a much simpler approach - it would only require some
> > ./configure options to add --with-lz4 (and other compression
> > libraries), an initdb option to pick compression algorithm, and
> > probably noting the choice in cluster controldata.
>
> Replacing pglz for all varlena values wasn't the goal of the patch, but
> it's possible to do with it and I think that's good. And as Robert
> mentioned pglz could appear as builtin undroppable compresssion method
> so the others could be added too. And in the future it can open the
> ways to specify compression for specific database or cluster.
>

Yes, usage of custom compression methods to replace generic non
type-specific compression method is really not the primary goal of this
patch.  However, I would consider that as useful side effect.  However,
even in this case I see some advantages of custom compression methods over
initdb option.

1) In order to support alternative compression methods in initdb, we have
to provide builtin support for them.  Then we immediately run into
dependencies/incompatible-licenses problem.  Also, we tie appearance of new
compression methods to our release cycle.  In real life, flexibility means
a lot.  Giving users freedom to experiment with various compression
libraries without having to recompile PostgreSQL core is great.
2) It's not necessary that users would be satisfied with applying single
compression method to the whole database cluster.  Various columns may have
different data distributions with different workloads.  Optimal compression
type for one column is not necessary optimal for another column.
3) Possibility to change compression method on the fly without re-initdb is
very good too.

> Custom datatype-aware compression (e.g. the tsvector)
> > --
> >
> > Exploiting knowledge of the internal data type structure is a
> > promising way to improve compression ratio and/or performance.
> >
> > The obvious question of course is why shouldn't this be done by the
> > data type code directly, which would also allow additional benefits
> > like operating directly on the compressed values.
> >
> > Another thing is that if the datatype representation changes in some
> > way, the compression method has to change too. So it's tightly coupled
> > to the datatype anyway.
> >
> > This does not really require any new infrastructure, all the pieces
> > are already there.
> >
> > In some cases that may not be quite possible - the datatype may not be
> > flexible enough to support alternative (compressed) representation,
> > e.g. because there are no bits available for "compressed" flag, etc.
> >
> > Conclusion: IMHO if we want to exploit the knowledge of the data type
> > internal structure, perhaps doing that in the datatype code directly
> > would be a better choice.
>
> It could be, but let's imagine there will be internal compression for
> tsvector. It means that tsvector has two formats now and minus one bit
> somewhere in the header. After a while we found a better compression
> but we can't add it because there is already one and it's not good to
> have three different formats for one type. Or, the compression methods
> were implemented and we decided to use dictionaries for tsvector (if
> the user going to store limited number of words). But it will mean that
> tsvector will go two compression stages (for its internal and for
> dictionaries).

I would like to add that even for single datatype various compression
methods may have different tradeoffs.  For instance, one compression method
can have better compression ratio, but another one have faster
decompression.  And it's OK for user to choose different compression
methods for different columns.

Depending extensions on datatype internal representation doesn't seem evil
for me.  We already have bunch of extension depending on much more deeper
guts of PostgreSQL.  On major release of PostgreSQL, extensions must adopt
the changes, that is the rule.  And note, the datatype internal
representation alters relatively rare in comparison with other internals,
because it's related to on-disk fo

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread David G. Johnston

On Tue, Dec 12, 2017 at 8:04 AM, Jeremy Finzel  wrote:

> I don't think this is a significant stumbling block for users adopting
> postgres.
>

The OP complained, and I tend to agree, that 90% of the world uses the
command "exit" to end a program and that it would be nice if we did as
well.  This is not just a new user consideration - and I'd say that making
it so help/quit/exit/\q on a line of their own is helpfully reported to the
user as having been ineffective if in the middle of a string or
statement-continuation, would benefit more than just new users as well.

David J.

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Avinash Kumar

It takes a few hours to learn PostgreSQL and 1 second to remember, how to
quit from psql.
Surprised to see the depth of discussion on this.

It takes sometime for a person to explore or know about PostgreSQL and
install it.
Likewise, it takes him some more time to understand that there exists a
"psql" to connect to it.
He knows about psql because he searched it somewhere. Just a psql --help
(anybody uses help), should help me know much more about psql.
I think many people doesn't find it difficult to quit using \q.

I think its good to talk about or help with more features that helps a
production database.

Hope this discussion gets to a final conclusion soon :)

On Tue, Dec 12, 2017 at 11:19 AM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, Dec 12, 2017 at 8:04 AM, Jeremy Finzel  wrote:
>
>> I don't think this is a significant stumbling block for users adopting
>> postgres.
>>
>
> The OP complained, and I tend to agree, that 90% of the world uses the
> command "exit" to end a program and that it would be nice if we did as
> well.  This is not just a new user consideration - and I'd say that making
> it so help/quit/exit/\q on a line of their own is helpfully reported to the
> user as having been ineffective if in the middle of a string or
> statement-continuation, would benefit more than just new users as well.
>
> David J.
>
>

-- 
9000799060

portal pinning

2017-12-12 Thread Peter Eisentraut

While working on transaction control in procedures, I noticed some
inconsistencies in how portal pinning is used.

This mechanism was introduced in
eb81b6509f4c9109ecf8839d8c482cc597270687 to prevent user code from
closing cursors that PL/pgSQL has created internally, mainly for FOR
loops.  Otherwise, user code could just write CLOSE ''
to mess with the language internals.

It seems to me that PL/Perl and PL/Python should also use that
mechanism, because the same problem could happen there.  (PL/Tcl does
not expose any cursor functionality AFAICT.)  Currently, in PL/Perl, if
an internally generated cursor is closed, PL/Perl just thinks the cursor
has been exhausted and silently does nothing.  PL/Python comes back with
a slightly bizarre error message "closing a cursor in an aborted
subtransaction", which might apply in some situations but not in all.

Attached is a sample patch that adds portal pinning to PL/Perl and
PL/Python.

But I also wonder whether we shouldn't automatically pin/unpin portals
in SPI_cursor_open() and SPI_cursor_close().  This makes sense if you
consider "pinned" to mean "internally generated".  I don't think there
is a scenario in which user code should directly operate on a portal
created by SPI.

Comments?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ee052fc03d51d35617935679c30041127b635e21 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 12 Dec 2017 10:26:47 -0500
Subject: [PATCH] Use portal pinning in PL/Perl and PL/Python

---
 src/backend/utils/mmgr/portalmem.c  | 14 ++
 src/pl/plperl/plperl.c  |  8 
 src/pl/plpython/plpy_cursorobject.c |  8 
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/mmgr/portalmem.c 
b/src/backend/utils/mmgr/portalmem.c
index d03b779407..e1872b8fda 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -464,11 +464,17 @@ PortalDrop(Portal portal, bool isTopCommit)
 
/*
 * Don't allow dropping a pinned portal, it's still needed by whoever
-* pinned it. Not sure if the PORTAL_ACTIVE case can validly happen or
-* not...
+* pinned it.
 */
-   if (portal->portalPinned ||
-   portal->status == PORTAL_ACTIVE)
+   if (portal->portalPinned)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_CURSOR_STATE),
+errmsg("cannot drop pinned portal \"%s\"", 
portal->name)));
+
+   /*
+* Not sure if the PORTAL_ACTIVE case can validly happen or not...
+*/
+   if (portal->status == PORTAL_ACTIVE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_CURSOR_STATE),
 errmsg("cannot drop active portal \"%s\"", 
portal->name)));
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 9f5313235f..5dd3026dc2 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -3405,6 +3405,8 @@ plperl_spi_query(char *query)
 SPI_result_code_string(SPI_result));
cursor = cstr2sv(portal->name);
 
+   PinPortal(portal);
+
/* Commit the inner transaction, return to outer xact context */
ReleaseCurrentSubTransaction();
MemoryContextSwitchTo(oldcontext);
@@ -3468,6 +3470,7 @@ plperl_spi_fetchrow(char *cursor)
SPI_cursor_fetch(p, true, 1);
if (SPI_processed == 0)
{
+   UnpinPortal(p);
SPI_cursor_close(p);
row = &PL_sv_undef;
}
@@ -3519,7 +3522,10 @@ plperl_spi_cursor_close(char *cursor)
p = SPI_cursor_find(cursor);
 
if (p)
+   {
+   UnpinPortal(p);
SPI_cursor_close(p);
+   }
 }
 
 SV *
@@ -3883,6 +3889,8 @@ plperl_spi_query_prepared(char *query, int argc, SV 
**argv)
 
cursor = cstr2sv(portal->name);
 
+   PinPortal(portal);
+
/* Commit the inner transaction, return to outer xact context */
ReleaseCurrentSubTransaction();
MemoryContextSwitchTo(oldcontext);
diff --git a/src/pl/plpython/plpy_cursorobject.c 
b/src/pl/plpython/plpy_cursorobject.c
index 9467f64808..a527585b81 100644
--- a/src/pl/plpython/plpy_cursorobject.c
+++ b/src/pl/plpython/plpy_cursorobject.c
@@ -151,6 +151,8 @@ PLy_cursor_query(const char *query)
 
cursor->portalname = MemoryContextStrdup(cursor->mcxt, 
portal->name);
 
+   PinPortal(portal);
+
PLy_spi_subtransaction_commit(oldcontext, oldowner);
}
PG_CATCH();
@@ -266,6 +268,8 @@ PLy_cursor_plan(PyObject *ob, PyObject *args)
 
cursor

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 4:10 PM, Masahiko Sawada  wrote:
> Thank you for updating the patch. Here is two minor comments.
>
> + * we acquire the same relation extension lock repeatedly.  nLocks is 0 is 
> the
> + * number of times we've acquired that lock;
>
> Should it be "nLocks is the number of times we've acquired that lock:"?

Yes.

> +/* Remember lock held by this backend */
> +held_relextlock.relid = relid;
> +held_relextlock.lock = relextlock;
> +held_relextlock.nLocks = 1;
>
> We set held_relextlock.relid and held_relextlock.lock again. Can we remove 
> them?

Yes.

Can you also try the experiment Andres mentions: "Measure two COPYs to
relations on different filesystems, reduce N_RELEXTLOCK_ENTS to 1, and
measure performance. Then increase the concurrency of the copies to
each relation."  We want to see whether and how much this regresses
performance in that case.  It simulates the case of a hash collision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 5:50 PM, Andres Freund  wrote:
> This leads to fun like:
>
> postgres[2002][1]=# SELECT '92233720368547758.07'::money+'0.1';
> ┌─┐
> │  ?column?   │
> ├─┤
> │ -$92,233,720,368,547,757.99 │
> └─┘

It seems like I could have a lot MORE fun if I did it the other way --
i.e. spent myself so deeply into debt that I went positive again.

Seriously, though, this same issue was noted in a discussion back in 2011:

https://www.postgresql.org/message-id/flat/AANLkTi%3Dzbyy2%3Dcq8Wa3K3%2B%3Dn2ynkR1kdTHECnoruWS_G%40mail.gmail.com#AANLkTi=zbyy2=cq8Wa3K3+=n2ynkr1kdthecnoruw...@mail.gmail.com

Long story short, I don't think anyone cares about this enough to
spend effort fixing it.  I suspect the money data type has very few
users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Incorrect debug info printed in generate_partition_wise_join_paths

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 9:05 PM, Etsuro Fujita
 wrote:
> (2017/12/11 17:24), Ashutosh Bapat wrote:
>> Yes, that's the correct fix. We should be printing debug information
>> about the child and not the parent.
>
> Thanks for reviewing!

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Tom Lane

Robert Haas  writes:
> Long story short, I don't think anyone cares about this enough to
> spend effort fixing it.  I suspect the money data type has very few
> users.

Somebody did contribute the effort not too long ago to install that
overflow check into cash_in.  So maybe somebody will step up and fix
the other money functions.  I can't get too excited about it in the
meantime.

regards, tom lane

Re: plpgsql test layout

2017-12-12 Thread Peter Eisentraut

On 12/11/17 19:29, Michael Paquier wrote:
> If I read vcregress.pl correctly, it seems to me that you need to do
> more with MSVC (see plcheck). The tests would kick if sql/ and
> expected/ are found, and the test list is fetched by looking at
> REGRESSION in the test files. However plpgsql code has an additional
> src/ folder which would cause the tests to not execute. If plpgsql
> code was moved on folder down then the tests would execute properly.

OK, I hacked something up for MSVC.  How about this?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 90383a92d457ac0cc23faf72b68e87a12f000def Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 7 Dec 2017 14:03:29 -0500
Subject: [PATCH v2] Start a separate test suite for plpgsql

The plpgsql.sql test file in the main regression tests is now by far the
largest after numeric_big, making editing and managing the test cases
very cumbersome.  The other PLs have their own test suites split up into
smaller files by topic.  It would be nice to have that for plpgsql as
well.  So, to get that started, set up test infrastructure in
src/pl/plpgsql/src/ and split out the recently added procedure test
cases into a new file there.  That file now mirrors the test cases added
to the other PLs, making managing those matching tests a bit easier too.
---
 src/pl/plpgsql/src/.gitignore|  3 ++
 src/pl/plpgsql/src/Makefile  | 14 
 src/pl/plpgsql/src/expected/plpgsql_call.out | 41 +++
 src/pl/plpgsql/src/sql/plpgsql_call.sql  | 47 ++
 src/test/regress/expected/plpgsql.out| 41 ---
 src/test/regress/sql/plpgsql.sql | 49 
 src/tools/msvc/vcregress.pl  | 18 +++---
 7 files changed, 119 insertions(+), 94 deletions(-)
 create mode 100644 src/pl/plpgsql/src/expected/plpgsql_call.out
 create mode 100644 src/pl/plpgsql/src/sql/plpgsql_call.sql

diff --git a/src/pl/plpgsql/src/.gitignore b/src/pl/plpgsql/src/.gitignore
index 92387fa3cb..ff6ac965fd 100644
--- a/src/pl/plpgsql/src/.gitignore
+++ b/src/pl/plpgsql/src/.gitignore
@@ -1,3 +1,6 @@
 /pl_gram.c
 /pl_gram.h
 /plerrcodes.h
+/log/
+/results/
+/tmp_check/
diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile
index 95348179ac..64991c3115 100644
--- a/src/pl/plpgsql/src/Makefile
+++ b/src/pl/plpgsql/src/Makefile
@@ -24,6 +24,8 @@ OBJS = pl_gram.o pl_handler.o pl_comp.o pl_exec.o \
 
 DATA = plpgsql.control plpgsql--1.0.sql plpgsql--unpackaged--1.0.sql
 
+REGRESS = plpgsql_call
+
 all: all-lib
 
 # Shared library stuff
@@ -65,6 +67,18 @@ pl_gram.c: BISONFLAGS += -d
 plerrcodes.h: $(top_srcdir)/src/backend/utils/errcodes.txt 
generate-plerrcodes.pl
$(PERL) $(srcdir)/generate-plerrcodes.pl $< > $@
 
+
+check: submake
+   $(pg_regress_check) $(REGRESS_OPTS) $(REGRESS)
+
+installcheck: submake
+   $(pg_regress_installcheck) $(REGRESS_OPTS) $(REGRESS)
+
+.PHONY: submake
+submake:
+   $(MAKE) -C $(top_builddir)/src/test/regress pg_regress$(X)
+
+
 distprep: pl_gram.h pl_gram.c plerrcodes.h
 
 # pl_gram.c, pl_gram.h and plerrcodes.h are in the distribution tarball,
diff --git a/src/pl/plpgsql/src/expected/plpgsql_call.out 
b/src/pl/plpgsql/src/expected/plpgsql_call.out
new file mode 100644
index 00..d0f35163bc
--- /dev/null
+++ b/src/pl/plpgsql/src/expected/plpgsql_call.out
@@ -0,0 +1,41 @@
+--
+-- Tests for procedures / CALL syntax
+--
+CREATE PROCEDURE test_proc1()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+NULL;
+END;
+$$;
+CALL test_proc1();
+-- error: can't return non-NULL
+CREATE PROCEDURE test_proc2()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+RETURN 5;
+END;
+$$;
+CALL test_proc2();
+ERROR:  cannot return a value from a procedure
+CONTEXT:  PL/pgSQL function test_proc2() while casting return value to 
function's return type
+CREATE TABLE test1 (a int);
+CREATE PROCEDURE test_proc3(x int)
+LANGUAGE plpgsql
+AS $$
+BEGIN
+INSERT INTO test1 VALUES (x);
+END;
+$$;
+CALL test_proc3(55);
+SELECT * FROM test1;
+ a  
+
+ 55
+(1 row)
+
+DROP PROCEDURE test_proc1;
+DROP PROCEDURE test_proc2;
+DROP PROCEDURE test_proc3;
+DROP TABLE test1;
diff --git a/src/pl/plpgsql/src/sql/plpgsql_call.sql 
b/src/pl/plpgsql/src/sql/plpgsql_call.sql
new file mode 100644
index 00..38fd220e8f
--- /dev/null
+++ b/src/pl/plpgsql/src/sql/plpgsql_call.sql
@@ -0,0 +1,47 @@
+--
+-- Tests for procedures / CALL syntax
+--
+
+CREATE PROCEDURE test_proc1()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+NULL;
+END;
+$$;
+
+CALL test_proc1();
+
+
+-- error: can't return non-NULL
+CREATE PROCEDURE test_proc2()
+LANGUAGE plpgsql
+AS $$
+BEGIN
+RETURN 5;
+END;
+$$;
+
+CALL test_proc2();
+
+
+CREATE TABLE test1 (a int);
+
+CREATE PROCEDURE test_proc3(x int)
+LANGUAGE plpgsql
+AS $$
+BEGIN
+INSERT INTO test1 VALUES (x);
+END;
+$$;
+
+CALL test_proc3(55);
+
+SELECT * FROM

Re: Boolean partitions syntax

2017-12-12 Thread Peter Eisentraut

On 12/12/17 04:12, Ashutosh Bapat wrote:
> May be you should use opt_boolean_or_string instead of TRUE_P and
> FALSE_P. It also supports ON and OFF, which will be bonus.

But ON and OFF are not valid boolean literals in SQL.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Arthur Nascimento

On 12 December 2017 at 13:04, Jeremy Finzel  wrote:
> FWIW I am +1 in favor of not overcomplicating the special psql commands and
> keep to the "\" standard.

I find the general rule of \ starts a client command and anything else
starts a server command to be very good to begin with. The 'help' is
an ugly exception to the rule and I wouldn't like to see it expanding.
If it were up to me, I'd rename 'help' to '\help' and add '\q' to the
initial welcome message.

Of course, that wouldn't be accepted and it wouldn't solve the SO
statistics or the OP's complaint. It's been already established that
we're now trying to cater to a special kind of user - not the one who
doesn't want/have time to read the docs - but the one who didn't even
read the start message or who read it and forgot what it was after a
screen full of content has passed, then went to SO instead of the
docs. (I'm not criticizing - just trying to paint a clear picture of
the goal here.) And since I view the lonely exit|quit in an uglier
solution still, how about this idea:

Nano keeps a cheatsheet of commands on the bottom; and it's regarded
as a fairly good editor for newcomers. Some shells have a right-hand
side status bar. And even vim these days shows several lines of
explanation of :q if you open it alone. If psql had a right-hand side
status bar that defaulted to a short cheatsheet-like message like the
initial "type 'help' for help and '\q' to quit" (perhaps empty if
started with quiet), then it would follow new users around on every
new command, so they would always know how to quit or ask for help. I
think the message mentioning '\q' instead of 'quit' is important
anyway since it works even on a dirty buffer. And experienced users
already set their status lines anyway, so they would have an extra
status space on the right for general things; perhaps RPROMPT, as zsh
calls it. I'd rather have SO fill with "how to change the right prompt
in psql" than "how to quit psql" since then we're talking about
different user profiles.

Tureba - Arthur Nascimento

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Everaldo Canuto

On Tue, Dec 12, 2017 at 1:02 PM, Geoff Winkless  wrote:
>
> There is also stackexchange question with 51000 views that asks how to
> start the postgresql client. Should we rename psql to mysql to help
> new users too?
>

No, we shouldn't but it will help a lot is postgres client have names like
"postgres", "postgres-client" or "portgres-sql". Even pgsql is better than
psql. On mysql normally you type mysql then tab and bash show all mysql
tools.

>
> Ctrl-D works almost everywhere. It takes 2 seconds to find the answer
> and there are going to be many other things you have to find out if
> you're coming at PostgreSQL from a different DB.
>

Not everybody knows CTRL-D.

>
> As others have pointed out, there are _far_ more frustrating things
> (like the lack of SHOW CREATE [TABLE|etc]... support in the backend)
> for anyone coming from mysql that I would suggest should be addressed
> if you were serious about making life easier for those moving DBs.
>

Well theres lots of hungry people in the world, maybe we should stop
improve software tool until everyone have food.

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Geoff Winkless

On 12 December 2017 at 16:39, Everaldo Canuto  wrote:
> On Tue, Dec 12, 2017 at 1:02 PM, Geoff Winkless  wrote:
>> There is also stackexchange question with 51000 views that asks how to
>> start the postgresql client. Should we rename psql to mysql to help
>> new users too?

> No, we shouldn't but it will help a lot is postgres client have names like
> "postgres", "postgres-client" or "portgres-sql". Even pgsql is better than
> psql. On mysql normally you type mysql then tab and bash show all mysql
> tools.

If you're going down _that_ route I'd vote for pg_sql since we have
pg_dump and pg_ctl etc.

But in order to know how to call those things you would still need to
find out what they are. You think that typing mysql is "natural"
but you only know how to do it because someone told you to type
"mysql" to run the client.

> Not everybody knows CTRL-D.

And it takes two seconds to find the answer if you don't. Adding
"quit" and/or "exit" would save each person a maximum of one web
search per lifetime and would just confuse the paradigm that
backslashed commands are handled by the client.

> Well theres lots of hungry people in the world, maybe we should stop improve
> software tool until everyone have food.

That's a nonsense metaphor. If you said "there's lots of hungry people
in the world, maybe we should stop bakers making croissants until
everyone has bread" it might be slightly more appropriate.

Geoff

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Everaldo Canuto

On Tue, Dec 12, 2017 at 2:39 PM, Everaldo Canuto 
wrote:

>
> As others have pointed out, there are _far_ more frustrating things
>> (like the lack of SHOW CREATE [TABLE|etc]... support in the backend)
>> for anyone coming from mysql that I would suggest should be addressed
>> if you were serious about making life easier for those moving DBs.
>>
>
> Well theres lots of hungry people in the world, maybe we should stop
> improve software tool until everyone have food.
>

Sorry to be so sarcastic on last comment but, I am a little frustrated, it
is not like this patch is big and need a lots of review, it is already done
and it takes 15 minutes from me and it was the first time I look at
Postgres source code. Really sorry.

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Geoff Winkless

On 12 December 2017 at 16:36, Arthur Nascimento  wrote:
> Nano keeps a cheatsheet of commands on the bottom; and it's regarded
> as a fairly good editor for newcomers. Some shells have a right-hand
> side status bar.

I quite like this idea, although I would rather a line printed after
each command is run: something like

mydbname=# select ;
--
(1 row)

\h for help, \q for quit, \expert to turn off this message
osprey=#

This has the advantage that it doesn't need curses support, is pretty
simple and actually has the "off" information inline.

When you type \expert you could print a line that says

put \expert in ~/.psqlrc to make this change permanent

or something.

Geoff

Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2017-12-12 Thread Oleg Ivanov


On 12/12/2017 04:33 PM, Oleg Bartunov wrote:

On Mon, Dec 11, 2017 at 11:11 PM, Nikolay Samokhvalov
 wrote:

Very interesting read: https://arxiv.org/abs/1712.01208

HN discussion: https://news.ycombinator.com/item?id=15894896

Some of the comments (from Twitter
https://twitter.com/schrockn/status/940037656494317568): "Jeff Dean and co
at GOOG just released a paper showing how machine-learned indexes can
replace B-Trees, Hash Indexes, and Bloom Filters. Execute 3x faster than
B-Trees, 10-100x less space. Executes on GPU, which are getting faster
unlike CPU. Amazing."

Can those ideas be applied to Postgres in its current state? Or it's not
really down-to-earth?

Oleg made some analysis of the paper.
If the keys are comparable and the data distribution is simple enough 
(which seems to happen rather often) and the data does not change, we 
can learn the distribution, and so perform the search faster, and save 
the memory also. That is what authors wrote and that is what must work 
in my opinion.


The limitations of the approach, which authors mention in their work:
+ The data does not change. The proposed method can be extended more or 
less straightforward to nearly append-only workloads and to workloads in 
which the data distribution does not change or changes very slowly (the 
data itself or its amount may change, nevertheless). Otherwise it is 
challenging, because the key positions cannot be considered as 
independent values anymore, but it looks still possible. The other 
proposed by the authors option is to store new objects in buffer and 
retrain model in the background, but that does not look as a nice solution.
+ GPU are fast, but the latency of doing operations on the GPU almost 
certainly avoids all speed benefits for such small models. The only case 
in which it is reasonable to use GPU is training models (i. e. building 
such indexes).


The following left unclear for me from the paper:
+ How effectively the approach can be implemented taking into account 
the limitations above.
+ For hash functions authors propose to replace hash function with the 
learned CDF, which can decrease the amount of collisions, which decreaes 
the memory consumption. That is reasonable, but this hash function 
implicitly uses the information about the order on the keys. I suppose 
that using the ordering on the data and with access to the data 
histogram one could achieve similar memory consumption decrease.
+ The proposed hierarchical learning looks natural for the problem, but 
it is not clear that it is the best option. For example, one might use 
the end-to-end learning procedure with weights sparsity regularizers as 
well. I wonder whether the last approach can obtain the competitive 
results with the first one, and if not, then why. Anyway, I have a 
feeling that the proposed method has a room for improvement.


In general, the approach still needs some additional research (mostly 
about the changing data), and has some points to think about how to 
implement them properly (paging, for example), but it looks promising 
enough nevertheless.

Re: Learned Index

2017-12-12 Thread Oleg Ivanov


On 12/12/2017 12:16 PM, Laurenz Albe wrote:

I have read into the paper.

This may be interesting or not, but the paper is very vague about
its concepts and algorithms, so it's hard to tell.

I'd say that the paper does not meet publication standards.

For example, they say that their results were generated by comparing
a B-tree implementation with "learned indexes using a 2-stage
RMI model and different second-stage sizes (i.e., 10k, 50k, 100k, and 
200k)",

but they don't say exactly what the neural network in these stages is
(at least it is not obvious to me).  Their "Learning Index Framework" 
(LIF)

is described with a few vague sentences and a reference to the literature
saying that is where they got some ideas from.
That is not the answer, but gives us the idea of which kind of neural 
networks was used: "For this paper, we only focused on 2 types of 
models, simple neural nets with zero to two fully-connected hidden 
layers and ReLU activation functions and a layer width of up to 32 
neurons and B-Trees (a.k.a. decision trees)".

There is also no clear concept of how these indexes should handle
data modifications, so I think that there are some loose ends to be
tied up before it is ready for implementation.

Finally, I don't see any clear statement as to the error guarantees
that the neural network prediction can give, and if it is possible that
it may degrade to scanning relevant parts of the table in some cases.
No guarantees are provided (I don't think it is even possible), besides 
the guarantee that if the error of the neural network prediction is more 
than the error of B-tree prediction, B-tree will be used: "Note, that 
hybrid indexes allow us to bound the worst case performance of learned 
indexes to the performance of B-Trees".


Oleg Ivanov
Postgres Professional
The Russian PostgreSQL Company

Backfill bgworker Extension?

2017-12-12 Thread Jeremy Finzel

One of our challenges we have is that our engineers have written frameworks
to backfill data in several different DSLs, and every time they adopt a new
language, they maybe need to write another one.

To be clear, what I mean is batch updating a large set of data in small
pieces so as to avoid things like lock contention and replication lags.
Sometimes these have a driving table that has the source data to update in
a destination table based on a key column, but sometimes it is something
like setting just a single specific value for a huge table.

I would love instead to have a Postgres extension that uses postgres
background workers to accomplish this, especially if it were part of core.
Before I venture into exploring writing something like this as an
extension, would this ever be considered something appropriate as an
extension in Postgres core?  Would that be appropriate?

Thanks,
Jeremy

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Andres Freund

Hi,

On 2017-12-12 11:01:04 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > Long story short, I don't think anyone cares about this enough to
> > spend effort fixing it.  I suspect the money data type has very few
> > users.

I'm unfortunately not so sure :(.

> Somebody did contribute the effort not too long ago to install that
> overflow check into cash_in.  So maybe somebody will step up and fix
> the other money functions.  I can't get too excited about it in the
> meantime.

I can't really either. But I think that kinda suggest we ought to rip
that code out in the not too far away future.

The background is that I was working on committing the faster (&
correct) overflow checks, and wanted to compile postgres with -ftrapv.
Some rudimentary cash (as well as pgbench, rel/abstime) fixes were
required to make the tests succeed...

Greetings,

Andres Freund

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Chapman Flack

On 12/12/2017 12:00 PM, Geoff Winkless wrote:

> I quite like this idea, although I would rather a line printed after
> each command is run: something like
> 
> mydbname=# select ;
> --
> (1 row)
> 
> \h for help, \q for quit, \expert to turn off this message
> osprey=#

It seems to me that this is very close to the Robert Haas suggestion
already discussed upthread, except that suggestion only prints the
hint message when you might need it, which I'd find less intrusive.

-Chap

Re: pgbench's expression parsing & negative numbers

2017-12-12 Thread Andres Freund

On 2017-12-12 07:21:21 +0100, Fabien COELHO wrote:
> 
> Hello Andres,
> 
> > working on overflow correctness in pg I noticed that pgbench isn't quite
> > there.  I assume it won't matter terribly often, but the way it parses
> > integers makes it incorrect for, at least, the negativemost number.
> > 
> > It directly parses the integer input using:
> > {digit}+{
> > yylval->ival = strtoint64(yytext);
> > return INTEGER_CONST;
> > }
> > 
> > but that unfortunately means that the sign is no included in the call to
> > strtoint64. Which in turn means you can't correctly parse PG_INT64_MIN,
> 
> Indeed. For a benchmarking script which generates a pseudo random load I do
> not see as a big issue, but maybe I'm missing some use case.

IDK, behaving correctly seems like a good idea. Also far from impossible
that that code gets propagated further.  Doesn't seem like an entirely
crazy idea that somebody actually wants to benchmark boundary cases,
which might be slower and such.

> > because that's not representable as a positive number on a two's
> > complement machine (i.e. everywhere).  Preliminary testing shows that
> > that can relatively easily fixed by just prefixing [-+]? to the relevant
> > exprscan.l rules.
> 
> Beware that it must handle the unary/binary plus/minus case consistently:
> 
>   "-123" vs "a -123" vs "a + -123"

Which is a lot easier to solve on the parser side of things...


> > But it might also not be a bad idea to simply defer parsing of the
> > number until exprparse.y has its hand on the number?
> 
> It is usually simpler to let the lexer handle constants.


> > There's plenty other things wrong with overflow handling too, especially
> > evalFunc() basically just doesn't care.
> 
> Sure.
> 
> There are some overflow checking with div and double to int cast, which were
> added because of previous complaints, but which are not very useful to me.

I think handling it inconsistently is the worst of all worlds.


> ISTM that it is a voluntary feature, which handles int64 operations with a
> modulo overflow like C. Generating exceptions on such overflows does not
> look very attractive to me.

No, it's not really voluntary. Signed overflow isn't actually defined
behaviour in C. We kinda make it so on some platforms, but that's not
really a good idea.


> >  I'll come up with a patch for that sometime soon.
> 
> I wish you could provide some motivation: why does it matter much to a
> benchmarking script to handle overflows with more case?

a) I want to be able to compile without -fwrapv in the near
   future. Which means you can't have signed overflows, because they're
   undefined behaviour in C.
b) I want to be able to compile with -ftrapv /
   -fsanitize=signed-integer-overflow, to be sure code is actually
   correct. Currently pgbench crashes with that.
c) Randomly handling things in some but not all places is a bad idea.


> > A third complaint I have is that the tap tests are pretty hard to parse
> > for humans.
> 
> Probably.
> 
> Perl is pretty hard to humans to start with:-)

Meh.


> There is a lot of code factorization to cram many tests together, so that
> one test is more or less one line and there are hundreds of them. Not sure
> it would help if it was expanded.

I don't think expanding it is really a problem, I think they're just
largely not well documented, formatted. With some effort the tests could
be a lot easier to read.


Greetings,

Andres Freund

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Andres Freund

On 2017-12-12 14:25:48 +0800, Craig Ringer wrote:
> If we were willing to wrap variadic calls in a callback and rely on
> vfprintf, we could:

I don't think there's problems with relying on variadic calls, we do
that in plenty places.

> and let the ProcSignal based caller pass an elog wrapper instead, or form a
> StringInfo with appendStringInfoVA then elog it in one go after the
> MemoryContextStatsDetail call returns.

I don't think we want a simple elog wrapper - outputting the context
stats as hundreds of log messages doesn't seem right. So at the least it
seems we should bunch it up in stringinfo - which seems to at least
require expanding the API to pass around a void *callback_data or such.

I do wonder if the right thing here wouldn't be to put the result into a
dsm segment, and then return that to the UDF on the requesting
side. Logging to the server log and then have the requestor dig that out
doesn't seem particularly user friendly.

Greetings,

Andres Freund

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 10:41 PM, Amit Kapila  wrote:
> In particular, if the worker crashed (say somebody forcefully killed
> the worker), then I think it will lead to a restart of the server.  So
> not sure, adding anything about that in the comment will make much
> sense.

I think it's dangerous to rely on that assumption.  If somebody stuck
proc_exit(1) into the very beginning of the worker startup sequence,
the worker would exit but the server would not restart.

As I started experimenting with this, I discovered that your patch is
actually unsafe.   Here's a test case that shows the breakage:

set force_parallel_mode = on;
select 1;

The expected outcome is:

 ?column?
--
1
(1 row)

If I hack do_start_bgworker() to make worker startup fail, like this:

--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5701,7 +5701,7 @@ do_start_bgworker(RegisteredBgWorker *rw)
 #ifdef EXEC_BACKEND
 switch ((worker_pid = bgworker_forkexec(rw->rw_shmem_slot)))
 #else
-switch ((worker_pid = fork_process()))
+switch ((worker_pid = -1))
 #endif
 {
 case -1:

...then it hangs forever.  With your patch it no longer hangs forever,
which is good, but it also returns the wrong answer, which is bad:

 ?column?
--
(0 rows)

The problem is that Gather sees nworkers_launched > 0 and assumes that
it is therefore entitled not to perform a local scan.  I'm pretty sure
that there could also be cases where this patch makes us miss an error
reported by a worker; suppose the worker error is reported and the
worker exits after WaitForParallelWorkersToFinish does
CHECK_FOR_INTERRUPTS() but before we reach GetBackgroundWorkerPid().
The code will simply conclude that the worker is dead and stop
waiting, never mind the fact that there's a pending error in the
queue.  This is exactly why I made parallel workers send a Terminate
message to the leader instead of just disappearing -- the leader needs
to see that message to know that it's reached the end of what the
worker is going to send.

So, I think we need to regard fork failure and related failure modes
as ERROR conditions.  It's not equivalent to failing to register the
background worker.  When we fail to register the background worker,
the parallel.c machinery knows at LaunchParallelWorkers() time that
the worker will never show up and lets the calling code by setting
nworkers_launched, and the calling code can then choose to proceed
with that number of workers or ERROR out as it chooses.  But once we
decide on the value of nworkers_launched to report to callers, it's
really not OK for workers to just silently not ever start, as the test
case above shows.  We could make it the job of code that uses the
ParallelContext machinery to cope with that situation, but I think
that's a bad plan, because every user would have to worry about this
obscure case.  Right now, the only in-core use of the ParallelContext
machinery is Gather/Gather Merge, but there are pending patches for
parallel index scan and parallel VACUUM, so we'll just end up adding
this handling to more and more places.  And for what?  If you've got
max_connections and/or max_parallel_workers set so high that your
system can't fork(), you have a big problem, and forcing things that
would have run using parallel query to run serially instead of
erroring out is not going to fix it.  I think that deciding we're
going to handle fork() failure by reporting an ERROR is just fine.

So, here's a patch.  This patch keeps track of whether we've ever
received a message from each worker via the error queue.  If there are
no remaining workers which have neither exited cleanly nor sent us at
least one message, then it calls GetBackgroundWorkerPid() for each
one.  If any worker reports that it is BGWH_STOPPED, then it
double-checks whether the worker has attached to the error queue.  If
it did, then it assumes that we'll detect clean shutdown or an error
on the next iteration and ignores the problem.  If not, then the
worker died without ever attaching the error queue, so it throws an
ERROR.  I tested that this causes the test case above to throw a
proper ERROR when fork_process() returns -1.  Please review...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

parallel-worker-fork-failure.patch
Description: Binary data

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 1:50 PM, Andres Freund  wrote:
> I do wonder if the right thing here wouldn't be to put the result into a
> dsm segment, and then return that to the UDF on the requesting
> side. Logging to the server log and then have the requestor dig that out
> doesn't seem particularly user friendly.

I think that dumping it to the server log will be fine for most
people, and it's *significantly* safer.  Creating a DSM segment could
fail, and the last thing we want is for interrogating a long-running
backend to make it crap out.

+1 for this whole concept, just BTW.  As I've said before, I grow
weary of asking customers to run gdb.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Andres Freund

On 2017-12-12 14:04:55 -0500, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 1:50 PM, Andres Freund  wrote:
> > I do wonder if the right thing here wouldn't be to put the result into a
> > dsm segment, and then return that to the UDF on the requesting
> > side. Logging to the server log and then have the requestor dig that out
> > doesn't seem particularly user friendly.
> 
> I think that dumping it to the server log will be fine for most
> people, and it's *significantly* safer.

I agree that it's more reliable - I hope there's no meaningful safety
difference.  I think you overestimate users a bit however - far from
most of them are going to be able to extract a very long log entry from
a busy log file. There's no generally available easy way to copy a few
pages of text from a logfile that's a few gigabytes large...

I'd suggest adding two functions.

> Creating a DSM segment could fail, and the last thing we want is for
> interrogating a long-running backend to make it crap out.

It'd need to handle memory alloc failures gracefully, I agree. That
doesn't seem impossible.  Also don't think that's meaningfully different
in the non-dsm case, it'd be good to handle malloc failures gracefully
in that case as well.

The requesting side should create the dsm segment and preallocate a
reasonable amount of memory, if we go for it.

> +1 for this whole concept, just BTW.  As I've said before, I grow
> weary of asking customers to run gdb.

Indeed.

Greetings,

Andres Freund

Re: ALTER TABLE ADD COLUMN fast default

2017-12-12 Thread Andrew Dunstan

On 12/06/2017 11:26 AM, Andrew Dunstan wrote:
>> In general, you need to be thinking about this in terms of making sure
>> that it's impossible to accidentally not update code that needs to be
>> updated.  Another example is that I noticed that some places the patch
>> has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
>> the former will fail to copy the missing-attribute support data.
>> I think that's an astonishingly bad idea.  There is basically no situation
>> in which CreateTupleDescCopy defined in that way will ever be safe to use.
>> The missing-attribute info is now a fundamental part of a tupdesc and it
>> has to be copied always, just as much as e.g. atttypid.
>>
>> I would opine that it's a mistake to design TupleDesc as though the
>> missing-attribute data had anything to do with default values.  It may
>> have originated from the same place but it's a distinct thing.  Better
>> to store it in a separate sub-structure.
>
> OK, will work on all that. Thanks again.
>

Looking closer at this. First, there is exactly one place where the
patch does  s/CreateTupleDescCopy/CreateTupleDescCopyConstr/.

making this like atttypid suggests that it belongs right outside the
TupleConstr structure. But then it seems to me that the change could
well end up being a darn site more invasive. For example, should we be
passing the number of missing values to CreateTemplateTupleDesc and
CreateTupleDesc? I'm happy to do whatever work is required, but I want
to have a firm understanding of the design before I spend lots of time
cutting code. Once I understand how tupdesc.h should look I should be good.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 2:12 PM, Andres Freund  wrote:
> I agree that it's more reliable - I hope there's no meaningful safety
> difference.  I think you overestimate users a bit however - far from
> most of them are going to be able to extract a very long log entry from
> a busy log file. There's no generally available easy way to copy a few
> pages of text from a logfile that's a few gigabytes large...

Well, a lot of users will send us the whole logfile rather than just
the relevant bits, but that doesn't bother me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-12 Thread Andres Freund

On 2017-12-12 14:16:59 -0500, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 2:12 PM, Andres Freund  wrote:
> > I agree that it's more reliable - I hope there's no meaningful safety
> > difference.  I think you overestimate users a bit however - far from
> > most of them are going to be able to extract a very long log entry from
> > a busy log file. There's no generally available easy way to copy a few
> > pages of text from a logfile that's a few gigabytes large...
> 
> Well, a lot of users will send us the whole logfile rather than just
> the relevant bits, but that doesn't bother me.

That doesn't really work on some busy servers, and also often in cases
where the log potentially contains sensitive (e.g. HIPPA) data.

I think a function returning the dump would be great, a function "just"
dumping to the server log still pretty good.

Greetings,

Andres Freund

Re: Rethinking MemoryContext creation

2017-12-12 Thread Tom Lane

I wrote:
> I've not done any benchmarking on this yet, just confirmed that it
> compiles and passes check-world.

So once I'd done some benchmarking, I was a bit disappointed: I could
not measure any difference anymore in "pgbench -S", and poking into
other scenarios found cases that were actually slower, like

do $$
begin
  for i in 1..1000 loop
declare x int;
begin
  x := 42;
exception when others then null;
end;
  end loop;
end$$;

which went from about 12.4 seconds in HEAD to about 13.6 with my
v2 patch.  When I looked into the reason, I found that my earlier
blithe pronouncement that "optimizing for the unused-context case
seems like the wrong thing" was too simple.  In this example we
create and delete two memory contexts per loop (a subtransaction
context and an ExprContext) and neither of them receives any
palloc requests.  Some of the contexts created in a "pgbench -S"
scenario are the same way.  So in these examples, we were on the
losing side of the replace-a-palloc-with-a-malloc trade.

However, if we can predict which contexts are more likely not to
receive traffic, we can fix this by doing things the old way for
those contexts.  I instrumented AllocSetDelete to log whether
the target context had received any requests in its lifespan,
and aggregated the reports over a run of the core regression tests.
I found these cases where there were significantly more reports
of "context was never used" than "context was used":

379 CurTransactionContext was never used
 24 CurTransactionContext was used
  66978 ExprContext was never used
  17364 ExprContext was used
993 HashTableContext was never used
 25 HashTableContext was used
 88 Logical Replication Launcher sublist was never used
  2 Logical Replication Launcher sublist was used
  11139 SPI Exec was never used
   2421 SPI Exec was used
 36 SetOp was never used
  2 SetOp was used
185 Statistics snapshot was never used
104 Subplan HashTable Context was never used
 44 Subplan HashTable Context was used
148 Subplan HashTable Temp Context was never used
   1481 Table function arguments was never used
 45 Table function arguments was used
 22 TableFunc per value context was never used
 52 Unique was never used
  4 Unique was used
137 WindowAgg Aggregates was never used
  2 WindowAgg Aggregates was used
127 WindowAgg Partition was never used
 12 WindowAgg Partition was used
288 _bt_pagedel was never used
 14 _bt_pagedel was used
 35 event trigger context was never used
  3 event trigger context was used
  38386 ginPlaceToPage temporary context was never used
   3348 ginPlaceToPage temporary context was used
454 ident parser context was never used
229 tSRF function arguments was never used
 46 tSRF function arguments was used

A lot of these probably aren't bothering with optimizing because they just
don't occur often enough to move the needle.  (And in workloads where that
wasn't true, maybe the usage statistics would be different anyway.)  But
it looked to me like CurTransactionContext, ExprContext, HashTableContext,
SPI Exec, and ginPlaceToPage temporary context would be worth doing it
the old way for.

Accordingly, attached is a v3 patch that adds another flag to
AllocSetContextCreateExtended telling it to do things the old way
with the context header in TopMemoryContext.  (We can still optimize
context name copying as before.)  This fixes the subtransaction case
I showed above, bringing it to 10.7 sec which is noticeably better
than HEAD.  I now see maybe a 1 or 2 percent improvement in "pgbench -S",
which isn't much, but it turns out that that test case only involves
half a dozen context creations/deletions per transaction.  So probably
we aren't going to get any noticeable movement on that benchmark, and
it'd be brighter to look for test cases where more contexts are involved.

regards, tom lane

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 868c14e..adbbc44 100644
*** a/contrib/amcheck/verify_nbtree.c
--- b/contrib/amcheck/verify_nbtree.c
*** bt_check_every_level(Relation rel, bool 
*** 295,303 
  	/* Create context for page */
  	state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
   "amcheck context",
!  ALLOCSET_DEFAULT_MINSIZE,
!  ALLOCSET_DEFAULT_INITSIZE,
!  ALLOCSET_DEFAULT_MAXSIZE);
  	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);

  	/* Get true root block from meta-page */
--- 295,301 
  	/* Create context for page */
  	state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
   "amcheck context",
!  ALLOCSET_DEFAULT_SIZES);
  	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);

  	/* Get true root block from meta-page */
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 1b92

Re: Rethinking MemoryContext creation

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 2:30 PM, Tom Lane  wrote:
> 379 CurTransactionContext was never used
>  24 CurTransactionContext was used
>   66978 ExprContext was never used
>   17364 ExprContext was used
>   11139 SPI Exec was never used
>2421 SPI Exec was used
>   38386 ginPlaceToPage temporary context was never used
>3348 ginPlaceToPage temporary context was used

It strikes me that a way to optimize these cases even more would be to
postpone creating the context until it's actually needed.  That might
not always be a reasonable plan -- in particular, it occurs to me to
think that CurTransactionContext is probably so widely used that
changing anything about how it works would probably be really painful
-- but it might be possible in some cases.

Another idea I have is that perhaps we could arrange to reuse contexts
instead of destroying them and recreating them.  For example, when
asked to delete a context, we could instead push it on a linked list
of old contexts, or only if the list isn't too long already, and when
asked to create one, we could pop from the list.  Or we could keep
around an array of, say, 1024 contexts that are never freed and only
allocated dynamically when we run out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rethinking MemoryContext creation

2017-12-12 Thread Andres Freund

On 2017-12-12 14:50:37 -0500, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 2:30 PM, Tom Lane  wrote:
> > 379 CurTransactionContext was never used
> >  24 CurTransactionContext was used
> >   66978 ExprContext was never used
> >   17364 ExprContext was used
> >   11139 SPI Exec was never used
> >2421 SPI Exec was used
> >   38386 ginPlaceToPage temporary context was never used
> >3348 ginPlaceToPage temporary context was used
> 
> It strikes me that a way to optimize these cases even more would be to
> postpone creating the context until it's actually needed.  That might
> not always be a reasonable plan -- in particular, it occurs to me to
> think that CurTransactionContext is probably so widely used that
> changing anything about how it works would probably be really painful
> -- but it might be possible in some cases.

That's not generally easy without slowing things down though - e.g. we
don't want to check for ExprContext's existence before every use, there
can be billions of usages in a single analytics query. The branches (yea
yea ;)) would show up as being noticeable.

There are a few places where could probably reliably *detect* that
they're not needed however. E.g. plenty executor nodes only need a
ExprContext when either a qual or projection is needed, both are pretty
cheap to detect at ExecInitNode() time.

> Another idea I have is that perhaps we could arrange to reuse contexts
> instead of destroying them and recreating them.  For example, when
> asked to delete a context, we could instead push it on a linked list
> of old contexts, or only if the list isn't too long already, and when
> asked to create one, we could pop from the list.  Or we could keep
> around an array of, say, 1024 contexts that are never freed and only
> allocated dynamically when we run out.

I'm a bit doubtful that's going to help, maintaining that list isn't
going to be free, and the lifetime and number of those contexts aren't
always going to match up.

I think you're somewhat on to something however: I do think that
especially executor startup would have a good chance of avoiding
noticeable overhead by batching the allocation of all these tiny
contexts together - I just don't quite know how given how our current
initialization works.  The best thing to do would probably to do two
walks during executor initialization, one to compute memory sizes, and a
second to initialize pre-requested memory that's laid out
serially... But obviously that's not a small change.

Greetings,

Andres Freund

Error generating coverage report

2017-12-12 Thread David Steele

I'm working on improving coverage and would like to generate some
reports (other than the text versions) to help me find uncovered code.
However, my source path and build path are not the same and I'm running
into problems.

This works fine and produces gcov output:

$ make -C test/build/src/bin/pg_basebackup check
$ ls -1 test/build/src/bin/pg_basebackup/*.gcno

test/build/src/bin/pg_basebackup/pg_basebackup.gcno
test/build/src/bin/pg_basebackup/pg_receivewal.gcno
test/build/src/bin/pg_basebackup/pg_recvlogical.gcno
test/build/src/bin/pg_basebackup/receivelog.gcno
test/build/src/bin/pg_basebackup/streamutil.gcno
test/build/src/bin/pg_basebackup/walmethods.gcno

But, when I try to generate the coverage report I get:

$ make -C test/build/src/bin/pg_basebackup coverage-html

make: Entering directory `/home/vagrant/test/build/src/bin/pg_basebackup'
/usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d
/postgres/src/bin/pg_basebackup -o lcov_base.info
geninfo: ERROR: no .gcno files found in /postgres/src/bin/pg_basebackup!
make: *** [lcov_base.info] Error 255
make: *** Deleting file `lcov_base.info'
make: Leaving directory `/home/vagrant/test/build/src/bin/pg_basebackup'

It's looking at my source directory rather than my build directory.
Here's how I'm calling configure:

$ cd /home/vagrant/test/build
$ /postgres/configure --prefix=/home/vagrant/test/pg \
--docdir=/postgres-doc --enable-cassert --enable-tap-tests \
--enable-coverage --with-openssl --enable-depend

I read docs/10/static/regress-coverage.html but didn't see anything that
helps.  Hopefully I'm missing something?

Thanks,
-- 
-David
da...@pgmasters.net

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Oleg Bartunov

On Tue, Dec 12, 2017 at 6:07 PM, Alexander Korotkov
 wrote:
> Hi!
>
> Let me add my two cents too.
>
> On Tue, Dec 12, 2017 at 2:41 PM, Ildus Kurbangaliev
>  wrote:
>>
>> On Mon, 11 Dec 2017 20:53:29 +0100 Tomas Vondra
>>  wrote:
>> > Replacing the algorithm used to compress all varlena values (in a way
>> > that makes it transparent for the data type code).
>> > --
>> >
>> > While pglz served us well over time, it was repeatedly mentioned that
>> > in some cases it becomes the bottleneck. So supporting other state of
>> > the art compression algorithms seems like a good idea, and this patch
>> > is one way to do that.
>> >
>> > But perhaps we should simply make it an initdb option (in which case
>> > the whole cluster would simply use e.g. lz4 instead of pglz)?
>> >
>> > That seems like a much simpler approach - it would only require some
>> > ./configure options to add --with-lz4 (and other compression
>> > libraries), an initdb option to pick compression algorithm, and
>> > probably noting the choice in cluster controldata.
>>
>> Replacing pglz for all varlena values wasn't the goal of the patch, but
>> it's possible to do with it and I think that's good. And as Robert
>> mentioned pglz could appear as builtin undroppable compresssion method
>> so the others could be added too. And in the future it can open the
>> ways to specify compression for specific database or cluster.
>
>
> Yes, usage of custom compression methods to replace generic non
> type-specific compression method is really not the primary goal of this
> patch.  However, I would consider that as useful side effect.  However, even
> in this case I see some advantages of custom compression methods over initdb
> option.
>
> 1) In order to support alternative compression methods in initdb, we have to
> provide builtin support for them.  Then we immediately run into
> dependencies/incompatible-licenses problem.  Also, we tie appearance of new
> compression methods to our release cycle.  In real life, flexibility means a
> lot.  Giving users freedom to experiment with various compression libraries
> without having to recompile PostgreSQL core is great.
> 2) It's not necessary that users would be satisfied with applying single
> compression method to the whole database cluster.  Various columns may have
> different data distributions with different workloads.  Optimal compression
> type for one column is not necessary optimal for another column.
> 3) Possibility to change compression method on the fly without re-initdb is
> very good too.

I consider custom compression as the way to custom TOAST. For example,
to optimal access
parts of very long document we need to compress slices of document.
Currently, long jsonb
document get compressed and then sliced and that killed all benefits
of binary jsonb.  Also, we are
thinking about "lazy" access to the parts of jsonb from pl's, which is
currently awfully unefficient.

>
>> > Custom datatype-aware compression (e.g. the tsvector)
>> > --
>> >
>> > Exploiting knowledge of the internal data type structure is a
>> > promising way to improve compression ratio and/or performance.
>> >
>> > The obvious question of course is why shouldn't this be done by the
>> > data type code directly, which would also allow additional benefits
>> > like operating directly on the compressed values.
>> >
>> > Another thing is that if the datatype representation changes in some
>> > way, the compression method has to change too. So it's tightly coupled
>> > to the datatype anyway.
>> >
>> > This does not really require any new infrastructure, all the pieces
>> > are already there.
>> >
>> > In some cases that may not be quite possible - the datatype may not be
>> > flexible enough to support alternative (compressed) representation,
>> > e.g. because there are no bits available for "compressed" flag, etc.
>> >
>> > Conclusion: IMHO if we want to exploit the knowledge of the data type
>> > internal structure, perhaps doing that in the datatype code directly
>> > would be a better choice.
>>
>> It could be, but let's imagine there will be internal compression for
>> tsvector. It means that tsvector has two formats now and minus one bit
>> somewhere in the header. After a while we found a better compression
>> but we can't add it because there is already one and it's not good to
>> have three different formats for one type. Or, the compression methods
>> were implemented and we decided to use dictionaries for tsvector (if
>> the user going to store limited number of words). But it will mean that
>> tsvector will go two compression stages (for its internal and for
>> dictionaries).
>
>
> I would like to add that even for single datatype various compression
> methods may have different tradeoffs.  For instance, one compression method
> can have better compression ratio, but another one have faster

Re: [HACKERS] pg_stat_wal_write statistics view

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 4:18 AM, Haribabu Kommi
 wrote:
> I ran the pgbench performance after removing stats collector usage and
> moving
> the stats into shared memory. I find it that this stats addition doesn't
> have any
> overhead while collecting the stats. I ran these tests on a small machine,
> these may
> need to be retested on an high end machine to confirm that there is no
> impact.
>
> Threads Clients HEAD PATCH Diff
> 1 1 88.72981533 88.527789 -0.23%
> 2 2 88.17999   88.823842 0.73%
> 4 4 169.430813 167.897937 -0.90%
> 8 8 311.790313 315.6659117 1.24%
> 16 16 558.6014777 562.5883583 0.71%
> 32 32 874.0996587 899.855634 2.95%
>
> Attached is the updated patch accordingly to use the shared memory stats.

I think the concerns have more to do with systems that have a huge
number of tables.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error generating coverage report

2017-12-12 Thread Peter Eisentraut

On 12/12/17 15:02, David Steele wrote:
> I'm working on improving coverage and would like to generate some
> reports (other than the text versions) to help me find uncovered code.
> However, my source path and build path are not the same and I'm running
> into problems.

Vpath builds were one of the things that was specifically tested for the
coverage tooling changes that went into PG11, so it should work in
principle, unless something has changed in the meantime.

> $ make -C test/build/src/bin/pg_basebackup coverage-html
> 
> make: Entering directory `/home/vagrant/test/build/src/bin/pg_basebackup'
> /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d
> /postgres/src/bin/pg_basebackup -o lcov_base.info
> geninfo: ERROR: no .gcno files found in /postgres/src/bin/pg_basebackup!
> make: *** [lcov_base.info] Error 255
> make: *** Deleting file `lcov_base.info'
> make: Leaving directory `/home/vagrant/test/build/src/bin/pg_basebackup'

The -d options should accumulate, so that it should look in both places.

Wild guess: If the /postgres directory is some kind of vboxsf mount,
there might be problems if symlinks are involved.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Backfill bgworker Extension?

2017-12-12 Thread Peter Eisentraut

On 12/12/17 13:03, Jeremy Finzel wrote:
> To be clear, what I mean is batch updating a large set of data in small
> pieces so as to avoid things like lock contention and replication lags. 
> Sometimes these have a driving table that has the source data to update
> in a destination table based on a key column, but sometimes it is
> something like setting just a single specific value for a huge table.
> 
> I would love instead to have a Postgres extension that uses postgres
> background workers to accomplish this, especially if it were part of
> core.  Before I venture into exploring writing something like this as an
> extension, would this ever be considered something appropriate as an
> extension in Postgres core?  Would that be appropriate?

I don't see what the common ground between different variants of this
use case would be.  Aren't you basically just looking to execute a
use-case-specific stored procedure in the background?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Learned Indexes in PostgreSQL?

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 6:38 AM, Stefan Keller  wrote:
> This is an awesome paper about a new index called "Learned Index".
> it's aka dense hash structure derived ("learned") from actual data.
> Looks very promising for certain properties [*].
>
> Anyone already working on this in Postgres?
> How could this be implemented in Postgres?

You're the third person to start a thread on this topic in the last few days.

Here are links to the other two threads:

http://postgr.es/m/CANNMO+J1KeTSx5q5SYuwHf1v-gPRLrOZw1s7qOpqWx=3umm...@mail.gmail.com
http://postgr.es/m/caaerrx8bhiw3rgaqpolqjhyhk7ghordtke5uc9dasv1w5fz...@mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Geoff Winkless

On 12 December 2017 at 18:32, Chapman Flack  wrote:
> It seems to me that this is very close to the Robert Haas suggestion
> already discussed upthread, except that suggestion only prints the
> hint message when you might need it, which I'd find less intrusive.

Well "close" yes, but I like "simpler", and I'm not sure I like the
idea of waiting for someone to flail at the quit command before giving
them a hint.

Also, to be frank, if someone's typed "quit" and you're going to
perform an action based on it, it feels like you might as well quit -
a bit like google saying "did you mean \q" in its sneering
supercilious way.

Geoff

Re: Error generating coverage report

2017-12-12 Thread David Steele


Hi Peter,

On 12/12/17 3:20 PM, Peter Eisentraut wrote:


make: Entering directory `/home/vagrant/test/build/src/bin/pg_basebackup'
/usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d
/postgres/src/bin/pg_basebackup -o lcov_base.info
geninfo: ERROR: no .gcno files found in /postgres/src/bin/pg_basebackup!
make: *** [lcov_base.info] Error 255
make: *** Deleting file `lcov_base.info'
make: Leaving directory `/home/vagrant/test/build/src/bin/pg_basebackup'


The -d options should accumulate, so that it should look in both places.

Wild guess: If the /postgres directory is some kind of vboxsf mount,
there might be problems if symlinks are involved.


/postgres is a vboxsf mount.  However, I copied /postgres to 
/home/vagrant/src and still see the same error:


$ make -C test/build/src/bin/pg_basebackup coverage-html
make: Entering directory `/home/vagrant/test/build/src/bin/pg_basebackup'
/usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d 
/home/vagrant/src/src/bin/pg_basebackup -o lcov_base.info
geninfo: ERROR: no .gcno files found in 
/home/vagrant/src/src/bin/pg_basebackup!

make: *** [lcov_base.info] Error 255
make: *** Deleting file `lcov_base.info'
make: Leaving directory `/home/vagrant/test/build/src/bin/pg_basebackup'

Thoughts?

--
-David
da...@pgmasters.net

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 11:58 AM, Everaldo Canuto
 wrote:
> Sorry to be so sarcastic on last comment but, I am a little frustrated, it
> is not like this patch is big and need a lots of review, it is already done
> and it takes 15 minutes from me and it was the first time I look at Postgres
> source code. Really sorry.

It's not really done in any useful sense, because it has a -1 vote
from a prominent committer.  There is however an alternative proposal
which has several +1 votes, including from that same committer and
from me.  If you would like to produce a patch that implements that
counter-proposal, it might go somewhere.  Very little of significance
gets done around here without some give and take; I know that's
sometimes frustrating, but it's how it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 10:02 AM, Geoff Winkless  wrote:
> There is also stackexchange question with 51000 views that asks how to
> start the postgresql client. Should we rename psql to mysql to help
> new users too?

That's not a serious question, and I don't know what your point is in
suggesting it, unless it's that you want to just troll everyone, which
I don't find especially constructive.

> Ctrl-D works almost everywhere. It takes 2 seconds to find the answer
> and there are going to be many other things you have to find out if
> you're coming at PostgreSQL from a different DB.

If it really took everyone 2 seconds, this question would not be as
common as it is.  So, it probably takes some people quite a bit
longer.

> As others have pointed out, there are _far_ more frustrating things
> (like the lack of SHOW CREATE [TABLE|etc]... support in the backend)
> for anyone coming from mysql that I would suggest should be addressed
> if you were serious about making life easier for those moving DBs.

That is neither a reason to make a change along the lines discussed
here nor a reason not to do so.  You can start separate threads about
those other topics if you wish.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Robert Haas

On Tue, Dec 12, 2017 at 3:38 PM, Geoff Winkless  wrote:
> Well "close" yes, but I like "simpler", and I'm not sure I like the
> idea of waiting for someone to flail at the quit command before giving
> them a hint.

I think a patch that adds a line to every prompt for everyone all the
time and has to be explicitly turned off is not something that most
people here are going to be willing to consider.  -1 from me, for a
start.

> Also, to be frank, if someone's typed "quit" and you're going to
> perform an action based on it, it feels like you might as well quit -
> a bit like google saying "did you mean \q" in its sneering
> supercilious way.

Well, I don't agree; the reasons why have already been discussed
upthread, so I won't repeat them here.

Also, I think you're attributing a lot of emotion to Google's
auto-correct.  While it is not without its flaws, it is right about
what I meant more often than it is wrong, and there are a lot of
people who are worse at spelling than I am (not that I'm perfect or
anything).  Accuracy aside, I am fairly sure that it is an emotionless
software artifact incapable either of sneering or superciliousness.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 1:06 PM, Alexander Korotkov
 wrote:
> OK, but NOTICE that presumably unexpected table rewrite takes place could be
> still useful.

I'm not going to complain too much about that, but I think that's
mostly a failure of expectation rather than a real problem.  If the
documentation says what the user should expect, and they expect
something else, tough luck for them.

> Also we probably should add some view that will expose compression methods
> whose are currently preserved for columns.  So that user can correctly
> construct SET COMPRESSION query that doesn't rewrites table without digging
> into internals (like directly querying pg_depend).

Yes.  I wonder if \d or \d+ can show it somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error generating coverage report

2017-12-12 Thread Peter Eisentraut

On 12/12/17 15:39, David Steele wrote:
> $ make -C test/build/src/bin/pg_basebackup coverage-html
> make: Entering directory `/home/vagrant/test/build/src/bin/pg_basebackup'
> /usr/bin/lcov --gcov-tool /usr/bin/gcov -q --no-external -c -i -d . -d 
> /home/vagrant/src/src/bin/pg_basebackup -o lcov_base.info
> geninfo: ERROR: no .gcno files found in 
> /home/vagrant/src/src/bin/pg_basebackup!
> make: *** [lcov_base.info] Error 255
> make: *** Deleting file `lcov_base.info'
> make: Leaving directory `/home/vagrant/test/build/src/bin/pg_basebackup'

I tried a setup like this locally and it works fine.  So we might have
to be more precise.  Can you tell me the vagrant box you are using if
it's public, or the OS version, and the exact commands, then I can try
to reproduce it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rethinking MemoryContext creation

2017-12-12 Thread Tom Lane

Andres Freund  writes:
> On 2017-12-12 14:50:37 -0500, Robert Haas wrote:
>> It strikes me that a way to optimize these cases even more would be to
>> postpone creating the context until it's actually needed.  That might
>> not always be a reasonable plan -- in particular, it occurs to me to
>> think that CurTransactionContext is probably so widely used that
>> changing anything about how it works would probably be really painful
>> -- but it might be possible in some cases.

> That's not generally easy without slowing things down though - e.g. we
> don't want to check for ExprContext's existence before every use, there
> can be billions of usages in a single analytics query. The branches (yea
> yea ;)) would show up as being noticeable.

Yeah.  Also, in most of these cases what we're doing with the context
is installing it as CurrentMemoryContext while we execute some random
code that might or might not need to palloc anything.  We can't just
set CurrentMemoryContext to null - for one thing, that would leave no
trace of how the context should get set up if it's needed.  You could
imagine installing some sort of placeholder, but really that's the
mechanism we've already got, in the case where we just make a context
header and no blocks.

>> Another idea I have is that perhaps we could arrange to reuse contexts
>> instead of destroying them and recreating them.

> I'm a bit doubtful that's going to help, maintaining that list isn't
> going to be free, and the lifetime and number of those contexts aren't
> always going to match up.

Actually, this seems like a really promising idea to me.  To the extent
that an empty context has standard parameters, they're all
interchangeable, so you could imagine that AllocSetDelete shoves it
onto a freelist after resetting it, instead of just giving it back to
libc.  I'm slightly worried about creating allocation islands that
way, but that problem is probably surmountable, if it's real at all.

However, that seems like a different patch from what I'm working on
here.

regards, tom lane

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Robert Haas

On Mon, Dec 11, 2017 at 2:53 PM, Tomas Vondra
 wrote:
> But let me play the devil's advocate for a while and question the
> usefulness of this approach to compression. Some of the questions were
> mentioned in the thread before, but I don't think they got the attention
> they deserve.

Sure, thanks for chiming in.  I think it is good to make sure we are
discussing this stuff.

> But perhaps we should simply make it an initdb option (in which case the
> whole cluster would simply use e.g. lz4 instead of pglz)?
>
> That seems like a much simpler approach - it would only require some
> ./configure options to add --with-lz4 (and other compression libraries),
> an initdb option to pick compression algorithm, and probably noting the
> choice in cluster controldata.
>
> No dependencies tracking, no ALTER TABLE issues, etc.
>
> Of course, it would not allow using different compression algorithms for
> different columns (although it might perhaps allow different compression
> level, to some extent).
>
> Conclusion: If we want to offer a simple cluster-wide pglz alternative,
> perhaps this patch is not the right way to do that.

I actually disagree with your conclusion here.   I mean, if you do it
that way, then it has the same problem as checksums: changing
compression algorithms requires a full dump-and-reload of the
database, which makes it more or less a non-starter for large
databases.  On the other hand, with the infrastructure provided by
this patch, we can have a default_compression_method GUC that will be
set to 'pglz' initially.  If the user changes it to 'lz4', or we ship
a new release where the new default is 'lz4', then new tables created
will use that new setting, but the existing stuff keeps working.  If
you want to upgrade your existing tables to use lz4 rather than pglz,
you can change the compression option for those columns to COMPRESS
lz4 PRESERVE pglz if you want to do it incrementally or just COMPRESS
lz4 to force a rewrite of an individual table.  That's really
powerful, and I think users will like it a lot.

In short, your approach, while perhaps a little simpler to code, seems
like it is fraught with operational problems which this design avoids.

> Custom datatype-aware compression (e.g. the tsvector)
> --
>
> Exploiting knowledge of the internal data type structure is a promising
> way to improve compression ratio and/or performance.
>
> The obvious question of course is why shouldn't this be done by the data
> type code directly, which would also allow additional benefits like
> operating directly on the compressed values.
>
> Another thing is that if the datatype representation changes in some
> way, the compression method has to change too. So it's tightly coupled
> to the datatype anyway.
>
> This does not really require any new infrastructure, all the pieces are
> already there.
>
> In some cases that may not be quite possible - the datatype may not be
> flexible enough to support alternative (compressed) representation, e.g.
> because there are no bits available for "compressed" flag, etc.
>
> Conclusion: IMHO if we want to exploit the knowledge of the data type
> internal structure, perhaps doing that in the datatype code directly
> would be a better choice.

I definitely think there's a place for compression built right into
the data type.  I'm still happy about commit
145343534c153d1e6c3cff1fa1855787684d9a38 -- although really, more
needs to be done there.  But that type of improvement and what is
proposed here are basically orthogonal.  Having either one is good;
having both is better.

I think there may also be a place for declaring that a particular data
type has a "privileged" type of TOAST compression; if you use that
kind of compression for that data type, the data type will do smart
things, and if not, it will have to decompress in more cases.  But I
think this infrastructure makes that kind of thing easier, not harder.

> Custom datatype-aware compression with additional column-specific
> metadata (e.g. the jsonb with external dictionary).
> --
>
> Exploiting redundancy in multiple values in the same column (instead of
> compressing them independently) is another attractive way to help the
> compression. It is inherently datatype-aware, but currently can't be
> implemented directly in datatype code as there's no concept of
> column-specific storage (e.g. to store dictionary shared by all values
> in a particular column).
>
> I believe any patch addressing this use case would have to introduce
> such column-specific storage, and any solution doing that would probably
> need to introduce the same catalogs, etc.
>
> The obvious disadvantage of course is that we need to decompress the
> varlena value before doing pretty much anything with it, because the
> datatype is not aware of the compression.
>
> So I wonder if the patch should instead pr

Re: CUBE seems a bit confused about ORDER BY

2017-12-12 Thread Tomas Vondra



On 12/12/2017 01:52 PM, Alexander Korotkov wrote:
> On Tue, Dec 12, 2017 at 3:49 PM, Teodor Sigaev  > wrote:
> 
> Yes, the thing is that we change behavior of existing ~>
> operator.  In general, this is not good idea because it could
> affect existing users whose already use this operator. 
> Typically in such situation, we could leave existing operator as
> is, and invent new operator with new behavior.  However, in this
> particular case, I think there are reasons to make an exception
> to the rules.  The reasons are following:
> 1) The ~> operator was designed especially for knn gist.
> 2) Knn gist support for current behavior is broken by design and
> can't be fixed.  Most we can do to fix existing ~> operator
> behavior as is to drop knn gist support.  But then ~> operator
> would be left useless.
> 3) It doesn't seems that ~> operator have many users yet,
> because an error wasn't reported during whole release cycle.
> 
> I hope these reasons justify altering behavior of existing
> operator as an exception to the rules.  Another question is
> whether we should backpatch it.  But I think we could leave this
> decision to committer.
> 
>     I think that this patch is ready for committer.
> 
> I'm agree with changing behavior of existing ~> operator, but is any
> objection here? Current implementation is not fixable and I hope
> that users which use this operator will understand why we change it.
> Fortunately, the fix doesn't require changes in system catalog.
> 
> The single question here is about index over expression with this
> operator, they will need to reindex, which should be noted in
> release notes.
> 
> 
> Yes.  I bet only few users have built indexes over ~> operator if any. 
> Ask them to reindex in the release notes seems OK for me.
> 

Is there a good way to detect such cases? Either in pg_upgrade, so that
we can print warnings, or at least manually (which would be suitable for
release notes).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: plpgsql test layout

2017-12-12 Thread Tom Lane

Peter Eisentraut  writes:
> On 12/11/17 19:29, Michael Paquier wrote:
>> If I read vcregress.pl correctly, it seems to me that you need to do
>> more with MSVC (see plcheck). The tests would kick if sql/ and
>> expected/ are found, and the test list is fetched by looking at
>> REGRESSION in the test files. However plpgsql code has an additional
>> src/ folder which would cause the tests to not execute. If plpgsql
>> code was moved on folder down then the tests would execute properly.

> OK, I hacked something up for MSVC.  How about this?

Looks ok to me, though I'm not in a position to actually test the
msvc changes.

regards, tom lane

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Tom Lane

Andres Freund  writes:
>> Robert Haas  writes:
>>> Long story short, I don't think anyone cares about this enough to
>>> spend effort fixing it.  I suspect the money data type has very few
>>> users.

> I'm unfortunately not so sure :(.

There seem to be enough of them that we get pushback anytime somebody
suggests removing the type.

> The background is that I was working on committing the faster (&
> correct) overflow checks, and wanted to compile postgres with -ftrapv.
> Some rudimentary cash (as well as pgbench, rel/abstime) fixes were
> required to make the tests succeed...

Really?  We've got test cases that intentionally exercise overflow
in the money code?  I think we could just drop such tests, until
such time as someone fixes the issue.

(OTOH, I bet we could drop reltime/abstime without too many complaints.
Y2038 is coming.)

regards, tom lane

Re: money type's overflow handling is woefully incomplete

2017-12-12 Thread Andres Freund

On 2017-12-12 16:47:17 -0500, Tom Lane wrote:
> Really?  We've got test cases that intentionally exercise overflow
> in the money code?  I think we could just drop such tests, until
> such time as someone fixes the issue.

Some parts at least (input & output), I think it's easy enough to fix
those up.

> (OTOH, I bet we could drop reltime/abstime without too many complaints.
> Y2038 is coming.)

I'm actually about to send a patch doing so, that code is one mess WRT
overflow handling.

Greetings,

Andres Freund

Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com

2017-12-12 Thread Robert Haas

On Sat, Dec 9, 2017 at 5:30 AM, Amit Kapila  wrote:
> On Sat, Dec 9, 2017 at 1:24 AM, Robert Haas  wrote:
>> On Fri, Dec 8, 2017 at 5:11 AM, Amit Kapila  wrote:
 Okay, I have adjusted the patch accordingly.  I have also added a
 regression test which should produce the same result across different
 runs, see if that looks okay to you, then it is better to add such a
 test as well.
>>>
>>> The regression test added by patch needs cleanup at the end which I
>>> have added in the attached patch.
>>
>> Hmm.  If we're going this way, then shouldn't we revert the changes
>> commit 2c09a5c12a66087218c7f8cba269cd3de51b9b82 made to
>> ExecParallelRetrieveInstrumentation?
>>
>
> Yeah, it is better to revert it as ideally that is not required after
> this patch and that is what I have tried to convey above ("Ideally, it
> would have obviated the need for my previous patch which
> got committed as 778e78ae." (The commit id is for branch 10,
> otherwise, it is same as what you mention.)).  I have locally reverted
> that patch and then rebased it on top of that.

Uh, should I just revert that commit entirely first, and then we can
commit the new fix afterward?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal: alternative psql commands quit and exit

2017-12-12 Thread Chapman Flack

On 12/12/2017 03:39 PM, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 10:02 AM, Geoff Winkless  
>> Ctrl-D works almost everywhere. It takes 2 seconds...
> 
> If it really took everyone 2 seconds, this question would not be as
> common as it is. ...

Some years ago, I took a job at a mostly-Unix shop, where the
default .profile supplied with every new account contained

  stty eof ^Z

... apparently, a sysadmin had liked the DOS convention better than
the Unix one, and then made it the default for everybody. It often
took people more than 2 seconds to learn to use ^Z where they
expected ^D to work (not to mention to learn why they couldn't
suspend).

I guess I only mention that to reinforce the point that some
environments have customizations you wouldn't expect, and assumptions
about how obvious something is can be thrown off by a variety of things.

-Chap

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Tomas Vondra



On 12/12/2017 10:33 PM, Robert Haas wrote:
> On Mon, Dec 11, 2017 at 2:53 PM, Tomas Vondra
>  wrote:
>> But let me play the devil's advocate for a while and question the
>> usefulness of this approach to compression. Some of the questions were
>> mentioned in the thread before, but I don't think they got the attention
>> they deserve.
> 
> Sure, thanks for chiming in.  I think it is good to make sure we are
> discussing this stuff.
> 
>> But perhaps we should simply make it an initdb option (in which case the
>> whole cluster would simply use e.g. lz4 instead of pglz)?
>>
>> That seems like a much simpler approach - it would only require some
>> ./configure options to add --with-lz4 (and other compression libraries),
>> an initdb option to pick compression algorithm, and probably noting the
>> choice in cluster controldata.
>>
>> No dependencies tracking, no ALTER TABLE issues, etc.
>>
>> Of course, it would not allow using different compression algorithms for
>> different columns (although it might perhaps allow different compression
>> level, to some extent).
>>
>> Conclusion: If we want to offer a simple cluster-wide pglz alternative,
>> perhaps this patch is not the right way to do that.
> 
> I actually disagree with your conclusion here.   I mean, if you do it
> that way, then it has the same problem as checksums: changing
> compression algorithms requires a full dump-and-reload of the
> database, which makes it more or less a non-starter for large
> databases.  On the other hand, with the infrastructure provided by
> this patch, we can have a default_compression_method GUC that will be
> set to 'pglz' initially.  If the user changes it to 'lz4', or we ship
> a new release where the new default is 'lz4', then new tables created
> will use that new setting, but the existing stuff keeps working.  If
> you want to upgrade your existing tables to use lz4 rather than pglz,
> you can change the compression option for those columns to COMPRESS
> lz4 PRESERVE pglz if you want to do it incrementally or just COMPRESS
> lz4 to force a rewrite of an individual table.  That's really
> powerful, and I think users will like it a lot.
> 
> In short, your approach, while perhaps a little simpler to code, seems
> like it is fraught with operational problems which this design avoids.
> 

I agree the checksum-like limitations are annoying and make it
impossible to change the compression algorithm after the cluster is
initialized (although I recall a discussion about addressing that).

So yeah, if such flexibility is considered valuable/important, then the
patch is a better solution.

>> Custom datatype-aware compression (e.g. the tsvector)
>> --
>>
>> Exploiting knowledge of the internal data type structure is a promising
>> way to improve compression ratio and/or performance.
>>
>> The obvious question of course is why shouldn't this be done by the data
>> type code directly, which would also allow additional benefits like
>> operating directly on the compressed values.
>>
>> Another thing is that if the datatype representation changes in some
>> way, the compression method has to change too. So it's tightly coupled
>> to the datatype anyway.
>>
>> This does not really require any new infrastructure, all the pieces are
>> already there.
>>
>> In some cases that may not be quite possible - the datatype may not be
>> flexible enough to support alternative (compressed) representation, e.g.
>> because there are no bits available for "compressed" flag, etc.
>>
>> Conclusion: IMHO if we want to exploit the knowledge of the data type
>> internal structure, perhaps doing that in the datatype code directly
>> would be a better choice.
> 
> I definitely think there's a place for compression built right into
> the data type.  I'm still happy about commit
> 145343534c153d1e6c3cff1fa1855787684d9a38 -- although really, more
> needs to be done there.  But that type of improvement and what is
> proposed here are basically orthogonal.  Having either one is good;
> having both is better.
> 

Why orthogonal?

For example, why couldn't (or shouldn't) the tsvector compression be
done by tsvector code itself? Why should we be doing that at the varlena
level (so that the tsvector code does not even know about it)?

For example we could make the datatype EXTERNAL and do the compression
on our own, using a custom algorithm. Of course, that would require
datatype-specific implementation, but tsvector_compress does that too.

It seems to me the main reason is that tsvector actually does not allow
us to do that, as there's no good way to distinguish the different
internal format (e.g. by storing a flag or format version in some sort
of header, etc.).

> I think there may also be a place for declaring that a particular data
> type has a "privileged" type of TOAST compression; if you use that
> kind of compression for that data type, the data type will do smart
> things, and if no

Re: [HACKERS] Custom compression methods

2017-12-12 Thread Chapman Flack

On 12/12/2017 04:33 PM, Robert Haas wrote:

> you want to upgrade your existing tables to use lz4 rather than pglz,
> you can change the compression option for those columns to COMPRESS
> lz4 PRESERVE pglz if you want to do it incrementally or just COMPRESS

This is a thread I've only been following peripherally, so forgive
a question that's probably covered somewhere upthread: how will this
be done? Surely not with compression-type bits in each tuple? By
remembering a txid where the compression was changed, and the former
algorithm for older txids?

-Chap

PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread David Steele

Including unlogged relations in base backups takes up space and is 
wasteful since they are truncated during backup recovery.


The attached patches exclude unlogged relations from base backups except 
for the init fork, which is required to recreate the main fork during 
recovery.


* exclude-unlogged-v1-01.patch

Some refactoring of reinit.c was required to reduce code duplication but 
the coverage report showed that most of the interesting parts of 
reinit.c were not being tested.  This patch adds coverage for reinit.c.


* exclude-unlogged-v1-02.patch

Refactor reinit.c to allow other modules to identify and work with 
unlogged relation forks.


* exclude-unlogged-v1-03.patch

Exclude unlogged relation forks (except init) from pg_basebackup to save 
space (and time).


I decided not to try and document unlogged exclusions in the continuous 
backup documentation yet (they are noted in the protocol docs).  I would 
like to get some input on whether the community thinks this is a good 
idea.  It's a non-trivial procedure that would be easy to misunderstand 
and does not affect the quality of the backup other than using less 
space. Thoughts?


I'll add these patches to the next CF.

--
-David
da...@pgmasters.net
diff --git a/src/test/recovery/t/014_unlogged_reinit.pl 
b/src/test/recovery/t/014_unlogged_reinit.pl
new file mode 100644
index 00..35feba69a0
--- /dev/null
+++ b/src/test/recovery/t/014_unlogged_reinit.pl
@@ -0,0 +1,117 @@
+# Tests that unlogged tables are properly reinitialized after a crash.
+#
+# The behavior should be the same when restoring from a backup but that is not
+# tested here (yet).
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 16;
+
+# Initialize node without replication settings
+my $node = get_new_node('main');
+
+$node->init;
+$node->start;
+my $pgdata = $node->data_dir;
+
+# Create an unlogged table to test that forks other than init are not copied
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+   q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+my $tablespaceDir = undef;
+my $ts1UnloggedPath = undef;
+
+SKIP:
+{
+   skip "symlinks not supported on Windows", 2 if ($windows_os);
+
+# Create unlogged tables in a tablespace
+$tablespaceDir = TestLib::tempdir . "/ts1";
+
+mkdir($tablespaceDir)
+or die "unable to mkdir \"$tablespaceDir\"";
+
+$node->safe_psql('postgres',
+"CREATE TABLESPACE ts1 LOCATION '$tablespaceDir'");
+$node->safe_psql('postgres',
+'CREATE UNLOGGED TABLE ts1_unlogged (id int) TABLESPACE ts1');
+
+$ts1UnloggedPath = $node->safe_psql('postgres',
+   q{select pg_relation_filepath('ts1_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace');
+ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace');
+}
+
+# Crash the postmaster
+$node->stop('immediate');
+
+# Write forks to test that they are removed during recovery
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_vm"],
+   'touch vm fork in base');
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_fsm"],
+   'touch fsm fork in base');
+
+# Remove main fork to test that it is recopied from init
+unlink("$pgdata/${baseUnloggedPath}")
+or die "unable to remove \"${baseUnloggedPath}\"";
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+   skip "symlinks not supported on Windows", 2 if ($windows_os);
+
+# Write forks to test that they are removed by recovery
+$node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_vm"],
+   'touch vm fork in tablespace');
+$node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_fsm"],
+   'touch fsm fork in tablespace');
+
+# Remove main fork to test that it is recopied from init
+unlink("$pgdata/${ts1UnloggedPath}")
+or die "unable to remove \"${ts1UnloggedPath}\"";
+}
+
+# Start the postmaster
+$node->start;
+
+# Check unlogged table in base
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_vm", 'vm fork not in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_fsm", 'fsm fork not in base');
+
+# Drop unlogged table
+$node->safe_psql('postgres', 'DROP TABLE base_unlogged');
+
+# The following tests test symlinks. Windows doesn't have symlinks, so
+# skip on Windows.
+SKIP:
+{
+   skip "symlinks not supported on Windows", 4 if ($windows_os);
+
+# Check unlogged table in tablespace
+ok(-f "$pgdata/${ts1UnloggedPath}_ini

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread Andres Freund

Hi,

On 2017-12-12 17:49:54 -0500, David Steele wrote:
> Including unlogged relations in base backups takes up space and is wasteful
> since they are truncated during backup recovery.
> 
> The attached patches exclude unlogged relations from base backups except for
> the init fork, which is required to recreate the main fork during recovery.

How do you reliably identify unlogged relations while writes are going
on? Without locks that sounds, uh, nontrivial?


> I decided not to try and document unlogged exclusions in the continuous
> backup documentation yet (they are noted in the protocol docs).  I would
> like to get some input on whether the community thinks this is a good idea.
> It's a non-trivial procedure that would be easy to misunderstand and does
> not affect the quality of the backup other than using less space. Thoughts?

Think it's a good idea, I've serious concerns about practicability of a
correct implementation though.

- Andres

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread David Steele


Hi Andres,

On 12/12/17 5:52 PM, Andres Freund wrote:

On 2017-12-12 17:49:54 -0500, David Steele wrote:

Including unlogged relations in base backups takes up space and is wasteful
since they are truncated during backup recovery.

The attached patches exclude unlogged relations from base backups except for
the init fork, which is required to recreate the main fork during recovery.


How do you reliably identify unlogged relations while writes are going
on? Without locks that sounds, uh, nontrivial?


I don't think this is an issue.  If the init fork exists it should be OK 
if it is torn since it will be recreated from WAL.


If the forks are written out of order (i.e. main before init), which is 
definitely possible, then I think worst case is some files will be 
backed up that don't need to be.  The main fork is unlikely to be very 
large at that point so it doesn't seem like a big deal.


I don't see this as any different than what happens during recovery. 
The unlogged forks are cleaned / re-inited before replay starts which is 
the same thing we are doing here.



I decided not to try and document unlogged exclusions in the continuous
backup documentation yet (they are noted in the protocol docs).  I would
like to get some input on whether the community thinks this is a good idea.
It's a non-trivial procedure that would be easy to misunderstand and does
not affect the quality of the backup other than using less space. Thoughts?


Think it's a good idea, I've serious concerns about practicability of a
correct implementation though.


Well, I would be happy if you had a look!

Thanks.
--
-David
da...@pgmasters.net

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread Andres Freund

Hi,

On 2017-12-12 18:04:44 -0500, David Steele wrote:
> On 12/12/17 5:52 PM, Andres Freund wrote:
> > On 2017-12-12 17:49:54 -0500, David Steele wrote:
> > > Including unlogged relations in base backups takes up space and is 
> > > wasteful
> > > since they are truncated during backup recovery.
> > > 
> > > The attached patches exclude unlogged relations from base backups except 
> > > for
> > > the init fork, which is required to recreate the main fork during 
> > > recovery.
> > 
> > How do you reliably identify unlogged relations while writes are going
> > on? Without locks that sounds, uh, nontrivial?
> 
> I don't think this is an issue.  If the init fork exists it should be OK if
> it is torn since it will be recreated from WAL.

I'm not worried about torn pages.


> If the forks are written out of order (i.e. main before init), which is
> definitely possible, then I think worst case is some files will be backed up
> that don't need to be.  The main fork is unlikely to be very large at that
> point so it doesn't seem like a big deal.
> 
> I don't see this as any different than what happens during recovery. The
> unlogged forks are cleaned / re-inited before replay starts which is the
> same thing we are doing here.

It's quite different - in the recovery case there's no other write
activity going on. But on a normally running cluster the persistence of
existing tables can get changed, and oids can get recycled.  What
guarantees that between the time you checked for the init fork the table
hasn't been dropped, the oid reused and now a permanent relation is in
its place?

Greetings,

Andres Freund

Re: Learned Indexes in PostgreSQL?

2017-12-12 Thread Stefan Keller

2017-12-12 21:26 GMT+01:00 Robert Haas :
> Here are links to the other two threads:
>
> http://postgr.es/m/CANNMO+J1KeTSx5q5SYuwHf1v-gPRLrOZw1s7qOpqWx=3umm...@mail.gmail.com
> http://postgr.es/m/caaerrx8bhiw3rgaqpolqjhyhk7ghordtke5uc9dasv1w5fz...@mail.gmail.com

Thanks, Robert.
My fault not having searched before this list - in fact I did but inaccurately.

:Stefan

2017-12-12 21:26 GMT+01:00 Robert Haas :
> On Tue, Dec 12, 2017 at 6:38 AM, Stefan Keller  wrote:
>> This is an awesome paper about a new index called "Learned Index".
>> it's aka dense hash structure derived ("learned") from actual data.
>> Looks very promising for certain properties [*].
>>
>> Anyone already working on this in Postgres?
>> How could this be implemented in Postgres?
>
> You're the third person to start a thread on this topic in the last few days.
>
> Here are links to the other two threads:
>
> http://postgr.es/m/CANNMO+J1KeTSx5q5SYuwHf1v-gPRLrOZw1s7qOpqWx=3umm...@mail.gmail.com
> http://postgr.es/m/caaerrx8bhiw3rgaqpolqjhyhk7ghordtke5uc9dasv1w5fz...@mail.gmail.com
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread Michael Paquier

On Wed, Dec 13, 2017 at 8:04 AM, David Steele  wrote:
> On 12/12/17 5:52 PM, Andres Freund wrote:
>> On 2017-12-12 17:49:54 -0500, David Steele wrote:
>>>
>>> Including unlogged relations in base backups takes up space and is
>>> wasteful
>>> since they are truncated during backup recovery.
>>>
>>> The attached patches exclude unlogged relations from base backups except
>>> for
>>> the init fork, which is required to recreate the main fork during
>>> recovery.
>>
>>
>> How do you reliably identify unlogged relations while writes are going
>> on? Without locks that sounds, uh, nontrivial?
>
>
> I don't think this is an issue.  If the init fork exists it should be OK if
> it is torn since it will be recreated from WAL.

Yeah, I was just typing that until I saw your message.

> If the forks are written out of order (i.e. main before init), which is
> definitely possible, then I think worst case is some files will be backed up
> that don't need to be.  The main fork is unlikely to be very large at that
> point so it doesn't seem like a big deal.

As far as I recall the init forks are logged before the main forks. I
don't think that we should rely on that assumption though to be always
satisfied.

>>> I decided not to try and document unlogged exclusions in the continuous
>>> backup documentation yet (they are noted in the protocol docs).  I would
>>> like to get some input on whether the community thinks this is a good
>>> idea.
>>> It's a non-trivial procedure that would be easy to misunderstand and does
>>> not affect the quality of the backup other than using less space.
>>> Thoughts?
>>
>>
>> Think it's a good idea, I've serious concerns about practicability of a
>> correct implementation though.
>
> Well, I would be happy if you had a look!

You can count me in. I think that this patch has value for some
dedicated workloads. It is a waste to backup stuff that will be
removed at recovery anyway.
-- 
Michael

Re: PATCH: Exclude unlogged tables from base backups

2017-12-12 Thread David Steele


On 12/12/17 6:07 PM, Andres Freund wrote:


I don't see this as any different than what happens during recovery. The
unlogged forks are cleaned / re-inited before replay starts which is the
same thing we are doing here.


It's quite different - in the recovery case there's no other write
activity going on. But on a normally running cluster the persistence of
existing tables can get changed, and oids can get recycled.  What
guarantees that between the time you checked for the init fork the table
hasn't been dropped, the oid reused and now a permanent relation is in
its place?


Well, that's a good point!

How about rechecking the presence of the init fork after a main/other 
fork has been found?  Is it possible for an init fork to still be lying 
around after an oid has been recycled? Seems like it could be...


--
-David
da...@pgmasters.net

Re: Leftover reference to replacement selection 1 run case

2017-12-12 Thread Robert Haas

On Fri, Dec 8, 2017 at 6:41 PM, Peter Geoghegan  wrote:
> While reviewing the parallel CREATE INDEX patch, I noticed that commit
> 8b304b8b omitted to remove a comment that it made obsolete.
>
> Attached patch removes the comment.

I had to think about this for a while.  I could see that the comment
is now wrong, but initially I wasn't sure why we didn't care about the
optimization any more.  Eventually I had a thought: it's now
impossible to end up with only one tape, because each tape now always
contains the output of one quicksort operation, and if we only did one
quicksort operation, it would have been an in-memory sort and there
would be no tapes anyway.  Once we create a first tape, we've overrun
work_mem, so there will always be a second one.

Does that sound right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

1 2 >

1 - 100 of 149 matches

Mail list logo