date:20190209

Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-09 Thread Robert Haas

On Fri, Feb 8, 2019 at 11:59 PM Tom Lane  wrote:
> To the extent that this works at all, OIDs in the 9000 range ought
> to be enough of a flag already, I think.

A "flag" that isn't documented anywhere outside of a mailing list
discussion and that isn't checked by any code anywhere is not much of
a flag, IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: dsa_allocate() faliure

2019-02-09 Thread Robert Haas

On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
 wrote:
> Sometimes FreeManagerPutInternal() returns a
> number-of-contiguous-pages-created-by-this-insertion that is too large
> by one.  If this happens to be a new max-number-of-contiguous-pages,
> it causes trouble some arbitrary time later because the max is wrong
> and this FPM cannot satisfy a request that large, and it may not be
> recomputed for some time because the incorrect value prevents
> recomputation.  Not sure yet if this is due to the lazy computation
> logic or a plain old fence-post error in the btree consolidation code
> or something else.

I spent a long time thinking about this and starting at code this
afternoon, but I didn't really come up with much of anything useful.
It seems like a strange failure mode, because
FreePageManagerPutInternal() normally just  returns its third argument
unmodified. The only cases where anything else happens are the ones
where we're able to consolidate the returned span with a preceding or
following span, and I'm scratching my head as to how that logic could
be wrong, especially since it also has some Assert() statements that
seem like they would detect the kinds of inconsistencies that would
lead to trouble.  For example, if we somehow ended up with two spans
that (improperly) overlapped, we'd trip an Assert().  And if that
didn't happen -- because we're not in an Assert-enabled build -- the
code is written so that it only relies on the npages value of the last
of the consolidated scans, so an error in the npages value of one of
the earlier spans would just get fixed up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: fast defaults in heap_getattr vs heap_deform_tuple

2019-02-09 Thread Andres Freund

Hi,

On 2019-02-05 22:44:38 -0300, Alvaro Herrera wrote:
> On 2019-Feb-05, Andres Freund wrote:
> 
> > @@ -82,7 +80,7 @@ static Datum getmissingattr(TupleDesc tupleDesc, int 
> > attnum, bool *isnull);
> >  /*
> >   * Return the missing value of an attribute, or NULL if there isn't one.
> >   */
> > -static Datum
> > +Datum
> >  getmissingattr(TupleDesc tupleDesc,
> >int attnum, bool *isnull)
> 
> This is a terrible name for an exported function -- let's change it
> before it gets exported.  Heck, even heap_getmissingattr() would be
> better.

I don't really aggree. Note that the relevant datastructure is named
AttrMissing, and that the function isn't relevant for heap tuple. So I
don't really see what we'd otherwise name it?  I think if we wanted to
improve this we should start with AttrMissing and not this function.

> I notice that with this patch, heap_getattr() obtains a new Assert()
> that the attr being fetched is no further than tupledesc->natts.
> It previously just returned null for that case.  Maybe we should change
> it so that it returns null if an attr beyond end-of-array is fetched?
> (I think in non-assert builds, it would dereference past the AttrMissing
> array.)

Hm, it seems like there's plenty issues with accessing datums after the
end of the desc before this change, as well as after this change. Note
that slot_getattr() (in 11, it looks a bit different in master) already
calls getmissingattr(). And that's much more commonly used. Therefore
I'm disinclined to see this as a problem?

Greetings,

Andres Freund

Re: fast defaults in heap_getattr vs heap_deform_tuple

2019-02-09 Thread Andres Freund

Hi,

On 2019-02-06 03:39:19 -0800, Andres Freund wrote:
> On 2019-02-06 10:25:56 +, Andrew Gierth wrote:
> > > "Andres" == Andres Freund  writes:
> >
> >  >> Cool. Here's the patch that I, after some commit message polishing,
> >  >> plan to commit to 11 and master to fix the issue at hand.
> >
> >  Andres> And pushed.
> >
> > Sorry I didn't spot this earlier, but... in the backpatch to v11, is the
> > expansion of the trigger tuple really unnecessary now? Since
> > heap_getattr is a macro, C-language triggers that aren't recompiled
> > won't see the default? So perhaps the expansion should be left in?
> 
> There was some IRC conversation about this:
> 
> RhodiumToad | andres: ping?
> RhodiumToad | n/m, sent mail
>  andres | RhodiumToad: IDK, there's a lot of other heap_getattr calls, 
> and most of them don't go
> | through GetTupleForTrigger(). So extensions need to be 
> recompiled either way.
>  andres | (I'll write that in a bit on list, in meeting now)
> RhodiumToad | right, but those other calls were already broken.
> RhodiumToad | whereas in a trigger, they'd have worked previously, but then 
> break with your fix
>Myon | the list of packages in Debian where heap_getattr appears is 
> postgresql-11, pglogical,
> | pgextwlist, citus, postgresql-plsh, wal2json, pg-cron, 
> postgresql-rum
> RhodiumToad | it's not an encouraged interface
> 
> I'm mildly disinclined to re-add the heap_expand_tuple, because it's not
> the only case, and the extensions named above don't seem like they'd
> specifically affected by the trigger path, but I'm not sure.

I put the expansion back for 11. Seems harmless enough.  I'm planning to
send a reply to the minor release info to -packagers informing people
that extensions compiled after the minor release might not be compatible
with older servers (because they depend on getmissingattr() which
previously wasn't an exposed function)

Greetings,

Andres Freund

Re: XLogInsert() of dangling pointer while logging replica identity

2019-02-09 Thread Andres Freund

On 2019-02-01 10:38:49 +0900, Michael Paquier wrote:
> On Thu, Jan 31, 2019 at 11:51:36PM +0300, Stas Kelvich wrote:
> >   It seems that heapam.c:3082 calls XLogRegisterData() with an argument
> > allocated on stack, but following call to XLogInsert() happens after
> > end of context for that variable.
> >   Issue spotted by clang's AddressSanitizer. Fix attached.
> 
> Oh, good catch.  Committed and back-patched down to 9.4.

Thanks Stas and Michael!

Re: Documentation and code don't agree about partitioned table UPDATEs

2019-02-09 Thread Andres Freund

On 2019-02-07 09:16:09 +0530, Amit Kapila wrote:
> On Wed, Feb 6, 2019 at 4:57 PM David Rowley
>  wrote:
> >
> > On Wed, 6 Feb 2019 at 16:20, Amit Kapila  wrote:
> > > I agree that the docs need to be updated and this patch should be
> > > backpatched as well.  However, I think the older wording was more
> > > descriptive and clear, so I have modified your patch a bit to retain
> > > part of old wording, see the result as attached.
> >
> > I have to admit, I was quite fond of the original text, at least when
> > it was true.  Your alteration of it seems pretty good to me too.
> >
> 
> Thanks, pushed!

Thanks David and Amit!

Re: First-draft release notes for next week's releases

2019-02-09 Thread Alexander Kuzmenkov


El 9/2/19 a las 4:19, Tom Lane escribió:

Please send comments/corrections by Sunday.


+  tuple deletion WAL record (Stas Kelvish)

-- a typo in his surname, should be Kelvich.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-02-09 Thread Andrey Borodin

Hi! Thanks for the next version!

> 8 февр. 2019 г., в 18:30, Alexey Kondratov  
> написал(а):
> 
> On 21.01.2019 23:50, a.kondra...@postgrespro.ru wrote:
>> Thank you for the review! I have updated the patch according to your
>> comments and remarks. Please, find new version attached.
> 
> During the self-reviewing of the code and tests, I discovered some problems 
> with build on Windows. New version of the patch is attached and it fixes this 
> issue as well as includes some minor code revisions.

I've made one more pass through code and found no important problems.

The patch moves code including these lines
* XXX this is an unmaintainable crock, because we have to know how to 
set
* (or at least what to call to set) every variable that could 
potentially
* have PGC_S_DYNAMIC_DEFAULT or PGC_S_ENV_VAR source. However, there's 
no
* time to redesign it for 9.1.
But I think it's not the point of this patch to refactor that code.

Here's a typo in postgreslq.conf
+   fprintf(stderr, _("%s: option -r/--use-postgresql-conf 
is specified, but postgreslq.conf is absent in the target directory\n"),

I'm still not sure guc-file refactoring you made is architecturally correct, I 
do not feel that my expertise is enough to judge, but everything works.

Besides this, I think you can switch patch to "Ready for committer".

check-world is passing on macbook, docs are here, feature is implemented and 
tested.

Best regards, Andrey Borodin.

Re: WIP: Avoid creation of the free space map for small tables

2019-02-09 Thread Amit Kapila

On Tue, Feb 5, 2019 at 3:25 PM John Naylor  wrote:
>
> On Tue, Feb 5, 2019 at 4:04 AM Amit Kapila  wrote:
> >
> > On Mon, Feb 4, 2019 at 2:27 PM John Naylor  
> > wrote:
> > >
> > > 1. Earlier, I had a test to ensure that free space towards the front
> > > of the relation was visible with no FSM. In [1], I rewrote it without
> > > using vacuum, so we can consider adding it back now if desired. I can
> > > prepare a patch for this.
> > >
> >
> > Yes, this is required.  It is generally a good practise to add test (unless 
> > it takes a lot of time) which covers new code/functionality.
> >
> > > 2. As a follow-on, since we don't rely on vacuum to remove dead rows,
> > > we could try putting the fsm.sql test in some existing group in the
> > > parallel schedule, rather than its own group is it is now.
> > >
> >
> > +1.
>
> This is done in 0001.
>

This is certainly a good test w.r.t code coverage of new code, but I
have few comments:
1. The size of records in test still depends on alignment (MAXALIGN).
Though it doesn't seem to be a problematic case, I still suggest we
can avoid using records whose size depends on alignment.  If you
change the schema as CREATE TABLE fsm_check_size (num1 int, num2 int,
str text);, then you can avoid alignment related issues for the
records being used in test.
2.
+-- Fill most of the last block
..
+-- Make sure records can go into any block but the last one
..
+-- Insert large record and make sure it does not cause the relation to extend

The comments in some part of the test seems too focussed towards the
algorithm used for in-memory map.  I think we can keep these if we
want, but it is required to write a more generic comment stating what
is the actual motive of additional tests (basically we are testing the
functionality of in-memory map (LSM) for the heap, so we should write
about it.).


> > > 3. While looking at 0d1fe9f74e, it occurred to me that I ignored this
> > > patch's effects on GetRecordedFreeSpace(), which will return zero for
> > > tables with no FSM.
> > >
> >
> > Right, but what exactly we want to do for it?  Do you want to add a comment 
> > atop of this function?
>
> Hmm, the comment already says "according to the FSM", so maybe it's
> already obvious. I was thinking more about maybe commenting the
> callsite where it's helpful, as in 0002.
>
> > > The other callers are in:
> > > contrib/pg_freespacemap/pg_freespacemap.c
> > > contrib/pgstattuple/pgstatapprox.c
> > >
> > > For pg_freespacemap, this doesn't matter, since it's just reporting
> > > the facts. For pgstattuple_approx(), it might under-estimate the free
> > > space and over-estimate the number of live tuples.
> > >
> >
> > Sure, but without patch also, it can do so, if the vacuum hasn't updated 
> > freespace map.
>
> Okay, then maybe we don't need to do anything else here.
>

Shall we add a note to the docs of pg_freespacemap and
pgstattuple_approx indicating that for small relations, FSM won't be
created, so these functions won't give appropriate value?  Or other
possibility could be that we return an error if the block number is
less than the threshold value, but not sure if that is a good
alternative as that can happen today also if the vacuum hasn't run on
the table.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Make drop database safer

2019-02-09 Thread Alexandra Wang

Current sequence of operations for drop database (dropdb())
1. Start Transaction
2. Make catalog changes
3. Drop database buffers
4. Forget database fsync requests
5. Checkpoint
6. Delete database directory
7. Commit Transaction

Problem
This sequence is unsafe from couple of fronts. Like if drop database,
aborts (means system can crash/shutdown can happen) right after buffers are
dropped step 3 or step 4. The database will still exist and fully
accessible but will loose the data from the dirty buffers. This seems very
bad.

Operation can abort after step 5 as well in which can the entries remain in
catalog but the database is not accessible. Which is bad as well but not as
severe as above case mentioned, where it exists but some stuff goes
magically missing.

Repo:
```
CREATE DATABASE test;
\c test
CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a int);
\c postgres
DROP DATABASE test; <<== kill the session after DropDatabaseBuffers()
(make sure to issue checkpoint before killing the session)
```

Proposed ways to fix
1. CommitTransactionCommand() right after step 2. This makes it fully safe
as the catalog will have the database dropped. Files may still exist on
disk in some cases which is okay. This also makes it consistent with the
approach used in movedb().

2. Alternative way to make it safer is perform Checkpoint (step 5) just
before dropping database buffers, to avoid the unsafe nature. Caveats of
this solution is:
- Performs IO for data which in success case anyways will get deleted
- Still doesn't cover the case where catalog has the database entry but
files are removed from disk

3. One more fancier approach is to use pending delete mechanism used by
relation drops, to perform these non-catalog related activities at commit.
Easily, the pending delete structure can be added boolean to convey
database directory dropping instead of file. Given drop database can't be
performed inside transaction, not needed to be done this way, but this
makes it one consistent approach used to deal with on-disk removal.

We passing along patch with fix 1, as seems most promising to us. But we
would love to hear thoughts on other solutions mentioned.
===
diff --git a/src/backend/commands/dbcommands.c
b/src/backend/commands/dbcommands.c
index d207cd899f..af1b1e0896 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -917,6 +917,9 @@ dropdb(const char *dbname, bool missing_ok)
 */
dropDatabaseDependencies(db_id);

+   CommitTransactionCommand();
+   StartTransactionCommand();
+
/*
 * Drop db-specific replication slots.
 */
===

Thanks,
Ashwin and Alex

Re: [HACKERS] Block level parallel vacuum

2019-02-09 Thread Masahiko Sawada

On Tue, Feb 5, 2019 at 12:14 PM Haribabu Kommi  wrote:
>
>
> On Fri, Feb 1, 2019 at 8:19 AM Masahiko Sawada  wrote:
>>
>>
>> The passing stats = NULL to amvacuumcleanup and ambulkdelete means the
>> first time execution. For example, btvacuumcleanup skips cleanup if
>> it's not NULL.In the normal vacuum we pass NULL to ambulkdelete or
>> amvacuumcleanup when the first time calling. And they store the result
>> stats to the memory allocated int the local memory. Therefore in the
>> parallel vacuum I think that both worker and leader need to move it to
>> the shared memory and mark it as updated as different worker could
>> vacuum different indexes at the next time.
>
>
> OK, understood the point. But for btbulkdelete whenever the stats are NULL,
> it allocates the memory. So I don't see a problem with it.
>
> The only problem is with btvacuumcleanup, when there are no dead tuples
> present in the table, the btbulkdelete is not called and directly the 
> btvacuumcleanup
> is called at the end of vacuum, in that scenario, there is code flow 
> difference
> based on the stats. so why can't we use the deadtuples number to differentiate
> instead of adding another flag?

I don't understand your suggestion. What do we compare deadtuples
number to? Could you elaborate on that please?

> And also this scenario is not very often, so avoiding
> memcpy for normal operations would be better. It may be a small gain, just
> thought of it.
>

This scenario could happen periodically on an insert-only table.
Additional memcpy is executed once per indexes in a vacuuming but I
agree that the avoiding memcpy would be good.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Make drop database safer

2019-02-09 Thread Andres Freund

Hi,

On 2019-02-08 16:36:13 -0800, Alexandra Wang wrote:
> Current sequence of operations for drop database (dropdb())
> 1. Start Transaction
> 2. Make catalog changes
> 3. Drop database buffers
> 4. Forget database fsync requests
> 5. Checkpoint
> 6. Delete database directory
> 7. Commit Transaction
> 
> Problem
> This sequence is unsafe from couple of fronts. Like if drop database,
> aborts (means system can crash/shutdown can happen) right after buffers are
> dropped step 3 or step 4. The database will still exist and fully
> accessible but will loose the data from the dirty buffers. This seems very
> bad.
> 
> Operation can abort after step 5 as well in which can the entries remain in
> catalog but the database is not accessible. Which is bad as well but not as
> severe as above case mentioned, where it exists but some stuff goes
> magically missing.
> 
> Repo:
> ```
> CREATE DATABASE test;
> \c test
> CREATE TABLE t1(a int); CREATE TABLE t2(a int); CREATE TABLE t3(a int);
> \c postgres
> DROP DATABASE test; <<== kill the session after DropDatabaseBuffers()
> (make sure to issue checkpoint before killing the session)
> ```
> 
> Proposed ways to fix
> 1. CommitTransactionCommand() right after step 2. This makes it fully safe
> as the catalog will have the database dropped. Files may still exist on
> disk in some cases which is okay. This also makes it consistent with the
> approach used in movedb().

To me this seems bad. The current failure mode obviously isn't good, but
the data obviously isn't valuable, and just loosing track of an entire
database worth of data seems worse.


> 2. Alternative way to make it safer is perform Checkpoint (step 5) just
> before dropping database buffers, to avoid the unsafe nature. Caveats of
> this solution is:
> - Performs IO for data which in success case anyways will get deleted
> - Still doesn't cover the case where catalog has the database entry but
> files are removed from disk

That seems like an unacceptable slowdown.


> 3. One more fancier approach is to use pending delete mechanism used by
> relation drops, to perform these non-catalog related activities at commit.
> Easily, the pending delete structure can be added boolean to convey
> database directory dropping instead of file. Given drop database can't be
> performed inside transaction, not needed to be done this way, but this
> makes it one consistent approach used to deal with on-disk removal.

ISTM we'd need to do something like this.

Greetings,

Andres Freund

Re: add_partial_path() may remove dominated path but still in use

2019-02-09 Thread Amit Kapila

On Wed, Feb 6, 2019 at 10:35 AM Kohei KaiGai  wrote:
>
> Hello,
> Let me remind the thread again.
> I'm waiting for the fix getting committed for a month...
>

It seems you would also like to see this back-patched.  I am not sure
if that is a good idea as there is some risk of breaking existing
usage.  Tom, do you have any opinion on this patch?  It seems to me
you were thinking to have a separate hook for partial paths, but the
patch has solved the problem by moving the hook location.  I think
whatever is the case we should try to reach some consensus and move
forward with this patch as KaiGai-San is waiting from quite some time.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: libpq compression

2019-02-09 Thread Konstantin Knizhnik





On 09.02.2019 1:38, Tomas Vondra wrote:

On 2/8/19 11:10 PM, Konstantin Knizhnik wrote:


On 08.02.2019 21:57, Andres Freund wrote:

On 2019-02-08 12:15:58 +0300, Konstantin Knizhnik wrote:

Frankly speaking, I do not think that such flexibility in choosing
compression algorithms is really needed.
I do not expect that there will be many situations where old client
has to
communicate with new server or visa versa.
In most cases both client and server belongs to the same postgres
distributive and so implements the same compression algorithm.
As far as we are compressing only temporary data (traffic), the
problem of
providing backward compatibility seems to be not so important.

I think we should outright reject any patch without compression type
negotiation.

Does it mean that it is necessary to support multiple compression
algorithms and make it possible to perform switch between them at
runtime?

IMHO the negotiation should happen at connection time, i.e. the server
should support connections compressed by different algorithms. Not sure
if that's what you mean by runtime.

AFAICS this is quite close to how negotiation of encryption algorithms
works, in TLS and so on. Client specifies supported algorithms, server
compares that to list of supported algorithms, deduces the encryption
algorithm and notifies the client.

To allow fall-back to uncompressed connection, use "none" as algorithm.
If there's no common algorithm, fail.


It is good analogue with SSL.
Yes, SSL protocol provides several ways of authentication, encryption,...
And there are several different libraries implementing SSL.
But Postgres is using only one of them: OpenSSL.
If I want to use some other library (for example to make it possible to 
serialize and pass SSL session state to other

process), then there is no way to achieve it.

Actually zstd also includes implementations of several compression 
algorithms and it choose one of them best fitting particular data 
stream. As in case of SSL, choice of algorithm is performed internally 
inside zstd - not at libpq level.


Sorry, if my explanation about static and dynamic (at runtime) choice 
were not correct.

This is how compression is toggled now:

#if HAVE_LIBZSTD
ZpqStream*
zpq_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg)
{
...
}
#endif

So if Postgres was configured with zstd, then this implementation is 
included inclient and server Postgres libraries.

If postgres is configures with zlib, them  zlib implementation will be used.
This is similar with using compression and most of other configurable 
features in Postgres.


If we want to provide dynamic choice at runtime, then we need to have 
array with available compression algorithms:


#if HAVE_LIBZSTD
static ZpqStream*
zstd_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg)
{
...
}
#endif

ZpqCompressorImpl compressorImpl[] =
{
#if HAVE_LIBZSTD
{zstd_create, zstd_read,zstd_write,...},
#endif
#if HAVE_ZLIB
{zlib_create, zlib_read,zslib_write,...},
#endif
...
}

And the most interesting case is that if we load library dynamically.
Each implementation is generated in separate library (for  example 
libpztd.so).

In this case we need to somehow specify available libraries.
For example by placing them in separate directory, or specifying list of 
libraries in postgresql.conf.
Then we try to load this library using dlopen.  Such library has 
external dependencies of correspondent compressor library (for example 
-lz). The library can be successfully loaded if there correspond 
compressor implementation was install at the system.
This is most flexible approach allowing to provide custom implementation 
of compressors.
Compression implementation can be organized as Postgres extension and 
its PG_init function registers this implementation in some list.


This is what I am asking about.
Right now approach 1) is implemented: compression algorithm is defined 
by configure.

It is no so difficult to extend it to support multiple algorithms.
And the most flexible but more sophisticated is to load libraries 
dynamically.





Right now compression algorithm is linked statically.
Negotiation of compression type is currently performed but it only
checks that server and client are implementing the same algorithm and
disables compression if it is not true.


I don't think we should automatically fall-back to disabled compression,
when a client specifies compression algorithm.


Compression is disabled only when client and server were configured with 
different compression algorithms (i.e. zstd and zlib).





If we are going to support multiple compression algorithms, do we need
dynamic loading of correspondent compression libraries or static linking
is ok? In case of dynamic linking we need to somehow specify information
about available compression algorithms.
Some special subdirectory for them so that I can traverse this directory
and try to load correspondent libraries?

Only I find it too complicated for the addressed

Re: libpq compression

2019-02-09 Thread Andres Freund

Hi,

On 2019-02-08 23:38:12 +0100, Tomas Vondra wrote:
> On 2/8/19 11:10 PM, Konstantin Knizhnik wrote:
> > Does it mean that it is necessary to support multiple compression
> > algorithms and make it possible to perform switch between them at
> > runtime?
> 
> IMHO the negotiation should happen at connection time, i.e. the server
> should support connections compressed by different algorithms.

Exactly. And support client libraries with support for different
compression algorithms. We have large forward and backward compatibility
with libpq and even moreso with the wire protocol, we therefore
shouldn't require exactly matching client / server versions or
compilation options.


> Not sure if that's what you mean by runtime.

Same.


> AFAICS this is quite close to how negotiation of encryption algorithms
> works, in TLS and so on. Client specifies supported algorithms, server
> compares that to list of supported algorithms, deduces the encryption
> algorithm and notifies the client.
> 
> To allow fall-back to uncompressed connection, use "none" as algorithm.
> If there's no common algorithm, fail.

I'm somewhat inclined to think that not compressing is
preferrable. There's some reason to think that there's some
cryptographical issues around compression, therefore it could make a ton
of sense for specific servers to disable compression centrally.


> > Right now compression algorithm is linked statically.
> > Negotiation of compression type is currently performed but it only
> > checks that server and client are implementing the same algorithm and
> > disables compression if it is not true.

> I don't think we should automatically fall-back to disabled compression,
> when a client specifies compression algorithm.

Why?


> > If we are going to support multiple compression algorithms, do we need
> > dynamic loading of correspondent compression libraries or static linking
> > is ok? In case of dynamic linking we need to somehow specify information
> > about available compression algorithms.
> > Some special subdirectory for them so that I can traverse this directory
> > and try to load correspondent libraries?
> > 
> > Only I find it too complicated for the addressed problem?
> > 
> 
> I don't think we need dynamic algorithms v1, but IMHO it'd be pretty
> simple to do - just add a shared_preload_library which registers it in a
> list in memory.

I personally don't think that's something we need to support. There's a
lot of issues around naming registries that need to synchronized between
client/server.

Greetings,

Andres Freund

Re: propagating replica identity to partitions

2019-02-09 Thread Alvaro Herrera

On 2019-Feb-07, Dmitry Dolgov wrote:

> Could there be a race condition somewhere? The idea and the code looks
> reasonable, but when I'm testing ALTER TABLE ... REPLICA IDENTITY with this
> patch and concurrent partition creation, I've got the following error on 
> ATTACH
> PARTITION:
> 
> ERROR:  42P17: replica index does not exist in partition "test373"
> LOCATION:  MatchReplicaIdentity, tablecmds.c:15018
> 
> This error seems unstable, other time I've got a deadlock. And I don't observe
> this behaviour on the master.

Clearly there is a problem somewhere.  I'll investigate.  Thanks for
testing.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: add_partial_path() may remove dominated path but still in use

2019-02-09 Thread Tom Lane

Amit Kapila  writes:
> It seems you would also like to see this back-patched.  I am not sure
> if that is a good idea as there is some risk of breaking existing
> usage.  Tom, do you have any opinion on this patch?  It seems to me
> you were thinking to have a separate hook for partial paths, but the
> patch has solved the problem by moving the hook location.

I was expecting Haas to take point on this, but since he doesn't seem
to be doing so, I'll push it.  I don't think there's any material
risk of breaking things --- the only functionality lost is the ability to
remove or modify baserel Gather paths, which I doubt anybody is interested
in doing.  Certainly that's way less useful than the ability to add
partial paths and have them be included in Gather-building.

In a green field I'd rather have had a separate hook for adding partial
paths, but it's not clear that that really buys much of anything except
logical cleanliness ... against which it adds cost since the using
extension(s) have to figure out what's going on twice.

Also this way does have the advantage that it retroactively fixes things
for extensions that may be trying to make partial paths today.

regards, tom lane

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Alvaro Herrera

On 2019-Feb-09, Tom Lane wrote:

> Amit Langote  writes:
> > On Sat, Feb 9, 2019 at 9:41 AM Tom Lane  wrote:
> >> +1.  The best solution would presumably be to go through the normal
> >> object deletion mechanism; though possibly there's a reason that
> >> won't work given you're already inside some other DDL.
> 
> > Maybe:
> > - CatalogTupleDelete(trigrel, &trigtup->t_self);
> > + RemoveTriggerById(trgform->oid)?
> 
> No, that's still the back end of the deletion machinery, and in particular
> it would fail to clean pg_depend entries for the trigger.  Going in by the
> front door would use performDeletion().  (See deleteOneObject() to get
> an idea of what's being possibly missed out here.)

This patch I think does the right thing.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit 7031ce2b1f79a2e03b29ac896f0a008b0621d654
Author: Alvaro Herrera 
AuthorDate: Sat Feb 9 12:31:25 2019 -0300
CommitDate: Sat Feb 9 12:53:00 2019 -0300

fix trigger drop

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 910e5deaa3f..923866b7e11 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8024,17 +8024,21 @@ CloneFkReferencing(Relation pg_constraint, Relation parentRel,
 			while ((trigtup = systable_getnext(scan)) != NULL)
 			{
 Form_pg_trigger	trgform = (Form_pg_trigger) GETSTRUCT(trigtup);
+ObjectAddress	trigger;
 
 if (trgform->tgconstrrelid != fk->conrelid)
 	continue;
 if (trgform->tgrelid != fk->confrelid)
 	continue;
 
-deleteDependencyRecordsForClass(TriggerRelationId,
-HeapTupleGetOid(trigtup),
-ConstraintRelationId,
-DEPENDENCY_INTERNAL);
-CatalogTupleDelete(trigrel, &trigtup->t_self);
+deleteDependencyRecordsFor(TriggerRelationId,
+		   HeapTupleGetOid(trigtup),
+		   false);
+/* make dependency deletion visible to performDeletion */
+CommandCounterIncrement();
+ObjectAddressSet(trigger, TriggerRelationId,
+ HeapTupleGetOid(trigtup));
+performDeletion(&trigger, DROP_RESTRICT, 0);
 			}
 
 			systable_endscan(scan);

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Tom Lane

Alvaro Herrera  writes:
> On 2019-Feb-09, Tom Lane wrote:
>> No, that's still the back end of the deletion machinery, and in particular
>> it would fail to clean pg_depend entries for the trigger.  Going in by the
>> front door would use performDeletion().  (See deleteOneObject() to get
>> an idea of what's being possibly missed out here.)

> This patch I think does the right thing.

(squint ...) Don't much like the undocumented deleteDependencyRecordsFor
call; that looks like it's redundant with what deleteOneObject will do.
I think you're doing it to get rid of the INTERNAL dependency so that
deletion won't recurse across that, but why is that a good idea?  Needs
a comment at least.

Also, I suspect you might need a second CCI after the performDeletion
call, in case the loop iterates?

regards, tom lane

Re: libpq compression

2019-02-09 Thread Tomas Vondra

On 2/9/19 3:14 PM, Andres Freund wrote:
> Hi,
> 
> On 2019-02-08 23:38:12 +0100, Tomas Vondra wrote:
>> On 2/8/19 11:10 PM, Konstantin Knizhnik wrote:
>>> Does it mean that it is necessary to support multiple compression
>>> algorithms and make it possible to perform switch between them at
>>> runtime?
>>
>> IMHO the negotiation should happen at connection time, i.e. the server
>> should support connections compressed by different algorithms.
> 
> Exactly. And support client libraries with support for different
> compression algorithms. We have large forward and backward compatibility
> with libpq and even moreso with the wire protocol, we therefore
> shouldn't require exactly matching client / server versions or
> compilation options.
> 

IMHO the main reason to want/need this is not as much backward/forward
compatibility (in the sense of us adding/removing supported algorithms).
A much bigger problem (IMHO) is that different systems may not have some
of the libraries needed, or maybe the packager decided not to enable a
particular library, etc. That's likely far more dynamic.

> 
>> AFAICS this is quite close to how negotiation of encryption algorithms
>> works, in TLS and so on. Client specifies supported algorithms, server
>> compares that to list of supported algorithms, deduces the encryption
>> algorithm and notifies the client.
>>
>> To allow fall-back to uncompressed connection, use "none" as algorithm.
>> If there's no common algorithm, fail.
> 
> I'm somewhat inclined to think that not compressing is
> preferrable. There's some reason to think that there's some
> cryptographical issues around compression, therefore it could make a ton
> of sense for specific servers to disable compression centrally.
> 

I agree compression is not the same as crypto in this case, so fallback
to uncompressed connection might be sensible in some cases. But in other
cases I probably want to be notified ASAP that the compression does not
work, before pushing 10x the amount of data through the network. That's
why I think why I proposed to also have "none" as compression algorithm,
to allow/disallow the fallback.

> 
>>> Right now compression algorithm is linked statically.
>>> Negotiation of compression type is currently performed but it only
>>> checks that server and client are implementing the same algorithm and
>>> disables compression if it is not true.
> 
>> I don't think we should automatically fall-back to disabled compression,
>> when a client specifies compression algorithm.
> 
> Why?
> 

Because it's pretty difficult to realize it happened, particularly for
non-interactive connections. Imagine you have machines where transfer
between them is not free - surely you want to know when you suddenly get
10x the traffic, right?

But as I said above, this should be configurable. IMHO having "none"
algorithm to explicitly enable this seems like a reasonable solution.

> 
>>> If we are going to support multiple compression algorithms, do we need
>>> dynamic loading of correspondent compression libraries or static linking
>>> is ok? In case of dynamic linking we need to somehow specify information
>>> about available compression algorithms.
>>> Some special subdirectory for them so that I can traverse this directory
>>> and try to load correspondent libraries?
>>>
>>> Only I find it too complicated for the addressed problem?
>>>
>>
>> I don't think we need dynamic algorithms v1, but IMHO it'd be pretty
>> simple to do - just add a shared_preload_library which registers it in a
>> list in memory.
> 
> I personally don't think that's something we need to support. There's a
> lot of issues around naming registries that need to synchronized between
> client/server.
> 

Don't we have that issue even without the dynamic registration? Let's
say you have custom driver implementing the protocol (but not based on
libpq). Surely that assumes the algorithm names match what we have?

In the worst case the decompression fails (which may happen already
anyway) and the connection dies. Not a big deal, no?

But I agree v1 should not include this dynamic registration.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Alvaro Herrera

On 2019-Feb-09, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2019-Feb-09, Tom Lane wrote:
> >> No, that's still the back end of the deletion machinery, and in particular
> >> it would fail to clean pg_depend entries for the trigger.  Going in by the
> >> front door would use performDeletion().  (See deleteOneObject() to get
> >> an idea of what's being possibly missed out here.)
> 
> > This patch I think does the right thing.
> 
> (squint ...) Don't much like the undocumented deleteDependencyRecordsFor
> call; that looks like it's redundant with what deleteOneObject will do.
> I think you're doing it to get rid of the INTERNAL dependency so that
> deletion won't recurse across that, but why is that a good idea?  Needs
> a comment at least.

Yeah, it's deleting the INTERNAL dependency, because otherwise the
trigger deletion is (correctly) forbidden, since the constraint depends
on it.  Perhaps it'd be good to have it be more targetted: make sure it
only deletes that dependency row and not any others that the trigger
might have (though I don't have it shouldn't have any.  How could it?)  I'd do
that by adding a new function

long
deleteExactDependencyRecords(Oid classId, Oid objectId,
 Oid refclassId, Oid refobjectId)

and calling that instead of deleteDependencyRecordsFor.

> Also, I suspect you might need a second CCI after the performDeletion
> call, in case the loop iterates?

Can do.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Amit Langote

On Sun, Feb 10, 2019 at 1:50 AM Alvaro Herrera  wrote:
> On 2019-Feb-09, Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > On 2019-Feb-09, Tom Lane wrote:
> > >> No, that's still the back end of the deletion machinery, and in 
> > >> particular
> > >> it would fail to clean pg_depend entries for the trigger.  Going in by 
> > >> the
> > >> front door would use performDeletion().  (See deleteOneObject() to get
> > >> an idea of what's being possibly missed out here.)
> >
> > > This patch I think does the right thing.
> >
> > (squint ...) Don't much like the undocumented deleteDependencyRecordsFor
> > call; that looks like it's redundant with what deleteOneObject will do.
> > I think you're doing it to get rid of the INTERNAL dependency so that
> > deletion won't recurse across that, but why is that a good idea?  Needs
> > a comment at least.
>
> Yeah, it's deleting the INTERNAL dependency, because otherwise the
> trigger deletion is (correctly) forbidden, since the constraint depends
> on it.  Perhaps it'd be good to have it be more targetted: make sure it
> only deletes that dependency row and not any others that the trigger
> might have (though I don't have it shouldn't have any.  How could it?)

Reading Tom's reply to my email, I wondered if performDeletion won't
do more than what the code is already doing (except calling the right
trigger deletion function which the current code doesn't), because the
trigger in question is an internal trigger without any dependencies
(the function it invokes are pinned by the system)?

Thanks,
Amit

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Tom Lane

Alvaro Herrera  writes:
> On 2019-Feb-09, Tom Lane wrote:
>> I think you're doing it to get rid of the INTERNAL dependency so that
>> deletion won't recurse across that, but why is that a good idea?  Needs
>> a comment at least.

> Yeah, it's deleting the INTERNAL dependency, because otherwise the
> trigger deletion is (correctly) forbidden, since the constraint depends
> on it.

Well, the question that's begged here is exactly why it's okay to remove
the trigger and dependency link despite the fact that the constraint needs
it.  I suppose the answer is that we'll subsequently insert a new trigger
implementing the same constraint (and internally-linked to it)?  That
information is what I'd like to have in the comment.

> Perhaps it'd be good to have it be more targetted: make sure it
> only deletes that dependency row and not any others that the trigger
> might have (though I don't have it shouldn't have any.  How could it?)  I'd do
> that by adding a new function

I'm not sure that'd be an improvement, especially in light of the
hazard that the trigger might somehow have acquired extension and/or
partition dependencies that'd also cause issues.

regards, tom lane

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Tom Lane

Amit Langote  writes:
> Reading Tom's reply to my email, I wondered if performDeletion won't
> do more than what the code is already doing (except calling the right
> trigger deletion function which the current code doesn't), because the
> trigger in question is an internal trigger without any dependencies
> (the function it invokes are pinned by the system)?

A big part of the point here is to not have to have such assumptions
wired into the fk-cloning code.  But even if that internal dependency is
the only one the trigger is involved in, there are other steps in
deleteOneObject that shouldn't be ignored.  For example, somebody
could've attached a comment to it.

regards, tom lane

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Alvaro Herrera

On 2019-Feb-09, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2019-Feb-09, Tom Lane wrote:
> >> I think you're doing it to get rid of the INTERNAL dependency so that
> >> deletion won't recurse across that, but why is that a good idea?  Needs
> >> a comment at least.
> 
> > Yeah, it's deleting the INTERNAL dependency, because otherwise the
> > trigger deletion is (correctly) forbidden, since the constraint depends
> > on it.
> 
> Well, the question that's begged here is exactly why it's okay to remove
> the trigger and dependency link despite the fact that the constraint needs
> it.  I suppose the answer is that we'll subsequently insert a new trigger
> implementing the same constraint (and internally-linked to it)?  That
> information is what I'd like to have in the comment.

Well, the answer is that the trigger is no longer needed.  This is an
action trigger, i.e. it's attached to the referenced relation; and the
action is making an independent table become a partition.  Since the
partition is reachable by the action trigger that goes through the
parent table, we no longer need the action trigger that goes directly to
the partition.

Conversely, when we detach a partition, we create an action trigger that
wasn't there before.

I'll put this in the comment.

> > Perhaps it'd be good to have it be more targetted: make sure it
> > only deletes that dependency row and not any others that the trigger
> > might have (though I don't have it shouldn't have any.  How could it?)  I'd 
> > do
> > that by adding a new function
> 
> I'm not sure that'd be an improvement, especially in light of the
> hazard that the trigger might somehow have acquired extension and/or
> partition dependencies that'd also cause issues.

Good point.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Amit Langote

On Sun, Feb 10, 2019 at 2:13 AM Tom Lane  wrote:
>
> Amit Langote  writes:
> > Reading Tom's reply to my email, I wondered if performDeletion won't
> > do more than what the code is already doing (except calling the right
> > trigger deletion function which the current code doesn't), because the
> > trigger in question is an internal trigger without any dependencies
> > (the function it invokes are pinned by the system)?
>
> A big part of the point here is to not have to have such assumptions
> wired into the fk-cloning code.  But even if that internal dependency is
> the only one the trigger is involved in, there are other steps in
> deleteOneObject that shouldn't be ignored.  For example, somebody
> could've attached a comment to it.

Okay, I hadn't considered that far.  Thanks for explaining.

Regards,
Amit

Re: Protect syscache from bloating with negative cache entries

2019-02-09 Thread Tomas Vondra

On 2/7/19 1:18 PM, Kyotaro HORIGUCHI wrote:
> At Thu, 07 Feb 2019 15:24:18 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20190207.152418.139132570.horiguchi.kyot...@lab.ntt.co.jp>
>> I'm going to retake numbers with search-only queries.
> 
> Yeah, I was stupid.
> 
> I made a rerun of benchmark using "-S -T 30" on the server build
> with no assertion and -O2. The numbers are the best of three
> successive attempts.  The patched version is running with
> cache_target_memory = 0, cache_prune_min_age = 600 and
> cache_entry_limit = 0 but pruning doesn't happen by the workload.
> 
> master: 13393 tps
> v12   : 12625 tps (-6%)
> 
> Significant degradation is found.
> 
> Recuded frequency of dlist_move_tail by taking 1ms interval
> between two succesive updates on the same entry let the
> degradation dissapear.
> 
> patched  : 13720 tps (+2%)
> 
> I think there's still no need of such frequency. It is 100ms in
> the attched patch.
> 
> # I'm not sure the name LRU_IGNORANCE_INTERVAL makes sens..
> 

Hi,

I've done a bunch of benchmarks on v13, and I don't see any serious
regression either. Each test creates a number of tables (100, 1k, 10k,
100k and 1M) and then runs SELECT queries on them. The tables are
accessed randomly - with either uniform or exponential distribution. For
each combination there are 5 runs, 60 seconds each (see the attached
shell scripts, it should be pretty obvious).

I've done the tests on two different machines - small one (i5 with 8GB
of RAM) and large one (e5-2620v4 with 64GB RAM), but the behavior is
almost exactly the same (with the exception of 1M tables, which does not
fit into RAM on the smaller one).

On the xeon, the results (throughput compared to master) look like this:

uniform   100 10001   10   100

v13   105.04%  100.28%  102.96%  102.11%   101.54%
v13 (nodata)   97.05%   98.30%   97.42%   96.60%   107.55%

exponential   100 10001   10   100

v13   100.04%  103.48%  101.70%   98.56%   103.20%
v13 (nodata)   97.12%   98.43%   98.86%   98.48%   104.94%

The "nodata" case means the tables were empty (so no files created),
while in the other case each table contained 1 row.

Per the results it's mostly break even, and in some cases there is
actually a measurable improvement.

That being said, the question is whether the patch actually reduces
memory usage in a useful way - that's not something this benchmark
validates. I plan to modify the tests to make pgbench script
time-dependent (i.e. to pick a subset of tables depending on time).

A couple of things I've happened to notice during a quick review:

1) The sgml docs in 0002 talk about "syscache_memory_target" and
"syscache_prune_min_age", but those options were renamed to just
"cache_memory_target" and "cache_prune_min_age".

2) "cache_entry_limit" is not mentioned in sgml docs at all, and it's
defined three times in guc.c for some reason.

3) I don't see why to define PRUNE_BY_AGE and PRUNE_BY_NUMBER, instead
of just using two bool variables prune_by_age and prune_by_number doing
the same thing.

4) I'm not entirely sure about using stmtStartTimestamp. Doesn't that
pretty much mean long-running statements will set the lastaccess to very
old timestamp? Also, it means that long-running statements (like a PL
function accessing a bunch of tables) won't do any eviction at all, no?
AFAICS we'll set the timestamp only once, at the very beginning.

I wonder whether using some other timestamp source (like a timestamp
updated regularly from a timer, or something like that).

5) There are two fread() calls in 0003 triggering a compiler warning
about unused return value.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

run-data.sh
Description: application/shellscript

run-nodata.sh
Description: application/shellscript

syscache.ods
Description: application/vnd.oasis.opendocument.spreadsheet

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Alvaro Herrera

On 2019-Feb-09, Alvaro Herrera wrote:

> I'll put this in the comment.

Attached.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 910e5deaa3f..0d32262c4c3 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8008,10 +8008,12 @@ CloneFkReferencing(Relation pg_constraint, Relation parentRel,
 			ReleaseSysCache(partcontup);
 
 			/*
-			 * Looks good!  Attach this constraint.  Note that the action
-			 * triggers are no longer needed, so remove them.  We identify
-			 * them because they have our constraint OID, as well as being
-			 * on the referenced rel.
+			 * Looks good!  Attach this constraint.  The action triggers are
+			 * no longer needed -- the parent table already has equivalent
+			 * ones, and those will be able to reach the partition, therefore
+			 * the ones in the partition are redundant, so remove them.  We
+			 * identify them because they have our constraint OID, as well as
+			 * being on the referenced rel.
 			 */
 			trigrel = heap_open(TriggerRelationId, RowExclusiveLock);
 			ScanKeyInit(&key,
@@ -8024,17 +8026,30 @@ CloneFkReferencing(Relation pg_constraint, Relation parentRel,
 			while ((trigtup = systable_getnext(scan)) != NULL)
 			{
 Form_pg_trigger	trgform = (Form_pg_trigger) GETSTRUCT(trigtup);
+ObjectAddress	trigger;
 
 if (trgform->tgconstrrelid != fk->conrelid)
 	continue;
 if (trgform->tgrelid != fk->confrelid)
 	continue;
 
-deleteDependencyRecordsForClass(TriggerRelationId,
-HeapTupleGetOid(trigtup),
-ConstraintRelationId,
-DEPENDENCY_INTERNAL);
-CatalogTupleDelete(trigrel, &trigtup->t_self);
+/*
+ * The constraint is originally set up to contain this trigger
+ * as an implementation object, so there's a dependency record
+ * that links the two; however, since the trigger is no longer
+ * needed, we remove the dependency link in order to be able
+ * to drop the trigger while keeping the constraint intact.
+ */
+deleteDependencyRecordsFor(TriggerRelationId,
+		   HeapTupleGetOid(trigtup),
+		   false);
+/* make dependency deletion visible to performDeletion */
+CommandCounterIncrement();
+ObjectAddressSet(trigger, TriggerRelationId,
+ HeapTupleGetOid(trigtup));
+performDeletion(&trigger, DROP_RESTRICT, 0);
+/* make trigger drop visible, in case the loop iterates */
+CommandCounterIncrement();
 			}
 
 			systable_endscan(scan);

Re: dsa_allocate() faliure

2019-02-09 Thread Thomas Munro

On Sat, Feb 9, 2019 at 9:21 PM Robert Haas  wrote:
> On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
>  wrote:
> > Sometimes FreeManagerPutInternal() returns a
> > number-of-contiguous-pages-created-by-this-insertion that is too large
> > by one. [...]
>
> I spent a long time thinking about this and starting at code this
> afternoon, but I didn't really come up with much of anything useful.
> It seems like a strange failure mode, because
> FreePageManagerPutInternal() normally just  returns its third argument
> unmodified. [...]

Bleugh.  Yeah.  What I said before wasn't quite right.  The value
returned by FreePageManagerPutInternal() is actually correct at the
moment it is returned, but it ceases to be correct immediately
afterwards if the following call to FreePageBtreeCleanup() happens to
reduce the size of that particular span.  The problem is that we
clobber fpm->contiguous_pages with the earlier (and by now incorrect)
value that we were holding in a local variable.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Tom Lane

Alvaro Herrera  writes:
> On 2019-Feb-09, Tom Lane wrote:
>> Well, the question that's begged here is exactly why it's okay to remove
>> the trigger and dependency link despite the fact that the constraint needs
>> it.  I suppose the answer is that we'll subsequently insert a new trigger
>> implementing the same constraint (and internally-linked to it)?  That
>> information is what I'd like to have in the comment.

> Well, the answer is that the trigger is no longer needed.  This is an
> action trigger, i.e. it's attached to the referenced relation; and the
> action is making an independent table become a partition.  Since the
> partition is reachable by the action trigger that goes through the
> parent table, we no longer need the action trigger that goes directly to
> the partition.

Oh ... then why don't we go ahead and get rid of the constraint entry,
too?

regards, tom lane

Re: Early WIP/PoC for inlining CTEs

2019-02-09 Thread Tom Lane

Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> After further reflection I really don't like Andrew's suggestion
>  Tom> that we not document the rule that multiply-referenced CTEs won't
>  Tom> be inlined by default. That would be giving up the principle that
>  Tom> WITH calculations are not done multiple times by default, and I
>  Tom> draw the line at that. It's an often-useful behavior as well as
>  Tom> one that's been documented from day one, so I do not accept the
>  Tom> argument that we might someday override it on the basis of nothing
>  Tom> but planner cost estimates.

> The case that springs to mind is when a CTE with grouping is then joined
> multiple times in the main query with different conditions. If the
> planner is able to deduce (e.g. via ECs) that restrictions on grouped
> columns can be pushed into the CTE, then inlining the CTE multiple times
> might be a significant win. But if that isn't possible, then inlining
> multiple times might be a significant loss.

Sure, but this is exactly the sort of situation where we should offer
a way for the user to force either decision to be made.  I think it's
very unlikely that we'll ever be in a position to make a realistic
cost-based decision for that.  Actually planning it out both ways would
be horrendously expensive (and probably none too reliable anyway, given
how shaky ndistinct estimates tend to be); and we certainly don't have
enough info to make a smart choice without doing that.

regards, tom lane

Re: dsa_allocate() faliure

2019-02-09 Thread Thomas Munro

On Sun, Feb 10, 2019 at 7:24 AM Thomas Munro
 wrote:
> On Sat, Feb 9, 2019 at 9:21 PM Robert Haas  wrote:
> > On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro
> >  wrote:
> > > Sometimes FreeManagerPutInternal() returns a
> > > number-of-contiguous-pages-created-by-this-insertion that is too large
> > > by one. [...]
> >
> > I spent a long time thinking about this and starting at code this
> > afternoon, but I didn't really come up with much of anything useful.
> > It seems like a strange failure mode, because
> > FreePageManagerPutInternal() normally just  returns its third argument
> > unmodified. [...]
>
> Bleugh.  Yeah.  What I said before wasn't quite right.  The value
> returned by FreePageManagerPutInternal() is actually correct at the
> moment it is returned, but it ceases to be correct immediately
> afterwards if the following call to FreePageBtreeCleanup() happens to
> reduce the size of that particular span.

... but why would it do that?  I can reproduce cases where (for
example) FreePageManagerPutInternal() returns 179, and then
FreePageManagerLargestContiguous() returns 179, but then after
FreePageBtreeCleanup() it returns 178.  At that point FreePageDump()
says:

btree depth 1:
  77@0 l: 27(1) 78(178)
freelists:
  1: 27
  129: 78(178)

But at first glance it shouldn't be allocating pages, because it just
does consolidation to try to convert to singleton format, and then it
does recycle list cleanup using soft=true so that no allocation of
btree pages should occur.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: FETCH FIRST clause PERCENT option

2019-02-09 Thread Tomas Vondra

On 1/30/19 7:07 AM, Surafel Temesgen wrote:
> 
> 
> On Mon, Jan 28, 2019 at 1:28 AM Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> 
> 
> OK. Does that mean you agree the incremental approach is reasonable?
> 
> 
> there are no noticeable performance difference but i love previous
> approach more regarding cursor operation it fetch tuple forward and
> backward from tuplestore only but in incremental approach we have to 
> re execute outer node in every forward and backward fetching operation
> 

I'm not sure I understand - are you saying every time the user does a
FETCH, we have to run the outer plan from scratch? I don't see why would
that be necessary? And if it is, how come there's no noticeable
performance difference?

Can you share a patch implementing the incremental approach, and a query
demonstrating the issue?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: executor relation handling

2019-02-09 Thread Julien Rouhaud

Hi,

On Sun, Sep 30, 2018 at 7:18 PM Tom Lane  wrote:
>
> I think that the call sites should ultimately look like
>
> Assert(CheckRelationLockedByMe(...));
>
> but for hunting down the places where the assertion currently fails,
> it's more convenient if it's just an elog(WARNING).

I just hit one of the asserts (in relation_open()) added in
b04aeb0a053.  Here's a simple reproducer:

DROP TABLE IF EXISTS test;
CREATE TABLE test (id integer primary key);
PREPARE s AS DELETE FROM test WHERE id = 1;
EXECUTE s;
EXECUTE s;

This comes from ExecInitIndexScan() and ExecInitBitmapIndexScan(),
which open the index without lock if the parent table is a target
relation:

/*
 * Open the index relation.
 *
 * If the parent table is one of the target relations of the query, then
 * InitPlan already opened and write-locked the index, so we can avoid
 * taking another lock here.  Otherwise we need a normal reader's lock.
 */
relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
indexstate->iss_RelationDesc = index_open(node->indexid,
  relistarget ? NoLock :
AccessShareLock);

And digging into InitPlan() up to ExecInitModifyTable():

/*
 * If there are indices on the result relation, open them and save
 * descriptors in the result relation info, so that we can add new
 * index entries for the tuples we add/update.  We need not do this
 * for a DELETE, however, since deletion doesn't affect indexes. Also,
 * inside an EvalPlanQual operation, the indexes might be open
 * already, since we share the resultrel state with the original
 * query.
 */
if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
ExecOpenIndices(resultRelInfo,
node->onConflictAction != ONCONFLICT_NONE);

So, this is problematic with a cached plan on a DELETE query since the
lock on the target relation's index will never be acquired (the lock
is acquired on the first execution in get_relation_info()).  This
doesn't seem unsafe though, since DROP INDEX [CONCURRENTLY] will still
acquire lock on the index relation or query xid before dropping the
index.

I'm not sure of what's the best way to fix this problem.  I wanted to
modify ExecInitIndexScan() and ExecInitBitmapIndexScan() to acquire
the lock for a DELETE on the target relation, however I don't think
that we have that information at this point.  Maybe just
unconditionally acquire an AccessShareLock on the index in those
functions?

Re: executor relation handling

2019-02-09 Thread Tom Lane

Julien Rouhaud  writes:
> I just hit one of the asserts (in relation_open()) added in
> b04aeb0a053.  Here's a simple reproducer:

Yeah, I think this is the same issue being discussed in

https://www.postgresql.org/message-id/flat/19465.1541636036%40sss.pgh.pa.us

I imagine the patch David recently posted in that thread will fix this,
but can you check, and/or review the patch?

regards, tom lane

Re: FETCH FIRST clause PERCENT option

2019-02-09 Thread Tomas Vondra

On 2/1/19 12:10 PM, Surafel Temesgen wrote:
> here is a rebased version of  previous  patch with planner
> comment included
> 

It's really hard to say which comment is that ..

FWIW I find the changes in nodeLimit lacking sufficient comments. The
comments present are somewhat obvious - what we need are comments
explaining why things happen. For example the LIMIT_INITIAL now includes
this chunk of code:

case LIMIT_INITIAL:

if (node->limitOption == PERCENTAGE)
{

/*
 * Find all rows the plan will return.
 */
for (;;)
{
slot = ExecProcNode(outerPlan);
if (TupIsNull(slot))
{
break;
}
tuplestore_puttupleslot(node->totalTuple, slot);
}
}

Ignoring the fact that the comment is incorrectly indented, it states a
rather obvious fact - that we fetch all rows from a plan and stash them
into a tuplestore. What is missing is comment explaining why we need to
do that.

This applies to other changes in nodeLimit too, and elsewhere.

Another detail is that we generally leave a free line before "if", i.e.
instead of

}
if (node->limitOption == PERCENTAGE)
{

it should be

}

if (node->limitOption == PERCENTAGE)
{

because it's fairly easy to misread as "else if". Even better, there
should be a comment before the "if" explaining what the branch does.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: executor relation handling

2019-02-09 Thread Julien Rouhaud

On Sun, Feb 10, 2019 at 12:31 AM Tom Lane  wrote:
>
> Julien Rouhaud  writes:
> > I just hit one of the asserts (in relation_open()) added in
> > b04aeb0a053.  Here's a simple reproducer:
>
> Yeah, I think this is the same issue being discussed in
>
> https://www.postgresql.org/message-id/flat/19465.1541636036%40sss.pgh.pa.us

Ah indeed, sorry I totally missed that thread.

>> I imagine the patch David recently posted in that thread will fix this,
> but can you check, and/or review the patch?

I tried quickly and it does fix my test case.  It's quite late here,
so I'll review the patch tomorrow.
Thanks.

Re: libpq compression

2019-02-09 Thread Tomas Vondra




On 2/9/19 3:02 PM, Konstantin Knizhnik wrote:
> 
> 
> On 09.02.2019 1:38, Tomas Vondra wrote:
>> On 2/8/19 11:10 PM, Konstantin Knizhnik wrote:
>>>
>>> On 08.02.2019 21:57, Andres Freund wrote:
 On 2019-02-08 12:15:58 +0300, Konstantin Knizhnik wrote:
> Frankly speaking, I do not think that such flexibility in choosing
> compression algorithms is really needed.
> I do not expect that there will be many situations where old client
> has to
> communicate with new server or visa versa.
> In most cases both client and server belongs to the same postgres
> distributive and so implements the same compression algorithm.
> As far as we are compressing only temporary data (traffic), the
> problem of
> providing backward compatibility seems to be not so important.
 I think we should outright reject any patch without compression type
 negotiation.
>>> Does it mean that it is necessary to support multiple compression
>>> algorithms and make it possible to perform switch between them at
>>> runtime?
>> IMHO the negotiation should happen at connection time, i.e. the server
>> should support connections compressed by different algorithms. Not sure
>> if that's what you mean by runtime.
>>
>> AFAICS this is quite close to how negotiation of encryption algorithms
>> works, in TLS and so on. Client specifies supported algorithms, server
>> compares that to list of supported algorithms, deduces the encryption
>> algorithm and notifies the client.
>>
>> To allow fall-back to uncompressed connection, use "none" as algorithm.
>> If there's no common algorithm, fail.
> 
> It is good analogue with SSL.
> Yes, SSL protocol provides several ways of authentication, encryption,...
> And there are several different libraries implementing SSL.
> But Postgres is using only one of them: OpenSSL.
> If I want to use some other library (for example to make it possible to
> serialize and pass SSL session state to other
> process), then there is no way to achieve it.
> 

That's rather misleading. Firstly, it's true we only support OpenSSL at
the moment, but I do remember we've been working on adding support to a
bunch of other TLS libraries.

But more importantly, it's not the TLS library that's negotiated. It's
the encryption algorithms that is negotiated. The server is oblivious
which TLS library is used by the client (and vice versa), because the
messages are the same - what matters is that they agree on keys,
ciphers, etc. And those can differ/change between libraries or even
versions of the same library.

For us, the situation is the same - we have the messages specified by
the FE/BE protocol, and it's the algorithms that are negotiated.

> Actually zstd also includes implementations of several compression
> algorithms and it choose one of them best fitting particular data
> stream. As in case of SSL, choice of algorithm is performed internally
> inside zstd - not at libpq level.
> 

Really? I always thought zstd is a separate compression algorithm.
There's adaptive compression feature, but AFAIK that essentially tweaks
compression level based on network connection. Can you point me to the
sources or docs explaining this?

Anyway, this does not really change anything - it's internal zstd stuff.

> Sorry, if my explanation about static and dynamic (at runtime) choice
> were not correct.
> This is how compression is toggled now:
> 
> #if HAVE_LIBZSTD
> ZpqStream*
> zpq_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg)
> {
> ...
> }
> #endif
> 
> So if Postgres was configured with zstd, then this implementation is
> included inclient and server Postgres libraries.
> If postgres is configures with zlib, them  zlib implementation will be
> used.
> This is similar with using compression and most of other configurable
> features in Postgres.
> 
> If we want to provide dynamic choice at runtime, then we need to have
> array with available compression algorithms:
> 
> #if HAVE_LIBZSTD
> static ZpqStream*
> zstd_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg)
> {
> ...
> }
> #endif
> 
> ZpqCompressorImpl compressorImpl[] =
> {
> #if HAVE_LIBZSTD
> {zstd_create, zstd_read,zstd_write,...},
> #endif
> #if HAVE_ZLIB
> {zlib_create, zlib_read,zslib_write,...},
> #endif
> ...
> }
> 

Yes, that's mostly what I've been imagining, except that you also need
some sort of identifier for the algorithm - a cstring at the beginning
of the struct should be enough, I guess.

> And the most interesting case is that if we load library dynamically.
> Each implementation is generated in separate library (for  example
> libpztd.so).
> In this case we need to somehow specify available libraries.
> For example by placing them in separate directory, or specifying list of
> libraries in postgresql.conf.
> Then we try to load this library using dlopen.  Such library has
> external dependencies of correspondent compressor library (for example
> -lz). The library can be successfully loaded if there c

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Alvaro Herrera

On 2019-Feb-09, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2019-Feb-09, Tom Lane wrote:
> >> Well, the question that's begged here is exactly why it's okay to
> >> remove the trigger and dependency link despite the fact that the
> >> constraint needs it.  I suppose the answer is that we'll
> >> subsequently insert a new trigger implementing the same constraint
> >> (and internally-linked to it)?  That information is what I'd like
> >> to have in the comment.
> 
> > Well, the answer is that the trigger is no longer needed.  This is
> > an action trigger, i.e. it's attached to the referenced relation;
> > and the action is making an independent table become a partition.
> > Since the partition is reachable by the action trigger that goes
> > through the parent table, we no longer need the action trigger that
> > goes directly to the partition.
> 
> Oh ... then why don't we go ahead and get rid of the constraint entry,
> too?

Because each partition has its own pg_constraint entry.  (Otherwise
there's no place to put the column numbers into -- they can differ from
partition to partition, remember.)  The only thing we do is mark it as
child of the parent's one.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Tom Lane

Alvaro Herrera  writes:
> On 2019-Feb-09, Tom Lane wrote:
>> Oh ... then why don't we go ahead and get rid of the constraint entry,
>> too?

> Because each partition has its own pg_constraint entry.  (Otherwise
> there's no place to put the column numbers into -- they can differ from
> partition to partition, remember.)  The only thing we do is mark it as
> child of the parent's one.

Uh-huh.  And what happens after DETACH PARTITION ... are you going to run
around and recreate these triggers?

regards, tom lane

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Alvaro Herrera

On 2019-Feb-09, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2019-Feb-09, Tom Lane wrote:
> >> Oh ... then why don't we go ahead and get rid of the constraint entry,
> >> too?
> 
> > Because each partition has its own pg_constraint entry.  (Otherwise
> > there's no place to put the column numbers into -- they can differ from
> > partition to partition, remember.)  The only thing we do is mark it as
> > child of the parent's one.
> 
> Uh-huh.  And what happens after DETACH PARTITION ... are you going to run
> around and recreate these triggers?

Yep, that's there too.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-09 Thread Tom Lane

Alvaro Herrera  writes:
> On 2019-Feb-09, Tom Lane wrote:
>> Uh-huh.  And what happens after DETACH PARTITION ... are you going to run
>> around and recreate these triggers?

> Yep, that's there too.

OK, then I guess it's fine.

regards, tom lane

Reporting script runtimes in pg_regress

2019-02-09 Thread Tom Lane

I've wondered for some time whether we couldn't make a useful
reduction in the run time of the PG regression tests by looking
for scripts that run significantly longer than others in their
parallel groups, and making an effort to trim the runtimes of
those particular scripts.

The first step in that of course is to get some data, so attached
is a patch to pg_regress to cause it to print the runtime of each
script.  This produces results like, say,

parallel group (17 tests):  circle line timetz path lseg point macaddr macaddr8 
time interval inet tstypes date box timestamp timestamptz polygon
 point... ok (35 ms)
 lseg ... ok (31 ms)
 line ... ok (23 ms)
 box  ... ok (135 ms)
 path ... ok (24 ms)
 polygon  ... ok (1256 ms)
 circle   ... ok (20 ms)
 date ... ok (69 ms)
 time ... ok (40 ms)
 timetz   ... ok (22 ms)
 timestamp... ok (378 ms)
 timestamptz  ... ok (378 ms)
 interval ... ok (50 ms)
 inet ... ok (56 ms)
 macaddr  ... ok (33 ms)
 macaddr8 ... ok (37 ms)
 tstypes  ... ok (62 ms)

or on a rather slower machine,

parallel group (8 tests):  hash_part reloptions partition_info identity 
partition_join partition_aggregate partition_prune indexing
 identity ... ok (3807 ms)
 partition_join   ... ok (10433 ms)
 partition_prune  ... ok (19370 ms)
 reloptions   ... ok (1166 ms)
 hash_part... ok (628 ms)
 indexing ... ok (22070 ms)
 partition_aggregate  ... ok (12731 ms)
 partition_info   ... ok (1373 ms)
test event_trigger... ok (1953 ms)
test fast_default ... ok (2689 ms)
test stats... ok (1173 ms)

Does anyone else feel that this is interesting/useful data?

regards, tom lane

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 4b742a1..a4caa22 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -36,6 +36,7 @@
 #include "getopt_long.h"
 #include "libpq/pqcomm.h"		/* needed for UNIXSOCK_PATH() */
 #include "pg_config_paths.h"
+#include "portability/instr_time.h"
 
 /* for resultmap we need a list of pairs of strings */
 typedef struct _resultmap
@@ -1473,14 +1474,15 @@ results_differ(const char *testname, const char *resultsfile, const char *defaul
 
 /*
  * Wait for specified subprocesses to finish, and return their exit
- * statuses into statuses[]
+ * statuses into statuses[] and stop times into stoptimes[]
  *
  * If names isn't NULL, print each subprocess's name as it finishes
  *
  * Note: it's OK to scribble on the pids array, but not on the names array
  */
 static void
-wait_for_tests(PID_TYPE * pids, int *statuses, char **names, int num_tests)
+wait_for_tests(PID_TYPE * pids, int *statuses, instr_time *stoptimes,
+			   char **names, int num_tests)
 {
 	int			tests_left;
 	int			i;
@@ -1533,6 +1535,7 @@ wait_for_tests(PID_TYPE * pids, int *statuses, char **names, int num_tests)
 #endif
 pids[i] = INVALID_PID;
 statuses[i] = (int) exit_status;
+INSTR_TIME_SET_CURRENT(stoptimes[i]);
 if (names)
 	status(" %s", names[i]);
 tests_left--;
@@ -1582,6 +1585,8 @@ run_schedule(const char *schedule, test_function tfunc)
 	_stringlist *expectfiles[MAX_PARALLEL_TESTS];
 	_stringlist *tags[MAX_PARALLEL_TESTS];
 	PID_TYPE	pids[MAX_PARALLEL_TESTS];
+	instr_time	starttimes[MAX_PARALLEL_TESTS];
+	instr_time	stoptimes[MAX_PARALLEL_TESTS];
 	int			statuses[MAX_PARALLEL_TESTS];
 	_stringlist *ignorelist = NULL;
 	char		scbuf[1024];
@@ -1687,7 +1692,8 @@ run_schedule(const char *schedule, test_function tfunc)
 		{
 			status(_("test %-28s ... "), tests[0]);
 			pids[0] = (tfunc) (tests[0], &resultfiles[0], &expectfiles[0], &tags[0]);
-			wait_for_tests(pids, statuses, NULL, 1);
+			INSTR_TIME_SET_CURRENT(starttimes[0]);
+			wait_for_tests(pids, statuses, stoptimes, NULL, 1);
 			/* status line is finished below */
 		}
 		else if (max_concurrent_tests > 0 && max_concurrent_tests < num_tests)
@@ -1707,12 +1713,15 @@ run_schedule(const char *schedule, test_function tfunc)
 if (i - oldest >= max_connections)
 {
 	wait_for_tests(pids + oldest, statuses + oldest,
+   stoptimes + oldest,
    tests + oldest, i - oldest);
 	oldest = i;
 }
 pids[i] = (tfunc) (tests[i], &resultfiles[i], &expectfiles[i], &tags[i]);
+INSTR_TIME_SET_CURRENT(starttimes[i]);
 			}
 			wait_for_tests(pids + oldest, statuses + oldest,
+		   stoptimes + oldest,

Re: Reporting script runtimes in pg_regress

2019-02-09 Thread Peter Geoghegan

On Sat, Feb 9, 2019 at 7:50 PM Tom Lane  wrote:
> I've wondered for some time whether we couldn't make a useful
> reduction in the run time of the PG regression tests by looking
> for scripts that run significantly longer than others in their
> parallel groups, and making an effort to trim the runtimes of
> those particular scripts.

> Does anyone else feel that this is interesting/useful data?

It definitely seems useful to me.

-- 
Peter Geoghegan

Re: Reporting script runtimes in pg_regress

2019-02-09 Thread Andres Freund

Hi,

On February 10, 2019 9:20:18 AM GMT+05:30, Tom Lane  wrote:
>I've wondered for some time whether we couldn't make a useful
>reduction in the run time of the PG regression tests by looking
>for scripts that run significantly longer than others in their
>parallel groups, and making an effort to trim the runtimes of
>those particular scripts.
>
>The first step in that of course is to get some data, so attached
>is a patch to pg_regress to cause it to print the runtime of each
>script.  This produces results like, say,
>
>parallel group (17 tests):  circle line timetz path lseg point macaddr
>macaddr8 time interval inet tstypes date box timestamp timestamptz
>polygon
> point... ok (35 ms)
> lseg ... ok (31 ms)
> line ... ok (23 ms)
> box  ... ok (135 ms)
> path ... ok (24 ms)
> polygon  ... ok (1256 ms)
> circle   ... ok (20 ms)
> date ... ok (69 ms)
> time ... ok (40 ms)
> timetz   ... ok (22 ms)
> timestamp... ok (378 ms)
> timestamptz  ... ok (378 ms)
> interval ... ok (50 ms)
> inet ... ok (56 ms)
> macaddr  ... ok (33 ms)
> macaddr8 ... ok (37 ms)
> tstypes  ... ok (62 ms)
>
>or on a rather slower machine,
>
>parallel group (8 tests):  hash_part reloptions partition_info identity
>partition_join partition_aggregate partition_prune indexing
> identity ... ok (3807 ms)
> partition_join   ... ok (10433 ms)
> partition_prune  ... ok (19370 ms)
> reloptions   ... ok (1166 ms)
> hash_part... ok (628 ms)
> indexing ... ok (22070 ms)
> partition_aggregate  ... ok (12731 ms)
> partition_info   ... ok (1373 ms)
>test event_trigger... ok (1953 ms)
>test fast_default ... ok (2689 ms)
>test stats... ok (1173 ms)
>
>Does anyone else feel that this is interesting/useful data?

Yes, it does. I've locally been playing with parallelizing isolationtester's 
schedule, and it's quite useful for coming up with a schedule that's optimized.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Transaction commits VS Transaction commits (with parallel) VS query mean time

2019-02-09 Thread Haribabu Kommi

On Sat, Feb 9, 2019 at 4:07 PM Amit Kapila  wrote:

> On Fri, Feb 8, 2019 at 6:55 AM Haribabu Kommi 
> wrote:
> >
> > On Thu, Feb 7, 2019 at 9:31 PM Haribabu Kommi 
> wrote:
> >>
> >>
> >> This is because of larger xact_commit value than default configuration.
> With the changed server configuration, that leads to generate more parallel
> workers and every parallel worker operation is treated as an extra commit,
> because of this reason, the total number of commits increased, but the
> overall query performance is decreased.
> >>
> >> Is there any relation of transaction commits to performance?
> >>
> >> Is there any specific reason to consider the parallel worker activity
> also as a transaction commit? Especially in my observation, if we didn't
> consider the parallel worker activity as separate commits, the test doesn't
> show an increase in transaction commits.
> >
> >
> > The following statements shows the increase in the xact_commit value with
> > parallel workers. I can understand that workers updating the seq_scan
> stats
> > as they performed the seq scan.
> >
>

Thanks for your opinion.


> Yeah, that seems okay, however, one can say that for the scan they
> want to consider it as a single scan even if part of the scan is
> accomplished by workers or may be a separate counter for parallel
> workers scan.
>

OK.


> > Is the same applied to parallel worker transaction
> > commits also?
> >
>
> I don't think so.  It seems to me that we should consider it as a
> single transaction.  Do you want to do the leg work for this and try
> to come up with a patch?  On a quick look, I think we might want to
> change AtEOXact_PgStat so that the commits for parallel workers are
> not considered.  I think the caller has that information.
>

I try to fix it by adding a check for parallel worker or not and based on it
count them into stats. Patch attached.

With this patch, currently it doesn't count parallel worker transactions,
and
rest of the stats like seqscan and etc are still get counted. IMO they
still
may need to be counted as those stats represent the number of tuples
returned and etc.

Comments?

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Avoid-counting-parallel-worker-transactions-stats.patch
Description: Binary data

Re: libpq compression

2019-02-09 Thread Konstantin Knizhnik




On 10.02.2019 3:25, Tomas Vondra wrote:


On 2/9/19 3:02 PM, Konstantin Knizhnik wrote:


On 09.02.2019 1:38, Tomas Vondra wrote:

On 2/8/19 11:10 PM, Konstantin Knizhnik wrote:

On 08.02.2019 21:57, Andres Freund wrote:

On 2019-02-08 12:15:58 +0300, Konstantin Knizhnik wrote:

Frankly speaking, I do not think that such flexibility in choosing
compression algorithms is really needed.
I do not expect that there will be many situations where old client
has to
communicate with new server or visa versa.
In most cases both client and server belongs to the same postgres
distributive and so implements the same compression algorithm.
As far as we are compressing only temporary data (traffic), the
problem of
providing backward compatibility seems to be not so important.

I think we should outright reject any patch without compression type
negotiation.

Does it mean that it is necessary to support multiple compression
algorithms and make it possible to perform switch between them at
runtime?

IMHO the negotiation should happen at connection time, i.e. the server
should support connections compressed by different algorithms. Not sure
if that's what you mean by runtime.

AFAICS this is quite close to how negotiation of encryption algorithms
works, in TLS and so on. Client specifies supported algorithms, server
compares that to list of supported algorithms, deduces the encryption
algorithm and notifies the client.

To allow fall-back to uncompressed connection, use "none" as algorithm.
If there's no common algorithm, fail.

It is good analogue with SSL.
Yes, SSL protocol provides several ways of authentication, encryption,...
And there are several different libraries implementing SSL.
But Postgres is using only one of them: OpenSSL.
If I want to use some other library (for example to make it possible to
serialize and pass SSL session state to other
process), then there is no way to achieve it.


That's rather misleading. Firstly, it's true we only support OpenSSL at
the moment, but I do remember we've been working on adding support to a
bunch of other TLS libraries.

But more importantly, it's not the TLS library that's negotiated. It's
the encryption algorithms that is negotiated. The server is oblivious
which TLS library is used by the client (and vice versa), because the
messages are the same - what matters is that they agree on keys,
ciphers, etc. And those can differ/change between libraries or even
versions of the same library.

For us, the situation is the same - we have the messages specified by
the FE/BE protocol, and it's the algorithms that are negotiated.


Actually zstd also includes implementations of several compression
algorithms and it choose one of them best fitting particular data
stream. As in case of SSL, choice of algorithm is performed internally
inside zstd - not at libpq level.


Really? I always thought zstd is a separate compression algorithm.
There's adaptive compression feature, but AFAIK that essentially tweaks
compression level based on network connection. Can you point me to the
sources or docs explaining this?

Anyway, this does not really change anything - it's internal zstd stuff.


Sorry, if my explanation about static and dynamic (at runtime) choice
were not correct.
This is how compression is toggled now:

#if HAVE_LIBZSTD
ZpqStream*
zpq_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg)
{
...
}
#endif

So if Postgres was configured with zstd, then this implementation is
included inclient and server Postgres libraries.
If postgres is configures with zlib, them  zlib implementation will be
used.
This is similar with using compression and most of other configurable
features in Postgres.

If we want to provide dynamic choice at runtime, then we need to have
array with available compression algorithms:

#if HAVE_LIBZSTD
static ZpqStream*
zstd_create(zpq_tx_func tx_func, zpq_rx_func rx_func, void *arg)
{
...
}
#endif

ZpqCompressorImpl compressorImpl[] =
{
#if HAVE_LIBZSTD
{zstd_create, zstd_read,zstd_write,...},
#endif
#if HAVE_ZLIB
{zlib_create, zlib_read,zslib_write,...},
#endif
...
}


Yes, that's mostly what I've been imagining, except that you also need
some sort of identifier for the algorithm - a cstring at the beginning
of the struct should be enough, I guess.


And the most interesting case is that if we load library dynamically.
Each implementation is generated in separate library (for  example
libpztd.so).
In this case we need to somehow specify available libraries.
For example by placing them in separate directory, or specifying list of
libraries in postgresql.conf.
Then we try to load this library using dlopen.  Such library has
external dependencies of correspondent compressor library (for example
-lz). The library can be successfully loaded if there correspond
compressor implementation was install at the system.
This is most flexible approach allowing to provide custom implementation
of compressors.
Compression implementation can be organized as Postgres ext

Re: dsa_allocate() faliure

2019-02-09 Thread Robert Haas

On Sun, Feb 10, 2019 at 1:55 AM Thomas Munro
 wrote:
> Bleugh.  Yeah.  What I said before wasn't quite right.  The value
> returned by FreePageManagerPutInternal() is actually correct at the
> moment it is returned, but it ceases to be correct immediately
> afterwards if the following call to FreePageBtreeCleanup() happens to
> reduce the size of that particular span.  The problem is that we
> clobber fpm->contiguous_pages with the earlier (and by now incorrect)
> value that we were holding in a local variable.

Yeah, I had similar bugs to that during the initial development work I
did on freepage.c, and that's why I got rid of some lazy recomputation
thing that I had tried at some point.  The version that got committed
brought that back again, but possibly it's got the same kind of
problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: dsa_allocate() faliure

2019-02-09 Thread Robert Haas

On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro
 wrote:
> ... but why would it do that?  I can reproduce cases where (for
> example) FreePageManagerPutInternal() returns 179, and then
> FreePageManagerLargestContiguous() returns 179, but then after
> FreePageBtreeCleanup() it returns 178.  At that point FreePageDump()
> says:
>
> btree depth 1:
>   77@0 l: 27(1) 78(178)
> freelists:
>   1: 27
>   129: 78(178)
>
> But at first glance it shouldn't be allocating pages, because it just
> does consolidation to try to convert to singleton format, and then it
> does recycle list cleanup using soft=true so that no allocation of
> btree pages should occur.

I think I see what's happening.  At the moment the problem occurs,
there is no btree - there is only a singleton range.  So
FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and
then ends up in the section with the comment  /* Not contiguous; we
need to initialize the btree. */.  And that section, sadly, does not
respect the 'soft' flag, so kaboom.  Something like the attached might
fix it.

Boy, I love FreePageManagerDump!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

dsa-allocate-btree-init-soft.patch
Description: Binary data

49 matches

Mail list logo