Re: BUG? Slave don't reconnect to the master

2020-09-29 Thread Jehan-Guillaume de Rorthais
On Thu, 24 Sep 2020 15:22:46 +0300
Олег Самойлов  wrote:

> Hi, Jehan.
> 
> > On 9 Sep 2020, at 18:19, Jehan-Guillaume de Rorthais 
> > wrote:
> > 
> > On Mon, 7 Sep 2020 23:46:17 +0300
> > Олег Самойлов  wrote:
> >   
> >>> [...]  
> >> 10:30:55.965 FATAL:  terminating walreceiver process dpue to
> >> administrator cmd 10:30:55.966 LOG:  redo done at 0/1600C4B0
> >> 10:30:55.966 LOG:  last completed transaction was at log time
> >> 10:25:38.76429 10:30:55.968 LOG:  selected new timeline ID: 4
> >> 10:30:56.001 LOG:  archive recovery complete
> >> 10:30:56.005 LOG:  database system is ready to accept connections  
> >   
> >> The slave with didn't reconnected replication, tuchanka3c. Also I
> >> separated logs copied from the old master by a blank line:
> >> 
> >> [...]
> >> 
> >> 10:20:25.168 LOG:  database system was interrupted; last known up at
> >> 10:20:19 10:20:25.180 LOG:  entering standby mode
> >> 10:20:25.181 LOG:  redo starts at 0/1198
> >> 10:20:25.183 LOG:  consistent recovery state reached at 0/11000A68
> >> 10:20:25.183 LOG:  database system is ready to accept read only
> >> connections 10:20:25.193 LOG:  started streaming WAL from primary at
> >> 0/1200 on tl 3 10:25:05.370 LOG:  could not send data to client:
> >> Connection reset by peer 10:26:38.655 FATAL:  terminating walreceiver
> >> due to timeout 10:26:38.655 LOG:  record with incorrect prev-link
> >> 0/1200C4B0 at 0/1600C4D8  
> > 
> > This message appear before the effective promotion of tuchanka3b. Do you
> > have logs about what happen *after* the promotion?  
>  
>  This is end of the slave log. Nothing. Just absent replication.
> >>> 
> >>> This is unusual. Could you log some more details about replication
> >>> tryouts to your PostgreSQL logs? Set log_replication_commands and lower
> >>> log_min_messages to debug ?
> >> 
> >> Sure, this is PostgreSQL logs for the cluster tuchanka3.
> >> Tuchanka3a is an old (failed) master.  
> > 
> > According to your logs:
> > 
> > 20:29:41 tuchanka3a: freeze
> > 20:30:39 tuchanka3c: wal receiver timeout (default 60s timeout)
> > 20:30:39 tuchanka3c: switched to archives, and error'ed (expected)
> > 20:30:39 tuchanka3c: switched to stream again (expected)
> > no more news from this new wal receiver 
> > 20:34:21 tuchanka3b: promoted
> > 
> > I'm not sure where your floating IP is located at 20:30:39, but I suppose it
> > is still on tuchanka3a as the wal receiver don't hit any connection error
> > and tuchanka3b is not promoted yet.  
> 
> I think so.
> 
> > 
> > So at this point, I suppose the wal receiver is stuck in libpqrcv_connect
> > waiting for frozen tuchanka3a to answer, with no connection timeout. You
> > might track tcp sockets on tuchanka3a to confirm this.  
> 
> I don't know how to do this.

Use ss, see its manual page. Hare is an example, using standard 5432 pgsql port:

  ss -tapn 'dport = 5432 or sport = 5432'

Look for Local and Peer addresses and their status.

> > To avoid such a wait, try to add eg. connect_timeout=2 to your
> > primary_conninfo parameter. See:
> > https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
> >   
> 
> Nope, this was not enough. But I went further and I added tcp keepalive
> options too. So now paf file, for instance in tuchanka3c, is:
> 
> # recovery.conf for krogan3, pgsqlms pacemaker module
> primary_conninfo = 'host=krogan3 user=replicant application_name=tuchanka3c
> connect_timeout=5 keepalives=1 keepalives_idle=1 keepalives_interval=3
> keepalives_count=3' recovery_target_timeline = 'latest' standby_mode = 'on'
> 
> And now the problem with PostgreSQL-STOP is solved. But I surprised, why this
> was needed? I though that wal_receiver_timeout must be enough for this case.

Because wal_receiver_timeout apply on already established and streaming
connections, when the server end streaming becomes silent.

The timeout you have happen during the connection establishment, where
connect_timeout takes effect.

In regards with keepalive parameters, I am a bit surprised. According to the
source code, parameters defaults are:

  keepalives=1
  keepalives_idle=1
  keepalives_interval=1
  keepalives_count=1

But I just had a quick look there, so I probably miss something.

Regards,




Re: BUG? Slave don't reconnect to the master

2020-09-29 Thread Олег Самойлов



> On 29 Sep 2020, at 12:31, Jehan-Guillaume de Rorthais  wrote:
> 
> On Thu, 24 Sep 2020 15:22:46 +0300
> Олег Самойлов  wrote:
> 
>> Hi, Jehan.
>> 
>>> On 9 Sep 2020, at 18:19, Jehan-Guillaume de Rorthais 
>>> wrote:
>>> 
>>> On Mon, 7 Sep 2020 23:46:17 +0300
>>> Олег Самойлов  wrote:
>>> 
> [...]  
 10:30:55.965 FATAL:  terminating walreceiver process dpue to
 administrator cmd 10:30:55.966 LOG:  redo done at 0/1600C4B0
 10:30:55.966 LOG:  last completed transaction was at log time
 10:25:38.76429 10:30:55.968 LOG:  selected new timeline ID: 4
 10:30:56.001 LOG:  archive recovery complete
 10:30:56.005 LOG:  database system is ready to accept connections  
>>> 
 The slave with didn't reconnected replication, tuchanka3c. Also I
 separated logs copied from the old master by a blank line:
 
 [...]
 
 10:20:25.168 LOG:  database system was interrupted; last known up at
 10:20:19 10:20:25.180 LOG:  entering standby mode
 10:20:25.181 LOG:  redo starts at 0/1198
 10:20:25.183 LOG:  consistent recovery state reached at 0/11000A68
 10:20:25.183 LOG:  database system is ready to accept read only
 connections 10:20:25.193 LOG:  started streaming WAL from primary at
 0/1200 on tl 3 10:25:05.370 LOG:  could not send data to client:
 Connection reset by peer 10:26:38.655 FATAL:  terminating walreceiver
 due to timeout 10:26:38.655 LOG:  record with incorrect prev-link
 0/1200C4B0 at 0/1600C4D8  
>>> 
>>> This message appear before the effective promotion of tuchanka3b. Do you
>>> have logs about what happen *after* the promotion?  
>> 
>> This is end of the slave log. Nothing. Just absent replication.
> 
> This is unusual. Could you log some more details about replication
> tryouts to your PostgreSQL logs? Set log_replication_commands and lower
> log_min_messages to debug ?
 
 Sure, this is PostgreSQL logs for the cluster tuchanka3.
 Tuchanka3a is an old (failed) master.  
>>> 
>>> According to your logs:
>>> 
>>> 20:29:41 tuchanka3a: freeze
>>> 20:30:39 tuchanka3c: wal receiver timeout (default 60s timeout)
>>> 20:30:39 tuchanka3c: switched to archives, and error'ed (expected)
>>> 20:30:39 tuchanka3c: switched to stream again (expected)
>>>no more news from this new wal receiver 
>>> 20:34:21 tuchanka3b: promoted
>>> 
>>> I'm not sure where your floating IP is located at 20:30:39, but I suppose it
>>> is still on tuchanka3a as the wal receiver don't hit any connection error
>>> and tuchanka3b is not promoted yet.  
>> 
>> I think so.
>> 
>>> 
>>> So at this point, I suppose the wal receiver is stuck in libpqrcv_connect
>>> waiting for frozen tuchanka3a to answer, with no connection timeout. You
>>> might track tcp sockets on tuchanka3a to confirm this.  
>> 
>> I don't know how to do this.
> 
> Use ss, see its manual page. Hare is an example, using standard 5432 pgsql 
> port:
> 
>  ss -tapn 'dport = 5432 or sport = 5432'
> 
> Look for Local and Peer addresses and their status.
> 
>>> To avoid such a wait, try to add eg. connect_timeout=2 to your
>>> primary_conninfo parameter. See:
>>> https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
>>>   
>> 
>> Nope, this was not enough. But I went further and I added tcp keepalive
>> options too. So now paf file, for instance in tuchanka3c, is:
>> 
>> # recovery.conf for krogan3, pgsqlms pacemaker module
>> primary_conninfo = 'host=krogan3 user=replicant application_name=tuchanka3c
>> connect_timeout=5 keepalives=1 keepalives_idle=1 keepalives_interval=3
>> keepalives_count=3' recovery_target_timeline = 'latest' standby_mode = 'on'
>> 
>> And now the problem with PostgreSQL-STOP is solved. But I surprised, why this
>> was needed? I though that wal_receiver_timeout must be enough for this case.
> 
> Because wal_receiver_timeout apply on already established and streaming
> connections, when the server end streaming becomes silent.
> 
> The timeout you have happen during the connection establishment, where
> connect_timeout takes effect.
> 
> In regards with keepalive parameters, I am a bit surprised. According to the
> source code, parameters defaults are:
> 
>  keepalives=1
>  keepalives_idle=1
>  keepalives_interval=1
>  keepalives_count=1
> 
> But I just had a quick look there, so I probably miss something.

According to the official documentation, if keepalive parameters are not 
specified, then used default value from the OS.
https://www.postgresql.org/docs/12/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS

Cite:  A value of 0 (the default) selects the operating system's default.

I don't know what is the default values for the CentOS 7. I can only assert 
that adding keepalive is solved issue with Postgres-STOP test and looked like 
problems with ForkBomb 

Re: BUG? Slave don't reconnect to the master

2020-09-29 Thread Jehan-Guillaume de Rorthais
On Tue, 29 Sep 2020 16:22:18 +0300
Олег Самойлов  wrote:
[...]
> > In regards with keepalive parameters, I am a bit surprised. According to the
> > source code, parameters defaults are:
> > 
> >  keepalives=1
> >  keepalives_idle=1
> >  keepalives_interval=1
> >  keepalives_count=1
> > 
> > But I just had a quick look there, so I probably miss something.  

I did miss something. See bellow.

> According to the official documentation, if keepalive parameters are not
> specified, then used default value from the OS.
> https://www.postgresql.org/docs/12/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS

This settings are related to the server side.

The one you are setting in primary_conninfo are related to the client side:

https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS

> Cite:  A value of 0 (the default) selects the operating system's default.
> 
> I don't know what is the default values for the CentOS 7.

sysctl -ar net.ipv4.tcp_keepalive_

> I can only assert that adding keepalive is solved issue with Postgres-STOP
> test and looked like problems with ForkBomb too. And keep in mind, I still
> use PostgreSQL 11. May be 12 or 13 something changed.

Sorry, there was some misunderstanding of the source code on my side. The
"keepalives" parameter is enabled by default on client side, but if you don't
set keepalives_idle, keepalives_interval and keepalives_count, they fallback to
system default ones which are 7200, 75 and 9 (on Debian and CentOS). So more
than 2 hours.

Regards,




Re: BUG? Slave don't reconnect to the master

2020-09-29 Thread Олег Самойлов



> On 29 Sep 2020, at 17:16, Jehan-Guillaume de Rorthais  wrote:
> 
> Sorry, there was some misunderstanding of the source code on my side. The
> "keepalives" parameter is enabled by default on client side, but if you don't
> set keepalives_idle, keepalives_interval and keepalives_count, they fallback 
> to
> system default ones which are 7200, 75 and 9 (on Debian and CentOS). So more
> than 2 hours.

I can confirm. After roughly 2 hours the situation was somehow resolved and 
tests continued.



Compile 12.2 with specific OpenSSL libraries

2020-09-29 Thread Joshua Poehls
I'm compiling PG 12.2 (on CentOS 7) and need to ensure that it uses a
specific version of OpenSSL (1.1.1h). I've added what I think are the
appropriate `./configure` flags (see below) and compilation is successful
but when I run `ldd -d initdb` or `readelf` I can see that it linked to a
different version of libcrypto/libssl.

--with-openssl --with-includes=/path/to/ssl/include
--with-libraries=/path/to/ssl/lib

My theory is that even though I'm adding --with-libraries/--with-includes,
the build script is finding and using the version of OpenSSL in /lib64/
because it has higher precedence somewhere.

How can I force the build to use the version of OpenSSL in the directories
I specify and not in the system /lib64/ directory?

--
Joshua Poehls
jos...@poehls.me


Re: Compile 12.2 with specific OpenSSL libraries

2020-09-29 Thread Tom Lane
Joshua Poehls  writes:
> I'm compiling PG 12.2 (on CentOS 7) and need to ensure that it uses a
> specific version of OpenSSL (1.1.1h). I've added what I think are the
> appropriate `./configure` flags (see below) and compilation is successful
> but when I run `ldd -d initdb` or `readelf` I can see that it linked to a
> different version of libcrypto/libssl.

> --with-openssl --with-includes=/path/to/ssl/include
> --with-libraries=/path/to/ssl/lib

> My theory is that even though I'm adding --with-libraries/--with-includes,
> the build script is finding and using the version of OpenSSL in /lib64/
> because it has higher precedence somewhere.

Linux is generally pretty unfriendly to putting shlibs in nonstandard
places.  If you want to do that, you need to add an "rpath" spec to
the calling programs, or set LD_LIBRARY_PATH, or fool around with the
dynamic linker's configuration files.  "man ld.so" will get you started.

regards, tom lane




Re: Compile 12.2 with specific OpenSSL libraries

2020-09-29 Thread Joshua Poehls
rpath and LD_LIBRARY_PATH are only relevant for the runtime
linking/resolution, right? My concern is that at *compilation time* the
wrong version of OpenSSL is being picked up. Is there a way to see the full
library/header path the compiler is using so that I can verify the path I
specified with --with-headers and --with-libraries is taking precedence
over the system paths?

On Tue, Sep 29, 2020 at 1:25 PM Tom Lane  wrote:

> Joshua Poehls  writes:
> > I'm compiling PG 12.2 (on CentOS 7) and need to ensure that it uses a
> > specific version of OpenSSL (1.1.1h). I've added what I think are the
> > appropriate `./configure` flags (see below) and compilation is successful
> > but when I run `ldd -d initdb` or `readelf` I can see that it linked to a
> > different version of libcrypto/libssl.
>
> > --with-openssl --with-includes=/path/to/ssl/include
> > --with-libraries=/path/to/ssl/lib
>
> > My theory is that even though I'm adding
> --with-libraries/--with-includes,
> > the build script is finding and using the version of OpenSSL in /lib64/
> > because it has higher precedence somewhere.
>
> Linux is generally pretty unfriendly to putting shlibs in nonstandard
> places.  If you want to do that, you need to add an "rpath" spec to
> the calling programs, or set LD_LIBRARY_PATH, or fool around with the
> dynamic linker's configuration files.  "man ld.so" will get you started.
>
> regards, tom lane
>


-- 
--
Joshua Poehls
jos...@poehls.me


Re: Compile 12.2 with specific OpenSSL libraries

2020-09-29 Thread Tom Lane
Joshua Poehls  writes:
> rpath and LD_LIBRARY_PATH are only relevant for the runtime
> linking/resolution, right? My concern is that at *compilation time* the
> wrong version of OpenSSL is being picked up.

No; --with-libraries will control what's linked against.  The problem
you face is that the dynamic linker has its own idea of how to resolve
the library references at runtime.  (ldd just reports on what the
dynamic linker would do; there's no trace in the executable file of
exactly which copy of a shlib was linked against.)

regards, tom lane




Re: Compile 12.2 with specific OpenSSL libraries

2020-09-29 Thread Joshua Poehls
Thanks, Tom. I really appreciate you taking the time to help me out!

I think I'm good now. I ran `readelf -d libpq.so | grep runpath` and I see
that the path I specified in --with-libraries is present (albeit after my
system path). It also appears the version I expected of libssl is being
used (libssl.so.1.1).

At this point I think it's fairly straightforward to set the rpath to lock
down the runtime linking as you mentioned. Thanks again!

On Tue, Sep 29, 2020 at 1:33 PM Tom Lane  wrote:

> Joshua Poehls  writes:
> > rpath and LD_LIBRARY_PATH are only relevant for the runtime
> > linking/resolution, right? My concern is that at *compilation time* the
> > wrong version of OpenSSL is being picked up.
>
> No; --with-libraries will control what's linked against.  The problem
> you face is that the dynamic linker has its own idea of how to resolve
> the library references at runtime.  (ldd just reports on what the
> dynamic linker would do; there's no trace in the executable file of
> exactly which copy of a shlib was linked against.)
>
> regards, tom lane
>


-- 
--
Joshua Poehls
jos...@poehls.me


Re: Gurjeet Singh Index Adviser User Interface

2020-09-29 Thread Yessica Brinkmann
I will greatly appreciate a help with this topic please. I really need to use that interface to be able to test my thesis. And I am not being able to use.Best regards,Yessica Brinkmann Mensaje original Asunto: Gurjeet Singh Index Adviser User InterfaceDe: Yessica Brinkmann Para: pgsql-general CC: 
Goodnight,I would like to ask the following question:As some of you may recall, I was doing my university thesis as a modification of Gurjeet Singh's Index Adviser.Now I have finished the programming part.But I have the following problem:Gurjeet Singh's Index Adviser readme describes how to use an Index Adviser interface called pg_advise_index tool.The readme mentions the following:i) pg_advise_index tool.   -Create a file that contains all the queries (semicolon terminated; maybe multi-line) that are expected to be executed by the application; andfeed this file to the pg_advise_index tool with appropriate options.pg_advise_index -d DB -h host -U user -s 10M -o advisory.sql workload.sqlpg_advise_index will open a connection with the PostgreSQL server bysetting appropriate session level options that will force the backend to loadthe pg_index_adviser plugin. It will then prepend the keywords EXPLAIN to eachof the queries found in the workload file, and execute them against the backend.For each query EXPLAINed, the backend will generate advice for each index thatmight have been beneficial in executing these queries.At the end, pg_advise_index will enumerate all the indexes suggested forthe current session, and output the CREATE INDEX statements for each of them.Optinally, if the -size option was specified, pg_advise_index will output suggestionsfor only those indexes, that fit into that size.-- To test my thesis, I need to use this interface pg_advise_index tool, but unfortunately I have not been able to use it yet.I would like to know if any of you have any experience testing or using this interface, or if you understand what the readme says anyway.In that case, please, can you help me. Now to test my thesis.I've actually already tried to run this interface pg_advise_index tool in various ways but haven't been able to.What I don't understand specifically is from which directory should I run the commandpg_advise_index -d DB -h host -U user -s 10M -o advisory.sql workload.sqlI will greatly appreciate any help, please.
The Index Adviser readme link is as follows:

https://github.com/gurjeet/pg_adviser/blob/master/index_adviser/README.index_adviserBest regards,

Yessica Brinkmann


Re: Gurjeet Singh Index Adviser User Interface

2020-09-29 Thread Rob Sargent




On 9/29/20 3:46 PM, Yessica Brinkmann wrote:
I will greatly appreciate a help with this topic please. I really need 
to use that interface to be able to test my thesis. And I am not being 
able to use.

Best regards,
Yessica Brinkmann


I didn't see any reported error messages in your post.

And you followed all the installation instructions: apply patch, compile 
install postgres, etc?  Not for the faint of heart, to be sure.


As presented

pg_advise_index -d DB -h host -U user -s 10M -o advisory.sql workload.sql

would need to be run in the directory containing advisory.sql and your 
PATH would need to include the directory containing the 
'pg_advise_index' executable.







Re: Gurjeet Singh Index Adviser User Interface

2020-09-29 Thread Yessica Brinkmann
 Goodnight,
Thank you very much for the answer.
I followed all the installation instructions: apply patch, compile
install postgres, etc.
I just tried everything with the UI ii) Manually (through psql session),
which is also mentioned in the readme, and everything works for me. But to
better test my thesis, I would also need to use the other user interface
mentioned in the readme and that is precisely i) pg_advise_index tool.
I understand what you are telling me about the directory. I will be testing
this way.
Many thanks.
Best regards,
Yessica brinkmann

El mar., 29 sept. 2020 a las 18:21, Rob Sargent ()
escribió:

>
>
> On 9/29/20 3:46 PM, Yessica Brinkmann wrote:
> > I will greatly appreciate a help with this topic please. I really need
> > to use that interface to be able to test my thesis. And I am not being
> > able to use.
> > Best regards,
> > Yessica Brinkmann
>
> I didn't see any reported error messages in your post.
>
> And you followed all the installation instructions: apply patch, compile
> install postgres, etc?  Not for the faint of heart, to be sure.
>
> As presented
>
> pg_advise_index -d DB -h host -U user -s 10M -o advisory.sql workload.sql
>
> would need to be run in the directory containing advisory.sql and your
> PATH would need to include the directory containing the
> 'pg_advise_index' executable.
>
>
>
>
>

Libre
de virus. www.avg.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>