Re: [GENERAL] error while autovacuuming

2010-10-03 Thread Gurjeet Singh
On Fri, Oct 1, 2010 at 9:16 PM, Craig Ringer wrote:

> On 2/10/2010 5:44 AM, Scott Marlowe wrote:
>
>> On Fri, Oct 1, 2010 at 2:06 PM, tamanna madaan
>>   wrote:
>>
>>> Hi All
>>>
>>> I am planning to upgrade  postgres to avoid this autovacuum problem.
>>>
>>>  From the release notes of postgres-8.1.x it seems this issue
>>> has been solved in postgres-8.1.6 and hence upgrading
>>> to latest 8.1 release i.e 8.1.21 will solve this issue.
>>>
>>> But I was just wondering if postgres-8.4 will also be having this fix
>>> (as I couldn't find it in release notes of postgres-8.4).
>>>
>>
>> 8.4 has much better autovacuuming, seeing as it's multithreaded and
>> you can have a lot more control over how it behaves.  This fix is
>> definitely in 8.4 and 8.3, or should I say the bug you hit was long
>> gone by the time the 8.4 branch was cut.
>>
>
> OP: Do be aware that if you jump straight to 8.4, there are some
> incompatibilities that may require changes to your code. The one people seem
> to be bitten by most frequently is the removal of implicit casts to text:
>
> http://www.postgresql.org/docs/current/static/release-8-3.html
>
> ... but, in general, you should read the release notes for each major
> version and see if you need to change anything to achieve a smooth upgrade.
>

I think, from the beginning the OP has been  unaware of the fact that
upgrading to latest minor release does not require a huge effort, and that
it involves very low risk of breaking any applications, Slony or otherwise.

OP should have been pointed to our versioning policy at
http://www.postgresql.org/support/versioning  . That would have kept this
thread very short.

Regards,
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device


Re: [GENERAL] Streaming Recovery - Automated Monitoring

2010-10-03 Thread Karl Denninger
 On 10/3/2010 1:34 AM, Guillaume Lelarge wrote:
> Le 03/10/2010 07:07, Karl Denninger a écrit :
>>  On 10/2/2010 11:40 PM, Rajesh Kumar Mallah wrote:
>>> I hope u checked point #11 
>>> http://wiki.postgresql.org/wiki/Streaming_Replication#How_to_Use
>>>
>>> * *11.* You can calculate the replication lag by comparing the
>>>   current WAL write location on the primary with the last WAL
>>>   location received/replayed by the standby. They can be retrieved
>>>   using /pg_current_xlog_location/ on the primary and the
>>>   /pg_last_xlog_receive_location///pg_last_xlog_replay_location/
>>>   on the standby, respectively.
>>>
>>> $ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host)
>>>  pg_current_xlog_location 
>>> --
>>>  0/200
>>> (1 row)
>>>
>>> $ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby 
>>> host)
>>>  pg_last_xlog_receive_location 
>>> ---
>>>  0/200
>>> (1 row)
>>>
>>> $ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby 
>>> host)
>>>  pg_last_xlog_replay_location 
>>> --
>>>  0/200
>>> (1 row)
>>>
>>> Regds
>>> Rajesh Kumar Mallah.
>> Yes, I did.
>>
>> Now how do I get an arithmetic difference between the two?  There will
>> (usually) be a small difference between the master and slave on a busy
>> system - what I want to do is query both and if the difference in their
>> locations is greater than some defined size, start raising hell (e.g.
>> sending SMS to people, etc)
>>
>> I can SEE the difference, but I don't see a way to COMPUTE a difference,
>> and there does not appear to be a function that will accept the log file
>> location as an argument for conversion - the one documented for offsets
>> (which might otherwise work) does not work on the slave as I noted.
>>
>> With Slony there was a set of system tables that would tell me how many
>> unapplied changes were in the queue.  From this I could determine health
>> - if the number was more than some reasonably-small amount, something
>> was broken and alarms were to be raised.
>>
>> I'm looking for a way to implement the same sort of functionality here.
>>
>> ticker=# select pg_last_xlog_replay_location();
>>  pg_last_xlog_replay_location
>> --
>>  37A/327D1888
>> (1 row)
>>
>> ticker=# select pg_current_xlog_location();
>>  pg_current_xlog_location
>> --
>>  37A/3280DCB8
>> (1 row)
>>
>> How do I get an arithmetic difference between these two
>> programmatically, and will such always be monoatomically increasing
>> (that is, will they ever roll over, thereby giving me a potential
>> NEGATIVE difference?)
>>
> pgPool-II does that computation. You should check that in its source
> code. File pool_worker_child.c, function check_replication_time_lag().
> It creates a long value from the LSN returned by these functions. Here
> is the computation:
>
>   lsn = xlogid * 16 * 1024 * 1024 * 255 + xrecoff;
>
> In your example, xlogid is 37A and xrecoff is 327D1888 on the slave and
> 3280DCB8 on the master. The hexadecimal values were first converted to
> their unsigned decimal integer representation (same file, function
> text_to_lsn()).
>
> You should really get a look at this file. It's probably a better
> explanation than what I'm trying to do :)
>

Here's a little program to do it - change "MAX_OFFSET" and modify
(particularly the "notify" function) to suit and place a file called
"CHECK.cluster" in the following form in the home directory of whatever
this runs as (it will look in the home directory of the euid of the
process):

master-name connection-string-to-connect-to-master
slave1-name slave-string-to-connect
slave2-name slave-string-to-connect
..

It will emit a notification for each slave that is more than
"MAX_OFFSET" behind, or if any slave is not in recovery mode (bad; it
thinks it's a master!) or worse, if the MASTER is in recovery mode, or
if it can't talk to any of the declared nodes.

Note that "CHECK.cluster" contains connection information so make sure
you run this as someone and with a home directory for that someone
appropriately permitted to prevent leakage of the connection credentials.

Ain't gonna claim it's elegant, but it was something I was able to
quickly bang out in a few minutes and it works.  Stuck in the CRON if
you have the cron's email going somewhere that will get noticed fast
(e.g. a Blackberry email, etc) you can run this on whatever interval you
need and it'll do the job.

Postgresql's development team can consider this a contribution to the
codebase if someone wants to clean it up a bit and include it.  You
could obviously pick up the config file as an argument rather than using
the running euid
of the process to grab it; I'm using it in an environment where the
latter works better for me - YMMV.


/*
 * check-replication - Checks the replication state of a Postgresql
maste

Re: [GENERAL] Streaming Recovery - Automated Monitoring - CODE LINK

2010-10-03 Thread Karl Denninger
 On 10/3/2010 3:44 PM, Karl Denninger wrote:
>  On 10/3/2010 1:34 AM, Guillaume Lelarge wrote:
>> Le 03/10/2010 07:07, Karl Denninger a écrit :
>>>  On 10/2/2010 11:40 PM, Rajesh Kumar Mallah wrote:
 I hope u checked point #11 
 http://wiki.postgresql.org/wiki/Streaming_Replication#How_to_Use

 * *11.* You can calculate the replication lag by comparing the
   current WAL write location on the primary with the last WAL
   location received/replayed by the standby. They can be retrieved
   using /pg_current_xlog_location/ on the primary and the
   /pg_last_xlog_receive_location///pg_last_xlog_replay_location/
   on the standby, respectively.

 $ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host)
  pg_current_xlog_location 
 --
  0/200
 (1 row)

 $ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby 
 host)
  pg_last_xlog_receive_location 
 ---
  0/200
 (1 row)

 $ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby 
 host)
  pg_last_xlog_replay_location 
 --
  0/200
 (1 row)

 Regds
 Rajesh Kumar Mallah.
>>> Yes, I did.
>>>
>>> Now how do I get an arithmetic difference between the two?  There will
>>> (usually) be a small difference between the master and slave on a busy
>>> system - what I want to do is query both and if the difference in their
>>> locations is greater than some defined size, start raising hell (e.g.
>>> sending SMS to people, etc)
>>>
>>> I can SEE the difference, but I don't see a way to COMPUTE a difference,
>>> and there does not appear to be a function that will accept the log file
>>> location as an argument for conversion - the one documented for offsets
>>> (which might otherwise work) does not work on the slave as I noted.
>>>
>>> With Slony there was a set of system tables that would tell me how many
>>> unapplied changes were in the queue.  From this I could determine health
>>> - if the number was more than some reasonably-small amount, something
>>> was broken and alarms were to be raised.
>>>
>>> I'm looking for a way to implement the same sort of functionality here.
>>>
>>> ticker=# select pg_last_xlog_replay_location();
>>>  pg_last_xlog_replay_location
>>> --
>>>  37A/327D1888
>>> (1 row)
>>>
>>> ticker=# select pg_current_xlog_location();
>>>  pg_current_xlog_location
>>> --
>>>  37A/3280DCB8
>>> (1 row)
>>>
>>> How do I get an arithmetic difference between these two
>>> programmatically, and will such always be monoatomically increasing
>>> (that is, will they ever roll over, thereby giving me a potential
>>> NEGATIVE difference?)
>>>
>> pgPool-II does that computation. You should check that in its source
>> code. File pool_worker_child.c, function check_replication_time_lag().
>> It creates a long value from the LSN returned by these functions. Here
>> is the computation:
>>
>>   lsn = xlogid * 16 * 1024 * 1024 * 255 + xrecoff;
>>
>> In your example, xlogid is 37A and xrecoff is 327D1888 on the slave and
>> 3280DCB8 on the master. The hexadecimal values were first converted to
>> their unsigned decimal integer representation (same file, function
>> text_to_lsn()).
>>
>> You should really get a look at this file. It's probably a better
>> explanation than what I'm trying to do :)
>>
> Here's a little program to do it - change "MAX_OFFSET" and modify
> (particularly the "notify" function) to suit and place a file called
> "CHECK.cluster" in the following form in the home directory of whatever
> this runs as (it will look in the home directory of the euid of the
> process):
>
> master-name connection-string-to-connect-to-master
> slave1-name slave-string-to-connect
> slave2-name slave-string-to-connect
> ..
>
> It will emit a notification for each slave that is more than
> "MAX_OFFSET" behind, or if any slave is not in recovery mode (bad; it
> thinks it's a master!) or worse, if the MASTER is in recovery mode, or
> if it can't talk to any of the declared nodes.
>
> Note that "CHECK.cluster" contains connection information so make sure
> you run this as someone and with a home directory for that someone
> appropriately permitted to prevent leakage of the connection credentials.
>
> Ain't gonna claim it's elegant, but it was something I was able to
> quickly bang out in a few minutes and it works.  Stuck in the CRON if
> you have the cron's email going somewhere that will get noticed fast
> (e.g. a Blackberry email, etc) you can run this on whatever interval you
> need and it'll do the job.
>
> Postgresql's development team can consider this a contribution to the
> codebase if someone wants to clean it up a bit and include it.  You
> could obviously pick up the config file as an argument rather than using
> the running