[GENERAL] 9.1.11 - many backends in "semtimedop" syscall

2014-03-06 Thread hubert depesz lubaczewski
Hi,
Ihave following situation:
amazon, but not RDS - just plain EC2, with ubuntu system.
zone reclaiming is disabled (at least in my system, no idea about host).
and there is no "transparent huge page support".

today we had at least 3 cases where bunch of abckends (~40) gets
"interlocked".
Some of them are shown as "waiting" in pg_stat_activity, but not all.
simple "insert into table ()" - without any triggers - gets killed after
10 minutes with no visible progress.

From what I saw - most of the backends are "locked" (though it's not
visible in pg_stat_actrivity) on the same table.

Did take coredump from one of the backends, but there are no symbols, so
the backtrace is just:

root@xx:/mnt# gdb --batch --quiet -ex "thread apply all bt full" -ex 
"quit" /usr/lib/postgresql/9.1/bin/postgres pg.core.21422
[New LWP 21422]
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres'.
#0  0x7ffa60da2dc7 in semop () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (LWP 21422):
#0  0x7ffa60da2dc7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x005f65e8 in PGSemaphoreLock ()
No symbol table info available.
#2  0x00636125 in LWLockAcquire ()
No symbol table info available.
#3  0x00630f91 in LockAcquireExtended ()
No symbol table info available.
#4  0x0062f88c in LockRelationOid ()
No symbol table info available.
#5  0x00470f6d in relation_open ()
No symbol table info available.
#6  0x0047b013 in index_open ()
No symbol table info available.
#7  0x0057bb4c in ExecOpenIndices ()
No symbol table info available.
#8  0x005894c8 in ExecInitModifyTable ()
No symbol table info available.
#9  0x0057266a in ExecInitNode ()
No symbol table info available.
#10 0x00570e4a in standard_ExecutorStart ()
No symbol table info available.
#11 0x00593406 in ?? ()
No symbol table info available.
#12 0x00593947 in SPI_execute_plan_with_paramlist ()
No symbol table info available.
#13 0x7ff8c34c8aed in ?? () from
/usr/lib/postgresql/9.1/lib/plpgsql.so
No symbol table info available.
#14 0x7ff8c34c9716 in ?? () from
/usr/lib/postgresql/9.1/lib/plpgsql.so
No symbol table info available.
#15 0x7ff8c34ca252 in ?? () from
/usr/lib/postgresql/9.1/lib/plpgsql.so
No symbol table info available.
#16 0x7ff8c34ca252 in ?? () from
/usr/lib/postgresql/9.1/lib/plpgsql.so
No symbol table info available.
#17 0x7ff8c34cc9dc in ?? () from
/usr/lib/postgresql/9.1/lib/plpgsql.so
No symbol table info available.
#18 0x7ff8c34ccf01 in plpgsql_exec_trigger () from
/usr/lib/postgresql/9.1/lib/plpgsql.so
No symbol table info available.
#19 0x7ff8c34c214a in plpgsql_call_handler () from
/usr/lib/postgresql/9.1/lib/plpgsql.so
No symbol table info available.
#20 0x00709d67 in ?? ()
No symbol table info available.
#21 0x00555fda in ?? ()
No symbol table info available.
#22 0x0055668f in ?? ()
No symbol table info available.
#23 0x0055ba6a in AfterTriggerEndQuery ()
No symbol table info available.
#24 0x0056febf in standard_ExecutorFinish ()
No symbol table info available.
#25 0x00645a2a in ?? ()
No symbol table info available.
#26 0x00645c13 in ?? ()
No symbol table info available.
#27 0x00646962 in PortalRun ()
No symbol table info available.
#28 0x0064274a in PostgresMain ()
No symbol table info available.
#29 0x00604443 in ?? ()
No symbol table info available.
#30 0x00604eb1 in PostmasterMain ()
No symbol table info available.
#31 0x0045a720 in main ()
No symbol table info available.


What could that be, and how to get rid of the problem?

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/


signature.asc
Description: Digital signature


[GENERAL] extract psql meta-commands into library?

2014-03-06 Thread Catherine Devlin
I'd like to provide access to psql's "backslash" meta-commands for
IPython's %sql magic, as well as some other non-psql tools.  When I
asked about standalone implementations of the meta-commands, Craig
Ringer suggested that extracting them from psql into a libpsql library
(which psql would then use) would make that easy for all matter of
external uses.

Does this sound doable / of interest?

(Originally asked at
http://stackoverflow.com/questions/22211916/standalone-psql-meta-command-implementation)

ipython-sql: https://pypi.python.org/pypi/ipython-sql/0.3.1

Thanks!
-- 
- Catherine
http://catherinedevlin.blogspot.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 9.1.11 - many backends in "semtimedop" syscall

2014-03-06 Thread Tom Lane
hubert depesz lubaczewski  writes:
> Thread 1 (LWP 21422):
> #0  0x7ffa60da2dc7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
> No symbol table info available.
> #1  0x005f65e8 in PGSemaphoreLock ()
> No symbol table info available.
> #2  0x00636125 in LWLockAcquire ()
> No symbol table info available.
> #3  0x00630f91 in LockAcquireExtended ()
> No symbol table info available.
> #4  0x0062f88c in LockRelationOid ()
> No symbol table info available.
> #5  0x00470f6d in relation_open ()
> No symbol table info available.

Huh.  Looks like it's blocked trying to acquire one of the lock-partition
LWLocks, which is odd because those ought never be held very long.
Somebody else has failed to release that LWLock, looks like.

Did you by any chance capture stack traces from all of the backends?
The interesting one would be the one that *doesn't* look like this.
Or possibly there's more than one such.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 9.1.11 - many backends in "semtimedop" syscall

2014-03-06 Thread hubert depesz lubaczewski
On Thu, Mar 06, 2014 at 11:56:06AM -0500, Tom Lane wrote:
> hubert depesz lubaczewski  writes:
> > Thread 1 (LWP 21422):
> > #0  0x7ffa60da2dc7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
> > No symbol table info available.
> > #1  0x005f65e8 in PGSemaphoreLock ()
> > No symbol table info available.
> > #2  0x00636125 in LWLockAcquire ()
> > No symbol table info available.
> > #3  0x00630f91 in LockAcquireExtended ()
> > No symbol table info available.
> > #4  0x0062f88c in LockRelationOid ()
> > No symbol table info available.
> > #5  0x00470f6d in relation_open ()
> > No symbol table info available.
> 
> Huh.  Looks like it's blocked trying to acquire one of the lock-partition
> LWLocks, which is odd because those ought never be held very long.
> Somebody else has failed to release that LWLock, looks like.
> 
> Did you by any chance capture stack traces from all of the backends?
> The interesting one would be the one that *doesn't* look like this.
> Or possibly there's more than one such.

I didn't have a chance to do it. Can try if there is a way to get trace
*without* making core (sorry, my c/gdb knowledge is very, very limited).

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/


signature.asc
Description: Digital signature


Re: [GENERAL] 9.1.11 - many backends in "semtimedop" syscall

2014-03-06 Thread Tom Lane
hubert depesz lubaczewski  writes:
> I didn't have a chance to do it. Can try if there is a way to get trace
> *without* making core (sorry, my c/gdb knowledge is very, very limited).

Sure, you just attach to the process:

$ gdb /path/to/postgres PID-of-process
gdb> bt
gdb> quit

This is usually preferable to forcing a core dump.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] 9.1.11 - many backends in "semtimedop" syscall

2014-03-06 Thread hubert depesz lubaczewski
On Thu, Mar 06, 2014 at 12:02:50PM -0500, Tom Lane wrote:
> hubert depesz lubaczewski  writes:
> > I didn't have a chance to do it. Can try if there is a way to get trace
> > *without* making core (sorry, my c/gdb knowledge is very, very limited).
> 
> Sure, you just attach to the process:
> 
>   $ gdb /path/to/postgres PID-of-process
>   gdb> bt
>   gdb> quit
> 
> This is usually preferable to forcing a core dump.

Thank you. If the problem will strike again, I will do it on all (or
most, depending how fast I can make it) backends.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/


signature.asc
Description: Digital signature


Re: [GENERAL] extract psql meta-commands into library?

2014-03-06 Thread Tom Lane
Catherine Devlin  writes:
> I'd like to provide access to psql's "backslash" meta-commands for
> IPython's %sql magic, as well as some other non-psql tools.  When I
> asked about standalone implementations of the meta-commands, Craig
> Ringer suggested that extracting them from psql into a libpsql library
> (which psql would then use) would make that easy for all matter of
> external uses.

> Does this sound doable / of interest?

Hm ... the code in psql's describe.c is not terribly conducive to that.
Parsing of the backslash command, execution of the query/queries, and
presentation of the results is all rather tightly bound up; you'd have
to think about how to decouple those.

If you could do it in a way that didn't result in a quantum jump in
the complexity/unreadability of the code, I think there'd be interest.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Israel Brewster
For starters, this happened again this morning (no data prior to 4:45 am and 
sequence reset), so whatever is going on appears to be reoccurring. Also, I 
forgot to mention if it is significant: this is running on slackware liunux 14.0

On Mar 5, 2014, at 1:00 PM, Adrian Klaver  wrote:

> On 03/05/2014 10:22 AM, Israel Brewster wrote:
>> My first thought was "Oh, I must have a typo in my
>> cleanup routine, such that it is deleting all records rather than only
>> those a week old, and it's just that no one has noticed until now". So I
>> looked at that, but changing the delete to a select appeared to produce
>> the proper results, in that no records were selected:
> 
> Well it would, if the records only go back to 4 AM this morning. In other 
> words if no records exist before 4 AM today, no records exist before 7 days 
> ago also or am I missing something?

If the delete is correct, you are absolutely right. My first theory, however, 
was that I made a typo, and the delete was deleting ALL records, in which case 
changing it to a select would select all records. As it did not, that seems to 
confirm the delete is correct, and therefore not the problem.

> A sequence is just a special table.
> 
> So what does SELECT * from the sequence show?

tracking=> SELECT * FROM data_id_seq;
 sequence_name | last_value | start_value | increment_by |  max_value  
| min_value | cache_value | log_cnt | is_cycled | is_called 
---++-+--+-+---+-+-+---+---
 data_id_seq   |   1184 |   1 |1 | 9223372036854775807 
| 1 |   1 |  16 | f | t


> 
>> 
>> Also odd is that my cleanup script runs at 1am. I have records of there
>> being new data in the database up to 3:51am, but the oldest record
>> currently in the DB is from 4:45am (as specified by the default of now()
>> on the column). So I know records were added after my delete command
>> ran, but before this reset occurred.
> 
> I am not sure what you are calling the 'reset'?
> Did something happen between 3:51 AM and 4:45 AM?

Yes: All my data was deleted and the sequence reset to 1.

> Also not sure why you call the 4:45 AM record the oldest, when you say you 
> can identify records from 3:51 AM?

As I mentioned, I archive the records to permanent storage. This archive 
process happens every hour (for various reasons). That is how I know we had 
records for 3:51 am: they exist in the permanent archive. However, they don't 
exist in the local database any more.

> 
>> 
>> So my question is, aside from someone going in and mucking about in the
>> wee hours of the morning, what could possibly cause this behavior? What
>> sort of event could cause all data to be deleted from the table, and the
>> sequence to be reset? Especially while there is an active connection?
>> Thanks for any ideas, however wild or off the wall :-)
> 
> What is in the Postgres/system logs for the time period(s) you mention?

The postgres log has a lot of errors in it, some of which MAY explain the 
issue. For example:

cp: cannot create regular file '/mnt/pglogs/000100040094': 
Permission denied
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: test ! -f 
/mnt/pglogs/000100040094 && cp pg_xlog/000100040094 
/mnt/pglogs/000100040094
WARNING:  transaction log file "000100040094" could not be 
archived: too many failures
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down

However, there are no timestamps on any of the entries (can I fix that?), so I 
don't know if those are current entries, or from back before I got the mount 
for the logs working. At this time, the mount point IS working correctly, and 
from what I can tell so is the archive command. The latest entry is from 
yesterday (modify date on the log shows Mar 5, 9:21, when I was messing with it 
yesterday), however, so there are no entries from this morning when it happened 
again. I don't see anything of interest in the syslog or messages log.

> 
>> 
>> ---
>> Israel Brewster
>> Computer Support Technician II
>> Era Alaska
>> 5245 Airport Industrial Rd
>> Fairbanks, AK 99709
>> (907) 450-7250 x7293
>> ---
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Israel Brewster
On Mar 5, 2014, at 10:01 AM, Alvaro Herrera  wrote:

> Israel Brewster wrote:
> 
>> So my question is, aside from someone going in and mucking about in the wee 
>> hours of the morning, what could possibly cause this behavior? What sort of 
>> event could cause all data to be deleted from the table, and the sequence to 
>> be reset? Especially while there is an active connection? Thanks for any 
>> ideas, however wild or off the wall :-)
> 
> Is this running off a NFS mount or something?  I'm wondering about the
> filesystem getting unmounted and an empty copy of the database being in
> the mount point.

Good thought - I hadn't thought of a file system level reset. Unfortunately the 
data folder is on the root partition, which is a standard SATA hard drive, and 
the date on the data folder shows February 27th, so apparently the folder 
itself at least hasn't been re-created.

> 
> -- 
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
---
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
---

BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Israel Brewster
On Mar 5, 2014, at 10:01 AM, Thom Brown  wrote:On 5 March 2014 18:22, Israel Brewster  wrote:

I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage.

This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected:

DELETE FROM data WHERE pointtimeThen I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to.

Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred.

So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-)

That is odd.  Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.Not famillar with a logged vs unlogged table (still learning all the features of PostgreSQL), but as you said the sequence resetting is rather odd.

So if all the data went missing, and the sequence reset, the only thing I can think of is:Someone ran:TRUNCATE data RESTART IDENTITY;Considering that I'm the only one in the company that knows SQL at all beyond a simple single-table select (I keep having to explain joins and how they aren't evil to the other programmer here), not likely :-)

or someone restored the table structure from a backup that deleted the original table.Now that's a thought...Maybe my backup routine is working backwards (pushing from the backup archive to the primary machine rather than from the primary to the backup). I did switch primary and secondary machines a while ago, but I thought I had checked that. What would be the effect of overwriting the data files while the database is active?Do you log DDL?Not sure what that is, so I'll assume no :-)Was the table partitioned?Nope.You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.I'll look at updating. Thanks.

-- Thom

---Israel BrewsterComputer Support Technician IIEra Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7250 x7293---BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Adrian Klaver

On 03/06/2014 09:33 AM, Israel Brewster wrote:

For starters, this happened again this morning (no data prior to 4:45 am and 
sequence reset), so whatever is going on appears to be reoccurring. Also, I 
forgot to mention if it is significant: this is running on slackware liunux 14.0

On Mar 5, 2014, at 1:00 PM, Adrian Klaver  wrote:


On 03/05/2014 10:22 AM, Israel Brewster wrote:

My first thought was "Oh, I must have a typo in my
cleanup routine, such that it is deleting all records rather than only
those a week old, and it's just that no one has noticed until now". So I
looked at that, but changing the delete to a select appeared to produce
the proper results, in that no records were selected:


Well it would, if the records only go back to 4 AM this morning. In other words 
if no records exist before 4 AM today, no records exist before 7 days ago also 
or am I missing something?


If the delete is correct, you are absolutely right. My first theory, however, 
was that I made a typo, and the delete was deleting ALL records, in which case 
changing it to a select would select all records. As it did not, that seems to 
confirm the delete is correct, and therefore not the problem.


A sequence is just a special table.

So what does SELECT * from the sequence show?


tracking=> SELECT * FROM data_id_seq;
  sequence_name | last_value | start_value | increment_by |  max_value  
| min_value | cache_value | log_cnt | is_cycled | is_called
---++-+--+-+---+-+-+---+---
  data_id_seq   |   1184 |   1 |1 | 9223372036854775807 
| 1 |   1 |  16 | f | t






Also odd is that my cleanup script runs at 1am. I have records of there
being new data in the database up to 3:51am, but the oldest record
currently in the DB is from 4:45am (as specified by the default of now()
on the column). So I know records were added after my delete command
ran, but before this reset occurred.


I am not sure what you are calling the 'reset'?
Did something happen between 3:51 AM and 4:45 AM?


Yes: All my data was deleted and the sequence reset to 1.


Also not sure why you call the 4:45 AM record the oldest, when you say you can 
identify records from 3:51 AM?


As I mentioned, I archive the records to permanent storage. This archive 
process happens every hour (for various reasons). That is how I know we had 
records for 3:51 am: they exist in the permanent archive. However, they don't 
exist in the local database any more.


Well something is happening. See my notes on logging below to help track 
down the cause.








So my question is, aside from someone going in and mucking about in the
wee hours of the morning, what could possibly cause this behavior? What
sort of event could cause all data to be deleted from the table, and the
sequence to be reset? Especially while there is an active connection?
Thanks for any ideas, however wild or off the wall :-)


What is in the Postgres/system logs for the time period(s) you mention?


The postgres log has a lot of errors in it, some of which MAY explain the 
issue. For example:

cp: cannot create regular file '/mnt/pglogs/000100040094': 
Permission denied
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: test ! -f /mnt/pglogs/000100040094 
&& cp pg_xlog/000100040094 /mnt/pglogs/000100040094
WARNING:  transaction log file "000100040094" could not be 
archived: too many failures
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down


Might be good to explain your archive setup.



However, there are no timestamps on any of the entries (can I fix that?), so I 
don't know if those are current entries, or from back before I got the mount 
for the logs working. At this time, the mount point IS working correctly, and 
from what I can tell so is the archive command. The latest entry is from 
yesterday (modify date on the log shows Mar 5, 9:21, when I was messing with it 
yesterday), however, so there are no entries from this morning when it happened 
again. I don't see anything of interest in the syslog or messages log.


Yes you can, timestamps and a lot more. For all the details go here:

http://www.postgresql.org/docs/9.2/interactive/runtime-config-logging.html

At the least I would:

log_statement = 'mod'

log_connections = on
log_disconnections = on

log_line_prefix = '%u-%m-%x'







---
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
---







--
Adrian Klaver
adrian.kla...@aklaver.com








--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your s

Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Michael Nolan
On 3/6/14, Israel Brewster  wrote:



> LOG:  received smart shutdown request
> LOG:  autovacuum launcher shutting down
> LOG:  shutting down
> LOG:  database system is shut down
>
> However, there are no timestamps on any of the entries (can I fix that?)

Yes, change the log_line_prefix in the postgresql.conf file and reload it.

I use:

log_line_prefix = '%m %u '

You might also want to use this, at least temporarily:

log_statement = all

--
Mike Nolan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Israel Brewster
On Mar 6, 2014, at 9:03 AM, Adrian Klaver  wrote:

> On 03/06/2014 09:33 AM, Israel Brewster wrote:
>> For starters, this happened again this morning (no data prior to 4:45 am and 
>> sequence reset), so whatever is going on appears to be reoccurring. Also, I 
>> forgot to mention if it is significant: this is running on slackware liunux 
>> 14.0
>> 
>> On Mar 5, 2014, at 1:00 PM, Adrian Klaver  wrote:
>> 
>>> On 03/05/2014 10:22 AM, Israel Brewster wrote:
 My first thought was "Oh, I must have a typo in my
 cleanup routine, such that it is deleting all records rather than only
 those a week old, and it's just that no one has noticed until now". So I
 looked at that, but changing the delete to a select appeared to produce
 the proper results, in that no records were selected:
>>> 
>>> Well it would, if the records only go back to 4 AM this morning. In other 
>>> words if no records exist before 4 AM today, no records exist before 7 days 
>>> ago also or am I missing something?
>> 
>> If the delete is correct, you are absolutely right. My first theory, 
>> however, was that I made a typo, and the delete was deleting ALL records, in 
>> which case changing it to a select would select all records. As it did not, 
>> that seems to confirm the delete is correct, and therefore not the problem.
>> 
>>> A sequence is just a special table.
>>> 
>>> So what does SELECT * from the sequence show?
>> 
>> tracking=> SELECT * FROM data_id_seq;
>>  sequence_name | last_value | start_value | increment_by |  max_value
>>   | min_value | cache_value | log_cnt | is_cycled | is_called
>> ---++-+--+-+---+-+-+---+---
>>  data_id_seq   |   1184 |   1 |1 | 
>> 9223372036854775807 | 1 |   1 |  16 | f | t
>> 
>> 
>>> 
 
 Also odd is that my cleanup script runs at 1am. I have records of there
 being new data in the database up to 3:51am, but the oldest record
 currently in the DB is from 4:45am (as specified by the default of now()
 on the column). So I know records were added after my delete command
 ran, but before this reset occurred.
>>> 
>>> I am not sure what you are calling the 'reset'?
>>> Did something happen between 3:51 AM and 4:45 AM?
>> 
>> Yes: All my data was deleted and the sequence reset to 1.
>> 
>>> Also not sure why you call the 4:45 AM record the oldest, when you say you 
>>> can identify records from 3:51 AM?
>> 
>> As I mentioned, I archive the records to permanent storage. This archive 
>> process happens every hour (for various reasons). That is how I know we had 
>> records for 3:51 am: they exist in the permanent archive. However, they 
>> don't exist in the local database any more.
> 
> Well something is happening. See my notes on logging below to help track down 
> the cause.

Yep.

> 
>> 
>>> 
 
 So my question is, aside from someone going in and mucking about in the
 wee hours of the morning, what could possibly cause this behavior? What
 sort of event could cause all data to be deleted from the table, and the
 sequence to be reset? Especially while there is an active connection?
 Thanks for any ideas, however wild or off the wall :-)
>>> 
>>> What is in the Postgres/system logs for the time period(s) you mention?
>> 
>> The postgres log has a lot of errors in it, some of which MAY explain the 
>> issue. For example:
>> 
>> cp: cannot create regular file '/mnt/pglogs/000100040094': 
>> Permission denied
>> LOG:  archive command failed with exit code 1
>> DETAIL:  The failed archive command was: test ! -f 
>> /mnt/pglogs/000100040094 && cp pg_xlog/000100040094 
>> /mnt/pglogs/000100040094
>> WARNING:  transaction log file "000100040094" could not be 
>> archived: too many failures
>> LOG:  received smart shutdown request
>> LOG:  autovacuum launcher shutting down
>> LOG:  shutting down
>> LOG:  database system is shut down
> 
> Might be good to explain your archive setup.

Ok, here goes: We have the primary system which receives the data and handles 
all requests for said data. There is also a hot standby server keep in sync 
with streaming replication. The WALs are archived to a NFS share on this 
machine.

Once an hour a python script runs that a) Selects all unsynced records from the 
postgresql db, b) stores a subset of them in our permanent archive, and c) 
marks the previously selected records as synced (UPDATE data SET syncd=true 
WHERE id in (...) )

Additionally, I have a) a script that runs at 8:00pm every evening that uses 
pg_dump to dump the contents of the database to a backup file, and b) a script 
that runs at 8:00 each morning that rsync's various config files and scripts 
(such as my data retrieval scripts) from the primary machine to a backup 
location on the secondary machine.

None of the 

Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Adrian Klaver

On 03/06/2014 10:43 AM, Israel Brewster wrote:

On Mar 6, 2014, at 9:03 AM, Adrian Klaver  wrote:





Well something is happening. See my notes on logging below to help track down 
the cause.


Yep.





Might be good to explain your archive setup.


Ok, here goes: We have the primary system which receives the data and handles 
all requests for said data. There is also a hot standby server keep in sync 
with streaming replication. The WALs are archived to a NFS share on this 
machine.

Once an hour a python script runs that a) Selects all unsynced records from the 
postgresql db, b) stores a subset of them in our permanent archive, and c) 
marks the previously selected records as synced (UPDATE data SET syncd=true 
WHERE id in (...) )

Additionally, I have a) a script that runs at 8:00pm every evening that uses 
pg_dump to dump the contents of the database to a backup file, and b) a script 
that runs at 8:00 each morning that rsync's various config files and scripts 
(such as my data retrieval scripts) from the primary machine to a backup 
location on the secondary machine.

None of the scripts run anywhere near the apparent 4:40ish cutoff time for my 
data


Are all the scripts running from one machine?
If so, have you checked that the times are set correctly on the various 
machines?





Make sense? Probably not the best setup, but then that's what happens when you 
figure out stuff for yourself rather than having formal training :-) I'm 
DEFINITELY open to suggestions :-)


'Makes sense' is context sensitive. It really depends on what you want 
to achieve. My procedure is to define the end result first and then work 
backwards from there.










I'll get those in the config, and we'll see what happens tomorrow morning. 
Hopefully that will give more information. Thanks for the link and information!






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Israel Brewster
On Mar 6, 2014, at 10:03 AM, Adrian Klaver  wrote:

> On 03/06/2014 10:43 AM, Israel Brewster wrote:
>> On Mar 6, 2014, at 9:03 AM, Adrian Klaver  wrote:
>> 
> 
>>> 
>>> Well something is happening. See my notes on logging below to help track 
>>> down the cause.
>> 
>> Yep.
>> 
> 
>>> 
>>> Might be good to explain your archive setup.
>> 
>> Ok, here goes: We have the primary system which receives the data and 
>> handles all requests for said data. There is also a hot standby server keep 
>> in sync with streaming replication. The WALs are archived to a NFS share on 
>> this machine.
>> 
>> Once an hour a python script runs that a) Selects all unsynced records from 
>> the postgresql db, b) stores a subset of them in our permanent archive, and 
>> c) marks the previously selected records as synced (UPDATE data SET 
>> syncd=true WHERE id in (...) )
>> 
>> Additionally, I have a) a script that runs at 8:00pm every evening that uses 
>> pg_dump to dump the contents of the database to a backup file, and b) a 
>> script that runs at 8:00 each morning that rsync's various config files and 
>> scripts (such as my data retrieval scripts) from the primary machine to a 
>> backup location on the secondary machine.
>> 
>> None of the scripts run anywhere near the apparent 4:40ish cutoff time for 
>> my data
> 
> Are all the scripts running from one machine?
> If so, have you checked that the times are set correctly on the various 
> machines?

Three different machines (due to OS requirements), but yeah the times all 
appear to be correct.

> 
> 
>> 
>> Make sense? Probably not the best setup, but then that's what happens when 
>> you figure out stuff for yourself rather than having formal training :-) I'm 
>> DEFINITELY open to suggestions :-)
> 
> 'Makes sense' is context sensitive. It really depends on what you want to 
> achieve. My procedure is to define the end result first and then work 
> backwards from there.

Good point. However, I was asking more in the general "did I explain it well 
enough to be understood" rather than in the "is it a sensible setup" aspect. 
The rest of the sentence was just me being insecure :-D

> 
>> 
>>> 
> 
>> 
>> I'll get those in the config, and we'll see what happens tomorrow morning. 
>> Hopefully that will give more information. Thanks for the link and 
>> information!
>> 
>> 
> 

---
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
---

BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Adrian Klaver

On 03/06/2014 11:09 AM, Israel Brewster wrote:

On Mar 6, 2014, at 10:03 AM, Adrian Klaver  wrote:





Are all the scripts running from one machine?
If so, have you checked that the times are set correctly on the various 
machines?


Three different machines (due to OS requirements), but yeah the times all 
appear to be correct.


So it appears you will have to wait to see what the logging reports. 
Should have mentioned that you need to make sure you do something like 
pg_ctl reload on the Postgres server to get the postgressql.conf changes 
to take effect.



Another thought. Might not be bad idea to grep your scripts for ALTER, 
SEQUENCE or any other relevant keywords. Just in case something slipped 
in you where not aware of.









Make sense? Probably not the best setup, but then that's what happens when you 
figure out stuff for yourself rather than having formal training :-) I'm 
DEFINITELY open to suggestions :-)


'Makes sense' is context sensitive. It really depends on what you want to 
achieve. My procedure is to define the end result first and then work backwards 
from there.


Good point. However, I was asking more in the general "did I explain it well enough to be 
understood" rather than in the "is it a sensible setup" aspect. The rest of the 
sentence was just me being insecure :-D


I understood the basics of what you are doing. The details probably can 
wait pending the log information. Hey, all of us are beginners/learning 
in something.








---
Israel Brewster
Computer Support Technician II
Era Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x7293
---





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] high throughput 9.3, master/slave question

2014-03-06 Thread Matthew Chambers


On 02/03/14 05:08, bricklen wrote:
On Fri, Feb 28, 2014 at 1:54 PM, Matthew Chambers 
mailto:mchamb...@wetafx.co.nz>> wrote:



Initially, I had my application servers using the slave for short,
read only queries, but this turned out to be highly unstable. The
slave would start refusing connections, and the logs would fill with:

ERROR:  canceling statement due to conflict with recovery

I've tried these 2 settings:

max_standby_archive_delay = -1
max_standby_streaming_delay = -1

But then I starting getting these:
DETAIL:  User transaction caused buffer deadlock with recovery.

Read requests come in at anywhere between 200 and 1000/second.

I was wondering if there is some combination of configuration
settings that would safely let me use the slave for read only queries?


Have you tried setting max_standby_archive|streaming_delay to several 
minutes (or whatever makes sense for your typical query durations), 
rather than disabling those settings with -1?


No, I have not tired this, but I'll give it a go.  Is there anyone doing 
this that has a configuration I can check out?  Is this even possible to do?


-Matt


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread David Johnston
Israel Brewster-2 wrote
> So my question is, aside from someone going in and mucking about in the
> wee hours of the morning, what could possibly cause this behavior? What
> sort of event could cause all data to be deleted from the table, and the
> sequence to be reset? Especially while there is an active connection?
> Thanks for any ideas, however wild or off the wall :-)

Nothing so far makes this likely but any chance there is some kind of
Virtual Machine setup in place where all the changes from a given day are
being lost because the VM is resetting back to "factory defaults"?

Also, you say you perform daily pg_dumps.  Have you tried loading these up
and see what their contents are?

Ultimately the log files are going to be needed to do any meaningful
forensic work though.

Any chance you may been debugging the wrong box/database?  That is always
something worth verifying and usually not the difficult.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Mysterious-DB-reset-tp5794868p5795031.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Mysterious DB reset

2014-03-06 Thread Steve Crawford

On 03/06/2014 09:33 AM, Israel Brewster wrote:

For starters, this happened again this morning (no data prior to 4:45 am and 
sequence reset), so whatever is going on appears to be reoccurring. Also, I 
forgot to mention if it is significant: this is running on slackware liunux 14.0


Also odd is that my cleanup script runs at 1am. I have records of there
being new data in the database up to 3:51am, but the oldest record
currently in the DB is from 4:45am (as specified by the default of now()
on the column). So I know records were added after my delete command
ran, but before this reset occurred.


A shot in the dark...

Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and 
the crontabs of any automatic scripts that connect. I'm not sure about 
Slackware but Red Hat and Centos run the cron.daily scripts at (wait for 
it...) just after 4am.


Some of the default daily scripts like logrotate can have "side effects" 
like restarting the service that writes to the log file being rotated.


Cheers,
Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Offending My Tender Sensibilities -OR- OLTP on a Star Schema

2014-03-06 Thread Merlin Moncure
On Tue, Mar 4, 2014 at 3:08 PM, Rob Sargent  wrote:
> Do you make a distinction between a key and an index?  I'm not picking up on
> design-by-natural-key and what that entails. Especially the notion that the
> natural key of a given item might be mutable. What stops it from colliding
> with the next item? (I have not had the pleasure of working in a domain
> where natural keys are obvious if they existed at all. "What's in a name",
> after all. )

If your keys are mutable then you definitely have to take that into
consideration for key style choice...but not for your stated concern.
 Even though you can cascade (via RI) updated keys to various tables
performance can certainly suffer vs a surrogate.  This is the main
reason not to use natural keys: slow, perhaps even pathologically slow
update performance on the key.

However, collisions are a reason *to* use natural keys.  If you can'd
handle them with your proposed key then either:
a) you've misidentified the key
or
b) you'er allowing duplicate unique entries in the system and when you
should not be

Even when using surrogates, it's still a good practice to identify
what makes a record unique wherever possible and place unique
constraints on those fields.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Offending My Tender Sensibilities -OR- OLTP on a Star Schema

2014-03-06 Thread Rob Sargent

On 03/06/2014 03:52 PM, Merlin Moncure wrote:

On Tue, Mar 4, 2014 at 3:08 PM, Rob Sargent  wrote:

Do you make a distinction between a key and an index?  I'm not picking up on
design-by-natural-key and what that entails. Especially the notion that the
natural key of a given item might be mutable. What stops it from colliding
with the next item? (I have not had the pleasure of working in a domain
where natural keys are obvious if they existed at all. "What's in a name",
after all. )

If your keys are mutable then you definitely have to take that into
consideration for key style choice...but not for your stated concern.
  Even though you can cascade (via RI) updated keys to various tables
performance can certainly suffer vs a surrogate.  This is the main
reason not to use natural keys: slow, perhaps even pathologically slow
update performance on the key.

However, collisions are a reason *to* use natural keys.  If you can'd
handle them with your proposed key then either:
a) you've misidentified the key
or
b) you'er allowing duplicate unique entries in the system and when you
should not be

Even when using surrogates, it's still a good practice to identify
what makes a record unique wherever possible and place unique
constraints on those fields.

merlin
Yes. And in my mind therein resides the semantic difference between a 
primary key and a unique index.

Thanks.


[GENERAL] problem with trigger function

2014-03-06 Thread Susan Cassidy
I'm having a problem with a trigger function.  I've been googling for over
an hour, with no luck with my specific problem.

I get this error:
ERROR:  missing FROM-clause entry for table "new"
LINE 1: insert into metric_double_values_201203 values (NEW.metricID...
^
QUERY:  insert into metric_double_values_201203 values (NEW.metricID,
NEW.sourceID, NEW.timestamp, NEW.value, NEW.datetimeval)
CONTEXT:  PL/pgSQL function metric_double_insert_func() line 8 at EXECUTE
statement


>From this trigger function:

CREATE OR REPLACE FUNCTION metric_double_insert_func()
RETURNS TRIGGER AS $$
  DECLARE insert_sql text;
BEGIN
 insert_sql:='insert into metric_double_values_' ||
to_char(NEW.datetimeval,'MM') || ' values (NEW.metricID, NEW.sourceID,
NEW.timestamp, NEW.value, NEW.datetimeval)';
EXECUTE insert_sql using NEW;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

DROP TRIGGER insert_metric_double_insert_trigger on metric_double_values;
CREATE TRIGGER insert_metric_double_insert_trigger
BEFORE INSERT ON metric_double_values
FOR EACH ROW EXECUTE PROCEDURE metric_double_insert_func();


This was an attempt at eliminating the error I got when trying to insert
with values (NEW.*) using NEW:
ERROR:  missing FROM-clause entry for table "new"
LINE 1: insert into metric_double_values_201203 values (NEW.*)
^
QUERY:  insert into metric_double_values_201203 values (NEW.*)
CONTEXT:  PL/pgSQL function metric_double_insert_func() line 7 at EXECUTE
statement

I don't know what from clause it is talking about

This is a trigger for inserting rows into the proper partition table based
on date.

Any help appreciated.

Thanks,
Susan


Re: [GENERAL] problem with trigger function

2014-03-06 Thread Adrian Klaver

On 03/06/2014 04:08 PM, Susan Cassidy wrote:

I'm having a problem with a trigger function.  I've been googling for
over an hour, with no luck with my specific problem.

I get this error:
ERROR:  missing FROM-clause entry for table "new"
LINE 1: insert into metric_double_values_201203 values (NEW.metricID...
 ^
QUERY:  insert into metric_double_values_201203 values (NEW.metricID,
NEW.sourceID, NEW.timestamp, NEW.value, NEW.datetimeval)
CONTEXT:  PL/pgSQL function metric_double_insert_func() line 8 at
EXECUTE statement


 From this trigger function:

CREATE OR REPLACE FUNCTION metric_double_insert_func()
RETURNS TRIGGER AS $$
   DECLARE insert_sql text;
BEGIN
  insert_sql:='insert into metric_double_values_' ||
to_char(NEW.datetimeval,'MM') || ' values (NEW.metricID,
NEW.sourceID, NEW.timestamp, NEW.value, NEW.datetimeval)';
 EXECUTE insert_sql using NEW;
 RETURN NULL;
END;
$$
LANGUAGE plpgsql;


The basic problem is here:

insert_sql:='insert into metric_double_values_' || 
to_char(NEW.datetimeval,'MM') || ' values (NEW.metricID, 
NEW.sourceID, NEW.timestamp, NEW.value, NEW.datetimeval)';


in particular:

 ' values (NEW.metricID,...'

You are quoting the NEW values which Postgres then interprets as values 
coming from the table new as new.metric_id, etc.


You need to use the parameter placeholders,$1, $2, etc. See here for 
some examples:


http://www.postgresql.org/docs/9.3/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN

40.5.4. Executing Dynamic Commands



DROP TRIGGER insert_metric_double_insert_trigger on metric_double_values;
CREATE TRIGGER insert_metric_double_insert_trigger
 BEFORE INSERT ON metric_double_values
 FOR EACH ROW EXECUTE PROCEDURE metric_double_insert_func();


This was an attempt at eliminating the error I got when trying to insert
with values (NEW.*) using NEW:
ERROR:  missing FROM-clause entry for table "new"
LINE 1: insert into metric_double_values_201203 values (NEW.*)
 ^
QUERY:  insert into metric_double_values_201203 values (NEW.*)
CONTEXT:  PL/pgSQL function metric_double_insert_func() line 7 at
EXECUTE statement

I don't know what from clause it is talking about

This is a trigger for inserting rows into the proper partition table
based on date.

Any help appreciated.

Thanks,
Susan



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] There is bug in PCS 0.9.26: configure pacemaker resource agent for PG stream replication

2014-03-06 Thread leo
Hi all,

   I just complete my HA configuration on Redhat 6.4 enterprise:
Clusterware: Pacemaker 1.1.8 ( CMAN .0.12.1, corosync 1.4.1 )
Resource manager: PCS  0.9.26
PostgreSQL 9.3.3.1
  Detailed configuration follow:
http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#PostgreSQL_.28node1_only.29


  Complete the configuration, we find all PostgreSQL is slave status. We
finally find the root cause after google  searching. PCS don't apply all
resource configuration definition, it cut configuration parameter at end of
the first "op"
  For example:
   pcs -f pgsql_cfg resource create pgsql pgsql \
   pgctl="/usr/bin/pg_ctl" \
   psql="/usr/bin/psql" \
   pgdata="/var/lib/pgsql/data/" \
   rep_mode="sync" \
   node_list="node1 node2" \
   restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \
   primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5
keepalives_count=5" \
   master_ip="192.168.2.3" \
   restart_on_promote='true' \
   op start   timeout="60s" interval="0s"  on-fail="restart" \
  ==Other op definition is cut 
   After we upgrade PCS version to the latest 0.9.90. All problem disappear
:)

  BWT, we find the app connection does not be interrupted when RA
automatically switch sync replication to async replication ( for example we
kill the slave node). 




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/There-is-bug-in-PCS-0-9-26-configure-pacemaker-resource-agent-for-PG-stream-replication-tp5795081.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general