[GENERAL] 9.1.11 - many backends in "semtimedop" syscall
Hi, Ihave following situation: amazon, but not RDS - just plain EC2, with ubuntu system. zone reclaiming is disabled (at least in my system, no idea about host). and there is no "transparent huge page support". today we had at least 3 cases where bunch of abckends (~40) gets "interlocked". Some of them are shown as "waiting" in pg_stat_activity, but not all. simple "insert into table ()" - without any triggers - gets killed after 10 minutes with no visible progress. From what I saw - most of the backends are "locked" (though it's not visible in pg_stat_actrivity) on the same table. Did take coredump from one of the backends, but there are no symbols, so the backtrace is just: root@xx:/mnt# gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" /usr/lib/postgresql/9.1/bin/postgres pg.core.21422 [New LWP 21422] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `postgres'. #0 0x7ffa60da2dc7 in semop () from /lib/x86_64-linux-gnu/libc.so.6 Thread 1 (LWP 21422): #0 0x7ffa60da2dc7 in semop () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 0x005f65e8 in PGSemaphoreLock () No symbol table info available. #2 0x00636125 in LWLockAcquire () No symbol table info available. #3 0x00630f91 in LockAcquireExtended () No symbol table info available. #4 0x0062f88c in LockRelationOid () No symbol table info available. #5 0x00470f6d in relation_open () No symbol table info available. #6 0x0047b013 in index_open () No symbol table info available. #7 0x0057bb4c in ExecOpenIndices () No symbol table info available. #8 0x005894c8 in ExecInitModifyTable () No symbol table info available. #9 0x0057266a in ExecInitNode () No symbol table info available. #10 0x00570e4a in standard_ExecutorStart () No symbol table info available. #11 0x00593406 in ?? () No symbol table info available. #12 0x00593947 in SPI_execute_plan_with_paramlist () No symbol table info available. #13 0x7ff8c34c8aed in ?? () from /usr/lib/postgresql/9.1/lib/plpgsql.so No symbol table info available. #14 0x7ff8c34c9716 in ?? () from /usr/lib/postgresql/9.1/lib/plpgsql.so No symbol table info available. #15 0x7ff8c34ca252 in ?? () from /usr/lib/postgresql/9.1/lib/plpgsql.so No symbol table info available. #16 0x7ff8c34ca252 in ?? () from /usr/lib/postgresql/9.1/lib/plpgsql.so No symbol table info available. #17 0x7ff8c34cc9dc in ?? () from /usr/lib/postgresql/9.1/lib/plpgsql.so No symbol table info available. #18 0x7ff8c34ccf01 in plpgsql_exec_trigger () from /usr/lib/postgresql/9.1/lib/plpgsql.so No symbol table info available. #19 0x7ff8c34c214a in plpgsql_call_handler () from /usr/lib/postgresql/9.1/lib/plpgsql.so No symbol table info available. #20 0x00709d67 in ?? () No symbol table info available. #21 0x00555fda in ?? () No symbol table info available. #22 0x0055668f in ?? () No symbol table info available. #23 0x0055ba6a in AfterTriggerEndQuery () No symbol table info available. #24 0x0056febf in standard_ExecutorFinish () No symbol table info available. #25 0x00645a2a in ?? () No symbol table info available. #26 0x00645c13 in ?? () No symbol table info available. #27 0x00646962 in PortalRun () No symbol table info available. #28 0x0064274a in PostgresMain () No symbol table info available. #29 0x00604443 in ?? () No symbol table info available. #30 0x00604eb1 in PostmasterMain () No symbol table info available. #31 0x0045a720 in main () No symbol table info available. What could that be, and how to get rid of the problem? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ signature.asc Description: Digital signature
[GENERAL] extract psql meta-commands into library?
I'd like to provide access to psql's "backslash" meta-commands for IPython's %sql magic, as well as some other non-psql tools. When I asked about standalone implementations of the meta-commands, Craig Ringer suggested that extracting them from psql into a libpsql library (which psql would then use) would make that easy for all matter of external uses. Does this sound doable / of interest? (Originally asked at http://stackoverflow.com/questions/22211916/standalone-psql-meta-command-implementation) ipython-sql: https://pypi.python.org/pypi/ipython-sql/0.3.1 Thanks! -- - Catherine http://catherinedevlin.blogspot.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 9.1.11 - many backends in "semtimedop" syscall
hubert depesz lubaczewski writes: > Thread 1 (LWP 21422): > #0 0x7ffa60da2dc7 in semop () from /lib/x86_64-linux-gnu/libc.so.6 > No symbol table info available. > #1 0x005f65e8 in PGSemaphoreLock () > No symbol table info available. > #2 0x00636125 in LWLockAcquire () > No symbol table info available. > #3 0x00630f91 in LockAcquireExtended () > No symbol table info available. > #4 0x0062f88c in LockRelationOid () > No symbol table info available. > #5 0x00470f6d in relation_open () > No symbol table info available. Huh. Looks like it's blocked trying to acquire one of the lock-partition LWLocks, which is odd because those ought never be held very long. Somebody else has failed to release that LWLock, looks like. Did you by any chance capture stack traces from all of the backends? The interesting one would be the one that *doesn't* look like this. Or possibly there's more than one such. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 9.1.11 - many backends in "semtimedop" syscall
On Thu, Mar 06, 2014 at 11:56:06AM -0500, Tom Lane wrote: > hubert depesz lubaczewski writes: > > Thread 1 (LWP 21422): > > #0 0x7ffa60da2dc7 in semop () from /lib/x86_64-linux-gnu/libc.so.6 > > No symbol table info available. > > #1 0x005f65e8 in PGSemaphoreLock () > > No symbol table info available. > > #2 0x00636125 in LWLockAcquire () > > No symbol table info available. > > #3 0x00630f91 in LockAcquireExtended () > > No symbol table info available. > > #4 0x0062f88c in LockRelationOid () > > No symbol table info available. > > #5 0x00470f6d in relation_open () > > No symbol table info available. > > Huh. Looks like it's blocked trying to acquire one of the lock-partition > LWLocks, which is odd because those ought never be held very long. > Somebody else has failed to release that LWLock, looks like. > > Did you by any chance capture stack traces from all of the backends? > The interesting one would be the one that *doesn't* look like this. > Or possibly there's more than one such. I didn't have a chance to do it. Can try if there is a way to get trace *without* making core (sorry, my c/gdb knowledge is very, very limited). Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ signature.asc Description: Digital signature
Re: [GENERAL] 9.1.11 - many backends in "semtimedop" syscall
hubert depesz lubaczewski writes: > I didn't have a chance to do it. Can try if there is a way to get trace > *without* making core (sorry, my c/gdb knowledge is very, very limited). Sure, you just attach to the process: $ gdb /path/to/postgres PID-of-process gdb> bt gdb> quit This is usually preferable to forcing a core dump. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 9.1.11 - many backends in "semtimedop" syscall
On Thu, Mar 06, 2014 at 12:02:50PM -0500, Tom Lane wrote: > hubert depesz lubaczewski writes: > > I didn't have a chance to do it. Can try if there is a way to get trace > > *without* making core (sorry, my c/gdb knowledge is very, very limited). > > Sure, you just attach to the process: > > $ gdb /path/to/postgres PID-of-process > gdb> bt > gdb> quit > > This is usually preferable to forcing a core dump. Thank you. If the problem will strike again, I will do it on all (or most, depending how fast I can make it) backends. Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ signature.asc Description: Digital signature
Re: [GENERAL] extract psql meta-commands into library?
Catherine Devlin writes: > I'd like to provide access to psql's "backslash" meta-commands for > IPython's %sql magic, as well as some other non-psql tools. When I > asked about standalone implementations of the meta-commands, Craig > Ringer suggested that extracting them from psql into a libpsql library > (which psql would then use) would make that easy for all matter of > external uses. > Does this sound doable / of interest? Hm ... the code in psql's describe.c is not terribly conducive to that. Parsing of the backslash command, execution of the query/queries, and presentation of the results is all rather tightly bound up; you'd have to think about how to decouple those. If you could do it in a way that didn't result in a quantum jump in the complexity/unreadability of the code, I think there'd be interest. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Mysterious DB reset
For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going on appears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 On Mar 5, 2014, at 1:00 PM, Adrian Klaver wrote: > On 03/05/2014 10:22 AM, Israel Brewster wrote: >> My first thought was "Oh, I must have a typo in my >> cleanup routine, such that it is deleting all records rather than only >> those a week old, and it's just that no one has noticed until now". So I >> looked at that, but changing the delete to a select appeared to produce >> the proper results, in that no records were selected: > > Well it would, if the records only go back to 4 AM this morning. In other > words if no records exist before 4 AM today, no records exist before 7 days > ago also or am I missing something? If the delete is correct, you are absolutely right. My first theory, however, was that I made a typo, and the delete was deleting ALL records, in which case changing it to a select would select all records. As it did not, that seems to confirm the delete is correct, and therefore not the problem. > A sequence is just a special table. > > So what does SELECT * from the sequence show? tracking=> SELECT * FROM data_id_seq; sequence_name | last_value | start_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled | is_called ---++-+--+-+---+-+-+---+--- data_id_seq | 1184 | 1 |1 | 9223372036854775807 | 1 | 1 | 16 | f | t > >> >> Also odd is that my cleanup script runs at 1am. I have records of there >> being new data in the database up to 3:51am, but the oldest record >> currently in the DB is from 4:45am (as specified by the default of now() >> on the column). So I know records were added after my delete command >> ran, but before this reset occurred. > > I am not sure what you are calling the 'reset'? > Did something happen between 3:51 AM and 4:45 AM? Yes: All my data was deleted and the sequence reset to 1. > Also not sure why you call the 4:45 AM record the oldest, when you say you > can identify records from 3:51 AM? As I mentioned, I archive the records to permanent storage. This archive process happens every hour (for various reasons). That is how I know we had records for 3:51 am: they exist in the permanent archive. However, they don't exist in the local database any more. > >> >> So my question is, aside from someone going in and mucking about in the >> wee hours of the morning, what could possibly cause this behavior? What >> sort of event could cause all data to be deleted from the table, and the >> sequence to be reset? Especially while there is an active connection? >> Thanks for any ideas, however wild or off the wall :-) > > What is in the Postgres/system logs for the time period(s) you mention? The postgres log has a lot of errors in it, some of which MAY explain the issue. For example: cp: cannot create regular file '/mnt/pglogs/000100040094': Permission denied LOG: archive command failed with exit code 1 DETAIL: The failed archive command was: test ! -f /mnt/pglogs/000100040094 && cp pg_xlog/000100040094 /mnt/pglogs/000100040094 WARNING: transaction log file "000100040094" could not be archived: too many failures LOG: received smart shutdown request LOG: autovacuum launcher shutting down LOG: shutting down LOG: database system is shut down However, there are no timestamps on any of the entries (can I fix that?), so I don't know if those are current entries, or from back before I got the mount for the logs working. At this time, the mount point IS working correctly, and from what I can tell so is the archive command. The latest entry is from yesterday (modify date on the log shows Mar 5, 9:21, when I was messing with it yesterday), however, so there are no entries from this morning when it happened again. I don't see anything of interest in the syslog or messages log. > >> >> --- >> Israel Brewster >> Computer Support Technician II >> Era Alaska >> 5245 Airport Industrial Rd >> Fairbanks, AK 99709 >> (907) 450-7250 x7293 >> --- >> >> >> >> > > > -- > Adrian Klaver > adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Mysterious DB reset
On Mar 5, 2014, at 10:01 AM, Alvaro Herrera wrote: > Israel Brewster wrote: > >> So my question is, aside from someone going in and mucking about in the wee >> hours of the morning, what could possibly cause this behavior? What sort of >> event could cause all data to be deleted from the table, and the sequence to >> be reset? Especially while there is an active connection? Thanks for any >> ideas, however wild or off the wall :-) > > Is this running off a NFS mount or something? I'm wondering about the > filesystem getting unmounted and an empty copy of the database being in > the mount point. Good thought - I hadn't thought of a file system level reset. Unfortunately the data folder is on the root partition, which is a standard SATA hard drive, and the date on the data folder shows February 27th, so apparently the folder itself at least hasn't been re-created. > > -- > Álvaro Herrerahttp://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general --- Israel Brewster Computer Support Technician II Era Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x7293 --- BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Mysterious DB reset
On Mar 5, 2014, at 10:01 AM, Thom Brownwrote:On 5 March 2014 18:22, Israel Brewster wrote: I have a Postgresql 9.2.4 database containing real-time tracking data for our aircraft for the past week (theoretically). It is populated by two different processes: one that runs every few minutes, retrieving data from a number of sources and storing it in the DB, and one that has an "always on" connection to the DB streaming data into the database in realtime (often several records per second). To keep the database size manageable I have a cron job that runs every evening to delete all records that are more than a week old, after archiving a subset of them in permanent storage. This morning my boss e-mailed me, complaining that only a couple of aircraft were showing up in the list (SELECT distinct(tail) FROM data being the command that populates the list). Upon looking at the data I saw that it only went back to 4am this morning, rather than the week I was expecting. My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected: DELETE FROM data WHERE pointtimeThen I noticed something even more odd. My database has an id column, which is defined as a SERIAL. As we all know, a serial is a monotonically increasing number that is not affected by deletes. However, the oldest record in my database, from 4am this morning, had an id of 1. Even though I KNOW there was data in the system yesterday. Even if my DELETE command was wrong and deleted ALL records, that shouldn't have reset the SERIAL column to 1! I also know that I have not been in the database mucking around with the sequence value - to be completely honest, I don't even know the exact command to reset it - I'd have to google it if I wanted to. Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred. So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-) That is odd. Even if it were an unlogged table, and there was a crash, the sequence wouldn't reset, and even if it was running in a very long-running transaction held open by a buggy connection pooler, the sequence would still progress as it's immune to the effects of transactions.Not famillar with a logged vs unlogged table (still learning all the features of PostgreSQL), but as you said the sequence resetting is rather odd. So if all the data went missing, and the sequence reset, the only thing I can think of is:Someone ran:TRUNCATE data RESTART IDENTITY;Considering that I'm the only one in the company that knows SQL at all beyond a simple single-table select (I keep having to explain joins and how they aren't evil to the other programmer here), not likely :-) or someone restored the table structure from a backup that deleted the original table.Now that's a thought...Maybe my backup routine is working backwards (pushing from the backup archive to the primary machine rather than from the primary to the backup). I did switch primary and secondary machines a while ago, but I thought I had checked that. What would be the effect of overwriting the data files while the database is active?Do you log DDL?Not sure what that is, so I'll assume no :-)Was the table partitioned?Nope.You should also really be on 9.2.7, although I can't think of any bug that's been fixed which could be responsible for this issue.I'll look at updating. Thanks. -- Thom ---Israel BrewsterComputer Support Technician IIEra Alaska5245 Airport Industrial RdFairbanks, AK 99709(907) 450-7250 x7293---BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD
Re: [GENERAL] Mysterious DB reset
On 03/06/2014 09:33 AM, Israel Brewster wrote: For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going on appears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 On Mar 5, 2014, at 1:00 PM, Adrian Klaver wrote: On 03/05/2014 10:22 AM, Israel Brewster wrote: My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected: Well it would, if the records only go back to 4 AM this morning. In other words if no records exist before 4 AM today, no records exist before 7 days ago also or am I missing something? If the delete is correct, you are absolutely right. My first theory, however, was that I made a typo, and the delete was deleting ALL records, in which case changing it to a select would select all records. As it did not, that seems to confirm the delete is correct, and therefore not the problem. A sequence is just a special table. So what does SELECT * from the sequence show? tracking=> SELECT * FROM data_id_seq; sequence_name | last_value | start_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled | is_called ---++-+--+-+---+-+-+---+--- data_id_seq | 1184 | 1 |1 | 9223372036854775807 | 1 | 1 | 16 | f | t Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred. I am not sure what you are calling the 'reset'? Did something happen between 3:51 AM and 4:45 AM? Yes: All my data was deleted and the sequence reset to 1. Also not sure why you call the 4:45 AM record the oldest, when you say you can identify records from 3:51 AM? As I mentioned, I archive the records to permanent storage. This archive process happens every hour (for various reasons). That is how I know we had records for 3:51 am: they exist in the permanent archive. However, they don't exist in the local database any more. Well something is happening. See my notes on logging below to help track down the cause. So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-) What is in the Postgres/system logs for the time period(s) you mention? The postgres log has a lot of errors in it, some of which MAY explain the issue. For example: cp: cannot create regular file '/mnt/pglogs/000100040094': Permission denied LOG: archive command failed with exit code 1 DETAIL: The failed archive command was: test ! -f /mnt/pglogs/000100040094 && cp pg_xlog/000100040094 /mnt/pglogs/000100040094 WARNING: transaction log file "000100040094" could not be archived: too many failures LOG: received smart shutdown request LOG: autovacuum launcher shutting down LOG: shutting down LOG: database system is shut down Might be good to explain your archive setup. However, there are no timestamps on any of the entries (can I fix that?), so I don't know if those are current entries, or from back before I got the mount for the logs working. At this time, the mount point IS working correctly, and from what I can tell so is the archive command. The latest entry is from yesterday (modify date on the log shows Mar 5, 9:21, when I was messing with it yesterday), however, so there are no entries from this morning when it happened again. I don't see anything of interest in the syslog or messages log. Yes you can, timestamps and a lot more. For all the details go here: http://www.postgresql.org/docs/9.2/interactive/runtime-config-logging.html At the least I would: log_statement = 'mod' log_connections = on log_disconnections = on log_line_prefix = '%u-%m-%x' --- Israel Brewster Computer Support Technician II Era Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x7293 --- -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your s
Re: [GENERAL] Mysterious DB reset
On 3/6/14, Israel Brewster wrote: > LOG: received smart shutdown request > LOG: autovacuum launcher shutting down > LOG: shutting down > LOG: database system is shut down > > However, there are no timestamps on any of the entries (can I fix that?) Yes, change the log_line_prefix in the postgresql.conf file and reload it. I use: log_line_prefix = '%m %u ' You might also want to use this, at least temporarily: log_statement = all -- Mike Nolan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Mysterious DB reset
On Mar 6, 2014, at 9:03 AM, Adrian Klaver wrote: > On 03/06/2014 09:33 AM, Israel Brewster wrote: >> For starters, this happened again this morning (no data prior to 4:45 am and >> sequence reset), so whatever is going on appears to be reoccurring. Also, I >> forgot to mention if it is significant: this is running on slackware liunux >> 14.0 >> >> On Mar 5, 2014, at 1:00 PM, Adrian Klaver wrote: >> >>> On 03/05/2014 10:22 AM, Israel Brewster wrote: My first thought was "Oh, I must have a typo in my cleanup routine, such that it is deleting all records rather than only those a week old, and it's just that no one has noticed until now". So I looked at that, but changing the delete to a select appeared to produce the proper results, in that no records were selected: >>> >>> Well it would, if the records only go back to 4 AM this morning. In other >>> words if no records exist before 4 AM today, no records exist before 7 days >>> ago also or am I missing something? >> >> If the delete is correct, you are absolutely right. My first theory, >> however, was that I made a typo, and the delete was deleting ALL records, in >> which case changing it to a select would select all records. As it did not, >> that seems to confirm the delete is correct, and therefore not the problem. >> >>> A sequence is just a special table. >>> >>> So what does SELECT * from the sequence show? >> >> tracking=> SELECT * FROM data_id_seq; >> sequence_name | last_value | start_value | increment_by | max_value >> | min_value | cache_value | log_cnt | is_cycled | is_called >> ---++-+--+-+---+-+-+---+--- >> data_id_seq | 1184 | 1 |1 | >> 9223372036854775807 | 1 | 1 | 16 | f | t >> >> >>> Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred. >>> >>> I am not sure what you are calling the 'reset'? >>> Did something happen between 3:51 AM and 4:45 AM? >> >> Yes: All my data was deleted and the sequence reset to 1. >> >>> Also not sure why you call the 4:45 AM record the oldest, when you say you >>> can identify records from 3:51 AM? >> >> As I mentioned, I archive the records to permanent storage. This archive >> process happens every hour (for various reasons). That is how I know we had >> records for 3:51 am: they exist in the permanent archive. However, they >> don't exist in the local database any more. > > Well something is happening. See my notes on logging below to help track down > the cause. Yep. > >> >>> So my question is, aside from someone going in and mucking about in the wee hours of the morning, what could possibly cause this behavior? What sort of event could cause all data to be deleted from the table, and the sequence to be reset? Especially while there is an active connection? Thanks for any ideas, however wild or off the wall :-) >>> >>> What is in the Postgres/system logs for the time period(s) you mention? >> >> The postgres log has a lot of errors in it, some of which MAY explain the >> issue. For example: >> >> cp: cannot create regular file '/mnt/pglogs/000100040094': >> Permission denied >> LOG: archive command failed with exit code 1 >> DETAIL: The failed archive command was: test ! -f >> /mnt/pglogs/000100040094 && cp pg_xlog/000100040094 >> /mnt/pglogs/000100040094 >> WARNING: transaction log file "000100040094" could not be >> archived: too many failures >> LOG: received smart shutdown request >> LOG: autovacuum launcher shutting down >> LOG: shutting down >> LOG: database system is shut down > > Might be good to explain your archive setup. Ok, here goes: We have the primary system which receives the data and handles all requests for said data. There is also a hot standby server keep in sync with streaming replication. The WALs are archived to a NFS share on this machine. Once an hour a python script runs that a) Selects all unsynced records from the postgresql db, b) stores a subset of them in our permanent archive, and c) marks the previously selected records as synced (UPDATE data SET syncd=true WHERE id in (...) ) Additionally, I have a) a script that runs at 8:00pm every evening that uses pg_dump to dump the contents of the database to a backup file, and b) a script that runs at 8:00 each morning that rsync's various config files and scripts (such as my data retrieval scripts) from the primary machine to a backup location on the secondary machine. None of the
Re: [GENERAL] Mysterious DB reset
On 03/06/2014 10:43 AM, Israel Brewster wrote: On Mar 6, 2014, at 9:03 AM, Adrian Klaver wrote: Well something is happening. See my notes on logging below to help track down the cause. Yep. Might be good to explain your archive setup. Ok, here goes: We have the primary system which receives the data and handles all requests for said data. There is also a hot standby server keep in sync with streaming replication. The WALs are archived to a NFS share on this machine. Once an hour a python script runs that a) Selects all unsynced records from the postgresql db, b) stores a subset of them in our permanent archive, and c) marks the previously selected records as synced (UPDATE data SET syncd=true WHERE id in (...) ) Additionally, I have a) a script that runs at 8:00pm every evening that uses pg_dump to dump the contents of the database to a backup file, and b) a script that runs at 8:00 each morning that rsync's various config files and scripts (such as my data retrieval scripts) from the primary machine to a backup location on the secondary machine. None of the scripts run anywhere near the apparent 4:40ish cutoff time for my data Are all the scripts running from one machine? If so, have you checked that the times are set correctly on the various machines? Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather than having formal training :-) I'm DEFINITELY open to suggestions :-) 'Makes sense' is context sensitive. It really depends on what you want to achieve. My procedure is to define the end result first and then work backwards from there. I'll get those in the config, and we'll see what happens tomorrow morning. Hopefully that will give more information. Thanks for the link and information! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Mysterious DB reset
On Mar 6, 2014, at 10:03 AM, Adrian Klaver wrote: > On 03/06/2014 10:43 AM, Israel Brewster wrote: >> On Mar 6, 2014, at 9:03 AM, Adrian Klaver wrote: >> > >>> >>> Well something is happening. See my notes on logging below to help track >>> down the cause. >> >> Yep. >> > >>> >>> Might be good to explain your archive setup. >> >> Ok, here goes: We have the primary system which receives the data and >> handles all requests for said data. There is also a hot standby server keep >> in sync with streaming replication. The WALs are archived to a NFS share on >> this machine. >> >> Once an hour a python script runs that a) Selects all unsynced records from >> the postgresql db, b) stores a subset of them in our permanent archive, and >> c) marks the previously selected records as synced (UPDATE data SET >> syncd=true WHERE id in (...) ) >> >> Additionally, I have a) a script that runs at 8:00pm every evening that uses >> pg_dump to dump the contents of the database to a backup file, and b) a >> script that runs at 8:00 each morning that rsync's various config files and >> scripts (such as my data retrieval scripts) from the primary machine to a >> backup location on the secondary machine. >> >> None of the scripts run anywhere near the apparent 4:40ish cutoff time for >> my data > > Are all the scripts running from one machine? > If so, have you checked that the times are set correctly on the various > machines? Three different machines (due to OS requirements), but yeah the times all appear to be correct. > > >> >> Make sense? Probably not the best setup, but then that's what happens when >> you figure out stuff for yourself rather than having formal training :-) I'm >> DEFINITELY open to suggestions :-) > > 'Makes sense' is context sensitive. It really depends on what you want to > achieve. My procedure is to define the end result first and then work > backwards from there. Good point. However, I was asking more in the general "did I explain it well enough to be understood" rather than in the "is it a sensible setup" aspect. The rest of the sentence was just me being insecure :-D > >> >>> > >> >> I'll get those in the config, and we'll see what happens tomorrow morning. >> Hopefully that will give more information. Thanks for the link and >> information! >> >> > --- Israel Brewster Computer Support Technician II Era Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x7293 --- BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Mysterious DB reset
On 03/06/2014 11:09 AM, Israel Brewster wrote: On Mar 6, 2014, at 10:03 AM, Adrian Klaver wrote: Are all the scripts running from one machine? If so, have you checked that the times are set correctly on the various machines? Three different machines (due to OS requirements), but yeah the times all appear to be correct. So it appears you will have to wait to see what the logging reports. Should have mentioned that you need to make sure you do something like pg_ctl reload on the Postgres server to get the postgressql.conf changes to take effect. Another thought. Might not be bad idea to grep your scripts for ALTER, SEQUENCE or any other relevant keywords. Just in case something slipped in you where not aware of. Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather than having formal training :-) I'm DEFINITELY open to suggestions :-) 'Makes sense' is context sensitive. It really depends on what you want to achieve. My procedure is to define the end result first and then work backwards from there. Good point. However, I was asking more in the general "did I explain it well enough to be understood" rather than in the "is it a sensible setup" aspect. The rest of the sentence was just me being insecure :-D I understood the basics of what you are doing. The details probably can wait pending the log information. Hey, all of us are beginners/learning in something. --- Israel Brewster Computer Support Technician II Era Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x7293 --- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] high throughput 9.3, master/slave question
On 02/03/14 05:08, bricklen wrote: On Fri, Feb 28, 2014 at 1:54 PM, Matthew Chambers mailto:mchamb...@wetafx.co.nz>> wrote: Initially, I had my application servers using the slave for short, read only queries, but this turned out to be highly unstable. The slave would start refusing connections, and the logs would fill with: ERROR: canceling statement due to conflict with recovery I've tried these 2 settings: max_standby_archive_delay = -1 max_standby_streaming_delay = -1 But then I starting getting these: DETAIL: User transaction caused buffer deadlock with recovery. Read requests come in at anywhere between 200 and 1000/second. I was wondering if there is some combination of configuration settings that would safely let me use the slave for read only queries? Have you tried setting max_standby_archive|streaming_delay to several minutes (or whatever makes sense for your typical query durations), rather than disabling those settings with -1? No, I have not tired this, but I'll give it a go. Is there anyone doing this that has a configuration I can check out? Is this even possible to do? -Matt
Re: [GENERAL] Mysterious DB reset
Israel Brewster-2 wrote > So my question is, aside from someone going in and mucking about in the > wee hours of the morning, what could possibly cause this behavior? What > sort of event could cause all data to be deleted from the table, and the > sequence to be reset? Especially while there is an active connection? > Thanks for any ideas, however wild or off the wall :-) Nothing so far makes this likely but any chance there is some kind of Virtual Machine setup in place where all the changes from a given day are being lost because the VM is resetting back to "factory defaults"? Also, you say you perform daily pg_dumps. Have you tried loading these up and see what their contents are? Ultimately the log files are going to be needed to do any meaningful forensic work though. Any chance you may been debugging the wrong box/database? That is always something worth verifying and usually not the difficult. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Mysterious-DB-reset-tp5794868p5795031.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Mysterious DB reset
On 03/06/2014 09:33 AM, Israel Brewster wrote: For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going on appears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 Also odd is that my cleanup script runs at 1am. I have records of there being new data in the database up to 3:51am, but the oldest record currently in the DB is from 4:45am (as specified by the default of now() on the column). So I know records were added after my delete command ran, but before this reset occurred. A shot in the dark... Have you searched /etc/crontab, root's crontab, PostgreSQL's crontab and the crontabs of any automatic scripts that connect. I'm not sure about Slackware but Red Hat and Centos run the cron.daily scripts at (wait for it...) just after 4am. Some of the default daily scripts like logrotate can have "side effects" like restarting the service that writes to the log file being rotated. Cheers, Steve -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Offending My Tender Sensibilities -OR- OLTP on a Star Schema
On Tue, Mar 4, 2014 at 3:08 PM, Rob Sargent wrote: > Do you make a distinction between a key and an index? I'm not picking up on > design-by-natural-key and what that entails. Especially the notion that the > natural key of a given item might be mutable. What stops it from colliding > with the next item? (I have not had the pleasure of working in a domain > where natural keys are obvious if they existed at all. "What's in a name", > after all. ) If your keys are mutable then you definitely have to take that into consideration for key style choice...but not for your stated concern. Even though you can cascade (via RI) updated keys to various tables performance can certainly suffer vs a surrogate. This is the main reason not to use natural keys: slow, perhaps even pathologically slow update performance on the key. However, collisions are a reason *to* use natural keys. If you can'd handle them with your proposed key then either: a) you've misidentified the key or b) you'er allowing duplicate unique entries in the system and when you should not be Even when using surrogates, it's still a good practice to identify what makes a record unique wherever possible and place unique constraints on those fields. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Offending My Tender Sensibilities -OR- OLTP on a Star Schema
On 03/06/2014 03:52 PM, Merlin Moncure wrote: On Tue, Mar 4, 2014 at 3:08 PM, Rob Sargent wrote: Do you make a distinction between a key and an index? I'm not picking up on design-by-natural-key and what that entails. Especially the notion that the natural key of a given item might be mutable. What stops it from colliding with the next item? (I have not had the pleasure of working in a domain where natural keys are obvious if they existed at all. "What's in a name", after all. ) If your keys are mutable then you definitely have to take that into consideration for key style choice...but not for your stated concern. Even though you can cascade (via RI) updated keys to various tables performance can certainly suffer vs a surrogate. This is the main reason not to use natural keys: slow, perhaps even pathologically slow update performance on the key. However, collisions are a reason *to* use natural keys. If you can'd handle them with your proposed key then either: a) you've misidentified the key or b) you'er allowing duplicate unique entries in the system and when you should not be Even when using surrogates, it's still a good practice to identify what makes a record unique wherever possible and place unique constraints on those fields. merlin Yes. And in my mind therein resides the semantic difference between a primary key and a unique index. Thanks.
[GENERAL] problem with trigger function
I'm having a problem with a trigger function. I've been googling for over an hour, with no luck with my specific problem. I get this error: ERROR: missing FROM-clause entry for table "new" LINE 1: insert into metric_double_values_201203 values (NEW.metricID... ^ QUERY: insert into metric_double_values_201203 values (NEW.metricID, NEW.sourceID, NEW.timestamp, NEW.value, NEW.datetimeval) CONTEXT: PL/pgSQL function metric_double_insert_func() line 8 at EXECUTE statement >From this trigger function: CREATE OR REPLACE FUNCTION metric_double_insert_func() RETURNS TRIGGER AS $$ DECLARE insert_sql text; BEGIN insert_sql:='insert into metric_double_values_' || to_char(NEW.datetimeval,'MM') || ' values (NEW.metricID, NEW.sourceID, NEW.timestamp, NEW.value, NEW.datetimeval)'; EXECUTE insert_sql using NEW; RETURN NULL; END; $$ LANGUAGE plpgsql; DROP TRIGGER insert_metric_double_insert_trigger on metric_double_values; CREATE TRIGGER insert_metric_double_insert_trigger BEFORE INSERT ON metric_double_values FOR EACH ROW EXECUTE PROCEDURE metric_double_insert_func(); This was an attempt at eliminating the error I got when trying to insert with values (NEW.*) using NEW: ERROR: missing FROM-clause entry for table "new" LINE 1: insert into metric_double_values_201203 values (NEW.*) ^ QUERY: insert into metric_double_values_201203 values (NEW.*) CONTEXT: PL/pgSQL function metric_double_insert_func() line 7 at EXECUTE statement I don't know what from clause it is talking about This is a trigger for inserting rows into the proper partition table based on date. Any help appreciated. Thanks, Susan
Re: [GENERAL] problem with trigger function
On 03/06/2014 04:08 PM, Susan Cassidy wrote: I'm having a problem with a trigger function. I've been googling for over an hour, with no luck with my specific problem. I get this error: ERROR: missing FROM-clause entry for table "new" LINE 1: insert into metric_double_values_201203 values (NEW.metricID... ^ QUERY: insert into metric_double_values_201203 values (NEW.metricID, NEW.sourceID, NEW.timestamp, NEW.value, NEW.datetimeval) CONTEXT: PL/pgSQL function metric_double_insert_func() line 8 at EXECUTE statement From this trigger function: CREATE OR REPLACE FUNCTION metric_double_insert_func() RETURNS TRIGGER AS $$ DECLARE insert_sql text; BEGIN insert_sql:='insert into metric_double_values_' || to_char(NEW.datetimeval,'MM') || ' values (NEW.metricID, NEW.sourceID, NEW.timestamp, NEW.value, NEW.datetimeval)'; EXECUTE insert_sql using NEW; RETURN NULL; END; $$ LANGUAGE plpgsql; The basic problem is here: insert_sql:='insert into metric_double_values_' || to_char(NEW.datetimeval,'MM') || ' values (NEW.metricID, NEW.sourceID, NEW.timestamp, NEW.value, NEW.datetimeval)'; in particular: ' values (NEW.metricID,...' You are quoting the NEW values which Postgres then interprets as values coming from the table new as new.metric_id, etc. You need to use the parameter placeholders,$1, $2, etc. See here for some examples: http://www.postgresql.org/docs/9.3/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN 40.5.4. Executing Dynamic Commands DROP TRIGGER insert_metric_double_insert_trigger on metric_double_values; CREATE TRIGGER insert_metric_double_insert_trigger BEFORE INSERT ON metric_double_values FOR EACH ROW EXECUTE PROCEDURE metric_double_insert_func(); This was an attempt at eliminating the error I got when trying to insert with values (NEW.*) using NEW: ERROR: missing FROM-clause entry for table "new" LINE 1: insert into metric_double_values_201203 values (NEW.*) ^ QUERY: insert into metric_double_values_201203 values (NEW.*) CONTEXT: PL/pgSQL function metric_double_insert_func() line 7 at EXECUTE statement I don't know what from clause it is talking about This is a trigger for inserting rows into the proper partition table based on date. Any help appreciated. Thanks, Susan -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] There is bug in PCS 0.9.26: configure pacemaker resource agent for PG stream replication
Hi all, I just complete my HA configuration on Redhat 6.4 enterprise: Clusterware: Pacemaker 1.1.8 ( CMAN .0.12.1, corosync 1.4.1 ) Resource manager: PCS 0.9.26 PostgreSQL 9.3.3.1 Detailed configuration follow: http://clusterlabs.org/wiki/PgSQL_Replicated_Cluster#PostgreSQL_.28node1_only.29 Complete the configuration, we find all PostgreSQL is slave status. We finally find the root cause after google searching. PCS don't apply all resource configuration definition, it cut configuration parameter at end of the first "op" For example: pcs -f pgsql_cfg resource create pgsql pgsql \ pgctl="/usr/bin/pg_ctl" \ psql="/usr/bin/psql" \ pgdata="/var/lib/pgsql/data/" \ rep_mode="sync" \ node_list="node1 node2" \ restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \ primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \ master_ip="192.168.2.3" \ restart_on_promote='true' \ op start timeout="60s" interval="0s" on-fail="restart" \ ==Other op definition is cut After we upgrade PCS version to the latest 0.9.90. All problem disappear :) BWT, we find the app connection does not be interrupted when RA automatically switch sync replication to async replication ( for example we kill the slave node). -- View this message in context: http://postgresql.1045698.n5.nabble.com/There-is-bug-in-PCS-0-9-26-configure-pacemaker-resource-agent-for-PG-stream-replication-tp5795081.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general