[GENERAL] Strange Postgresql crash

2006-11-16 Thread Eric Rousse
 each
random_page_cost = 4# units are one sequential page fetch cost
cpu_tuple_cost = 0.01   # (same)
cpu_index_tuple_cost = 0.001# (same)
cpu_operator_cost = 0.0025  # (same)
log_connections = false
log_pid = true
log_statement = false
log_duration = false
log_timestamp = true
log_min_error_statement = notice # Values in order of increasing severity:
#   debug5, debug4, debug3, debug2, debug1,
#   info, notice, warning, error, 
panic(off)

syslog = 0  # range 0-2
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
LC_MESSAGES = 'en_US'
LC_MONETARY = 'en_US'
LC_NUMERIC = 'en_US'
LC_TIME = 'en_US'

I tested my memory with memtest, and it's perfect. I also did some 
stress test within Linux, using stress and donnie++ to see if it would 
crash with APCI or not, while doing a dump... So far its okay.


The machine: Linux aquilonII 2.6.17-1.2142_FC4 #1 Tue Jul 11 22:41:14 
EDT 2006 i686 i686 i386 GNU/Linux


Any one has a suggestion ?

--
Eric Rousse
514-655-1001

Telmatik inc.
204 Montarville, suite 250
Boucherville, QC, Canada
J4B 6S2

www.telmatik.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Strange Postgresql crash

2006-11-16 Thread Eric Rousse
duh! right. I didn't thought about this one!! but the strange thing 
though is that it doesn't happen frequently, only recently it started to 
crash regularly.


here's the content of the crontab:

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/

# run-parts
01 * * * * root run-parts /etc/cron.hourly
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly

00 3 * * * root /export/dbsystem/pg_backup.sh va > /dev/null 2>&1
00 4 * * * root /export/dbsystem/pg_backup.sh b > /dev/null 2>&1
00 5 * * * root rsync --password-file=/etc/.rs_sec -Cauzbqr 
/export/dbsystem/backup/ rsync://[EMAIL PROTECTED]/rsync/



I'll move cron.daily to 4:30

brian a écrit :

Eric Rousse wrote:

Hello all,

I've been experiencing strange crash, never really took care of it 
since it was happening only every 1-2 months or so. But lately, I've 
seen it a lot in the past week and I have no clue about it, other 
than the backups.


So, here's some info about it and about my machine:

When: it crashes at night, at around 4AM, during the backup:

00 3 * * * root /export/dbsystem/pg_backup.sh va > /dev/null 2>&1
00 4 * * * root /export/dbsystem/pg_backup.sh b > /dev/null 2>&1

I move the vacuum to another time, just to make sure they are not in 
conflict, who knows!




Is there anything else running at that time? What does /etc/crontab 
have? I ask because my fedora box has cron.daily scripts run at 4:02am 
by default.


brian


---(end of broadcast)-------
TIP 2: Don't 'kill -9' the postmaster




--
Eric Rousse
514-655-1001

Telmatik inc.
204 Montarville, suite 250
Boucherville, QC, Canada
J4B 6S2

www.telmatik.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Strange Postgresql crash

2006-11-16 Thread Eric Rousse

Hi Tom,

Yeah, that's what I suspect it seems more like a hardware/os issue. 
Since, I really have no proof against PostgreSQL, other

than my daily dump that crashes *sometimes*.

I didn't know about badblocks, I'll try this one. Last time I did a full 
fsck check on all the volumes, and everything was clean.
But I have a sata raid on this server, I never knew if a raid would 
actually replicate badblocks to the other disk ?


Thanks for your advice!

Tom Lane a écrit :

Eric Rousse <[EMAIL PROTECTED]> writes:
  

...
2006-11-16 04:00:39 [8763]   LOG:  connection received: host=10.1.1.54 
port=4894
2006-11-16 04:00:40 [8763]   LOG:  pq_recvbuf: unexpected EOF on client 
connection

2006-11-16 04:00:40 [8763]   LOG:  incomplete startup packet
2006-11-16 04:02:26 [2534]   LOG:  database system was interrupted at 
2006-11-16 03:57:36 EST

2006-11-16 04:02:26 [2534]   LOG:  checkpoint record is at C/6733EB68
...



I think what you're seeing here is probably a kernel-level crash and
system reboot.  It's not any normal sort of Postgres problem, because
if it were you'd see the postmaster bleating about crash of one of its
child processes.  Here it appears that the postmaster and all its
children died at once leaving no messages behind --- and that just
doesn't happen without either manual intervention or a system crash.

If it seems to be triggered by running a PG backup, it could be that
you've got a disk hardware problem that only manifests when you try to
read a particular data block :-(.  Have you tried running "badblocks"?

    regards, tom lane

  



--
Eric Rousse
514-655-1001

Telmatik inc.
204 Montarville, suite 250
Boucherville, QC, Canada
J4B 6S2

www.telmatik.com