[BUGS] BUG #7600: Database crash with data corruption

2012-10-12 Thread hankiewicz
The following bug has been logged on the website:

Bug reference:  7600
Logged by:  Database crash with data corruption
Email address:  hankiew...@gmail.com
PostgreSQL version: 9.2.1
Operating system:   Linux
Description:

We experienced database crash. 
Our configuration:
Single postgres server instance with 17 databases installed on it. Postgres
server is installed in chroot on Linux Centos.

The list of steps to reproduce bug:
1) Create two database on single server instance (production and sandbox)
2) On production perform normal operations (we use Trac with over 50 active
users)
3) on the sandbox database start explain plan of query
4) while explain plan is executing drop sandbox database
5) after recovery was completed we have discovered that sequences on
production database had wrong values

Our log:
ERROR:  database "sandbox" is being accessed by other users
DETAIL:  There are 2 other sessions using the database.
STATEMENT:  DROP DATABASE sandbox;

ERROR:  database "sandbox" is being accessed by other users
DETAIL:  There is 1 other session using the database.
STATEMENT:  DROP DATABASE sandbox;

LOG:  server process (PID 1482) was terminated by signal 2: Killed
DETAIL:  Failed process was running: EXPLAIN (ANALYZE off, VERBOSE off,
COSTS on, BUFFERS off, TIMING off )select * from ticket_custom tcu join 
(ticket ti left join 
(select tcin.field,tcin.time as tc_time, 
tcin.ticket, tcin.oldvalue 
from ticket_change tcin 
where tcin.field = 'status' and tcin.time = 
(select max(tcmax.time) from
ticket_change tcmax where tcmax.field = 'status' and tcmax.ticket =
tcin.ticket)) tc
on ti.id = tc.ticket) con 
on tcu.ticket = con.id 
join ticket_custom tcu2 on tcu2.ticket = con.id 
join ticket_custom tcu3 on tcu3.ticket = con.id 
where 
tcu.ticket = con.id and 
tcu.name = 'scen' and
tcu2.name = 'err_type' and
tcu3.name = 'err_details' and
con.component not in ('', 'xxx ') and
--  tcu.value not in ('',' ') and 
con.status <> 'closed' and 
con.type = 'defect' and
con.owner <> 'TEST_TEAM'
order by 1 asc,5 asc
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repe

Re: [BUGS] BUG #7600: Database crash with data corruption

2012-10-15 Thread Michał Hankiewicz
You misunderstand me.
According to the server logs I have sent in the first message process
received signal 2 (it is SIGINT) and according to the documentation this
signal should not couse server crash. Wrong values of sequences does not
mean hole in generated values, but sequence started to generate the same
values once again causing unique constraint violation.

regards
Michał Hankiewicz

On Sat, Oct 13, 2012 at 1:10 PM, Craig Ringer  wrote:

> On 10/12/2012 09:35 PM, hankiew...@gmail.com wrote:
>
>> 5) after recovery was completed we have discovered that sequences on
>> production database had wrong values
>>
>
> To follow up on Tom's explanation, if you're relying on sequences not
> having "holes" then your design is dangerously mistaken. A simple ROLLBACK
> throws away any IDs that were obtained from sequences in that transaction.
>
> If you're saying that the sequence is wrong in that it's LOWER than IDs
> present in the table, then that's a problem.
>
> --
> Craig Ringer
>


Re: [BUGS] BUG #7600: Database crash with data corruption

2012-10-15 Thread Michał Hankiewicz
On Mon, Oct 15, 2012 at 3:31 PM, Craig Ringer  wrote:

>
> OK, that sounds more like a problem. It wasn't clear from your original
> post that it was replaying used sequence values.
>
> SIGINT should normally just terminate the statement, eg:
>
> regress=# SELECT pg_sleep(100);
> ERROR:  canceling statement due to user request
>
> so I agree that something isn't right here.
>
> Is this something you've been able to reproduce? Or is it a one-off
> incident so far?
>
> --
> Craig Ringer
>
>
It is just one-off incident. We have increased sequences values and since
friday everything is running fine.
We haven't tried to reproduce this situation.

Michał Hankiewicz