Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)

David Halik Wed, 06 Jan 2010 07:08:07 -0800

Timo, I reinstalled all our servers with -O0 and full debugging on"-gdwarf-2 -g3". I should be able to look up anything you need done now.

Here's some more examples with the data you asked for last time, nowwithout optimization. I also noticed that there are a few users thatseem to be regulars with this problem.

#4 0x00000000004d8fd7 in i_panic (format=0x501688 "file %s: line %d(%s): assertion failed: (%s)") at failures.c:207args = {{gp_offset = 40, fp_offset = 48, overflow_arg_area =0x7fffcf52b5a0, reg_save_area = 0x7fffcf52b4e0}}#5 0x000000000044f34e in maildir_uidlist_records_array_delete(uidlist=0x1b612330, rec=0x1b61f1f8) at maildir-uidlist.c:405

    recs = (struct maildir_uidlist_rec * const *) 0x1b607f60
    pos = (struct maildir_uidlist_rec * const *) 0x0
    idx = 0
    count = 2
    __PRETTY_FUNCTION__ = "maildir_uidlist_records_array_delete"

#6 0x000000000044f8fc in maildir_uidlist_next (uidlist=0x1b612330,line=0x1b60426a"1249562340.M302085P7316V04240006I00DB2CF1_0.gehenna7.rutgers.edu,S=19277:2,S")at maildir-uidlist.c:552

    rec = (struct maildir_uidlist_rec *) 0x1b6209f8
    old_rec = (struct maildir_uidlist_rec *) 0x1b61f1f8
    recs = (struct maildir_uidlist_rec * const *) 0x1b604264
    count = 0
    uid = 2807
--------------------
(gdb) fr 5

#5 0x000000000044f34e in maildir_uidlist_records_array_delete(uidlist=0x1b612330, rec=0x1b61f1f8) at maildir-uidlist.c:405

405            i_assert(pos != NULL);
(gdb) p *recs
$1 = (struct maildir_uidlist_rec * const) 0x1b61fa60
(gdb) p count
$2 = 2
(gdb) p *recs[0]

$3 = {uid = 2735, flags = 6, filename = 0x1b61fa78"1262739815.M913182P4280V04240006I014D0F92_0.gehenna7.rutgers.edu,S=11791:2,",extensions = 0x1b620980 "W12036"}

(gdb) p *recs[1]

$4 = {uid = 2736, flags = 0, filename = 0x1b6209a0"1262739645.M307617P13458V04240006I014D0F91_0.gehenna10.rutgers.edu,S=5005:2,S",extensions = 0x1b6209f0 "W5124"}

(gdb)

Another user for comparison:

(gdb) fr 5

#5 0x000000000044f34e in maildir_uidlist_records_array_delete(uidlist=0x6bd7680, rec=0x6bc93d0) at maildir-uidlist.c:405

405            i_assert(pos != NULL);
(gdb) p *recs
$1 = (struct maildir_uidlist_rec * const) 0x6d5c418
(gdb) p count
$2 = 7
(gdb) p *recs[0]

$3 = {uid = 52566, flags = 4, filename = 0x6d5c438"1262781360.M836147P9323V04240007I0148DC53_0.gehenna5.rutgers.edu,S=2157:2,a",extensions = 0x6d5c430 "W2201"}

(gdb) p *recs[1]

$4 = {uid = 52568, flags = 0, filename = 0x6cd4590"1262628888.M94168P6001V04240006I003C41C2_0.gehenna10.rutgers.edu,S=22946:2,Sa",extensions = 0x6cd4588 "W23481"}

(gdb) p *recs[2]

$5 = {uid = 52569, flags = 0, filename = 0x6bc9328"1041233872.000006.mbox:2,S", extensions = 0x6bc9318 "W6334"}

(gdb) p *recs[3]

$6 = {uid = 52570, flags = 0, filename = 0x6bc9370"1041233872.000013.mbox:2,S", extensions = 0x6bc9360 "W1193"}

(gdb) p *recs[4]

$7 = {uid = 52571, flags = 0, filename = 0x6bc92e0"1041233872.000014.mbox:2,RS", extensions = 0x6bc92d0 "W3095"}

(gdb) p *recs[5]

$8 = {uid = 52572, flags = 4, filename = 0x6d58ec8"1262782141.M730129P15749V03E80006I0148DC54_0.gehenna9.rutgers.edu,S=7454:2,",extensions = 0x6d62160 "W7626"}

(gdb) p *recs[6]

$9 = {uid = 52573, flags = 0, filename = 0x6d62140"1041233872.000016.mbox:2,S", extensions = 0x0}

(gdb)

And one more user:

(gdb) fr 5

#5 0x000000000044f34e in maildir_uidlist_records_array_delete(uidlist=0x10f44680, rec=0x10f67120) at maildir-uidlist.c:405

405            i_assert(pos != NULL);
(gdb) p *recs
$1 = (struct maildir_uidlist_rec * const) 0x11093760
(gdb) p count
$2 = 2
(gdb) p *recs[0]

$3 = {uid = 55108, flags = 0, filename = 0x11093778"1262788462.M201250P7938V04240006I01EF76E0_0.gehenna10.rutgers.edu,S=1731278:2,",extensions = 0x1109d240 "W1753846"}

(gdb) p *recs[1]

$4 = {uid = 55109, flags = 4, filename = 0x1109d268"1262788793.M851477P3866V045C0007I01EF76E3_0.gehenna8.rutgers.edu,S=19990",extensions = 0x0}





On 12/30/2009 12:10 PM, David Halik wrote:

Ok, I think I've got some more info and a more accurate time line foryou. I tried this on two different dumps from two different users. Thecount was 4 in the first example and 0 in the second. I'm guessingthat's considered "small"? The links to my gdb sessions for both arebelow and have some of the info you were looking for. Thecorresponding logs are also there so you can see how each failed. Iput everything on pastebin so it's a little easier to see.
By the way, I also found that the stale NFS file handle message doesappear first in each instance, it was just farther back in the logs."Lowering uid" message also appears immediately after every stale NFSmessage, which in turn causes all of this n amount of time later(sometimes 5 minutes, sometimes 20) when a user does a new action. The"file reappeared message only occurs some of the time. Here's thechain of events in every case so far that I can see:
1) fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist)failed: Stale NFS file handle2) /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: next_uid waslowered (n -> n-1, hdr=n-1)
...a few minutes later...
(may or may not be a "message reappeared" warning at this point)
3) /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: Duplicate fileentry at line 3:1261057547.M378185P17303V03E80002I0197FB4A_0.gehenna9.rutgers.edu,S=7174:2,RS(uid i -> n+1,2,3 )4) Panic: file maildir-uidlist.c: line 405(maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)
One thing to note, after the "Expunged message reappeared, giving anew UID" he died quickly and one more than one server simultaneously.The gdb output is from server gehenna11 of that log file. The uid in*recs[0] is also the number that you can see in the logs being loweredfrom 719 -> 718.
First user log: http://pastebin.com/m1718f07b
First user gdb: http://pastebin.com/m40088dc8
The second user also died on more than one server. The output is alsofrom gehenna11
Second user log: http://pastebin.com/f3a1756f2
Second user gdb: http://pastebin.com/m59aacde4



On 12/29/2009 7:50 PM, Timo Sirainen wrote:
On 29.12.2009, at 19.09, David Halik wrote:
I'll definitely get back to you on this. Right now we're closeduntil after New Years and I don't want to go updating the dovecotpackage on all of our servers until we're all back at work. I did dosome quick poking around and the count is optimized out, so I'llhave the package rebuilt without optimization and let you what thevalues are at the beginning of next week. Thanks again.
well, you can probably also get the values in a bit more difficult way:
p count = p uidlist.records.arr.buffer.used /uidlist.records.arr.element_size
p recs[n] = p *(*uidlist.records.v)[n]



--
================================
David Halik
System Administrator
OIT-CSS Rutgers University
dha...@jla.rutgers.edu
================================

Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)

Reply via email to