Re: Corrupted sizes in cache once again

2023-02-15 Thread Tim Evers
Hi again, so this is the actual bug report :) I installed 2.3.20 from repo yet the errors persist. I made the following observations: For a mailbox producing the "broken physical size" messages the culprit seems not to be the index.cache file but the dovecot-uidlist file. Removing the cache

Re: Corrupted sizes in cache once again

2023-02-02 Thread Stuart Henderson
On 2023-02-02, Tim Evers wrote: > > Am 02.02.23 um 16:23 schrieb Aki Tuomi: >> For bug reports, we do ask that you try to reproduce it with 2.3.20 (current >> latest), you can get packages from https://repo.dovecot.org/ and would be >> nice if you can provide steps to reproduce this issue. >> >>

RE: Corrupted sizes in cache once again

2023-02-02 Thread Marc
> > Maybe I was a bit unclear: I have about 1000 error messages per day from > random accounts (about 500 in total so far) on all clusters. These are > transparent to the user, so it's more like background noise at the > moment. Do you have ecc memory? > No VM involved. All machines are baremeta

Re: Corrupted sizes in cache once again

2023-02-02 Thread Tim Evers
Maybe I was a bit unclear: I have about 1000 error messages per day from random accounts (about 500 in total so far) on all clusters. These are transparent to the user, so it's more like background noise at the moment. No VM involved. All machines are baremetal DRBD two-node clusters. As far a

Re: Corrupted sizes in cache once again

2023-02-02 Thread Christopher Wensink
Can you isolate the problem account on a separate VM to see if the problem follows the account or the original vm? Chris On 2/2/2023 9:58 AM, Tim Evers wrote: Good point - these are 8 diferrent DRBD clusters. I failed over one testing this theory. Problem persists. So I would rule out underl

Re: Corrupted sizes in cache once again

2023-02-02 Thread Tim Evers
Good point - these are 8 diferrent DRBD clusters. I failed over one testing this theory. Problem persists. So I would rule out underlying issues. Especially since the "wrong" value is suspiciously often the on-disk size rather than a random value one would expect if there is corruption undern

Re: Corrupted sizes in cache once again

2023-02-02 Thread Tim Evers
Am 02.02.23 um 16:23 schrieb Aki Tuomi: On 02/02/2023 17:19 EET Stuart Henderson wrote: On 2023-02-01, Tim Evers wrote: I run a fairly large Dovecot Installation (around 100k mailboxes) on several servers. gzip compression is on. Every once in a while I get the dreaded "cache corruptio

RE: Corrupted sizes in cache once again

2023-02-02 Thread Marc
Could even be memory. I had once on an office machine a faulty memory module (without ecc), and it caused the md5sum from files on truecrypt usb backup drives to change constantly. Removed the module, and no more issues. > > Something to try, this all could be happening because of underlying d

Re: Corrupted sizes in cache once again

2023-02-02 Thread Christopher Wensink
Something to try, this all could be happening because of underlying disk failure on the array it is running on.  If this is a VM, can you move the operation to another host or data store to rule out hardware issues? On 2/2/2023 9:19 AM, Stuart Henderson wrote: On 2023-02-01, Tim Evers wrote:

Re: Corrupted sizes in cache once again

2023-02-02 Thread Aki Tuomi
> On 02/02/2023 17:19 EET Stuart Henderson wrote: > > > On 2023-02-01, Tim Evers wrote: > > I run a fairly large Dovecot Installation (around 100k mailboxes) on > > several servers. > > > > gzip compression is on. > > > > Every once in a while I get the dreaded "cache corruption" messages i

Re: Corrupted sizes in cache once again

2023-02-02 Thread Stuart Henderson
On 2023-02-01, Tim Evers wrote: > I run a fairly large Dovecot Installation (around 100k mailboxes) on > several servers. > > gzip compression is on. > > Every once in a while I get the dreaded "cache corruption" messages in > the log: > > Error: Corrupted record in index cache file > /[redacte

Re: Corrupted sizes in cache once again

2023-02-01 Thread Tim Evers
I would like to add some more observations: I hav another mailbox with one mail's size being saved wrong in the cache file: doveadm dump: RECORD: seq=40, uid=1202, flags=0x00  - ext 5 cache :   8504 (3821)  - ext 6 vsize : 145625 (d9380200)    : vsize