Ok update from my end

under 2.3.18 (have not upgraded production to 2.3.19 yet)

replication issues as stated before

however i need to note that i had to manually sync a user that was not being listed as a replicator fail

this means i have to force a full sync between servers on all accounts regardless of replication status

this was discovered this morning on a customers account that did not replicate between the servers properly and thus emails were being delivered days later because the client was accessing the other server.

its one thing to be 10 minutes late etc but a day late is not practical

again not complaining

I will load 2.3.19 on the test servers and try that and advise, also will test for the folder count replication issue as well and advise

please note NO errors are being thrown in the debug log, it reports the replication request, gets qued but does not complete??





Happy Thursday !!!
Thanks - paul

Paul Kudla


Scom.ca Internet Services <http://www.scom.ca>
004-1009 Byron Street South
Whitby, Ontario - Canada
L1N 4S3

Toronto 416.642.7266
Main 1.866.411.7266
Fax 1.888.892.7266

On 5/11/2022 12:25 AM, Cassidy B. Larson wrote:
Hi Aki,

We just installed 2.3.19, and are seeing a couple of users throwing the "INBOX/dovecot.index reset, view is now inconsistent" and their replicator status erroring out. Tried force-resync on the full mailbox, but to no avail just yet.  Not sure if this bug was supposedly fixed in 2.3.19?

Thanks,

Cassidy

On Thu, Apr 28, 2022 at 5:02 AM Aki Tuomi <aki.tu...@open-xchange.com <mailto:aki.tu...@open-xchange.com>> wrote:

    2.3.19 is round the corner, so not long. I cannot yet promise an
    exact date but hopefully within week or two.

    Aki

     > On 28/04/2022 13:57 Paul Kudla (SCOM.CA <http://SCOM.CA> Internet
    Services Inc.) <p...@scom.ca <mailto:p...@scom.ca>> wrote:
     >
     >
     > Thanks for the update.
     >
     > is this for both replication issues (folders +300 etc)
     >
     > Just Asking - Any ETA
     >
     >
     >
     >
     >
     > Happy Thursday !!!
     > Thanks - paul
     >
     > Paul Kudla
     >
     >
     > Scom.ca Internet Services <http://www.scom.ca <http://www.scom.ca>>
     > 004-1009 Byron Street South
     > Whitby, Ontario - Canada
     > L1N 4S3
     >
     > Toronto 416.642.7266
     > Main 1.866.411.7266
     > Fax 1.888.892.7266
     >
     > On 4/27/2022 9:01 AM, Aki Tuomi wrote:
     > >
     > > Hi!
     > >
     > > This is probably going to get fixed in 2.3.19, this looks like
    an issue we are already fixing.
     > >
     > > Aki
     > >
     > >> On 26/04/2022 16:38 Paul Kudla (SCOM.CA <http://SCOM.CA>
    Internet Services Inc.) <p...@scom.ca <mailto:p...@scom.ca>> wrote:
     > >>
     > >>
     > >> Agreed there seems to be no way of posting these kinds of
    issues to see
     > >> if they are even being addressed or even known about moving
    forward on
     > >> new updates
     > >>
     > >> i read somewhere there is a new branch soming out but nothing
    as of yet?
     > >>
     > >> 2.4 maybe ....
     > >> 5.0 ........
     > >>
     > >> my previous replication issues (back in feb) went unanswered.
     > >>
     > >> not faulting anyone, but the developers do seem to be
    disconnected from
     > >> issues as of late? or concentrating on other issues.
     > >>
     > >> I have no problem with support contracts for day to day maintence
     > >> however as a programmer myself they usually dont work as the
    other end
     > >> relies on the latest source code anyways. Thus can not help.
     > >>
     > >> I am trying to take a part the replicator c programming based
    on 2.3.18
     > >> as most of it does work to some extent.
     > >>
     > >> tcps just does not work (ie 600 seconds default in the c
    programming)
     > >>
     > >> My thoughts are tcp works ok but fails when the replicator through
     > >> dsync-client.c when asked to return the folder list?
     > >>
     > >>
     > >> replicator-brain.c seems to control the overall process and
    timing.
     > >>
     > >> replicator-queue.c seems to handle the que file that does seem
    to carry
     > >> acurate info.
     > >>
     > >>
     > >> things in the source code are documented enough to figure this
    out but i
     > >> am still going through all the related .h files documentation
    wise which
     > >> are all over the place.
     > >>
     > >> there is no clear documentation on the .h lib files so i have
    to walk
     > >> through the tree one at a time finding relative code.
     > >>
     > >> since the dsync from doveadm does see to work ok i have to
    assume the
     > >> dsync-client used to compile the replicator is at fault
    somehow or a
     > >> call from it upstream?
     > >>
     > >> Thanks for your input on the other issues noted below, i will
    keep that
     > >> in mind when disassembling the source code.
     > >>
     > >> No sense in fixing one thing and leaving something else
    behind, probably
     > >> all related anyways.
     > >>
     > >> i have two test servers avaliable so i can play with all this
    offline to
     > >> reproduce the issues
     > >>
     > >> Unfortunately I have to make a living first, this will be
    addressed when
     > >> possible as i dont like systems that are live running this way and
     > >> currently only have 5 accounts with this issue (mine included)
     > >>
     > >>
     > >>
     > >>
     > >> Happy Tuesday !!!
     > >> Thanks - paul
     > >>
     > >> Paul Kudla
     > >>
     > >>
     > >> Scom.ca Internet Services <http://www.scom.ca
    <http://www.scom.ca>>
     > >> 004-1009 Byron Street South
     > >> Whitby, Ontario - Canada
     > >> L1N 4S3
     > >>
     > >> Toronto 416.642.7266
     > >> Main 1.866.411.7266
     > >> Fax 1.888.892.7266
     > >>
     > >> On 4/26/2022 9:03 AM, Reuben Farrelly wrote:
     > >>>
     > >>> I ran into this back in February and documented a
    reproducible test case
     > >>> (and sent it to this list).  In short - I was able to
    reproduce this by
     > >>> having a valid and consistent mailbox on the source/local,
    creating a
     > >>> very standard empty Maildir/(new|cur|tmp) folder on the
    remote replica,
     > >>> and then initiating the replicate from the source. This
    consistently
     > >>> caused dsync to fail replication with the error
    "dovecot.index reset,
     > >>> view is now inconsistent" and sync aborted, leaving the
    replica mailbox
     > >>> in a screwed up inconsistent state. Client connections on the
    source
     > >>> replica were also dropped when this error occurred.  You can
    see the
     > >>> error by enabling debug level logging if you initiate dsync
    manually on
     > >>> a test mailbox.
     > >>>
     > >>> The only workaround I found was to remove the remote Maildir
    and let
     > >>> Dovecot create the whole thing from scratch.  Dovecot did not
    like any
     > >>> existing folders on the destination replica even if they were
    the same
     > >>> names as the source and completely empty.  I was able to
    reproduce this
     > >>> the bare minimum of folders - just an INBOX!
     > >>>
     > >>> I have no idea if any of the developers saw my post or if the
    bug has
     > >>> been fixed for the next release.  But it seemed to be quite a
    common
     > >>> problem over time (saw a few posts from people going back a
    long way
     > >>> with the same problem) and it is seriously disruptive to
    clients.  The
     > >>> error message is not helpful in tracking down the problem either.
     > >>>
     > >>> Secondly, I also have had an ongoing and longstanding problem
    using
     > >>> tcps: for replication.  For some reason using tcps: (with no
    other
     > >>> changes at all to the config) results in a lot of timeout
    messages
> >>> "Error: dsync I/O has stalled, no activity for 600 seconds". This goes
     > >>> away if I revert back to tcp: instead of tcps - with tcp: I
    very rarely
     > >>> get timeouts.  No idea why, guess this is a bug of some sort
    also.
     > >>>
     > >>> It's disappointing that there appears to be no way to have
    these sorts
     > >>> or problems addressed like there once was.  I am not using
    Dovecot for
     > >>> commercial purposes so paying a fortune for a support
    contract for a
     > >>> high end installation just isn't going to happen, and this
    list seems to
     > >>> be quite ordinary for getting support and reporting bugs
    nowadays....
     > >>>
     > >>> Reuben
     > >>>
     > >>> On 26/04/2022 7:21 pm, Paul Kudla (SCOM.CA <http://SCOM.CA>
    Internet Services Inc.) wrote:
     > >>>
     > >>>>
     > >>>> side issue
     > >>>>
     > >>>> if you are getting inconsistant dsyncs there is no real way
    to fix
     > >>>> this in the long run.
     > >>>>
     > >>>> i know its a pain (already had to my self)
     > >>>>
     > >>>> i needed to do a full sync, take one server offline, delete
    the user
     > >>>> dir (with dovecot offline) and then rsync (or somehow
    duplicate the
     > >>>> main server's user data) over the the remote again.
     > >>>>
     > >>>> then bring remote back up and it kind or worked worked
     > >>>>
     > >>>> best suggestion is to bring the main server down at night so
    the copy
     > >>>> is clean?
     > >>>>
     > >>>> if using postfix you can enable the soft bounce option and
    the mail
     > >>>> will back spool until everything comes back online
     > >>>>
     > >>>> (needs to be enable on bother servers)
     > >>>>
     > >>>> replication was still an issue on accounts with 300+ folders
    in them,
     > >>>> still working on a fix for that.
     > >>>>
     > >>>>
     > >>>> Happy Tuesday !!!
     > >>>> Thanks - paul
     > >>>>
     > >>>> Paul Kudla
     > >>>>
     > >>>>
     > >>>> Scom.ca Internet Services <http://www.scom.ca
    <http://www.scom.ca>>
     > >>>> 004-1009 Byron Street South
     > >>>> Whitby, Ontario - Canada
     > >>>> L1N 4S3
     > >>>>
     > >>>> Toronto 416.642.7266
     > >>>> Main 1.866.411.7266
     > >>>> Fax 1.888.892.7266
     > >>>>
     > >>>> On 4/25/2022 10:01 AM, Arnaud Abélard wrote:
     > >>>>> Ah, I'm now getting errors in the logs, that would explains the
     > >>>>> increasing number of failed sync requests:
     > >>>>>
     > >>>>> dovecot:
    imap(xxxxx)<2961235><Bs6w43rdQPAqAcsFiXEmAInUhhA3Rfqh>:
     > >>>>> Error: Mailbox INBOX: /vmail/l/i/xxxxx/dovecot.index reset,
    view is
     > >>>>> now inconsistent
     > >>>>>
     > >>>>>
     > >>>>> And sure enough:
     > >>>>>
     > >>>>> # dovecot replicator status xxxxx
     > >>>>>
     > >>>>> xxxxx         none     00:02:54  07:11:28  -            y
     > >>>>>
     > >>>>>
     > >>>>> What could explain that error?
     > >>>>>
     > >>>>> Arnaud
     > >>>>>
     > >>>>>
     > >>>>>
     > >>>>> On 25/04/2022 15:13, Arnaud Abélard wrote:
     > >>>>>> Hello,
     > >>>>>>
     > >>>>>> On my side we are running Linux (Debian Buster).
     > >>>>>>
     > >>>>>> I'm not sure my problem is actually the same as Paul or you
     > >>>>>> Sebastian since I have a lot of boxes but those are
    actually small
     > >>>>>> (quota of 110MB) so I doubt any of them have more than a
    dozen imap
     > >>>>>> folders.
     > >>>>>>
     > >>>>>> The main symptom is that I have tons of full sync requests
    awaiting
     > >>>>>> but even though no other sync is pending the replicator
    just waits
     > >>>>>> for something to trigger those syncs.
     > >>>>>>
     > >>>>>> Today, with users back I can see that normal and
    incremental syncs
     > >>>>>> are being done on the 15 connections, with an occasional
    full sync
     > >>>>>> here or there and lots of "Waiting 'failed' requests":
     > >>>>>>
     > >>>>>> Queued 'sync' requests        0
     > >>>>>>
     > >>>>>> Queued 'high' requests        0
     > >>>>>>
     > >>>>>> Queued 'low' requests         0
     > >>>>>>
     > >>>>>> Queued 'failed' requests      122
     > >>>>>>
     > >>>>>> Queued 'full resync' requests 28785
     > >>>>>>
     > >>>>>> Waiting 'failed' requests     4294
     > >>>>>>
     > >>>>>> Total number of known users   42512
     > >>>>>>
     > >>>>>>
     > >>>>>>
     > >>>>>> So, why didn't the replicator take advantage of the weekend to
     > >>>>>> replicate the mailboxes while no user were using them?
     > >>>>>>
     > >>>>>> Arnaud
     > >>>>>>
     > >>>>>>
     > >>>>>>
     > >>>>>>
     > >>>>>> On 25/04/2022 13:54, Sebastian Marske wrote:
     > >>>>>>> Hi there,
     > >>>>>>>
     > >>>>>>> thanks for your insights and for diving deeper into this
    Paul!
     > >>>>>>>
     > >>>>>>> For me, the users ending up in 'Waiting for dsync to
    finish' all have
     > >>>>>>> more than 256 Imap folders as well (ranging from 288 up
    to >5500;
     > >>>>>>> as per
     > >>>>>>> 'doveadm mailbox list -u <username> | wc -l'). For more
    details on my
     > >>>>>>> setup please see my post from February [1].
     > >>>>>>>
     > >>>>>>> @Arnaud: What OS are you running on?
     > >>>>>>>
     > >>>>>>>
     > >>>>>>> Best
     > >>>>>>> Sebastian
     > >>>>>>>
     > >>>>>>>
     > >>>>>>> [1]
    https://dovecot.org/pipermail/dovecot/2022-February/124168.html
    <https://dovecot.org/pipermail/dovecot/2022-February/124168.html>
     > >>>>>>>
     > >>>>>>>
     > >>>>>>> On 4/24/22 19:36, Paul Kudla (SCOM.CA <http://SCOM.CA>
    Internet Services Inc.) wrote:
     > >>>>>>>>
     > >>>>>>>> Question having similiar replication issues
     > >>>>>>>>
     > >>>>>>>> pls read everything below and advise the folder counts
    on the
     > >>>>>>>> non-replicated users?
     > >>>>>>>>
     > >>>>>>>> i find  the total number of folders / account seems to
    be a factor
     > >>>>>>>> and
     > >>>>>>>> NOT the size of the mail box
     > >>>>>>>>
     > >>>>>>>> ie i have customers with 40G of emails no problem over
    40 or so
     > >>>>>>>> folders
     > >>>>>>>> and it works ok
     > >>>>>>>>
     > >>>>>>>> 300+ folders seems to be the issue
     > >>>>>>>>
     > >>>>>>>> i have been going through the replication code
     > >>>>>>>>
     > >>>>>>>> no errors being logged
     > >>>>>>>>
     > >>>>>>>> i am assuming that the replication --> dhclient -->
    other server is
     > >>>>>>>> timing out or not reading the folder lists correctly (ie
    dies after X
     > >>>>>>>> folders read)
     > >>>>>>>>
     > >>>>>>>> thus i am going through the code patching for log
    entries etc to find
     > >>>>>>>> the issues.
     > >>>>>>>>
     > >>>>>>>> see
     > >>>>>>>>
     > >>>>>>>> [13:33:57] mail18.scom.ca <http://mail18.scom.ca>
    [root:0] /usr/local/var/lib/dovecot
     > >>>>>>>> # ll
     > >>>>>>>> total 86
     > >>>>>>>> drwxr-xr-x  2 root  wheel  uarch    4B Apr 24 11:11 .
     > >>>>>>>> drwxr-xr-x  4 root  wheel  uarch    4B Mar  8  2021 ..
     > >>>>>>>> -rw-r--r--  1 root  wheel  uarch   73B Apr 24 11:11
    instances
     > >>>>>>>> -rw-r--r--  1 root  wheel  uarch  160K Apr 24 13:33
    replicator.db
     > >>>>>>>>
     > >>>>>>>> [13:33:58] mail18.scom.ca <http://mail18.scom.ca>
    [root:0] /usr/local/var/lib/dovecot
     > >>>>>>>> #
     > >>>>>>>>
     > >>>>>>>> replicator.db seems to get updated ok but never
    processed properly.
     > >>>>>>>>
     > >>>>>>>> # sync.users
     > >>>>>>>> n...@elirpa.com
    <mailto:n...@elirpa.com>                   high     00:09:41
    463:47:01 -     y
     > >>>>>>>> ke...@elirpa.com
    <mailto:ke...@elirpa.com>                  high     00:09:23
    463:45:43 -     y
> >>>>>>>> p...@scom.ca <mailto:p...@scom.ca> high     00:09:41 463:46:51 -     y > >>>>>>>> e...@scom.ca <mailto:e...@scom.ca> high     00:09:43 463:47:01 -     y
     > >>>>>>>> ed.ha...@dssmgmt.com
    <mailto:ed.ha...@dssmgmt.com>              high     00:09:42
    463:46:58 -     y
     > >>>>>>>> p...@paulkudla.net
    <mailto:p...@paulkudla.net>                high     00:09:44 463:47:03
     > >>>>>>>> 580:35:07
     > >>>>>>>>      y
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> so ....
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> two things :
     > >>>>>>>>
     > >>>>>>>> first to get the production stuff to work i had to write
    a script
     > >>>>>>>> that
     > >>>>>>>> whould find the bad sync's and the force a dsync between
    the servers
     > >>>>>>>>
     > >>>>>>>> i run this every five minutes or each server.
     > >>>>>>>>
     > >>>>>>>> in crontab
     > >>>>>>>>
     > >>>>>>>> */10    *                *    *    *    root /usr/bin/nohup
     > >>>>>>>> /programs/common/sync.recover > /dev/null
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> python script to sort things out
     > >>>>>>>>
     > >>>>>>>> # cat /programs/common/sync.recover
     > >>>>>>>> #!/usr/local/bin/python3
     > >>>>>>>>
     > >>>>>>>> #Force sync between servers that are reporting bad?
     > >>>>>>>>
     > >>>>>>>> import os,sys,django,socket
     > >>>>>>>> from optparse import OptionParser
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> from lib import *
     > >>>>>>>>
     > >>>>>>>> #Sample Re-Index MB
     > >>>>>>>> #doveadm -D force-resync -u p...@scom.ca
    <mailto:p...@scom.ca> -f INBOX*
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> USAGE_TEXT = '''\
     > >>>>>>>> usage: %%prog %s[options]
     > >>>>>>>> '''
     > >>>>>>>>
     > >>>>>>>> parser = OptionParser(usage=USAGE_TEXT % '', version='0.4')
     > >>>>>>>>
     > >>>>>>>> parser.add_option("-m", "--send_to", dest="send_to",
    help="Send
     > >>>>>>>> Email To")
     > >>>>>>>> parser.add_option("-e", "--email", dest="email_box",
    help="Box to
     > >>>>>>>> Index")
     > >>>>>>>> parser.add_option("-d", "--detail",action='store_true',
     > >>>>>>>> dest="detail",default =False, help="Detailed report")
     > >>>>>>>> parser.add_option("-i", "--index",action='store_true',
     > >>>>>>>> dest="index",default =False, help="Index")
     > >>>>>>>>
     > >>>>>>>> options, args = parser.parse_args()
     > >>>>>>>>
     > >>>>>>>> print (options.email_box)
     > >>>>>>>> print (options.send_to)
     > >>>>>>>> print (options.detail)
     > >>>>>>>>
     > >>>>>>>> #sys.exit()
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> print ('Getting Current User Sync Status')
     > >>>>>>>> command = commands("/usr/local/bin/doveadm replicator
    status '*'")
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> #print command
     > >>>>>>>>
     > >>>>>>>> sync_user_status = command.output.split('\n')
     > >>>>>>>>
     > >>>>>>>> #print sync_user_status
     > >>>>>>>>
     > >>>>>>>> synced = []
     > >>>>>>>>
     > >>>>>>>> for n in range(1,len(sync_user_status)) :
     > >>>>>>>>           user = sync_user_status[n]
     > >>>>>>>>           print ('Processing User : %s' %user.split(' ')[0])
     > >>>>>>>>           if user.split(' ')[0] != options.email_box :
     > >>>>>>>>                   if options.email_box != None :
     > >>>>>>>>                           continue
     > >>>>>>>>
     > >>>>>>>>           if options.index == True :
     > >>>>>>>>                   command = '/usr/local/bin/doveadm -D
    force-resync
     > >>>>>>>> -u %s
     > >>>>>>>> -f INBOX*' %user.split(' ')[0]
     > >>>>>>>>                   command = commands(command)
     > >>>>>>>>                   command = command.output
     > >>>>>>>>
     > >>>>>>>>           #print user
     > >>>>>>>>           for nn in range (len(user)-1,0,-1) :
     > >>>>>>>>                   #print nn
     > >>>>>>>>                   #print user[nn]
     > >>>>>>>>
     > >>>>>>>>                   if user[nn] == '-' :
     > >>>>>>>>                           #print 'skipping ... %s'
    %user.split(' ')[0]
     > >>>>>>>>
     > >>>>>>>>                           break
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>                   if user[nn] == 'y': #Found a Bad Mailbox
     > >>>>>>>>                           print ('syncing ... %s'
    %user.split(' ')[0])
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>                           if options.detail == True :
     > >>>>>>>>                                   command =
    '/usr/local/bin/doveadm -D
     > >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0]
     > >>>>>>>>                                   print (command)
     > >>>>>>>>                                   command =
    commands(command)
     > >>>>>>>>                                   command =
    command.output.split('\n')
     > >>>>>>>>                                   print (command)
     > >>>>>>>>                                   print ('Processed
    Mailbox for ...
     > >>>>>>>> %s'
     > >>>>>>>> %user.split(' ')[0] )
> >>>>>>>> synced.append('Processed Mailbox
     > >>>>>>>> for ...
     > >>>>>>>> %s' %user.split(' ')[0])
     > >>>>>>>>                                   for nnn in
    range(len(command)):
     > >>>>>>>> synced.append(command[nnn] + '\n')
     > >>>>>>>>                                   break
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>                           if options.detail == False :
     > >>>>>>>>                                   #command =
     > >>>>>>>> '/usr/local/bin/doveadm -D
     > >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0]
     > >>>>>>>>                                   #print (command)
     > >>>>>>>>                                   #command =
    os.system(command)
     > >>>>>>>>                                   command =
    subprocess.Popen(
     > >>>>>>>> ["/usr/local/bin/doveadm sync -u %s -d -N -l 30 -U"
    %user.split('
     > >>>>>>>> ')[0]
     > >>>>>>>> ], \
     > >>>>>>>>                                   shell = True, stdin=None,
     > >>>>>>>> stdout=None,
     > >>>>>>>> stderr=None, close_fds=True)
     > >>>>>>>>
     > >>>>>>>>                                   print ( 'Processed
    Mailbox for
     > >>>>>>>> ... %s'
     > >>>>>>>> %user.split(' ')[0] )
> >>>>>>>> synced.append('Processed Mailbox
     > >>>>>>>> for ...
     > >>>>>>>> %s' %user.split(' ')[0])
     > >>>>>>>>                                   #sys.exit()
     > >>>>>>>>                                   break
     > >>>>>>>>
     > >>>>>>>> if len(synced) != 0 :
     > >>>>>>>>           #send email showing bad synced boxes ?
     > >>>>>>>>
     > >>>>>>>>           if options.send_to != None :
     > >>>>>>>>                   send_from = 'moni...@scom.ca
    <mailto:moni...@scom.ca>'
     > >>>>>>>>                   send_to = ['%s' %options.send_to]
     > >>>>>>>>                   send_subject = 'Dovecot Bad Sync
    Report for : %s'
     > >>>>>>>> %(socket.gethostname())
     > >>>>>>>>                   send_text = '\n\n'
     > >>>>>>>>                   for n in range (len(synced)) :
     > >>>>>>>>                           send_text = send_text +
    synced[n] + '\n'
     > >>>>>>>>
     > >>>>>>>>                   send_files = []
     > >>>>>>>>                   sendmail (send_from, send_to,
    send_subject,
     > >>>>>>>> send_text,
     > >>>>>>>> send_files)
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> sys.exit()
     > >>>>>>>>
     > >>>>>>>> second :
     > >>>>>>>>
     > >>>>>>>> i posted this a month ago - no response
     > >>>>>>>>
     > >>>>>>>> please appreciate that i am trying to help ....
     > >>>>>>>>
     > >>>>>>>> after much testing i can now reporduce the replication
    issues at hand
     > >>>>>>>>
     > >>>>>>>> I am running on freebsd 12 & 13 stable (both test and
    production
     > >>>>>>>> servers)
     > >>>>>>>>
     > >>>>>>>> sdram drives etc ...
     > >>>>>>>>
     > >>>>>>>> Basically replication works fine until reaching a folder
    quantity
     > >>>>>>>> of ~
     > >>>>>>>> 256 or more
     > >>>>>>>>
     > >>>>>>>> to reproduce using doveadm i created folders like
     > >>>>>>>>
     > >>>>>>>> INBOX/folder-0
     > >>>>>>>> INBOX/folder-1
     > >>>>>>>> INBOX/folder-2
     > >>>>>>>> INBOX/folder-3
     > >>>>>>>> and so forth ......
     > >>>>>>>>
     > >>>>>>>> I created 200 folders and they replicated ok on both servers
     > >>>>>>>>
     > >>>>>>>> I created another 200 (400 total) and the replicator got
    stuck and
     > >>>>>>>> would
     > >>>>>>>> not update the mbox on the alternate server anymore and
    is still
     > >>>>>>>> updating 4 days later ?
     > >>>>>>>>
     > >>>>>>>> basically replicator goes so far and either hangs or
    more likely
     > >>>>>>>> bails
     > >>>>>>>> on an error that is not reported to the debug reporting ?
     > >>>>>>>>
     > >>>>>>>> however dsync will sync the two servers but only when
    run manually
     > >>>>>>>> (ie
     > >>>>>>>> all the folders will sync)
     > >>>>>>>>
     > >>>>>>>> I have two test servers avaliable if you need any kind
    of access -
     > >>>>>>>> again
     > >>>>>>>> here to help.
     > >>>>>>>>
     > >>>>>>>> [07:28:42] mail18.scom.ca <http://mail18.scom.ca> [root:0] ~
     > >>>>>>>> # sync.status
     > >>>>>>>> Queued 'sync' requests        0
     > >>>>>>>> Queued 'high' requests        6
     > >>>>>>>> Queued 'low' requests         0
     > >>>>>>>> Queued 'failed' requests      0
     > >>>>>>>> Queued 'full resync' requests 0
     > >>>>>>>> Waiting 'failed' requests     0
     > >>>>>>>> Total number of known users   255
     > >>>>>>>>
     > >>>>>>>> username                       type        status
> >>>>>>>> p...@scom.ca <mailto:p...@scom.ca> normal      Waiting for dsync to
     > >>>>>>>> finish
> >>>>>>>> ke...@elirpa.com <mailto:ke...@elirpa.com> incremental Waiting for dsync to
     > >>>>>>>> finish
     > >>>>>>>> ed.ha...@dssmgmt.com
    <mailto:ed.ha...@dssmgmt.com>           incremental Waiting for dsync to
     > >>>>>>>> finish
> >>>>>>>> e...@scom.ca <mailto:e...@scom.ca> incremental Waiting for dsync to
     > >>>>>>>> finish
> >>>>>>>> n...@elirpa.com <mailto:n...@elirpa.com> incremental Waiting for dsync to
     > >>>>>>>> finish
     > >>>>>>>> p...@paulkudla.net
    <mailto:p...@paulkudla.net>             incremental Waiting for dsync to
     > >>>>>>>> finish
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> i have been going through the c code and it seems the
    replication
     > >>>>>>>> gets
     > >>>>>>>> requested ok
     > >>>>>>>>
     > >>>>>>>> replicator.db does get updated ok with the replicated
    request for the
     > >>>>>>>> mbox in question.
     > >>>>>>>>
     > >>>>>>>> however i am still looking for the actual replicator
    function in the
     > >>>>>>>> lib's that do the actual replication requests
     > >>>>>>>>
     > >>>>>>>> the number of folders & subfolders is defanately the
    issue - not the
     > >>>>>>>> mbox pyhsical size as thought origionally.
     > >>>>>>>>
     > >>>>>>>> if someone can point me in the right direction, it seems
    either the
     > >>>>>>>> replicator is not picking up on the number of folders to
    replicat
     > >>>>>>>> properly or it has a hard set limit like 256 / 512 /
    65535 etc and
     > >>>>>>>> stops
     > >>>>>>>> the replication request thereafter.
     > >>>>>>>>
     > >>>>>>>> I am mainly a machine code programmer from the 80's and have
     > >>>>>>>> concentrated on python as of late, 'c' i am starting to
    go through
     > >>>>>>>> just
     > >>>>>>>> to give you a background on my talents.
     > >>>>>>>>
     > >>>>>>>> It took 2 months to finger this out.
     > >>>>>>>>
     > >>>>>>>> this issue also seems to be indirectly causing the
    duplicate messages
     > >>>>>>>> supression not to work as well.
     > >>>>>>>>
     > >>>>>>>> python programming to reproduce issue (loops are for
    last run
     > >>>>>>>> started @
     > >>>>>>>> 200 - fyi) :
     > >>>>>>>>
     > >>>>>>>> # cat mbox.gen
     > >>>>>>>> #!/usr/local/bin/python2
     > >>>>>>>>
     > >>>>>>>> import os,sys
     > >>>>>>>>
     > >>>>>>>> from lib import *
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> user = 'p...@paulkudla.net <mailto:p...@paulkudla.net>'
     > >>>>>>>>
     > >>>>>>>> """
     > >>>>>>>> for count in range (0,600) :
     > >>>>>>>>           box = 'INBOX/folder-%s' %count
     > >>>>>>>>           print count
     > >>>>>>>>           command = '/usr/local/bin/doveadm mailbox
    create -s -u %s
     > >>>>>>>> %s'
     > >>>>>>>> %(user,box)
     > >>>>>>>>           print command
     > >>>>>>>>           a = commands.getoutput(command)
     > >>>>>>>>           print a
     > >>>>>>>> """
     > >>>>>>>>
     > >>>>>>>> for count in range (0,600) :
     > >>>>>>>>           box = 'INBOX/folder-0/sub-%' %count
     > >>>>>>>>           print count
     > >>>>>>>>           command = '/usr/local/bin/doveadm mailbox
    create -s -u %s
     > >>>>>>>> %s'
     > >>>>>>>> %(user,box)
     > >>>>>>>>           print command
     > >>>>>>>>           a = commands.getoutput(command)
     > >>>>>>>>           print a
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>           #sys.exit()
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> Happy Sunday !!!
     > >>>>>>>> Thanks - paul
     > >>>>>>>>
     > >>>>>>>> Paul Kudla
     > >>>>>>>>
     > >>>>>>>>
     > >>>>>>>> Scom.ca Internet Services <http://www.scom.ca
    <http://www.scom.ca>>
     > >>>>>>>> 004-1009 Byron Street South
     > >>>>>>>> Whitby, Ontario - Canada
     > >>>>>>>> L1N 4S3
     > >>>>>>>>
     > >>>>>>>> Toronto 416.642.7266
     > >>>>>>>> Main 1.866.411.7266
     > >>>>>>>> Fax 1.888.892.7266
     > >>>>>>>>
     > >>>>>>>> On 4/24/2022 10:22 AM, Arnaud Abélard wrote:
     > >>>>>>>>> Hello,
     > >>>>>>>>>
     > >>>>>>>>> I am working on replicating a server (and adding
    compression on the
     > >>>>>>>>> other side) and since I had "Error: dsync I/O has
    stalled, no
     > >>>>>>>>> activity
     > >>>>>>>>> for 600 seconds (version not received)" errors I
    upgraded both
     > >>>>>>>>> source
     > >>>>>>>>> and destination server with the latest 2.3 version
    (2.3.18). While
     > >>>>>>>>> before the upgrade all the 15 replication connections
    were busy
     > >>>>>>>>> after
     > >>>>>>>>> upgrading dovecot replicator dsync-status shows that
    most of the
     > >>>>>>>>> time
     > >>>>>>>>> nothing is being replicated at all. I can see some brief
     > >>>>>>>>> replications
     > >>>>>>>>> that last, but 99,9% of the time nothing is happening
    at all.
     > >>>>>>>>>
     > >>>>>>>>> I have a replication_full_sync_interval of 12 hours but
    I have
     > >>>>>>>>> thousands of users with their last full sync over 90
    hours ago.
     > >>>>>>>>>
     > >>>>>>>>> "doveadm replicator status" also shows that i have over
    35,000
     > >>>>>>>>> queued
     > >>>>>>>>> full resync requests, but no sync, high or low queued
    requests so
     > >>>>>>>>> why
     > >>>>>>>>> aren't the full requests occuring?
     > >>>>>>>>>
     > >>>>>>>>> There are no errors in the logs.
     > >>>>>>>>>
     > >>>>>>>>> Thanks,
     > >>>>>>>>>
     > >>>>>>>>> Arnaud
     > >>>>>>>>>
     > >>>>>>>>>
     > >>>>>>>>>
     > >>>>>>>>>
     > >>>>>>>>>
     > >>>>>>
     > >>>>>
     > >>>
     > >


--
This message has been scanned for viruses and
dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
believed to be clean.

Reply via email to