So we've discussed this at length in #cyrus on freenode, and concluded that the 
issue is that twoskip is doing far too many munmap / mmap calls during an 
unlocked foreach (which is what the LIST command uses).

I've filed a bug:

https://github.com/cyrusimap/cyrus-imapd/issues/5

And I'm looking at what it would take to fix this behaviour in twoskip now 
(yes, I know I should be sleeping, but I'm not going to be able to sleep until 
I understand how this got broken!)

Bron.

On Fri, Jul 15, 2016, at 20:41, Hynek Schlawack via Info-cyrus wrote:
> Hello,
> 
> we’ve updated one of our Cyrus IMAP backends from 2.4 to 2.5.8 on FreeBSD 
> 10.3 with ZFS and now we have an operational emergency.
> 
> Cyrus IMAPd starts fine and keeps working for about 5 to 20 minutes (rather 
> sluggishly tho).  At some point the server load starts growing and explodes 
> eventually until we have to restart the IMAP daemons which gives us another 5 
> to 20 minutes.
> 
> It doesn’t really matter if we run `reconstruct` in the background or not.
> 
> 
> # Observations:
> 
> 1. While healthy, the imapd daemons’s states are mostly `select` or `RUN`.  
> Once things get critical they all are mostly in `zfs` (but do occasionally 
> switch).
> 2. Customers report that their mail clients are downloading all e-mails.  
> That’s obviously extra bad given we seem to run in some kind of I/O problems. 
>  Running `truss` on busy imapd processes seem to confirm that.
> 3. Once hell breaks loose, IO collapses even on other file systems/hard disks.
> 4. `top` mentions processes in `lock` state – sometimes even more than 200.  
> That’s nothing we see on our other backends.
> 5. There seems to be a correlation between processes hanging in `zfs` state 
> and `truss` showing them accessing mailboxes.db.  Don’t know if it’s related, 
> but soon after the upgrade, mailboxes.db broke and we had to reconstruct it.
> 
> 
> # Additional key data:
> 
> - 25,000 accounts
> - 4.5 TB data
> - 64 GB RAM, no apparent swapping
> - 16 cores CPU
> - nginx in front of it.
> 
> ## zpool iostat 5
> 
>                capacity     operations    bandwidth
> pool        alloc   free   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> tank        4.52T   697G    144  2.03K  1.87M  84.2M
> tank        4.52T   697G     84    730  2.13M  3.94M
> tank        4.52T   697G    106    904  2.78M  4.52M
> tank        4.52T   697G    115    917  3.07M  5.11M
> tank        4.52T   697G    101   1016  4.04M  5.06M
> tank        4.52T   697G    124  1.03K  3.27M  6.59M
> 
> Which doesn’t look special.
> 
> The data used to be on HDDs and worked fine with an SSD ZIL.  After the 
> upgrade and ensuing problems we tried a Hail Mary by replacing the HDDs thru 
> SSDs to no avail (migrated a ZFS snapshot for that).
> 
> So we do *not* believe it’s really a traditional I/O bottleneck since it only 
> started *after* the upgrade to 2.5 and did not go away by adding SSDs.  The 
> change notes led us to believe that there shouldn’t be any I/O storm due to 
> mailbox conversions but is it true in any case?  How could we double check?  
> Observation #2 from above leads us to believe that there are in fact some 
> meta data problems.  We’re reconstructing in the background but that’s going 
> to take days; which is sadly time we don’t really have.
> 
> ## procstat -w 1 of an active imapd
> 
>   PID  PPID  PGID   SID  TSID THR LOGIN    WCHAN     EMUL          COMM       
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     -         FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     -         FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     -         FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     *vm objec FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     -         FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     zfs       FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     -         FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     select    FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     select    FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     select    FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     select    FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     select    FreeBSD ELF64 imapd      
>  
> 45016 43150 43150 43150     0   1 toor     select    FreeBSD ELF64 imapd
> 
> 
> Had anyone similar problems (and got them solved, ideally!)?
> 
> Are there any known incompatibilities between Cyrus 2.5.8 and FreeBSD/ZFS?
> 
> Has anyone ever successfully downgraded from 2.5.8 back to 2.4?
> 
> Do we have any other options?
> 
> Any help would be *very much* appreciated!
> —h
> ----
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


-- 
  Bron Gondwana
  br...@fastmail.fm
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Reply via email to