On Mon, 9 May 2016, Nick Fisk wrote:
> Hi All,
> 
> I've been testing an active/active Samba cluster over CephFS, performance
> seems really good with small files compared to Gluster. Soft reboots work
> beautifully with little to no interruption in file access. However when I
> perform a hard shutdown/reboot of one of the samba nodes, the remaining node
> detects that the other Samba node has disappeared but then eventually bans
> itself. If I leave everything for around 5 minutes, CTDB unbans itself and
> then everything continues running.
> 
> From what I can work out it looks like as the MDS has a stale session from
> the powered down node, it won't let the remaining node access the CTDB lock
> file (which is also sitting the on the CephFS). CTDB meanwhile is hammering
> away trying to access the lock file, but it sees what it thinks is a split
> brain scenario because something still has a lock on the lockfile, and so
> bans itself.
> 
> I'm guessing the solution is to either reduce the mds session timeout or
> increase the amount of time/retries for CTDB, but I'm not sure what's the
> best approach. Does anyone have any ideas?

I believe Ira was looking at this exact issue, and addressed it by 
lowering the mds_session_timeout to 30 seconds?

sage

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to