Re: [ceph-users] Multi-MDS Failover

2018-05-19 Thread Blair Bethwaite
On 19 May 2018 at 09:20, Scottix wrote: > It would be nice to have an option to have all IO blocked if it hits a > degraded state until it recovers. Since you are unaware of other MDS state, > seems like that would be tough to do. I agree this would be a nice knob to have from the perspective o

Re: [ceph-users] Multi-MDS Failover

2018-05-18 Thread Scottix
So we have been testing this quite a bit, having the failure domain as partially available is ok for us but odd, since we don't know what will be down. Compared to a single MDS we know everything will be blocked. It would be nice to have an option to have all IO blocked if it hits a degraded state

Re: [ceph-users] Multi-MDS Failover

2018-05-18 Thread Gregory Farnum
On Fri, May 18, 2018 at 11:56 AM Webert de Souza Lima wrote: > Hello, > > > On Mon, Apr 30, 2018 at 7:16 AM Daniel Baumann > wrote: > >> additionally: if rank 0 is lost, the whole FS stands still (no new >> client can mount the fs; no existing client can change a directory, etc.). >> >> my guess

Re: [ceph-users] Multi-MDS Failover

2018-05-18 Thread Webert de Souza Lima
Hello, On Mon, Apr 30, 2018 at 7:16 AM Daniel Baumann wrote: > additionally: if rank 0 is lost, the whole FS stands still (no new > client can mount the fs; no existing client can change a directory, etc.). > > my guess is that the root of a cephfs (/; which is always served by rank > 0) is nee

Re: [ceph-users] Multi-MDS Failover

2018-04-30 Thread Daniel Baumann
On 04/27/2018 07:11 PM, Patrick Donnelly wrote: > The answer is that there may be partial availability from > the up:active ranks which may hand out capabilities for the subtrees > they manage or no availability if that's not possible because it > cannot obtain the necessary locks. additionally: i

Re: [ceph-users] Multi-MDS Failover

2018-04-27 Thread Patrick Donnelly
On Thu, Apr 26, 2018 at 7:04 PM, Scottix wrote: > Ok let me try to explain this better, we are doing this back and forth and > its not going anywhere. I'll just be as genuine as I can and explain the > issue. > > What we are testing is a critical failure scenario and actually more of a > real worl

Re: [ceph-users] Multi-MDS Failover

2018-04-26 Thread Scottix
Ok let me try to explain this better, we are doing this back and forth and its not going anywhere. I'll just be as genuine as I can and explain the issue. What we are testing is a critical failure scenario and actually more of a real world scenario. Basically just what happens when it is 1AM and t

Re: [ceph-users] Multi-MDS Failover

2018-04-26 Thread Patrick Donnelly
On Thu, Apr 26, 2018 at 4:40 PM, Scottix wrote: >> Of course -- the mons can't tell the difference! > That is really unfortunate, it would be nice to know if the filesystem has > been degraded and to what degree. If a rank is laggy/crashed, the file system as a whole is generally unavailable. The

Re: [ceph-users] Multi-MDS Failover

2018-04-26 Thread Scottix
> Of course -- the mons can't tell the difference! That is really unfortunate, it would be nice to know if the filesystem has been degraded and to what degree. > You must have standbys for high availability. This is the docs. Ok but what if you have your standby go down and a master go down. This

Re: [ceph-users] Multi-MDS Failover

2018-04-26 Thread Patrick Donnelly
On Thu, Apr 26, 2018 at 3:16 PM, Scottix wrote: > Updated to 12.2.5 > > We are starting to test multi_mds cephfs and we are going through some > failure scenarios in our test cluster. > > We are simulating a power failure to one machine and we are getting mixed > results of what happens to the fil

[ceph-users] Multi-MDS Failover

2018-04-26 Thread Scottix
Updated to 12.2.5 We are starting to test multi_mds cephfs and we are going through some failure scenarios in our test cluster. We are simulating a power failure to one machine and we are getting mixed results of what happens to the file system. This is the status of the mds once we simulate the