On Sun, 14 Jun 2015 17:19:31 -0500 Goldwyn Rodrigues <rgold...@suse.com> wrote:
> > > On 06/12/2015 01:46 PM, David Teigland wrote: > > When a node fails, its dirty areas get special treatment from other nodes > > using the area_resyncing() function. Should the suspend_list be created > > before any reads or writes from the file system are processed by md? It > > seems to me that gfs journal recovery could read/write to dirty regions > > (from the failed node) before md was finished setting up the suspend_list. > > md could probably prevent that by using the recover_prep() dlm callback to > > set a flag that would block any i/o that arrived before the suspend_list > > was ready. > > > > . > > Yes, we should call mddev_suspend() in recover_prep() and mddev_resume() > after suspend_list is created. Thanks for pointing it out. > The only thing that nodes need to be careful of between the time when some other node disappears and when that disappearance has been completely handled is reads. md/raid1 must ensure that if/when the filesystem reads from a region that the missing node was writing to, that the filesystem sees consistent data - on all nodes. So it needs to suspend read-balancing while it is uncertain. Once the bitmap from the node has been loaded, the normal protection against read-balancing in a "dirty" region is sufficient. While waiting for the bitmap to be loaded, the safe thing to do would be to disable read-balancing completely. So I think that recover_prep() should set a flag which disables all read balancing, and recover_done() (or similar) should clear that flag. Probably there should be one flag for each other node. Calling mddev_suspend to suspect all IO is over-kill. Suspending all read balancing is all that is needed. Thanks, NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/