On Fri, Jun 6, 2014 at 8:38 AM, David Jericho <david.jeri...@aarnet.edu.au> wrote: > Hi all, > > > > I did a bit of an experiment with multi-mds on firefly, and it worked fine > until one of the MDS crashed when rebalancing. It's not the end of the > world, and I could just start fresh with the cluster, but I'm keen to see if > this can be fixed as running multi-mds is something I would like to do in > production, as when it was working, it did reduce load and improve response > time significantly. > > > > The output of ceph mds dump is: > > > > dumped mdsmap epoch 1232 > > epoch 1232 > > flags 0 > > created 2014-03-24 23:24:35.584469 > > modified 2014-06-06 00:17:54.336201 > > tableserver 0 > > root 0 > > session_timeout 60 > > session_autoclose 300 > > max_file_size 1099511627776 > > last_failure 1227 > > last_failure_osd_epoch 24869 > > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap} > > max_mds 2 > > in 0,1 > > up {1=578616} > > failed > > stopped > > data_pools > 0,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,101,105 > > metadata_pool 1 > > inline_data disabled > > 578616: 10.60.8.18:6808/252227 'c' mds.1.36 up:resolve seq 2 > > 577576: 10.60.8.19:6800/58928 'd' mds.-1.0 up:standby seq 1 > > 577603: 10.60.8.2:6801/245281 'a' mds.-1.0 up:standby seq 1 > > 578623: 10.60.8.3:6800/75325 'b' mds.-1.0 up:standby seq 1 > > > > Modifying max_mds has no effect, and restarting/rebooting the cluster has no > effect. No matter what combination of commands I try with the ceph-mds > binary, or via the ceph tool, can I make a second MDS startup, causing mds.1 > to leave resolve and move to the next step. Running with -debug_mds 10 > provides no really enlightening information, nor does watching the mon logs. > At a guess, it's looking for mds.0 to communicate with. > >
please run mds -debug_mds 10 and send both mds' log to me Regards Yan, Zheng > > Anyone have some pointers? > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com