Hi All, I created a single-node ceph cluster (v0.58) on a vm. Following is my conf file:
[global] auth client required = none auth cluster required = none auth service required = none [osd] osd journal data = 1000 filestore xattr use omap = true # osd data = /var/lib/ceph/osd/ceph-$id [mon.a] host = varunc3-virtual-machine mon addr = 10.72.148.201:6789 # mon data = /var/lib/ceph/mon/ceph-a [mds.a] host = varunc3-virtual-machine # mds data = /var/lib/ceph/mds/ceph-a [osd.0] host = varunc3-virtual-machine Here is the output of ceph -s: varunc@varunc3-virtual-machine:~$ ceph -s health HEALTH_WARN 392 pgs degraded; 392 pgs stuck unclean; mds a is laggy monmap e1: 1 mons at {a=10.72.148.201:6789/0}, election epoch 1, quorum 0 a osdmap e45: 1 osds: 1 up, 1 in pgmap v177: 392 pgs: 392 active+degraded; 0 bytes data, 13007 MB used, 62744 MB / 79745 MB avail mdsmap e946: 1/1/1 up {0=a=up:replay(laggy or crashed)} I believe due to this, I am not able to mount the ceph file system. I tried going through the mds-log, but could not understand much. I am pasting a part of it which shows errors (should I paste the whole thing?): 152 -29> 2013-03-26 19:25:58.301027 b4781b40 1 -- 10.72.148.201:6800/16609 <== mon.0 10.72.148.201:6789/0 10 ==== mdsbeacon(4897/a up:replay seq 2 v909) v2 ==== 103+0+0 (1650300491 0 0) 0x9e2e380 con 0x9e36200 153 -28> 2013-03-26 19:26:00.824303 b1d7ab40 0 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49340 s=1 pgs=0 cs=0 l=1).co nnect claims to be 10.72.148.201:6801/16695not 10.72.148.201:6801/16036 - wrong node! 154 -27> 2013-03-26 19:26:00.824384 b1d7ab40 2 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49340 s=1 pgs=0 cs=0 l=1).fa ult 107: Transport endpoint is not connected 155 -26> 2013-03-26 19:26:02.300921 b257bb40 10 monclient: _send_mon_message to mon.a at 10.72.148.201:6789/0 156 -25> 2013-03-26 19:26:02.300954 b257bb40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6789/0 -- mdsbeacon(4897/a up:replay seq 3 v909) v2 -- ?+0 0 x9e2e8c0 con 0x9e36200 157 -24> 2013-03-26 19:26:02.301264 b4781b40 1 -- 10.72.148.201:6800/16609 <== mon.0 10.72.148.201:6789/0 11 ==== mdsbeacon(4897/a up:replay seq 3 v909) v2 ==== 103+0+0 (460647212 0 0) 0x9e2ec40 con 0x9e36200 158 -23> 2013-03-26 19:26:06.301163 b257bb40 10 monclient: _send_mon_message to mon.a at 10.72.148.201:6789/0 159 -22> 2013-03-26 19:26:06.301200 b257bb40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6789/0 -- mdsbeacon(4897/a up:replay seq 4 v909) v2 -- ?+0 0 x9e2e700 con 0x9e36200 160 -21> 2013-03-26 19:26:06.301512 b4781b40 1 -- 10.72.148.201:6800/16609 <== mon.0 10.72.148.201:6789/0 12 ==== mdsbeacon(4897/a up:replay seq 4 v909) v2 ==== 103+0+0 (1900474344 0 0) 0x9e2ea80 con 0x9e36200 161 -20> 2013-03-26 19:26:07.224712 b1d7ab40 0 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49341 s=1 pgs=0 cs=0 l=1).co nnect claims to be 10.72.148.201:6801/16695not 10.72.148.201:6801/16036 - wrong node! 162 -19> 2013-03-26 19:26:07.224782 b1d7ab40 2 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49341 s=1 pgs=0 cs=0 l=1).fa ult 107: Transport endpoint is not connected 163 -18> 2013-03-26 19:26:07.299025 b377fb40 10 monclient: tick 164 -17> 2013-03-26 19:26:07.299047 b377fb40 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2013-03-26 19:25:37.299046) 165 -16> 2013-03-26 19:26:07.299072 b377fb40 10 monclient: renew subs? (now: 2013-03-26 19:26:07.299071; renew after: 2013-03-26 19:28:24.298915) -- no 166 -15> 2013-03-26 19:26:09.300863 b257bb40 10 monclient: renew_subs 167 -14> 2013-03-26 19:26:09.300892 b257bb40 10 monclient: _send_mon_message to mon.a at 10.72.148.201:6789/0 168 -13> 2013-03-26 19:26:09.300911 b257bb40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6789/0 -- mon_subscribe({mdsmap=910+,monmap=2+,osdmap=42}) v 2 -- ?+0 0x9e35360 con 0x9e36200 169 -12> 2013-03-26 19:26:09.301011 b257bb40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16036 -- ping v1 -- ?+0 0x9e35d80 con 0x9e36400 170 -11> 2013-03-26 19:26:09.301341 b1d7ab40 0 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49342 s=1 pgs=0 cs=0 l=1).co nnect claims to be 10.72.148.201:6801/16695not 10.72.148.201:6801/16036 - wrong node! 171 -10> 2013-03-26 19:26:09.301409 b1d7ab40 2 -- 10.72.148.201:6800/16609 >> 10.72.148.201:6801/16036 pipe(0x9e2e540 sd=17 :49342 s=1 pgs=0 cs=0 l=1).fa ult 107: Transport endpoint is not connected 172 -9> 2013-03-26 19:26:09.301812 b4781b40 1 -- 10.72.148.201:6800/16609 <== mon.0 10.72.148.201:6789/0 13 ==== osd_map(42..45 src has 1..45) v3 ==== 1 167+0+0 (3338985292 0 0) 0x9e2dc60 con 0x9e36200 173 -8> 2013-03-26 19:26:09.301887 b4781b40 1 -- 10.72.148.201:6800/16609 mark_down 0x9e36400 -- 0x9e2e540 174 -7> 2013-03-26 19:26:09.302019 b4781b40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16695 -- osd_op(mds.0.14:1 mds0_inotable [read 0~0] 1.b 852b893 RETRY) v4 -- ?+0 0x9e26900 con 0x9e36700 175 -6> 2013-03-26 19:26:09.302036 b4781b40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16695 -- osd_op(mds.0.14:2 mds0_sessionmap [read 0~0] 1 .3270c60b RETRY) v4 -- ?+0 0x9e26d80 con 0x9e36700 176 -5> 2013-03-26 19:26:09.302051 b4781b40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16695 -- osd_op(mds.0.14:3 mds_anchortable [read 0~0] 1 .a977f6a7 RETRY) v4 -- ?+0 0x9e4d600 con 0x9e36700 177 -4> 2013-03-26 19:26:09.302060 b4781b40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16695 -- osd_op(mds.0.14:4 mds_snaptable [read 0~0] 1.d 90270ad RETRY) v4 -- ?+0 0x9e4d480 con 0x9e36700 178 -3> 2013-03-26 19:26:09.302073 b4781b40 1 -- 10.72.148.201:6800/16609 --> 10.72.148.201:6801/16695 -- osd_op(mds.0.14:5 200.00000000 [read 0~0] 1.84 4f3494 RETRY) v4 -- ?+0 0x9e4d300 con 0x9e36700 179 -2> 2013-03-26 19:26:09.302472 b4781b40 0 mds.0.14 ms_handle_connect on 10.72.148.201:6801/16695 180 -1> 2013-03-26 19:26:09.303976 b4781b40 1 -- 10.72.148.201:6800/16609 <== osd.0 10.72.148.201:6801/16695 1 ==== osd_op_reply(1 mds0_inotable [read 0 ~0] ack = -2 (No such file or directory)) v4 ==== 112+0+0 (3010998831 0 0) 0x9e2d2c0 con 0x9e36700 181 0> 2013-03-26 19:26:09.305543 b4781b40 -1 mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread b4781b40 tim e 2013-03-26 19:26:09.304022 182 mds/MDSTable.cc: 150: FAILED assert(0) How do I get the mds running? Regards, Varun
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com