All,
Ok, it was indeed me.
Firewalld does not seem happy across boots when network manager is involved 
unless you use something like nm-connection-editor to put the nic in the zone 
you want... grrr....

Brian Andrus

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Andrus, Brian Contractor
Sent: Thursday, February 11, 2016 2:36 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Multipath devices with infernalis

All,

I have a set of hardware with a few systems connected via IB along with a DDN 
SFA12K.
There are 4 IB/SRP paths to each block device. Those show up as 
/dev/mapper/mpath[b-d]

I am trying to do an initial install/setup of ceph on 3 nodes. Each will be a 
monitor as well as host a single OSD.

I am using the ceph-deploy to do most of the heavy lifting (using CentOS 
7.2.1511).

Everything is quite successful installing monitors and even the first OSD.

ceph status shows:
    cluster 0d9e68e4-176d-4229-866b-d408f8055e5b
     health HEALTH_OK
     monmap e1: 3 mons at 
{ceph-1-35a=10.100.1.35:6789/0,ceph-1-35b=10.100.1.85:6789/0,ceph-1-36a=10.100.1.36:6789/0}
            election epoch 8, quorum 0,1,2 ceph-1-35a,ceph-1-36a,ceph-1-35b
     osdmap e5: 1 osds: 1 up, 1 in
            flags sortbitwise
      pgmap v8: 64 pgs, 1 pools, 0 bytes data, 0 objects
            40112 kB used, 43888 GB / 43889 GB avail
                  64 active+clean

But as soon as I try to add the next OSD on the next system using
ceph-deploy osd create ceph-1-35b:/dev/mapper/mpathc
things start acting up.
The last bit from the output seems ok:
[ceph-1-35b][INFO  ] checking OSD status...
[ceph-1-35b][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[ceph-1-35b][WARNIN] there is 1 OSD down
[ceph-1-35b][WARNIN] there is 1 OSD out
[ceph_deploy.osd][DEBUG ] Host ceph-1-35b is now ready for osd use.

But, ceph status is now:
    cluster 0d9e68e4-176d-4229-866b-d408f8055e5b
     health HEALTH_OK
     monmap e1: 3 mons at 
{ceph-1-35a=10.100.1.35:6789/0,ceph-1-35b=10.100.1.85:6789/0,ceph-1-36a=10.100.1.36:6789/0}
            election epoch 8, quorum 0,1,2 ceph-1-35a,ceph-1-36a,ceph-1-35b
     osdmap e6: 2 osds: 1 up, 1 in
            flags sortbitwise
      pgmap v10: 64 pgs, 1 pools, 0 bytes data, 0 objects
            40120 kB used, 43888 GB / 43889 GB avail
                  64 active+clean

And ceph osd tree:
ID WEIGHT   TYPE NAME           UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 42.86040 root default
-2 42.86040     host ceph-1-35a
0 42.86040         osd.0            up  1.00000          1.00000
1        0 osd.1                  down        0          1.00000

I don't understand why ceph-deploy didn't activate this one when it did for the 
first. The OSD is not mounted on the other box.
I can try to activate the down OSD (ceph-deploy disk activate 
ceph-1-35b:/dev/mapper/mpathc1:/dev/mapper/mpathc2)
Things look good for a bit:
    cluster 0d9e68e4-176d-4229-866b-d408f8055e5b
     health HEALTH_OK
     monmap e1: 3 mons at 
{ceph-1-35a=10.100.1.35:6789/0,ceph-1-35b=10.100.1.85:6789/0,ceph-1-36a=10.100.1.36:6789/0}
            election epoch 8, quorum 0,1,2 ceph-1-35a,ceph-1-36a,ceph-1-35b
     osdmap e8: 2 osds: 2 up, 2 in
            flags sortbitwise
      pgmap v14: 64 pgs, 1 pools, 0 bytes data, 0 objects
            74804 kB used, 87777 GB / 87778 GB avail
                  64 active+clean

But after about 1 minute, it goes down:
ceph osd tree
ID WEIGHT   TYPE NAME           UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 85.72079 root default
-2 42.86040     host ceph-1-35a
0 42.86040         osd.0            up  1.00000          1.00000
-3 42.86040     host ceph-1-35b
1 42.86040         osd.1          down  1.00000          1.00000

ceph status
    cluster 0d9e68e4-176d-4229-866b-d408f8055e5b
    health HEALTH_WARN
            1/2 in osds are down
     monmap e1: 3 mons at 
{ceph-1-35a=10.100.1.35:6789/0,ceph-1-35b=10.100.1.85:6789/0,ceph-1-36a=10.100.1.36:6789/0}
            election epoch 8, quorum 0,1,2 ceph-1-35a,ceph-1-36a,ceph-1-35b
     osdmap e9: 2 osds: 1 up, 2 in
            flags sortbitwise
      pgmap v15: 64 pgs, 1 pools, 0 bytes data, 0 objects
            74804 kB used, 87777 GB / 87778 GB avail
                  64 active+clean

Has anyone played with getting multipath devices to work?
Of course it could be something completely different and I need to step back 
and see what step is failing. Any insight into where to dig would be 
appreciated.

Thanks in advance,
Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to