you can ignore that, its a known issue http://tracker.ceph.com/issues/15990
regardless waht version of ceph are you running and what are the details of os version you updated to ? On Tue, Nov 29, 2016 at 7:12 PM, Mike Jacobacci <mi...@flowjo.com> wrote: > Found some more info, but getting weird... All three OSD nodes shows the > same unknown cluster message on all the OSD disks. I don't know where it > came from, all the nodes were configured using ceph-deploy on the admin > node. In any case, the OSD's seem to be up and running, the health is ok. > > no ceph-disk@ services are running on any of the OSD nodes which I didn't > notice before and each node was setup the exact same, yet there are > different services listed under systemctl: > > OSD NODE 1: > Output in earlier email > > OSD NODE 2: > > ● ceph-disk@dev-sdb1.service > loaded failed failed Ceph disk activation: /dev/sdb1 > > ● ceph-disk@dev-sdb2.service > loaded failed failed Ceph disk activation: /dev/sdb2 > > ● ceph-disk@dev-sdb5.service > loaded failed failed Ceph disk activation: /dev/sdb5 > > ● ceph-disk@dev-sdc2.service > loaded failed failed Ceph disk activation: /dev/sdc2 > > ● ceph-disk@dev-sdc4.service > loaded failed failed Ceph disk activation: /dev/sdc4 > > > OSD NODE 3: > > ● ceph-disk@dev-sdb1.service > loaded failed failed Ceph disk activation: /dev/sdb1 > > ● ceph-disk@dev-sdb3.service > loaded failed failed Ceph disk activation: /dev/sdb3 > > ● ceph-disk@dev-sdb4.service > loaded failed failed Ceph disk activation: /dev/sdb4 > > ● ceph-disk@dev-sdb5.service > loaded failed failed Ceph disk activation: /dev/sdb5 > > ● ceph-disk@dev-sdc2.service > loaded failed failed Ceph disk activation: /dev/sdc2 > > ● ceph-disk@dev-sdc3.service > loaded failed failed Ceph disk activation: /dev/sdc3 > > ● ceph-disk@dev-sdc4.service > loaded failed failed Ceph disk activation: /dev/sdc4 > > From my understanding, the disks have already been activated... Should > these services even be running or enabled? > > Mike > > > > On Tue, Nov 29, 2016 at 6:33 PM, Mike Jacobacci <mi...@flowjo.com> wrote: > >> Sorry about that... Here is the output of ceph-disk list: >> >> ceph-disk list >> /dev/dm-0 other, xfs, mounted on / >> /dev/dm-1 swap, swap >> /dev/dm-2 other, xfs, mounted on /home >> /dev/sda : >> /dev/sda2 other, LVM2_member >> /dev/sda1 other, xfs, mounted on /boot >> /dev/sdb : >> /dev/sdb1 ceph journal >> /dev/sdb2 ceph journal >> /dev/sdb3 ceph journal >> /dev/sdb4 ceph journal >> /dev/sdb5 ceph journal >> /dev/sdc : >> /dev/sdc1 ceph journal >> /dev/sdc2 ceph journal >> /dev/sdc3 ceph journal >> /dev/sdc4 ceph journal >> /dev/sdc5 ceph journal >> /dev/sdd : >> /dev/sdd1 ceph data, active, unknown cluster >> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.0 >> /dev/sde : >> /dev/sde1 ceph data, active, unknown cluster >> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.1 >> /dev/sdf : >> /dev/sdf1 ceph data, active, unknown cluster >> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.2 >> /dev/sdg : >> /dev/sdg1 ceph data, active, unknown cluster >> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.3 >> /dev/sdh : >> /dev/sdh1 ceph data, active, unknown cluster >> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.4 >> /dev/sdi : >> /dev/sdi1 ceph data, active, unknown cluster >> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.5 >> /dev/sdj : >> /dev/sdj1 ceph data, active, unknown cluster >> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.6 >> /dev/sdk : >> /dev/sdk1 ceph data, active, unknown cluster >> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.7 >> /dev/sdl : >> /dev/sdl1 ceph data, active, unknown cluster >> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.8 >> /dev/sdm : >> /dev/sdm1 ceph data, active, unknown cluster >> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.9 >> >> >> >> On Tue, Nov 29, 2016 at 6:32 PM, Mike Jacobacci <mi...@flowjo.com> wrote: >> >>> I forgot to add: >>> >>> >>> On Tue, Nov 29, 2016 at 6:28 PM, Mike Jacobacci <mi...@flowjo.com> >>> wrote: >>> >>>> So it looks like the journal partition is mounted: >>>> >>>> ls -lah /var/lib/ceph/osd/ceph-0/journal >>>> lrwxrwxrwx. 1 ceph ceph 9 Oct 10 16:11 /var/lib/ceph/osd/ceph-0/journal >>>> -> /dev/sdb1 >>>> >>>> Here is the output of journalctl -xe when I try to start the >>>> ceph-diak@dev-sdb1 service: >>>> >>>> sh[17481]: mount_activate: Failed to activate >>>> sh[17481]: unmount: Unmounting /var/lib/ceph/tmp/mnt.m9ek7W >>>> sh[17481]: command_check_call: Running command: /bin/umount -- >>>> /var/lib/ceph/tmp/mnt.m9ek7W >>>> sh[17481]: Traceback (most recent call last): >>>> sh[17481]: File "/usr/sbin/ceph-disk", line 9, in <module> >>>> sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts', >>>> 'ceph-disk')() >>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", >>>> line 5011, in run >>>> sh[17481]: main(sys.argv[1:]) >>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", >>>> line 4962, in main >>>> sh[17481]: args.func(args) >>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", >>>> line 4720, in <lambda> >>>> sh[17481]: func=lambda args: main_activate_space(name, args), >>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", >>>> line 3739, in main_activate_space >>>> sh[17481]: reactivate=args.reactivate, >>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", >>>> line 3073, in mount_activate >>>> sh[17481]: (osd_id, cluster) = activate(path, activate_key_template, >>>> init) >>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", >>>> line 3220, in activate >>>> sh[17481]: ' with fsid %s' % ceph_fsid) >>>> sh[17481]: ceph_disk.main.Error: Error: No cluster conf found in >>>> /etc/ceph with fsid e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9 >>>> sh[17481]: Traceback (most recent call last): >>>> sh[17481]: File "/usr/sbin/ceph-disk", line 9, in <module> >>>> sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts', >>>> 'ceph-disk')() >>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", >>>> line 5011, in run >>>> sh[17481]: main(sys.argv[1:]) >>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", >>>> line 4962, in main >>>> sh[17481]: args.func(args) >>>> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", >>>> line 4399, in main_trigger >>>> sh[17481]: raise Error('return code ' + str(ret)) >>>> sh[17481]: ceph_disk.main.Error: Error: return code 1 >>>> systemd[1]: ceph-disk@dev-sdb1.service: main process exited, >>>> code=exited, status=1/FAILURE >>>> systemd[1]: Failed to start Ceph disk activation: /dev/sdb1. >>>> >>>> I dont understand this error: >>>> ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph with >>>> fsid e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9 >>>> >>>> My fsid in ceph.conf is: >>>> fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8 >>>> >>>> I don't know why the fsid would change or be different. I thought I had >>>> a basic cluster setup, I don't understand what's going wrong. >>>> >>>> Mike >>>> >>>> On Tue, Nov 29, 2016 at 5:15 PM, Mike Jacobacci <mi...@flowjo.com> >>>> wrote: >>>> >>>>> Hi John, >>>>> >>>>> Thanks I wasn't sure if something happened to the journal partitions >>>>> or not. >>>>> >>>>> Right now, the ceph-osd.0-9 services are back up and the cluster >>>>> health is good, but none of the ceph-disk@dev-sd* services are >>>>> running. How can I get the Journal partitions mounted again? >>>>> >>>>> Cheers, >>>>> Mike >>>>> >>>>> On Tue, Nov 29, 2016 at 4:30 PM, John Petrini <jpetr...@coredial.com> >>>>> wrote: >>>>> >>>>>> Also, don't run sgdisk again; that's just for creating the journal >>>>>> partitions. ceph-disk is a service used for prepping disks, only the OSD >>>>>> services need to be running as far as I know. Are the ceph-osd@x. >>>>>> services running now that you've mounted the disks? >>>>>> >>>>>> ___ >>>>>> >>>>>> John Petrini >>>>>> >>>>>> NOC Systems Administrator // *CoreDial, LLC* // coredial.com >>>>>> // [image: Twitter] <https://twitter.com/coredial> [image: >>>>>> LinkedIn] <http://www.linkedin.com/company/99631> [image: Google >>>>>> Plus] <https://plus.google.com/104062177220750809525/posts> [image: >>>>>> Blog] <http://success.coredial.com/blog> >>>>>> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422 >>>>>> *P: *215.297.4400 x232 // *F: *215.297.4401 // *E: * >>>>>> jpetr...@coredial.com >>>>>> >>>>>> [image: Exceptional people. Proven Processes. Innovative Technology. >>>>>> Discover CoreDial - watch our video] >>>>>> <http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085> >>>>>> >>>>>> The information transmitted is intended only for the person or entity >>>>>> to which it is addressed and may contain confidential and/or privileged >>>>>> material. Any review, retransmission, dissemination or other use of, or >>>>>> taking of any action in reliance upon, this information by persons or >>>>>> entities other than the intended recipient is prohibited. If you received >>>>>> this in error, please contact the sender and delete the material from any >>>>>> computer. >>>>>> >>>>>> On Tue, Nov 29, 2016 at 7:27 PM, John Petrini <jpetr...@coredial.com> >>>>>> wrote: >>>>>> >>>>>>> What command are you using to start your OSD's? >>>>>>> >>>>>>> ___ >>>>>>> >>>>>>> John Petrini >>>>>>> >>>>>>> NOC Systems Administrator // *CoreDial, LLC* // coredial.com >>>>>>> // [image: Twitter] <https://twitter.com/coredial> [image: >>>>>>> LinkedIn] <http://www.linkedin.com/company/99631> [image: Google >>>>>>> Plus] <https://plus.google.com/104062177220750809525/posts> [image: >>>>>>> Blog] <http://success.coredial.com/blog> >>>>>>> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422 >>>>>>> *P: *215.297.4400 x232 // *F: *215.297.4401 // *E: * >>>>>>> jpetr...@coredial.com >>>>>>> >>>>>>> [image: Exceptional people. Proven Processes. Innovative Technology. >>>>>>> Discover CoreDial - watch our video] >>>>>>> <http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085> >>>>>>> >>>>>>> The information transmitted is intended only for the person or >>>>>>> entity to which it is addressed and may contain confidential and/or >>>>>>> privileged material. Any review, retransmission, dissemination or other >>>>>>> use of, or taking of any action in reliance upon, this information by >>>>>>> persons or entities other than the intended recipient is prohibited. If >>>>>>> you >>>>>>> received this in error, please contact the sender and delete the >>>>>>> material >>>>>>> from any computer. >>>>>>> >>>>>>> On Tue, Nov 29, 2016 at 7:19 PM, Mike Jacobacci <mi...@flowjo.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I was able to bring the osd's up by looking at my other OSD node >>>>>>>> which is the exact same hardware/disks and finding out which disks map. >>>>>>>> But I still cant bring up any of the start ceph-disk@dev-sd* >>>>>>>> services... When I first installed the cluster and got the OSD's up, I >>>>>>>> had >>>>>>>> to run the following: >>>>>>>> >>>>>>>> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb >>>>>>>> >>>>>>>> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb >>>>>>>> >>>>>>>> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb >>>>>>>> >>>>>>>> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb >>>>>>>> >>>>>>>> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb >>>>>>>> >>>>>>>> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc >>>>>>>> >>>>>>>> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc >>>>>>>> >>>>>>>> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc >>>>>>>> >>>>>>>> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc >>>>>>>> >>>>>>>> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc >>>>>>>> >>>>>>>> >>>>>>>> Do i need to run that again? >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Mike >>>>>>>> >>>>>>>> On Tue, Nov 29, 2016 at 4:13 PM, Sean Redmond < >>>>>>>> sean.redmo...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Normally they mount based upon the gpt label, if it's not working >>>>>>>>> you can mount the disk under /mnt and then cat the file called whoami >>>>>>>>> to >>>>>>>>> find out the osd number >>>>>>>>> >>>>>>>>> On 29 Nov 2016 23:56, "Mike Jacobacci" <mi...@flowjo.com> wrote: >>>>>>>>> >>>>>>>>>> OK I am in some trouble now and would love some help! After >>>>>>>>>> updating none of the OSDs on the node will come back up: >>>>>>>>>> >>>>>>>>>> ● ceph-disk@dev-sdb1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdb1 >>>>>>>>>> ● ceph-disk@dev-sdb2.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdb2 >>>>>>>>>> ● ceph-disk@dev-sdb3.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdb3 >>>>>>>>>> ● ceph-disk@dev-sdb4.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdb4 >>>>>>>>>> ● ceph-disk@dev-sdb5.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdb5 >>>>>>>>>> ● ceph-disk@dev-sdc1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdc1 >>>>>>>>>> ● ceph-disk@dev-sdc2.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdc2 >>>>>>>>>> ● ceph-disk@dev-sdc3.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdc3 >>>>>>>>>> ● ceph-disk@dev-sdc4.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdc4 >>>>>>>>>> ● ceph-disk@dev-sdc5.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdc5 >>>>>>>>>> ● ceph-disk@dev-sdd1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdd1 >>>>>>>>>> ● ceph-disk@dev-sde1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sde1 >>>>>>>>>> ● ceph-disk@dev-sdf1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdf1 >>>>>>>>>> ● ceph-disk@dev-sdg1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdg1 >>>>>>>>>> ● ceph-disk@dev-sdh1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdh1 >>>>>>>>>> ● ceph-disk@dev-sdi1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdi1 >>>>>>>>>> ● ceph-disk@dev-sdj1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdj1 >>>>>>>>>> ● ceph-disk@dev-sdk1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdk1 >>>>>>>>>> ● ceph-disk@dev-sdl1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdl1 >>>>>>>>>> ● ceph-disk@dev-sdm1.service >>>>>>>>>> loaded failed failed Ceph disk activation: /dev/sdm1 >>>>>>>>>> ● ceph-osd@0.service >>>>>>>>>> loaded failed failed Ceph object storage daemon >>>>>>>>>> ● ceph-osd@1.service >>>>>>>>>> loaded failed failed Ceph object storage daemon >>>>>>>>>> ● ceph-osd@2.service >>>>>>>>>> loaded failed failed Ceph object storage daemon >>>>>>>>>> ● ceph-osd@3.service >>>>>>>>>> loaded failed failed Ceph object storage daemon >>>>>>>>>> ● ceph-osd@4.service >>>>>>>>>> loaded failed failed Ceph object storage daemon >>>>>>>>>> ● ceph-osd@5.service >>>>>>>>>> loaded failed failed Ceph object storage daemon >>>>>>>>>> ● ceph-osd@6.service >>>>>>>>>> loaded failed failed Ceph object storage daemon >>>>>>>>>> ● ceph-osd@7.service >>>>>>>>>> loaded failed failed Ceph object storage daemon >>>>>>>>>> ● ceph-osd@8.service >>>>>>>>>> loaded failed failed Ceph object storage daemon >>>>>>>>>> ● ceph-osd@9.service >>>>>>>>>> loaded failed failed Ceph object storage daemon >>>>>>>>>> >>>>>>>>>> I did some searching and saw that the issue is that the disks >>>>>>>>>> aren't mounting... My question is how can I mount them correctly >>>>>>>>>> again >>>>>>>>>> (note sdb and sdc are ssd for cache)? I am not sure which disk maps >>>>>>>>>> to >>>>>>>>>> ceph-osd@0 and so on. Also, can I add them to /etc/fstab to >>>>>>>>>> work around? >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Mike >>>>>>>>>> >>>>>>>>>> On Tue, Nov 29, 2016 at 10:41 AM, Mike Jacobacci < >>>>>>>>>> mi...@flowjo.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I would like to install OS updates on the ceph cluster and >>>>>>>>>>> activate a second 10gb port on the OSD nodes, so I wanted to verify >>>>>>>>>>> the >>>>>>>>>>> correct steps to perform maintenance on the cluster. We are only >>>>>>>>>>> using rbd >>>>>>>>>>> to back our xenserver vm's at this point, and our cluster consists >>>>>>>>>>> of 3 OSD >>>>>>>>>>> nodes, 3 Mon nodes and 1 admin node... So would this be the >>>>>>>>>>> correct steps: >>>>>>>>>>> >>>>>>>>>>> 1. Shut down VM's? >>>>>>>>>>> 2. run "ceph osd set noout" on admin node >>>>>>>>>>> 3. install updates on each monitoring node and reboot one at a >>>>>>>>>>> time. >>>>>>>>>>> 4. install updates on OSD nodes and activate second 10gb port, >>>>>>>>>>> reboot one OSD node at a time >>>>>>>>>>> 5. once all nodes back up, run "ceph osd unset noout" >>>>>>>>>>> 6. bring VM's back online >>>>>>>>>>> >>>>>>>>>>> Does this sound correct? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Mike >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> ceph-users mailing list >>>>>>>>>> ceph-users@lists.ceph.com >>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> ceph-users@lists.ceph.com >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com