Re: [ceph-users] Ceph Maintenance

Mike Jacobacci Tue, 29 Nov 2016 18:33:13 -0800

I forgot to add:


On Tue, Nov 29, 2016 at 6:28 PM, Mike Jacobacci <mi...@flowjo.com> wrote:

> So it looks like the journal partition is mounted:
>
> ls -lah /var/lib/ceph/osd/ceph-0/journal
> lrwxrwxrwx. 1 ceph ceph 9 Oct 10 16:11 /var/lib/ceph/osd/ceph-0/journal
> -> /dev/sdb1
>
> Here is the output of journalctl -xe when I try to start the
> ceph-diak@dev-sdb1 service:
>
> sh[17481]: mount_activate: Failed to activate
> sh[17481]: unmount: Unmounting /var/lib/ceph/tmp/mnt.m9ek7W
> sh[17481]: command_check_call: Running command: /bin/umount --
> /var/lib/ceph/tmp/mnt.m9ek7W
> sh[17481]: Traceback (most recent call last):
> sh[17481]: File "/usr/sbin/ceph-disk", line 9, in <module>
> sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts',
> 'ceph-disk')()
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 5011, in run
> sh[17481]: main(sys.argv[1:])
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 4962, in main
> sh[17481]: args.func(args)
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 4720, in <lambda>
> sh[17481]: func=lambda args: main_activate_space(name, args),
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 3739, in main_activate_space
> sh[17481]: reactivate=args.reactivate,
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 3073, in mount_activate
> sh[17481]: (osd_id, cluster) = activate(path, activate_key_template, init)
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 3220, in activate
> sh[17481]: ' with fsid %s' % ceph_fsid)
> sh[17481]: ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph
> with fsid e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9
> sh[17481]: Traceback (most recent call last):
> sh[17481]: File "/usr/sbin/ceph-disk", line 9, in <module>
> sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts',
> 'ceph-disk')()
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 5011, in run
> sh[17481]: main(sys.argv[1:])
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 4962, in main
> sh[17481]: args.func(args)
> sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
> line 4399, in main_trigger
> sh[17481]: raise Error('return code ' + str(ret))
> sh[17481]: ceph_disk.main.Error: Error: return code 1
> systemd[1]: ceph-disk@dev-sdb1.service: main process exited, code=exited,
> status=1/FAILURE
> systemd[1]: Failed to start Ceph disk activation: /dev/sdb1.
>
> I dont understand this error:
> ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph with fsid
> e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9
>
> My fsid in ceph.conf is:
> fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8
>
> I don't know why the fsid would change or be different. I thought I had a
> basic cluster setup, I don't understand what's going wrong.
>
> Mike
>
> On Tue, Nov 29, 2016 at 5:15 PM, Mike Jacobacci <mi...@flowjo.com> wrote:
>
>> Hi John,
>>
>> Thanks I wasn't sure if something happened to the journal partitions or
>> not.
>>
>> Right now, the ceph-osd.0-9 services are back up and the cluster health
>> is good, but none of the ceph-disk@dev-sd* services are running.   How
>> can I get the Journal partitions mounted again?
>>
>> Cheers,
>> Mike
>>
>> On Tue, Nov 29, 2016 at 4:30 PM, John Petrini <jpetr...@coredial.com>
>> wrote:
>>
>>> Also, don't run sgdisk again; that's just for creating the journal
>>> partitions. ceph-disk is a service used for prepping disks, only the OSD
>>> services need to be running as far as I know. Are the ceph-osd@x.
>>> services running now that you've mounted the disks?
>>>
>>> ___
>>>
>>> John Petrini
>>>
>>> NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
>>>    //   [image: Twitter] <https://twitter.com/coredial>   [image:
>>> LinkedIn] <http://www.linkedin.com/company/99631>   [image: Google Plus]
>>> <https://plus.google.com/104062177220750809525/posts>   [image: Blog]
>>> <http://success.coredial.com/blog>
>>> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
>>> *P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
>>> jpetr...@coredial.com
>>>
>>> [image: Exceptional people. Proven Processes. Innovative Technology.
>>> Discover CoreDial - watch our video]
>>> <http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>
>>>
>>> The information transmitted is intended only for the person or entity to
>>> which it is addressed and may contain confidential and/or privileged
>>> material. Any review, retransmission,  dissemination or other use of, or
>>> taking of any action in reliance upon, this information by persons or
>>> entities other than the intended recipient is prohibited. If you received
>>> this in error, please contact the sender and delete the material from any
>>> computer.
>>>
>>> On Tue, Nov 29, 2016 at 7:27 PM, John Petrini <jpetr...@coredial.com>
>>> wrote:
>>>
>>>> What command are you using to start your OSD's?
>>>>
>>>> ___
>>>>
>>>> John Petrini
>>>>
>>>> NOC Systems Administrator   //   *CoreDial, LLC*   //   coredial.com
>>>>    //   [image: Twitter] <https://twitter.com/coredial>   [image:
>>>> LinkedIn] <http://www.linkedin.com/company/99631>   [image: Google
>>>> Plus] <https://plus.google.com/104062177220750809525/posts>   [image:
>>>> Blog] <http://success.coredial.com/blog>
>>>> Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
>>>> *P: *215.297.4400 x232   //   *F: *215.297.4401   //   *E: *
>>>> jpetr...@coredial.com
>>>>
>>>> [image: Exceptional people. Proven Processes. Innovative Technology.
>>>> Discover CoreDial - watch our video]
>>>> <http://cta-redirect.hubspot.com/cta/redirect/210539/4c492538-6e4b-445e-9480-bef676787085>
>>>>
>>>> The information transmitted is intended only for the person or entity
>>>> to which it is addressed and may contain confidential and/or privileged
>>>> material. Any review, retransmission,  dissemination or other use of, or
>>>> taking of any action in reliance upon, this information by persons or
>>>> entities other than the intended recipient is prohibited. If you received
>>>> this in error, please contact the sender and delete the material from any
>>>> computer.
>>>>
>>>> On Tue, Nov 29, 2016 at 7:19 PM, Mike Jacobacci <mi...@flowjo.com>
>>>> wrote:
>>>>
>>>>> I was able to bring the osd's up by looking at my other OSD node which
>>>>> is the exact same hardware/disks and finding out which disks map.  But I
>>>>> still cant bring up any of the start ceph-disk@dev-sd* services...
>>>>> When I first installed the cluster and got the OSD's up, I had to run the
>>>>> following:
>>>>>
>>>>> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>>>>
>>>>> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>>>>
>>>>> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>>>>
>>>>> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>>>>
>>>>> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdb
>>>>>
>>>>> # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>>>>
>>>>> # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>>>>
>>>>> # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>>>>
>>>>> # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>>>>
>>>>> # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b80ceff106 /dev/sdc
>>>>>
>>>>>
>>>>> Do i need to run that again?
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Mike
>>>>>
>>>>> On Tue, Nov 29, 2016 at 4:13 PM, Sean Redmond <sean.redmo...@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Normally they mount based upon the gpt label, if it's not working you
>>>>>> can mount the disk under /mnt and then cat the file called whoami to find
>>>>>> out the osd number
>>>>>>
>>>>>> On 29 Nov 2016 23:56, "Mike Jacobacci" <mi...@flowjo.com> wrote:
>>>>>>
>>>>>>> OK I am in some trouble now and would love some help!  After
>>>>>>> updating none of the OSDs on the node will come back up:
>>>>>>>
>>>>>>> ● ceph-disk@dev-sdb1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdb1
>>>>>>> ● ceph-disk@dev-sdb2.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdb2
>>>>>>> ● ceph-disk@dev-sdb3.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdb3
>>>>>>> ● ceph-disk@dev-sdb4.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdb4
>>>>>>> ● ceph-disk@dev-sdb5.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdb5
>>>>>>> ● ceph-disk@dev-sdc1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdc1
>>>>>>> ● ceph-disk@dev-sdc2.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdc2
>>>>>>> ● ceph-disk@dev-sdc3.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdc3
>>>>>>> ● ceph-disk@dev-sdc4.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdc4
>>>>>>> ● ceph-disk@dev-sdc5.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdc5
>>>>>>> ● ceph-disk@dev-sdd1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdd1
>>>>>>> ● ceph-disk@dev-sde1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sde1
>>>>>>> ● ceph-disk@dev-sdf1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdf1
>>>>>>> ● ceph-disk@dev-sdg1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdg1
>>>>>>> ● ceph-disk@dev-sdh1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdh1
>>>>>>> ● ceph-disk@dev-sdi1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdi1
>>>>>>> ● ceph-disk@dev-sdj1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdj1
>>>>>>> ● ceph-disk@dev-sdk1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdk1
>>>>>>> ● ceph-disk@dev-sdl1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdl1
>>>>>>> ● ceph-disk@dev-sdm1.service
>>>>>>>        loaded failed failed    Ceph disk activation: /dev/sdm1
>>>>>>> ● ceph-osd@0.service
>>>>>>>        loaded failed failed    Ceph object storage daemon
>>>>>>> ● ceph-osd@1.service
>>>>>>>        loaded failed failed    Ceph object storage daemon
>>>>>>> ● ceph-osd@2.service
>>>>>>>        loaded failed failed    Ceph object storage daemon
>>>>>>> ● ceph-osd@3.service
>>>>>>>        loaded failed failed    Ceph object storage daemon
>>>>>>> ● ceph-osd@4.service
>>>>>>>        loaded failed failed    Ceph object storage daemon
>>>>>>> ● ceph-osd@5.service
>>>>>>>        loaded failed failed    Ceph object storage daemon
>>>>>>> ● ceph-osd@6.service
>>>>>>>        loaded failed failed    Ceph object storage daemon
>>>>>>> ● ceph-osd@7.service
>>>>>>>        loaded failed failed    Ceph object storage daemon
>>>>>>> ● ceph-osd@8.service
>>>>>>>        loaded failed failed    Ceph object storage daemon
>>>>>>> ● ceph-osd@9.service
>>>>>>>        loaded failed failed    Ceph object storage daemon
>>>>>>>
>>>>>>> I did some searching and saw that the issue is that the disks aren't
>>>>>>> mounting... My question is how can I mount them correctly again (note 
>>>>>>> sdb
>>>>>>> and sdc are ssd for cache)? I am not sure which disk maps to ceph-osd@0
>>>>>>> and so on.  Also, can I add them to /etc/fstab to work around?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Mike
>>>>>>>
>>>>>>> On Tue, Nov 29, 2016 at 10:41 AM, Mike Jacobacci <mi...@flowjo.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I would like to install OS updates on the ceph cluster and activate
>>>>>>>> a second 10gb port on the OSD nodes, so I wanted to verify the correct
>>>>>>>> steps to perform maintenance on the cluster.  We are only using rbd to 
>>>>>>>> back
>>>>>>>> our xenserver vm's at this point, and our cluster consists of 3 OSD 
>>>>>>>> nodes,
>>>>>>>> 3 Mon nodes and 1 admin node...  So would this be the correct steps:
>>>>>>>>
>>>>>>>> 1. Shut down VM's?
>>>>>>>> 2. run "ceph osd set noout" on admin node
>>>>>>>> 3. install updates on each monitoring node and reboot one at a time.
>>>>>>>> 4. install updates on OSD nodes and activate second 10gb port,
>>>>>>>> reboot one OSD node at a time
>>>>>>>> 5. once all nodes back up, run "ceph osd unset noout"
>>>>>>>> 6. bring VM's back online
>>>>>>>>
>>>>>>>> Does this sound correct?
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Mike
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list
>>>>>>> ceph-users@lists.ceph.com
>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>
>>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Maintenance

Reply via email to