[ceph-users] ceph-deploy - problem creating an osd

Markus Goldberg Wed, 11 Jun 2014 06:31:32 -0700

Hi,

ceph-deploy-1.5.3 can make trouble, if a reboot is done betweenpreparation and aktivation of an osd:

The osd-disk was /dev/sdb at this time, osd itself should go to sdb1,formatted to cleared, journal should go to sdb2, formatted to btrfs

I prepared an osd:

root@bd-a:/etc/ceph# ceph-deploy -v --overwrite-conf osd --fs-type btrfsprepare bd-1:/dev/sdb1:/dev/sdb2[ceph_deploy.conf][DEBUG ] found configuration file at:/root/.cephdeploy.conf[ceph_deploy.cli][INFO ] Invoked (1.5.3): /usr/bin/ceph-deploy -v--overwrite-conf osd --fs-type btrfs prepare bd-1:/dev/sdb1:/dev/sdb2[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disksbd-1:/dev/sdb1:/dev/sdb2

[bd-1][DEBUG ] connected to host: bd-1
[bd-1][DEBUG ] detect platform information from remote host
[bd-1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] Deploying osd to bd-1
[bd-1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf

[bd-1][INFO ] Running command: udevadm trigger --subsystem-match=block--action=add[ceph_deploy.osd][DEBUG ] Preparing host bd-1 disk /dev/sdb1 journal/dev/sdb2 activate False[bd-1][INFO ] Running command: ceph-disk-prepare --fs-type btrfs--cluster ceph -- /dev/sdb1 /dev/sdb2

[bd-1][DEBUG ]
[bd-1][DEBUG ] WARNING! - Btrfs v3.12 IS EXPERIMENTAL
[bd-1][DEBUG ] WARNING! - see http://btrfs.wiki.kernel.org before using
[bd-1][DEBUG ]
[bd-1][DEBUG ] fs created label (null) on /dev/sdb1
[bd-1][DEBUG ]  nodesize 32768 leafsize 32768 sectorsize 4096 size 19.99TiB
[bd-1][DEBUG ] Btrfs v3.12

[bd-1][WARNIN] WARNING:ceph-disk:OSD will not be hot-swappable ifjournal is not the same device as the osd data[bd-1][WARNIN] Turning ON incompat feature 'extref': increased hardlinklimit per file to 65536[bd-1][WARNIN] Error: Partition(s) 1 on /dev/sdb1 have been written, butwe have been unable to inform the kernel of the change, probably becauseit/they are in use. As a result, the old partition(s) will remain inuse. You should reboot now before making further changes.

[bd-1][INFO  ] checking OSD status...
[bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host bd-1 is now ready for osd use.
Unhandled exception in thread started by
sys.excepthook is missing
lost sys.stderr

ceph-deploy told me to do a reboot, so i did.

After the reboot the osd-disk changed from sdb to sda. This is a knownproblem of linux (ubuntu)


root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sda1:/dev/sda2

[ceph_deploy.conf][DEBUG ] found configuration file at:/root/.cephdeploy.conf[ceph_deploy.cli][INFO ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osdactivate bd-1:/dev/sda1:/dev/sda2[ceph_deploy.osd][DEBUG ] Activating cluster ceph disksbd-1:/dev/sda1:/dev/sda2

[bd-1][DEBUG ] connected to host: bd-1
[bd-1][DEBUG ] detect platform information from remote host
[bd-1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sda1
[ceph_deploy.osd][DEBUG ] will use init type: upstart

[bd-1][INFO ] Running command: ceph-disk-activate --mark-init upstart--mount /dev/sda1

[bd-1][WARNIN] got monmap epoch 1
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument

[bd-1][WARNIN] 2014-06-10 11:45:07.222697 7f5c111af800 -1 journal check:ondisk fsid c8ce6ee2-f21b-4ba3-a20e-649224244b9a doesn't match expectedfcaaf66f-b7b7-4702-83a4-54832b7131fa, invalid (someone else's?) journal

[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument
[bd-1][WARNIN]  HDIO_DRIVE_CMD(identify) failed: Invalid argument

[bd-1][WARNIN] 2014-06-10 11:45:08.125384 7f5c111af800 -1filestore(/var/lib/ceph/tmp/mnt.LryOxo) could not find23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory[bd-1][WARNIN] 2014-06-10 11:45:08.320327 7f5c111af800 -1 created objectstore /var/lib/ceph/tmp/mnt.LryOxo journal/var/lib/ceph/tmp/mnt.LryOxo/journal for osd.4 fsid08066b4a-3f36-4e3f-bd1e-15c006a09057[bd-1][WARNIN] 2014-06-10 11:45:08.320367 7f5c111af800 -1 auth: errorreading file: /var/lib/ceph/tmp/mnt.LryOxo/keyring: can't open/var/lib/ceph/tmp/mnt.LryOxo/keyring: (2) No such file or directory[bd-1][WARNIN] 2014-06-10 11:45:08.320419 7f5c111af800 -1 created newkey in keyring /var/lib/ceph/tmp/mnt.LryOxo/keyring

[bd-1][WARNIN] added key for osd.4
[bd-1][INFO  ] checking OSD status...
[bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[bd-1][WARNIN] there are 2 OSDs down
[bd-1][WARNIN] there are 2 OSDs out
root@bd-a:/etc/ceph# ceph -s
    cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057

health HEALTH_WARN 679 pgs degraded; 992 pgs stuck unclean;recovery 19/60 objects degraded (31.667%); clock skew detected on mon.bd-1monmap e1: 3 mons at{bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},election epoch 4034, quorum 0,1,2 bd-0,bd-1,bd-2

     mdsmap e2815: 1/1/1 up {0=bd-2=up:active}, 2 up:standby
     osdmap e1717: 6 osds: 4 up, 4 in
      pgmap v46008: 992 pgs, 11 pools, 544 kB data, 20 objects
            10324 MB used, 125 TB / 125 TB avail
            19/60 objects degraded (31.667%)
                   2 active
                 679 active+degraded
                 311 active+remapped
root@bd-a:/etc/ceph# ceph osd tree
# id    weight  type name       up/down reweight
-1      189.1   root default
-2      63.63           host bd-0
0       43.64                   osd.0   up      1
3       19.99                   osd.3   up      1
-3      63.63           host bd-1
1       43.64                   osd.1   down    0
4       19.99                   osd.4   down    0
-4      61.81           host bd-2
2       43.64                   osd.2   up      1
5       18.17                   osd.5   up      1

At this time i rebooted bd-1 once more and the osd-disk now was /dev/sdb.
So i tried once more to activate the osd:


root@bd-a:/etc/ceph# ceph-deploy -v osd activate bd-1:/dev/sdb1:/dev/sdb2

[ceph_deploy.conf][DEBUG ] found configuration file at:/root/.cephdeploy.conf[ceph_deploy.cli][INFO ] Invoked (1.5.3): /usr/bin/ceph-deploy -v osdactivate bd-1:/dev/sdb1:/dev/sdb2[ceph_deploy.osd][DEBUG ] Activating cluster ceph disksbd-1:/dev/sdb1:/dev/sdb2

[bd-1][DEBUG ] connected to host: bd-1
[bd-1][DEBUG ] detect platform information from remote host
[bd-1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Ubuntu 14.04 trusty
[ceph_deploy.osd][DEBUG ] activating host bd-1 disk /dev/sdb1
[ceph_deploy.osd][DEBUG ] will use init type: upstart

[bd-1][INFO ] Running command: ceph-disk-activate --mark-init upstart--mount /dev/sdb1

[bd-1][INFO  ] checking OSD status...
[bd-1][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[bd-1][WARNIN] there are 2 OSDs down
[bd-1][WARNIN] there are 2 OSDs out
root@bd-a:/etc/ceph# ceph osd tree
# id    weight  type name       up/down reweight
-1      189.1   root default
-2      63.63           host bd-0
0       43.64                   osd.0   up      1
3       19.99                   osd.3   up      1
-3      63.63           host bd-1
1       43.64                   osd.1   down    0
4       19.99                   osd.4   down    0
-4      61.81           host bd-2
2       43.64                   osd.2   up      1
5       18.17                   osd.5   up      1
root@bd-a:/etc/ceph# ceph -s
    cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057

health HEALTH_WARN 679 pgs degraded; 992 pgs stuck unclean;recovery 10/60 objects degraded (16.667%); clock skew detected on mon.bd-1monmap e1: 3 mons at{bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},election epoch 4060, quorum 0,1,2 bd-0,bd-1,bd-2

     mdsmap e2823: 1/1/1 up {0=bd-2=up:active}, 2 up:standby
     osdmap e1759: 6 osds: 4 up, 4 in
      pgmap v46110: 992 pgs, 11 pools, 544 kB data, 20 objects
            10320 MB used, 125 TB / 125 TB avail
            10/60 objects degraded (16.667%)
                 679 active+degraded
                 313 active+remapped
root@bd-a:/etc/ceph#

After another reboot all was ok:

ceph -s
    cluster 08066b4a-3f36-4e3f-bd1e-15c006a09057
     health HEALTH_OK

monmap e1: 3 mons at{bd-0=xxx.xxx.xxx.20:6789/0,bd-1=xxx.xxx.xxx.21:6789/0,bd-2=xxx.xxx.xxx.22:6789/0},election epoch 4220, quorum 0,1,2 bd-0,bd-1,bd-2

     mdsmap e2895: 1/1/1 up {0=bd-2=up:active}, 2 up:standby
     osdmap e1939: 6 osds: 6 up, 6 in
      pgmap v47099: 992 pgs, 11 pools, 551 kB data, 20 objects
            117 MB used, 189 TB / 189 TB avail
                 992 active+clean
root@bd-a:~#

Is it possible for the author of ceph-deploy, to make the rebootneedlessly during these 2 steps ?

Then it would also be possible to use create instead of prepare+activate

Thank you,
  Markus

--
MfG,
  Markus Goldberg

--------------------------------------------------------------------------
Markus Goldberg       Universität Hildesheim
                      Rechenzentrum
Tel +49 5121 88392822 Marienburger Platz 22, D-31141 Hildesheim, Germany
Fax +49 5121 88392823 email goldb...@uni-hildesheim.de
--------------------------------------------------------------------------


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph-deploy - problem creating an osd

Reply via email to