Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

Loic Dachary Sat, 11 Oct 2014 09:27:07 -0700

Hi Frederic,

I think the problem is that you have a package that has this bug 
http://tracker.ceph.com/issues/9747


Let me know if using the latest from EPEL ( i.e. what was created from 
https://dl.fedoraproject.org/pub/epel/7/SRPMS/c/ceph-0.80.5-8.el7.src.rpm ) 
solves the problem. I'm learning a lot about udev, centos and RHEL in the 
process ;-)

Cheers

On 11/10/2014 17:31, Loic Dachary wrote:
> Hi,
> 
> On RHEL7 it works as expected : 
> 
> [ubuntu@mira042 ~]$ sudo ceph-disk prepare /dev/sdg
> 
> ***************************************************************
> Found invalid GPT and valid MBR; converting MBR to GPT format.
> ***************************************************************
> 
> Information: Moved requested sector from 34 to 2048 in
> order to align on 2048-sector boundaries.
> The operation has completed successfully.
> partx: /dev/sdg: error adding partition 2
> Information: Moved requested sector from 10485761 to 10487808 in
> order to align on 2048-sector boundaries.
> The operation has completed successfully.
> meta-data=/dev/sdg1              isize=2048   agcount=4, agsize=60719917 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=0
> data     =                       bsize=4096   blocks=242879665, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
> log      =internal log           bsize=4096   blocks=118593, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> The operation has completed successfully.
> partx: /dev/sdg: error adding partitions 1-2
> [ubuntu@mira042 ~]$ df
> Filesystem     1K-blocks    Used Available Use% Mounted on
> /dev/sda1      974540108 3412868 971110856   1% /
> devtmpfs         8110996       0   8110996   0% /dev
> tmpfs            8130388       0   8130388   0% /dev/shm
> tmpfs            8130388   58188   8072200   1% /run
> tmpfs            8130388       0   8130388   0% /sys/fs/cgroup
> /dev/sdh1      971044288   34740 971009548   1% /var/lib/ceph/osd/ceph-0
> /dev/sdg1      971044288   33700 971010588   1% /var/lib/ceph/osd/ceph-1
> 
> There is an important difference though: RHEL7 does not use 
> https://github.com/ceph/ceph/blob/giant/src/ceph-disk-udev . It should not be 
> necessary for centos7 but it looks like it is in use since the debug you get 
> comes from it . There must be something wrong in the source package you are 
> using around this point 
> https://github.com/ceph/ceph/blob/giant/ceph.spec.in#L382
> 
> I checked 
> http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHOS/SRPMS/ceph-0.80.5-1.el7ost.src.rpm
>  and it is as expected. Could you let me know where you got the package from 
> ? And what is the version according to 
> 
> $ yum info ceph
> Installed Packages
> Name        : ceph
> Arch        : x86_64
> Epoch       : 1
> Version     : 0.80.5
> Release     : 8.el7
> Size        : 37 M
> Repo        : installed
> From repo   : epel
> Summary     : User space components of the Ceph file system
> URL         : http://ceph.com/
> License     : GPL-2.0
> Description : Ceph is a massively scalable, open-source, distributed
>             : storage system that runs on commodity hardware and delivers
>             : object, block and file system storage.
> 
> I'm not very familiar with RPMs and maybe Release     : 8.el7 means it is a 
> more recent version of the package : ceph-0.80.5-1.el7ost.src.rpm suggests 
> the Release should be 1.el7 and not 8.el7.
> 
> Cheers
> 
> On 10/10/2014 15:59, SCHAER Frederic wrote:
>> Hi Loic,
>>
>> Patched, and still not working (sorry)...
>> I'm attaching the prepare output, and also a different a "real " udev debug 
>> output I captured using " udevadm monitor --environment " (udev.log file)
>>
>> I added a "sync" command in ceph-disk-udev (this did not change a thing), 
>> and I noticed that udev script is called 3 times when adding one disk, and 
>> that the debug output was captured and then mixed all into one file.
>> This may lead to log mis-interpretation (race conditions ?)...
>> I changed a bit the logging in order to get one file per call and attached 
>> those logs to this mail.
>>
>> File timestamps are as follows :
>>   File: '/var/log/udev_ceph.log.out.22706'
>> Change: 2014-10-10 15:48:09.136386306 +0200
>>   File: '/var/log/udev_ceph.log.out.22749'
>> Change: 2014-10-10 15:48:11.502425395 +0200
>>   File: '/var/log/udev_ceph.log.out.22750'
>> Change: 2014-10-10 15:48:11.606427113 +0200
>>
>> Actually, I can reproduce the UUID=0 thing with this command :
>>
>> [root@ceph1 ~]# /usr/sbin/ceph-disk -v activate-journal /dev/sdc2
>> INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid 
>> --osd-journal /dev/sdc2
>> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> DEBUG:ceph-disk:Journal /dev/sdc2 has OSD UUID 
>> 00000000-0000-0000-0000-000000000000
>> INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- 
>> /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000
>> error: /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: No such 
>> file or directory
>> ceph-disk: Cannot discover filesystem type: device 
>> /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command 
>> '/sbin/blkid' returned non-zero exit status 2
>>
>> Ah - to answer previous mails :
>> - I tried to manually create the gpt partition table to see if things would 
>> improve, but this was not the case (I also tried to zero out the start and 
>> end of disks, and also to add random data)
>> - running ceph-disk prepare twice does not work, it's just that once every 
>> 20 (?) times it "surprisingly does not fail" on this hardware/os combination 
>> ;)
>>
>> Regards
>>
>> -----Message d'origine-----
>> De : Loic Dachary [mailto:[email protected]] 
>> Envoyé : vendredi 10 octobre 2014 14:37
>> À : SCHAER Frederic; [email protected]
>> Objet : Re: [ceph-users] ceph-dis prepare : 
>> UUID=00000000-0000-0000-0000-000000000000
>>
>> Hi Frederic,
>>
>> To be 100% sure it would be great if you could manually patch your local 
>> ceph-disk script and change 'partprobe', into 'partx', '-a', in 
>> https://github.com/ceph/ceph/blob/v0.80.6/src/ceph-disk#L1284
>>
>> ceph-disk zap
>> ceph-disk prepare
>>
>> and hopefully it will show up as it should. It works for me on centos7 but 
>> ...
>>
>> Cheers
>>
>> On 10/10/2014 14:33, Loic Dachary wrote:
>>> Hi Frederic,
>>>
>>> It looks like this is just because 
>>> https://github.com/ceph/ceph/blob/v0.80.6/src/ceph-disk#L1284 should call 
>>> partx instead of partprobe. The udev debug output makes this quite clear 
>>> http://tracker.ceph.com/issues/9721
>>>
>>> I think 
>>> https://github.com/dachary/ceph/commit/8d914001420e5bfc1e12df2d4882bfe2e1719a5c#diff-788c3cea6213c27f5fdb22f8337096d5R1285
>>>  fixes it
>>>
>>> Cheers
>>>
>>> On 09/10/2014 16:29, SCHAER Frederic wrote:
>>>>
>>>>
>>>> -----Message d'origine-----
>>>> De : Loic Dachary [mailto:[email protected]] 
>>>> Envoyé : jeudi 9 octobre 2014 16:20
>>>> À : SCHAER Frederic; [email protected]
>>>> Objet : Re: [ceph-users] ceph-dis prepare : 
>>>> UUID=00000000-0000-0000-0000-000000000000
>>>>
>>>>
>>>>
>>>> On 09/10/2014 16:04, SCHAER Frederic wrote:
>>>>> Hi Loic,
>>>>>
>>>>> Back on sdb, as the sde output was from another machine on which I ran 
>>>>> partx -u afterwards.
>>>>> To reply your last question first : I think the SG_IO error comes from 
>>>>> the fact that disks are exported as a single disks RAID0 on a PERC 6/E, 
>>>>> which does not support JBOD - this is decommissioned hardware on which 
>>>>> I'd like to test and validate we can use ceph for our use case...
>>>>>
>>>>> So back on the  UUID.
>>>>> It's funny : I retried and ceph-disk prepare worked this time. I tried on 
>>>>> another disk, and it failed.
>>>>> There is a difference in the output from ceph-disk : on the failing disk, 
>>>>> I have these extra lines after disks are prepared :
>>>>>
>>>>> (...)
>>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>>> Warning: The kernel is still using the old partition table.
>>>>> The new table will be used at the next reboot.
>>>>> The operation has completed successfully.
>>>>> partx: /dev/sdc: error adding partitions 1-2
>>>>>
>>>>> I didn't have the warning about the old partition tables on the disk that 
>>>>> worked. 
>>>>> So on this new disk, I have :
>>>>>
>>>>> [root@ceph1 ~]# mount /dev/sdc1 /mnt
>>>>> [root@ceph1 ~]# ll /mnt/
>>>>> total 16
>>>>> -rw-r--r-- 1 root root 37 Oct  9 15:58 ceph_fsid
>>>>> -rw-r--r-- 1 root root 37 Oct  9 15:58 fsid
>>>>> lrwxrwxrwx 1 root root 58 Oct  9 15:58 journal -> 
>>>>> /dev/disk/by-partuuid/5e50bb8b-0b99-455f-af71-10815a32bfbc
>>>>> -rw-r--r-- 1 root root 37 Oct  9 15:58 journal_uuid
>>>>> -rw-r--r-- 1 root root 21 Oct  9 15:58 magic
>>>>>
>>>>> [root@ceph1 ~]# cat /mnt/journal_uuid
>>>>> 5e50bb8b-0b99-455f-af71-10815a32bfbc
>>>>>
>>>>> [root@ceph1 ~]# sgdisk --info=1 /dev/sdc
>>>>> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
>>>>> Partition unique GUID: 244973DE-7472-421C-BB25-4B09D3F8D441
>>>>> First sector: 10487808 (at 5.0 GiB)
>>>>> Last sector: 1952448478 (at 931.0 GiB)
>>>>> Partition size: 1941960671 sectors (926.0 GiB)
>>>>> Attribute flags: 0000000000000000
>>>>> Partition name: 'ceph data'
>>>>>
>>>>> [root@ceph1 ~]# sgdisk --info=2 /dev/sdc
>>>>> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
>>>>> Partition unique GUID: 5E50BB8B-0B99-455F-AF71-10815A32BFBC
>>>>> First sector: 2048 (at 1024.0 KiB)
>>>>> Last sector: 10485760 (at 5.0 GiB)
>>>>> Partition size: 10483713 sectors (5.0 GiB)
>>>>> Attribute flags: 0000000000000000
>>>>> Partition name: 'ceph journal'
>>>>>
>>>>> Puzzling, isn't it ?
>>>>>
>>>>>
>>>>
>>>> Yes :-) Just to be 100% sure, when you try to activate this /dev/sdc it 
>>>> shows an error and complains that the journal uuid is 0000-000* etc ? If 
>>>> so could you copy your udev debug output ?
>>>>
>>>> Cheers
>>>>
>>>> [>- FS : -<]  
>>>>
>>>> No, when I manually activate the disk instead of attempting to go the udev 
>>>> way, it seems to work :
>>>> [root@ceph1 ~]# ceph-disk activate /dev/sdc1
>>>> got monmap epoch 1
>>>> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 
>>>> 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> 2014-10-09 16:21:43.286288 7f2be6a027c0 -1 journal check: ondisk fsid 
>>>> 00000000-0000-0000-0000-000000000000 doesn't match expected 
>>>> 244973de-7472-421c-bb25-4b09d3f8d441, invalid (someone else's?) journal
>>>> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 
>>>> 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 
>>>> 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 
>>>> 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> 2014-10-09 16:21:43.301957 7f2be6a027c0 -1 
>>>> filestore(/var/lib/ceph/tmp/mnt.4lJlzP) could not find 
>>>> 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>>>> 2014-10-09 16:21:43.305941 7f2be6a027c0 -1 created object store 
>>>> /var/lib/ceph/tmp/mnt.4lJlzP journal /var/lib/ceph/tmp/mnt.4lJlzP/journal 
>>>> for osd.47 fsid 70ac4a78-46c0-45e6-8ff9-878b37f50fa1
>>>> 2014-10-09 16:21:43.305992 7f2be6a027c0 -1 auth: error reading file: 
>>>> /var/lib/ceph/tmp/mnt.4lJlzP/keyring: can't open 
>>>> /var/lib/ceph/tmp/mnt.4lJlzP/keyring: (2) No such file or directory
>>>> 2014-10-09 16:21:43.306099 7f2be6a027c0 -1 created new key in keyring 
>>>> /var/lib/ceph/tmp/mnt.4lJlzP/keyring
>>>> added key for osd.47
>>>> === osd.47 ===
>>>> create-or-move updating item name 'osd.47' weight 0.9 at location 
>>>> {host=ceph1,root=default} to crush map
>>>> Starting Ceph osd.47 on ceph1...
>>>> Running as unit run-12392.service.
>>>>
>>>> The osd then appeared in the osd tree...
>>>> I attached the logs to this email (I just added a set -x in the script 
>>>> called by udev, and redirected the output)
>>>>
>>>> Regards
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

Reply via email to