Re: [ceph-users] Crash and question

Khalid Ahsein Thu, 30 Jul 2015 03:45:13 -0700

Hi,

I tried to add a new monitor, but now I was unable to use ceph command


after doing ceph-deploy mon create myhostname

I’ve got :
# ceph status
2015-07-30 10:42:39.682038 7f7b16d90700  0 librados: client.admin 
authentication error (1) Operation not permitted
Error connecting to cluster: PermissionError

Could you help me to fix and how to change keys please ?

Thank a lot in advance, and sorry I’m a newbie on this topic.
K

> Le 30 juil. 2015 à 11:39, Khalid Ahsein <kahs...@gmail.com> a écrit :
> 
> Good morning christian,
> 
> thank you for your quick response.
> so I need to upgrade to 64 GB or 96 GB to be more secure ?
> 
> And sorry I though that 2 monitors was the minimum. We will work to add a new 
> host quickly.
> 
> About osd_pool_default_min_size should I change something for the future ? 
> 
> thank you again
> K
> 
>> Le 30 juil. 2015 à 11:12, Christian Balzer <ch...@gol.com 
>> <mailto:ch...@gol.com>> a écrit :
>> 
>> 
>> Hello,
>> 
>> On Thu, 30 Jul 2015 10:55:30 +0200 Khalid Ahsein wrote:
>> 
>>> Hello everybody,
>>> 
>>> I’m running since 4 months a ceph cluster configured with two monitors :
>>> 
>>> 1 host : 16GB RAM - 12x 4TB disks - 12 OSD - 1 monitor - RAID-1 for
>>> system 1 host : 16GB RAM - 12x 4TB disks - 12 OSD - 1 monitor - RAID-1
>>> for system
>>> 
>> Too little RAM, just 2 monitors, just 2 nodes...
>> 
>>> This night I’ve encountered an issue with the crash of the first host.
>>> 
>>> My first question is why with 1 host down, all my cluster was down
>>> (unable to do ceph status — hang command) and all my rbd was stuck
>>> without possibility to R/W. 
>> 
>> Re-read the documentation, you need at least 3 monitors to survive the
>> loss of one (monitor) node.
>> 
>> Your osd_pool_default_min_size would have left in a usable situation, 2
>> nodes is really a minimal case.
>> 
>>> I rebooted the first host, and 2 hours later
>>> the second go down with the same issue (all rbd down and ceph hang).
>>> 
>>> After reboot, here is ceph status :
>>> 
>>> # ceph status
>>>    cluster 9c29f469-7bad-4b64-97bf-3fbb1bbc0c5f
>>>     health HEALTH_ERR
>>>            3 pgs inconsistent
>>>            1 pgs peering
>>>            1 pgs stuck inactive
>>>            1 pgs stuck unclean
>>>            36 requests are blocked > 32 sec
>>>            928 scrub errors
>>>            clock skew detected on mon.drt-becks
>>>     monmap e1: 2 mons at
>>> {drt-becks=172.16.21.6:6789/0,drt-marco=172.16.21.4:6789/0} election
>>> epoch 26, quorum 0,1 drt-marco,drt-becks osdmap e961: 24 osds: 24 up, 24
>>> in pgmap v2532968: 400 pgs, 1 pools, 512 GB data, 130 kobjects
>>>            1039 GB used, 88092 GB / 89177 GB avail
>>>                 393 active+clean
>>>                   3 active+clean+scrubbing+deep
>>>                   3 active+clean+inconsistent
>>>                   1 peering
>>>  client io 57290 B/s wr, 7 op/s
>>> 
>> You will want to:
>> a) fix your NTP, clock skew.
>> b) check your logs about the scrub errors
>> c) same for the stuck requests
>> 
>>> Also I found this error on DMESG about the crash :
>>> 
>>> Message from syslogd@drt-marco at Jul 30 04:03:57 ...
>>> kernel:[4876519.657178] BUG: soft lockup - CPU#7 stuck for 22s!
>>> [btrfs-cleaner:32713]
>>> 
>>> All my volumes are on BTRFS, maybe it was not a good idea ?
>>> 
>> Depending on your OS, kernel version, most definitely. 
>> Plenty of BTRFS problems in the ML archives to be found.
>> 
>> Christian
>> 
>> -- 
>> Christian Balzer        Network/Systems Engineer                
>> ch...@gol.com <mailto:ch...@gol.com>         Global OnLine Japan/Fusion 
>> Communications
>> http://www.gol.com/ <http://www.gol.com/>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Crash and question

Reply via email to