Hi, I tried to add a new monitor, but now I was unable to use ceph command
after doing ceph-deploy mon create myhostname I’ve got : # ceph status 2015-07-30 10:42:39.682038 7f7b16d90700 0 librados: client.admin authentication error (1) Operation not permitted Error connecting to cluster: PermissionError Could you help me to fix and how to change keys please ? Thank a lot in advance, and sorry I’m a newbie on this topic. K > Le 30 juil. 2015 à 11:39, Khalid Ahsein <kahs...@gmail.com> a écrit : > > Good morning christian, > > thank you for your quick response. > so I need to upgrade to 64 GB or 96 GB to be more secure ? > > And sorry I though that 2 monitors was the minimum. We will work to add a new > host quickly. > > About osd_pool_default_min_size should I change something for the future ? > > thank you again > K > >> Le 30 juil. 2015 à 11:12, Christian Balzer <ch...@gol.com >> <mailto:ch...@gol.com>> a écrit : >> >> >> Hello, >> >> On Thu, 30 Jul 2015 10:55:30 +0200 Khalid Ahsein wrote: >> >>> Hello everybody, >>> >>> I’m running since 4 months a ceph cluster configured with two monitors : >>> >>> 1 host : 16GB RAM - 12x 4TB disks - 12 OSD - 1 monitor - RAID-1 for >>> system 1 host : 16GB RAM - 12x 4TB disks - 12 OSD - 1 monitor - RAID-1 >>> for system >>> >> Too little RAM, just 2 monitors, just 2 nodes... >> >>> This night I’ve encountered an issue with the crash of the first host. >>> >>> My first question is why with 1 host down, all my cluster was down >>> (unable to do ceph status — hang command) and all my rbd was stuck >>> without possibility to R/W. >> >> Re-read the documentation, you need at least 3 monitors to survive the >> loss of one (monitor) node. >> >> Your osd_pool_default_min_size would have left in a usable situation, 2 >> nodes is really a minimal case. >> >>> I rebooted the first host, and 2 hours later >>> the second go down with the same issue (all rbd down and ceph hang). >>> >>> After reboot, here is ceph status : >>> >>> # ceph status >>> cluster 9c29f469-7bad-4b64-97bf-3fbb1bbc0c5f >>> health HEALTH_ERR >>> 3 pgs inconsistent >>> 1 pgs peering >>> 1 pgs stuck inactive >>> 1 pgs stuck unclean >>> 36 requests are blocked > 32 sec >>> 928 scrub errors >>> clock skew detected on mon.drt-becks >>> monmap e1: 2 mons at >>> {drt-becks=172.16.21.6:6789/0,drt-marco=172.16.21.4:6789/0} election >>> epoch 26, quorum 0,1 drt-marco,drt-becks osdmap e961: 24 osds: 24 up, 24 >>> in pgmap v2532968: 400 pgs, 1 pools, 512 GB data, 130 kobjects >>> 1039 GB used, 88092 GB / 89177 GB avail >>> 393 active+clean >>> 3 active+clean+scrubbing+deep >>> 3 active+clean+inconsistent >>> 1 peering >>> client io 57290 B/s wr, 7 op/s >>> >> You will want to: >> a) fix your NTP, clock skew. >> b) check your logs about the scrub errors >> c) same for the stuck requests >> >>> Also I found this error on DMESG about the crash : >>> >>> Message from syslogd@drt-marco at Jul 30 04:03:57 ... >>> kernel:[4876519.657178] BUG: soft lockup - CPU#7 stuck for 22s! >>> [btrfs-cleaner:32713] >>> >>> All my volumes are on BTRFS, maybe it was not a good idea ? >>> >> Depending on your OS, kernel version, most definitely. >> Plenty of BTRFS problems in the ML archives to be found. >> >> Christian >> >> -- >> Christian Balzer Network/Systems Engineer >> ch...@gol.com <mailto:ch...@gol.com> Global OnLine Japan/Fusion >> Communications >> http://www.gol.com/ <http://www.gol.com/>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com