Il 30/10/2017 10:31, Alwin Antreich ha scritto:
Hello Marco,

On Mon, Oct 23, 2017 at 05:48:10PM +0200, Marco Baldini - H.S. Amiata wrote:
Hello

ceph-mon services do not restart in any node, yesterday I manually restarted
ceph-mon and ceph-mgr on every node and since them they did not restart

*pve-hs-2$ systemctl status [email protected]*
  [email protected] - Ceph cluster monitor daemon
    Loaded: loaded (/lib/systemd/system/[email protected]; enabled; vendor 
preset: enabled)
   Drop-In: /lib/systemd/system/[email protected]
            └─ceph-after-pve-cluster.conf
    Active:*active (running) since Sun 2017-10-22 12:04:22 CEST; 1 day 5h ago*
  Main PID: 24825 (ceph-mon)
     Tasks: 23
    CGroup: /system.slice/system-ceph\x2dmon.slice/[email protected]
            └─24825 /usr/bin/ceph-mon -f --cluster ceph --id pve-hs-2 --setuser 
ceph --setgroup ceph

Oct 22 12:04:22 pve-hs-2 systemd[1]: Stopped Ceph cluster monitor daemon.
Oct 22 12:04:22 pve-hs-2 systemd[1]: Started Ceph cluster monitor daemon.

*pve-hs-main$ systemctl status [email protected]*
  [email protected] - Ceph cluster monitor daemon
    Loaded: loaded (/lib/systemd/system/[email protected]; enabled; vendor 
preset: enabled)
   Drop-In: /lib/systemd/system/[email protected]
            └─ceph-after-pve-cluster.conf
    Active:*active (running) since Sun 2017-10-22 12:08:59 CEST; 1 day 5h ago*
  Main PID: 24857 (ceph-mon)
    CGroup: /system.slice/system-ceph\x2dmon.slice/[email protected]
            └─24857 /usr/bin/ceph-mon -f --cluster ceph --id pve-hs-main 
--setuser ceph --setgroup ceph

Oct 22 12:08:59 pve-hs-main systemd[1]: Started Ceph cluster monitor daemon.

*pve-hs-3$ systemctl status [email protected]*
  [email protected] - Ceph cluster monitor daemon
    Loaded: loaded (/lib/systemd/system/[email protected]; enabled; vendor 
preset: enabled)
   Drop-In: /lib/systemd/system/[email protected]
            └─ceph-after-pve-cluster.conf
    Active:*active (running) since Sun 2017-10-22 12:07:43 CEST; 1 day 5h ago*
  Main PID: 13077 (ceph-mon)
     Tasks: 23
    CGroup: /system.slice/system-ceph\x2dmon.slice/[email protected]
            └─13077 /usr/bin/ceph-mon -f --cluster ceph --id pve-hs-3 --setuser 
ceph --setgroup ceph


At 17:28 I have this in syslog / journal of pve-hs-2

Oct 23 17:38:47 pve-hs-2 kernel: [255282.309979] libceph: mon1 
10.10.10.252:6789 session lost, hunting for new mon

On same node, my ceph-mon.pve-hs-2.log at 17:38 is
https://pastebin.com/8BCUm5Mr

Thanks




Il 23/10/2017 16:26, Alwin Antreich ha scritto:
Does the ceph-mon services restart when the session is lost?
What do you see in the ceph-mon.log on the failing mon node?

--
Cheers,
Alwin

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
What is in the other ceph/syslog log files? Please also check your
dmesg, maybe there is something with your bond/LACP.


Actually after some server reboots, the problem seems solved by itself, that's strange because there have been no change in servers or network configurations

Only yesterday I had this in dmesg -xe

kern  :warn  : [Oct29 06:39] libceph: mon2 10.10.10.253:6789 socket closed (con 
state OPEN)
kern  :info  : [  +0.000029] libceph: mon2 10.10.10.253:6789 session lost, 
hunting for new mon
kern  :info  : [  +0.031530] libceph: mon0 10.10.10.251:6789 session established


On the other nodes at that time there are no warnings or errors.

I think the problem is solved, I don't know how, but ceph is running fine now.

Thanks











_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to