[ceph-users] Re: Host failure trigger " Cannot allocate memory"

Ashley Merrick Tue, 10 Sep 2019 03:14:01 -0700

What's specs ate the machines?


Recovery work will use more memory the general clean operation and looks like 
your maxing out the available memory on the machines during CEPH trying to 
recover.




---- On Tue, 10 Sep 2019 18:10:50 +0800 amudha...@gmail.com wrote ----


I have also found below error in dmesg.



[332884.028810] systemd-journald[6240]: Failed to parse kernel command line, 
ignoring: Cannot allocate memory
[332885.054147] systemd-journald[6240]: Out of memory.
[332894.844765] systemd[1]: systemd-journald.service: Main process exited, 
code=exited, status=1/FAILURE
[332897.199736] systemd[1]: systemd-journald.service: Failed with result 
'exit-code'.
[332906.503076] systemd[1]: Failed to start Journal Service.
[332937.909198] systemd[1]: ceph-crash.service: Main process exited, 
code=exited, status=1/FAILURE
[332939.308341] systemd[1]: ceph-crash.service: Failed with result 'exit-code'.
[332949.545907] systemd[1]: systemd-journald.service: Service has no hold-off 
time, scheduling restart.
[332949.546631] systemd[1]: systemd-journald.service: Scheduled restart job, 
restart counter is at 7.
[332949.546781] systemd[1]: Stopped Journal Service.
[332949.566402] systemd[1]: Starting Journal Service...
[332950.190332] systemd[1]: ceph-osd@1.service: Main process exited, 
code=killed, status=6/ABRT
[332950.190477] systemd[1]: ceph-osd@1.service: Failed with result 'signal'.
[332950.842297] systemd-journald[6249]: File 
/var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted or 
uncleanly shut down, renaming and replacing.
[332951.019531] systemd[1]: Started Journal Service.



On Tue, Sep 10, 2019 at 3:04 PM Amudhan P <amudha...@gmail.com> wrote:

Hi,


I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.


My current setup:

3 nodes, 1 node contain two bricks and other 2 nodes contain single brick each.


Volume is a 3 replica, I am trying to simulate node failure.


I powered down one host and started getting msg in other systems when running 
any command
"-bash: fork: Cannot allocate memory" and system not responding to commands.


what could be the reason for this?
at this stage, I could able to read some of the data stored in the volume and 
some just waiting for IO.


output from "sudo ceph -s"
  cluster:
    id:     7c138e13-7b98-4309-b591-d4091a1742b4
    health: HEALTH_WARN
            1 osds down
            2 hosts (3 osds) down
            Degraded data redundancy: 5313488/7970232 objects degraded 
(66.667%), 64 pgs degraded

  services:
    mon: 1 daemons, quorum mon01
    mgr: mon01(active)
    mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
    osd: 4 osds: 1 up, 2 in

  data:
    pools:   2 pools, 64 pgs
    objects: 2.66 M objects, 206 GiB
    usage:   421 GiB used, 3.2 TiB / 3.6 TiB avail
    pgs:     5313488/7970232 objects degraded (66.667%)
             64 active+undersized+degraded

  io:
    client:   79 MiB/s rd, 24 op/s rd, 0 op/s wr



output from : sudo ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS
 0   hdd 1.81940        0     0 B     0 B     0 B     0    0   0
 3   hdd 1.81940        0     0 B     0 B     0 B     0    0   0
 1   hdd 1.81940  1.00000 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00   0
 2   hdd 1.81940  1.00000 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00  64
                    TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
MIN/MAX VAR: 1.00/1.00  STDDEV: 0.03



regards
Amudhan

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

Reply via email to