Its a test cluster each node with a single OSD and 4GB RAM.

On Tue, Sep 10, 2019 at 3:42 PM Ashley Merrick <singap...@amerrick.co.uk>
wrote:

> What's specs ate the machines?
>
> Recovery work will use more memory the general clean operation and looks
> like your maxing out the available memory on the machines during CEPH
> trying to recover.
>
>
>
> ---- On Tue, 10 Sep 2019 18:10:50 +0800 * amudha...@gmail.com
> <amudha...@gmail.com> * wrote ----
>
> I have also found below error in dmesg.
>
> [332884.028810] systemd-journald[6240]: Failed to parse kernel command
> line, ignoring: Cannot allocate memory
> [332885.054147] systemd-journald[6240]: Out of memory.
> [332894.844765] systemd[1]: systemd-journald.service: Main process exited,
> code=exited, status=1/FAILURE
> [332897.199736] systemd[1]: systemd-journald.service: Failed with result
> 'exit-code'.
> [332906.503076] systemd[1]: Failed to start Journal Service.
> [332937.909198] systemd[1]: ceph-crash.service: Main process exited,
> code=exited, status=1/FAILURE
> [332939.308341] systemd[1]: ceph-crash.service: Failed with result
> 'exit-code'.
> [332949.545907] systemd[1]: systemd-journald.service: Service has no
> hold-off time, scheduling restart.
> [332949.546631] systemd[1]: systemd-journald.service: Scheduled restart
> job, restart counter is at 7.
> [332949.546781] systemd[1]: Stopped Journal Service.
> [332949.566402] systemd[1]: Starting Journal Service...
> [332950.190332] systemd[1]: ceph-osd@1.service: Main process exited,
> code=killed, status=6/ABRT
> [332950.190477] systemd[1]: ceph-osd@1.service: Failed with result
> 'signal'.
> [332950.842297] systemd-journald[6249]: File
> /var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted
> or uncleanly shut down, renaming and replacing.
> [332951.019531] systemd[1]: Started Journal Service.
>
> On Tue, Sep 10, 2019 at 3:04 PM Amudhan P <amudha...@gmail.com> wrote:
>
> Hi,
>
> I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.
>
> My current setup:
> 3 nodes, 1 node contain two bricks and other 2 nodes contain single brick
> each.
>
> Volume is a 3 replica, I am trying to simulate node failure.
>
> I powered down one host and started getting msg in other systems when
> running any command
> "-bash: fork: Cannot allocate memory" and system not responding to
> commands.
>
> what could be the reason for this?
> at this stage, I could able to read some of the data stored in the volume
> and some just waiting for IO.
>
> output from "sudo ceph -s"
>   cluster:
>     id:     7c138e13-7b98-4309-b591-d4091a1742b4
>     health: HEALTH_WARN
>             1 osds down
>             2 hosts (3 osds) down
>             Degraded data redundancy: 5313488/7970232 objects degraded
> (66.667%), 64 pgs degraded
>
>   services:
>     mon: 1 daemons, quorum mon01
>     mgr: mon01(active)
>     mds: cephfs-tst-1/1/1 up  {0=mon01=up:active}
>     osd: 4 osds: 1 up, 2 in
>
>   data:
>     pools:   2 pools, 64 pgs
>     objects: 2.66 M objects, 206 GiB
>     usage:   421 GiB used, 3.2 TiB / 3.6 TiB avail
>     pgs:     5313488/7970232 objects degraded (66.667%)
>              64 active+undersized+degraded
>
>   io:
>     client:   79 MiB/s rd, 24 op/s rd, 0 op/s wr
>
> output from : sudo ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS
>  0   hdd 1.81940        0     0 B     0 B     0 B     0    0   0
>  3   hdd 1.81940        0     0 B     0 B     0 B     0    0   0
>  1   hdd 1.81940  1.00000 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00   0
>  2   hdd 1.81940  1.00000 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00  64
>                     TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
> MIN/MAX VAR: 1.00/1.00  STDDEV: 0.03
>
> regards
> Amudhan
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to