Its a test cluster each node with a single OSD and 4GB RAM. On Tue, Sep 10, 2019 at 3:42 PM Ashley Merrick <singap...@amerrick.co.uk> wrote:
> What's specs ate the machines? > > Recovery work will use more memory the general clean operation and looks > like your maxing out the available memory on the machines during CEPH > trying to recover. > > > > ---- On Tue, 10 Sep 2019 18:10:50 +0800 * amudha...@gmail.com > <amudha...@gmail.com> * wrote ---- > > I have also found below error in dmesg. > > [332884.028810] systemd-journald[6240]: Failed to parse kernel command > line, ignoring: Cannot allocate memory > [332885.054147] systemd-journald[6240]: Out of memory. > [332894.844765] systemd[1]: systemd-journald.service: Main process exited, > code=exited, status=1/FAILURE > [332897.199736] systemd[1]: systemd-journald.service: Failed with result > 'exit-code'. > [332906.503076] systemd[1]: Failed to start Journal Service. > [332937.909198] systemd[1]: ceph-crash.service: Main process exited, > code=exited, status=1/FAILURE > [332939.308341] systemd[1]: ceph-crash.service: Failed with result > 'exit-code'. > [332949.545907] systemd[1]: systemd-journald.service: Service has no > hold-off time, scheduling restart. > [332949.546631] systemd[1]: systemd-journald.service: Scheduled restart > job, restart counter is at 7. > [332949.546781] systemd[1]: Stopped Journal Service. > [332949.566402] systemd[1]: Starting Journal Service... > [332950.190332] systemd[1]: ceph-osd@1.service: Main process exited, > code=killed, status=6/ABRT > [332950.190477] systemd[1]: ceph-osd@1.service: Failed with result > 'signal'. > [332950.842297] systemd-journald[6249]: File > /var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted > or uncleanly shut down, renaming and replacing. > [332951.019531] systemd[1]: Started Journal Service. > > On Tue, Sep 10, 2019 at 3:04 PM Amudhan P <amudha...@gmail.com> wrote: > > Hi, > > I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs. > > My current setup: > 3 nodes, 1 node contain two bricks and other 2 nodes contain single brick > each. > > Volume is a 3 replica, I am trying to simulate node failure. > > I powered down one host and started getting msg in other systems when > running any command > "-bash: fork: Cannot allocate memory" and system not responding to > commands. > > what could be the reason for this? > at this stage, I could able to read some of the data stored in the volume > and some just waiting for IO. > > output from "sudo ceph -s" > cluster: > id: 7c138e13-7b98-4309-b591-d4091a1742b4 > health: HEALTH_WARN > 1 osds down > 2 hosts (3 osds) down > Degraded data redundancy: 5313488/7970232 objects degraded > (66.667%), 64 pgs degraded > > services: > mon: 1 daemons, quorum mon01 > mgr: mon01(active) > mds: cephfs-tst-1/1/1 up {0=mon01=up:active} > osd: 4 osds: 1 up, 2 in > > data: > pools: 2 pools, 64 pgs > objects: 2.66 M objects, 206 GiB > usage: 421 GiB used, 3.2 TiB / 3.6 TiB avail > pgs: 5313488/7970232 objects degraded (66.667%) > 64 active+undersized+degraded > > io: > client: 79 MiB/s rd, 24 op/s rd, 0 op/s wr > > output from : sudo ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 0 hdd 1.81940 0 0 B 0 B 0 B 0 0 0 > 3 hdd 1.81940 0 0 B 0 B 0 B 0 0 0 > 1 hdd 1.81940 1.00000 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00 0 > 2 hdd 1.81940 1.00000 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00 64 > TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31 > MIN/MAX VAR: 1.00/1.00 STDDEV: 0.03 > > regards > Amudhan > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > >
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io