Public bug reported: How to reproduce:
On the initial installation, Z cluster had 1 monitor node, 3 OSDs, 1 MDS and 1 MGR. Inorder to form a quorum, 2 more nodes have been added as monitor nodes which are OSDs already. The Z cluster then had 3 monitor nodes of which 2 are both OSDs and Monitors. However, at some point in time during the stress-ng run, the monitor daemon crashed repeatedly on the cluster back to back. The crash stopped only after removing both the monitor nodes which are OSDs from the quorum and then the cluster remained stable. Topology: root@m8330013:~# ceph node ls all { "mon": { "m8330013": [ "m8330013" ], "m8330014": [ "m8330014" ], "m8330015": [ "m8330015" ] }, "osd": { "m8330014": [ 0 ], "m8330015": [ 1 ], "m8330016": [ 2 ] }, "mds": { "m8330013": [ "m8330013" ] }, "mgr": { "m8330013": [ "m8330013" ], "m8330015": [ "m8330015" ] } } root@m8330013:~# The below job file runs each filesystem stressor sequentially one per CPU for 5 minutes and the shows the cumulative user and system time of all the processes at the end of the stress run. Stress-ng Job file : run sequential metrics verbose timeout 5m times timestamp #0 means 1 stressor per CPU access 0 bind-mount 0 chdir 0 chmod 0 chown 0 copy-file 0 dentry 0 dir 0 dirdeep 0 dnotify 0 dup 0 eventfd 0 fallocate 0 fanotify 0 fcntl 0 fiemap 0 file-ioctl 0 filename 0 flock 0 fstat 0 getdent 0 handle 0 inode-flags 0 inotify 0 io 0 iomix 0 ioprio 0 lease 0 link 0 locka 0 lockf 0 lockofd 0 mknod 0 open 0 procfs 0 rename 0 symlink 0 sync-file 0 utime 0 xattr 0 Command for Execution: stress-ng --job <job_file> --temp-path <cephfs_mountpoint> --log-file <log_file> A proposed fixup sent to upstream: https://github.com/ceph/ceph/pull/36697 As mentioned above, the fix for this issue landed upstream at PR: https://github.com/ceph/ceph/pull/36697 which was backported to Octopus (15.2.x) release at PR: https://github.com/ceph/ceph/pull/36813 This backported patch seems to be applied cleanly in ceph-15.2.3 at focal-updates git tree at : https://git.launchpad.net/ubuntu/+source/ceph/log/?h=applied/ubuntu /focal-updates Please apply the backported patch to this tree. Thanks. Please be aware that upstream's backport patch https://github.com/ceph/ceph/pull/36813 merged 2 patches in master branch together: https://github.com/ceph/ceph/pull/35920 https://github.com/ceph/ceph/pull/36697 which we need both. ** Affects: ceph (Ubuntu) Importance: Undecided Assignee: Skipper Bug Screeners (skipper-screen-team) Status: New ** Tags: architecture-s39064 bugnameltc-188070 severity-high targetmilestone-inin2004 ** Tags added: architecture-s39064 bugnameltc-188070 severity-high targetmilestone-inin2004 ** Changed in: ubuntu Assignee: (unassigned) => Skipper Bug Screeners (skipper-screen-team) ** Package changed: ubuntu => ceph (Ubuntu) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1900690 Title: [Ubuntu 20.04] ceph: messages,mds: Fix decoding of enum types on big- endian systems To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1900690/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs