Hello netbsd, Could you conclude to a way to trigger this crash happen in a normal ocfs2 cluster? e.g. reproduce steps, or a shell script.
Thanks Gang >>> > Hello, > > Find the full log below: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.ubuntu.com_25625787_&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtYn-0afBpa7A&m=5ZRqjhlhVphYeGDUyONVUUBrtPi8rLz88ZN7_wbNlNQ&s=CGsTC_h47c4MXFb4l_7fmVPQ9Ru96AAupsNcqdb76Lk&e= > > > VM was restarted at 9:27 and no problem since then. We are rsyncing > about 2TB data (a lot of small files) between 2 OCFS shares on the same > vm: > > > /dev/vdc 4.8T 2.8T 2.1T 58% /mnt/s1 > /dev/vdf 4.8T 985G 3.9T 21% /mnt/s2 > > rsync -av --numeric-ids --delete /mnt/s1/ /mnt/s2/ > > > On 2017-09-27 10:53, Gang He wrote: >> Hello netbsd, >> >> The ocfs2 project is still be developed by us (from SUE, Huawei, >> Oracle and H3C. etc.). >> If you encountered some problem, please send the mail to ocfs2-devel >> mail list, we usually watch that mail for ocfs2 kernel related issues. >> >> >> >> >>>>> >>> Hello All, >>> >>> I wrote earlier about our OCFS2 crash issue in KVM due to bug in the >>> SMP >>> code. >>> >>> For this we come up with a solution: >>> >>> Instead of using multiple vcpus >>> <vcpu placement='static'>8</vcpu> >>> >>> using a single one and multiple cores instead: >>> <topology sockets='8' cores='8' threads='1'/> >>> >>> And applying key tune options to sysctl.conf: >>> >>> vm.min_free_kbytes=131072 >>> vm.zone_reclaim_mode=1 >>> >>> Seemed to be helped, the fs did not crash right away when we were >>> hammering it with apache benchmarks with 10000 requests however last >>> night I started a large rsync operation from a 5TB OCFS2 FS mounted in >>> the VM to another OCFS2 mounted in the same VM and ended up with: >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_gFeGg5&d=DwICAg&c=R >>> >>> > oP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtY > >>> > n-0afBpa7A&m=cYprGRHz-oQmhnx4HIke8sTdCG_tf8Jb-rF6sHnYLnk&s=ajWfQIlUZOpElFWxoKcmvTI > >>> k7J3PpuCJITcnXfJQHrc&e= >> From the kernel crash backtrace, this problem should be that long time >> to acquiring spin_lock triggers a NMI interruption. >> Could you give a detailed reproduce steps? since we want to reproduce >> this issue in local, then try to fix it. >> >> >> Thanks >> Gang >> >>> >>> After trying a lot of different kernels starting from the 3.x series, >>> now we are using 4.13.2 latest kernel with default configuration but >>> these issues still present. Is this OCFS2 project still being >>> developed? >>> With this crashing and unreliability it cannot be used in production >>> unless you put in place bunch of safeguards to reset out the whole >>> virtualmachine when it crashes. >>> >>> Thanks >>> >>> _______________________________________________ >>> Ocfs2-users mailing list >>> Ocfs2-users@oss.oracle.com >>> https://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users