I tried Ubuntu kernel "4.18.0-17-generic #18~18.04.1-Ubuntu". Crashed the same way on high load as the 4.15.0-47 does.
Now testing 4.15.0-48 from Kai-Heng. Still haven't found the trigger for that bug. Seems to be load related - we're having five servers each running many threads reading/writing gigabytes of data to the share. There might be even 100+ processes trying to set a lock one the same file at the same time. Seems to get better if we reduce the number of parallel threads. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1824981 Title: cifs set_oplock buffer overflow in strcat Status in linux package in Ubuntu: Confirmed Bug description: Ubuntu 18.04.2 LTS Linux SRV013 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux DELL R740, 2 CPU (40 Cores, 80 Threads), 384 GiB RAM top - 12:39:53 up 3:41, 4 users, load average: 66.19, 64.06, 76.90 Tasks: 1076 total, 1 running, 675 sleeping, 12 stopped, 1 zombie %Cpu(s): 28.2 us, 0.3 sy, 0.0 ni, 71.5 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 39483801+total, 24077185+free, 57428284 used, 96637872 buff/cache KiB Swap: 999420 total, 999420 free, 0 used. 33477683+avail Mem We've seen the following bug many times since we introduced new machines running Ubuntu 18. Wasn't an issue older machines running Ubuntu 16. Three different machines are affected, so it's rather not a hardware issue. | detected buffer overflow in strcat | ------------[ cut here ]------------ | kernel BUG at /build/linux-6ZmFRN/linux-4.15.0/lib/string.c:1052! | invalid opcode: 0000 [#1] SMP PTI | Modules linked in: [...] | Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 1.6.11 11/20/2018 | RIP: 0010:fortify_panic+0x13/0x22 | [...] | Call Trace: | smb21_set_oplock_level+0x147/0x1a0 [cifs] | smb3_set_oplock_level+0x22/0x90 [cifs] | smb2_set_fid+0x76/0xb0 [cifs] | cifs_new_fileinfo+0x259/0x390 [cifs] | ? smb2_get_lease_key+0x40/0x40 [cifs] | ? cifs_new_fileinfo+0x259/0x390 [cifs] | cifs_open+0x3db/0x8d0 [cifs] | [...] (Full dmesg output attached) After hitting this bug there are many cifs related dmesg entries, processes lock up and eventually the systems freezes. The share is mounted using: //server/share /mnt/server/ cifs defaults,auto,iocharset=utf8,noperm,file_mode=0777,dir_mode=0777,credentials=/root/passwords/share,domain=myDomain,uid=myUser,gid=10513,mfsymlinks Currently we're testing the cifs mount options "cache=none" as the bug seems to be oplock related. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824981/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp