** Tags removed: targetmilestone-inin1610 ** Tags added: targetmilestone-inin1704
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1641235 Title: Ubuntu 16.10: kdump over nfs did not generate complete vmcore Status in makedumpfile package in Ubuntu: Fix Released Status in makedumpfile source package in Trusty: Confirmed Status in makedumpfile source package in Xenial: Confirmed Status in makedumpfile source package in Yakkety: Confirmed Bug description: == Comment: #0 - HARSHA THYAGARAJA - 2016-11-03 08:05:59 == ---Problem Description--- kdump over nfs did not generate complete vmcore ---uname output--- Linux ltciofvtr-firestone1 4.8.0-26-generic #28-Ubuntu SMP Tue Oct 18 14:41:40 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux Machine Type = PowerNV (Baremetal) - Firestone ---Steps to Reproduce--- 1. Setup NFS 2. Trigger crash: echo c > /proc/sysrq-trigger == Comment: #6 - Kevin W. Rudd - 2016-11-04 16:30:49 == Hi Harsha. It looks like the base kdump NFS functionality works just fine. The known issue with makedumpfile is causing it to drop back to using "cp" to transfer the entire, non-compressed /proc/vmcore image. That's a rather large amount of data to send over to the remote server, and it appears to be sending back an I/O error after the first 122G. Further debug would need to be done to determine if this is a client- side or server-side issue. I recommend first bringing your remote NFS server up to the current release as it is currently a bit down-rev. == Comment: #8 - HARSHA THYAGARAJA - 2016-11-10 02:02:31 == Hi Kevin, I updated my peer to Ubuntu 16.10 and still saw the same observation. A snippet of the problem at hand is pasted below. [ 20.610748] kdump-tools[4559]: Starting kdump-tools: * Mounting NFS mountpoint 150.1.1.20:/home/tools ... [ 53.400516] kdump-tools[4559]: * Dumping to NFS mountpoint 150.1.1.20:/home/tools/201611100158 [ 53.409242] kdump-tools[4559]: * running makedumpfile -c -d 31 /proc/vmcore /mnt/var/crash/9.47.84.18-201611100158/dump-incomplete [ 53.526593] kdump-tools[4559]: get_mem_map: Can't distinguish the memory type. [ 53.527154] kdump-tools[4559]: The kernel version is not supported. [ 53.527488] kdump-tools[4559]: The makedumpfile operation may be incomplete. [ 53.527813] kdump-tools[4559]: makedumpfile Failed. [ 53.528117] kdump-tools[4559]: * kdump-tools: makedumpfile failed, falling back to 'cp' [ 90.754092] kdump-tools[4559]: cp: error writing '/mnt/var/crash/9.47.84.18-201611100158/vmcore-incomplete': Input/output error [ 90.754857] kdump-tools[4559]: * kdump-tools: failed to save vmcore in /mnt/var/crash/9.47.84.18-201611100158 [ 90.756155] kdump-tools[4559]: * running makedumpfile --dump-dmesg /proc/vmcore /mnt/var/crash/9.47.84.18-201611100158/dmesg.201611100158 [ 90.758731] kdump-tools[4559]: get_mem_map: Can't distinguish the memory type. [ 90.759089] kdump-tools[4559]: The kernel version is not supported. [ 90.759436] kdump-tools[4559]: The makedumpfile operation may be incomplete. [ 90.759780] kdump-tools[4559]: makedumpfile Failed. [ 90.760094] kdump-tools[4559]: * kdump-tools: makedumpfile --dump-dmesg failed. dmesg content will be unavailable [ 90.760668] kdump-tools[4559]: * kdump-tools: failed to save dmesg content in /mnt/var/crash/9.47.84.18-201611100158 [ 90.846117] kdump-tools[4559]: Thu, 10 Nov 2016 01:59:56 -0500 [ 90.886629] kdump-tools[4559]: Failed to read reboot parameter file: No such file or directory [ 90.887070] kdump-tools[4559]: Rebooting. == Comment: #13 - Kevin W. Rudd - 2016-11-11 17:12:33 == I was able to replicate this with debugging at both the kdump client and remote NFS server. The server was perfectly happy with the data coming at it, and appeared to be processing a COMMIT request from the client when the client shut down the connection. Looking at the client-side logs after a failure showed that it was logging "server ... not responding" messages, and bailed on the connection within the span of just a few seconds. This appears to be due to a very over-aggressive timeout being specified in /usr/sbin/kdump-config: mount -t nfs -o nolock -o tcp -o soft -o timeo=5 -o retrans=5 $NFS $KDUMP_COREDIR The timeo value is deciseconds, and "5" is far too aggressive for this type of connection. From my observations, the COMMIT was not issued until about 60G was transferred, and most remote servers will take a lot longer than 5 tenths of a second to flush that amount of data and respond to the COMMIT. I'm not sure what problem specifying this timeo value was supposed to address, but it would be better to leave the timeo value at its default for a tcp connection (let the TCP protocol handle any communication timeouts on its own). When I modified kdump-config to use the default timeo of 600, the kdump process transferred the entire vmcore without error. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1641235/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp