[Kernel-packages] [Bug 1641235] Re: Ubuntu 16.10: kdump over nfs did not generate complete vmcore

bugproxy Tue, 24 Jan 2017 00:36:13 -0800

** Tags removed: targetmilestone-inin1610
** Tags added: targetmilestone-inin1704


-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to makedumpfile in Ubuntu.
https://bugs.launchpad.net/bugs/1641235

Title:
  Ubuntu 16.10: kdump over nfs did not generate complete vmcore

Status in makedumpfile package in Ubuntu:
  Fix Released
Status in makedumpfile source package in Trusty:
  Confirmed
Status in makedumpfile source package in Xenial:
  Confirmed
Status in makedumpfile source package in Yakkety:
  Confirmed

Bug description:
  == Comment: #0 - HARSHA THYAGARAJA - 2016-11-03 08:05:59 ==
  ---Problem Description---
  kdump over nfs did not generate complete vmcore
   
  ---uname output---
  Linux ltciofvtr-firestone1 4.8.0-26-generic #28-Ubuntu SMP Tue Oct 18 
14:41:40 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
   
  Machine Type = PowerNV (Baremetal) - Firestone 
   
  ---Steps to Reproduce---
   1. Setup NFS
  2. Trigger crash: echo c > /proc/sysrq-trigger

  
  == Comment: #6 - Kevin W. Rudd - 2016-11-04 16:30:49 ==

  Hi Harsha.

  It looks like the base kdump NFS functionality works just fine.  The
  known issue with makedumpfile is causing it to drop back to using "cp"
  to transfer the entire, non-compressed /proc/vmcore image.  That's a
  rather large amount of data to send over to the remote server, and it
  appears to be sending back an I/O error after the first 122G.

  Further debug would need to be done to determine if this is a client-
  side or server-side issue.  I recommend first bringing your remote NFS
  server up to the current release as it is currently a bit down-rev.

  == Comment: #8 - HARSHA THYAGARAJA  - 2016-11-10 02:02:31 ==

  Hi Kevin,
  I updated my peer to Ubuntu 16.10 and still saw the same observation. 
  A snippet of the problem at hand is pasted below. 

  [   20.610748] kdump-tools[4559]: Starting kdump-tools:  * Mounting NFS 
mountpoint 150.1.1.20:/home/tools ...
  [   53.400516] kdump-tools[4559]:  * Dumping to NFS mountpoint 
150.1.1.20:/home/tools/201611100158
  [   53.409242] kdump-tools[4559]:  * running makedumpfile -c -d 31 
/proc/vmcore /mnt/var/crash/9.47.84.18-201611100158/dump-incomplete
  [   53.526593] kdump-tools[4559]: get_mem_map: Can't distinguish the memory 
type.
  [   53.527154] kdump-tools[4559]: The kernel version is not supported.
  [   53.527488] kdump-tools[4559]: The makedumpfile operation may be 
incomplete.
  [   53.527813] kdump-tools[4559]: makedumpfile Failed.
  [   53.528117] kdump-tools[4559]:  * kdump-tools: makedumpfile failed, 
falling back to 'cp'
  [   90.754092] kdump-tools[4559]: cp: error writing 
'/mnt/var/crash/9.47.84.18-201611100158/vmcore-incomplete': Input/output error
  [   90.754857] kdump-tools[4559]:  * kdump-tools: failed to save vmcore in 
/mnt/var/crash/9.47.84.18-201611100158
  [   90.756155] kdump-tools[4559]:  * running makedumpfile --dump-dmesg 
/proc/vmcore /mnt/var/crash/9.47.84.18-201611100158/dmesg.201611100158
  [   90.758731] kdump-tools[4559]: get_mem_map: Can't distinguish the memory 
type.
  [   90.759089] kdump-tools[4559]: The kernel version is not supported.
  [   90.759436] kdump-tools[4559]: The makedumpfile operation may be 
incomplete.
  [   90.759780] kdump-tools[4559]: makedumpfile Failed.
  [   90.760094] kdump-tools[4559]:  * kdump-tools: makedumpfile --dump-dmesg 
failed. dmesg content will be unavailable
  [   90.760668] kdump-tools[4559]:  * kdump-tools: failed to save dmesg 
content in /mnt/var/crash/9.47.84.18-201611100158
  [   90.846117] kdump-tools[4559]: Thu, 10 Nov 2016 01:59:56 -0500
  [   90.886629] kdump-tools[4559]: Failed to read reboot parameter file: No 
such file or directory
  [   90.887070] kdump-tools[4559]: Rebooting.

  == Comment: #13 - Kevin W. Rudd  - 2016-11-11 17:12:33 ==

  I was able to replicate this with debugging at both the kdump client
  and remote NFS server.  The server was perfectly happy with the data
  coming at it, and appeared to be processing a COMMIT request from the
  client when the client shut down the connection.

  Looking at the client-side logs after a failure showed that it was
  logging "server ... not responding" messages, and bailed on the
  connection within the span of just a few seconds.

  This appears to be due to a very over-aggressive timeout being
  specified in /usr/sbin/kdump-config:

  mount -t nfs -o nolock -o tcp -o soft -o timeo=5 -o retrans=5 $NFS
  $KDUMP_COREDIR

  The timeo value is deciseconds, and "5" is far too aggressive for this
  type of connection.  From my observations, the COMMIT was not issued
  until about 60G was transferred, and most remote servers will take a
  lot longer than 5 tenths of a second to flush that amount of data and
  respond to the COMMIT.

  I'm not sure what problem specifying this timeo value was supposed to
  address, but it would be better to leave the timeo value at its
  default for a tcp connection (let the TCP protocol handle any
  communication timeouts on its own).  When I modified kdump-config to
  use the default timeo of 600, the kdump process transferred the entire
  vmcore without error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1641235/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1641235] Re: Ubuntu 16.10: kdump over nfs did not generate complete vmcore

Reply via email to