Hello Gary What does "top" returns ? -> high load average ? / high %sys ? ...
2014-09-29 4:32 GMT+02:00 Gary Roach <gary719_li...@verizon.net>: > On 09/21/2014 11:54 AM, Gary Roach wrote: > > Hi all > > For the last few months I have been plagued by very slow response from my > system. As an example, it takes 2 1/2 minutes to drag and drop 80 files > from my email inbox to the trash bin in icedove. It has taken as high as 5 > minutes for iceweasel to load. This problem is not just these packages but > also applies to all of the rest of my programs. > > I am using Debian Wheezy with a i5750 4 core processor on a fast Intel > board. I run a kde desktop. All the software is up to date. > > I have checked all of the log files and can't find any anomalies. > Rebooting doesn't help. > > Using the KDE System Monitor (ksysguard) I have noticed that at least one > of the processors goes to 100% and stays there for long periods even though > there is no noticeable activity in the process tables. The only other thing > I have noticed (the printing just hung up while I am writing this) is that > the hard drive indicator comes on and stays on during the processor > activity. I ran some checks on the hard drive but found no indication of > any hard drive problems. > > Has anyone else had a similar problem or have any idea what is going on. > > Thanks in advance > > Gary R. > > > After several days of investigation, I am still not sure what is causing > the problem. I have noted the following: > > The journaling program jbd2/sda1-8 is taking up most of the I/O time. I do > have noatime set in fstab. > > Having two identical machines is very helpful. My wifes system is fast and > never bogs down. Mine is a real dog at this time. I ran iotop on both > systems. In my wifes kjournal will pop to the top of the list for a very > short period when loading a new application. Otherwise iotop is quiet. On > mine kjournal doesn't show at all but jbd2/sda1-8 shows up all of the time. > There is constant activity on my system even when nothing is happening. > > Below is the output from gsmartcontrol. Note that vendor specific > attributes 197 and 198 raw values are 757 and 232 respectively and are > highlighted in pink on the gsmartcontrol program. My wifes computer's raw > values are included in the last column of the smart attributes table for > comparison. Also note that there were no errors reported for my wifes > system. Do I need to replace the drive or is this fixable. Is this even the > root cause of my problems. I'm not too sharp on reading this data an will > appreciate comments from a more knowledgeable person. > > > > > smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build) > Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Model Family: Western Digital Caviar Green > Device Model: WDC WD5000AADS-00M2B0 > Serial Number: WD-WCAV59765616 > LU WWN Device Id: 5 0014ee 25992fc38 > Firmware Version: 01.00A01 > User Capacity: 500,107,862,016 bytes [500 GB] > Sector Size: 512 bytes logical/physical > Device is: In smartctl database [for details use: -P show] > ATA Version is: 8 > ATA Standard is: Exact ATA specification draft version not indicated > Local Time is: Sun Sep 28 18:12:56 2014 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x84) Offline data collection activity > was suspended by an interrupting command from host. > Auto Offline Data Collection: Enabled. > Self-test execution status: ( 121) The previous self-test > completed having > the read element of the test failed. > Total time to complete Offline > data collection: ( 9960) seconds. > Offline data collection > capabilities: (0x7b) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Suspend Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine > recommended polling time: ( 2) minutes. > Extended self-test routine > recommended polling time: ( 118) minutes. > Conveyance self-test routine > recommended polling time: ( 5) minutes. > SCT capabilities: (0x3037) SCT Status supported. > SCT Feature Control supported. > SCT Data Table supported. > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE (Wife Raw) > 1 Raw_Read_Error_Rate 0x002f 200 200 051 > Pre-fail Always - 84123 0 > 3 Spin_Up_Time 0x0027 114 108 021 > Pre-fail Always - 7283 3575 > 4 Start_Stop_Count 0x0032 100 100 000 > Old_age Always - 269 64 > 5 Reallocated_Sector_Ct 0x0033 200 200 140 > Pre-fail Always - 0 > 0 > 7 Seek_Error_Rate 0x002e 200 200 000 > Old_age Always - 0 > 0 > 9 Power_On_Hours 0x0032 051 051 000 > Old_age Always - 35966 31847 > 10 Spin_Retry_Count 0x0032 100 100 000 > Old_age Always - 0 > 0 > 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age > Always - 0 0 > 12 Power_Cycle_Count 0x0032 100 100 000 > Old_age Always - 267 62 > 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age > Always - 214 48 > 193 Load_Cycle_Count 0x0032 001 001 000 > Old_age Always - 4000934 2618304 > 194 Temperature_Celsius 0x0022 110 106 000 > Old_age Always - 37 33 > 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age > Always - 0 0 > 197 Current_Pending_Sector 0x0032 191 187 000 Old_age > Always - 757 0 > 198 Offline_Uncorrectable 0x0030 198 189 000 Old_age > Offline - 232 0 > 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age > Always - 0 0 > 200 Multi_Zone_Error_Rate 0x0008 197 193 000 Old_age > Offline - 723 0 > > SMART Error Log Version: 1 > ATA Error Count: 54177 (device log contains only the most recent five > errors) > CR = Command Register [HEX] > FR = Features Register [HEX] > SC = Sector Count Register [HEX] > SN = Sector Number Register [HEX] > CL = Cylinder Low Register [HEX] > CH = Cylinder High Register [HEX] > DH = Device/Head Register [HEX] > DC = Device Command Register [HEX] > ER = Error register [HEX] > ST = Status register [HEX] > Powered_Up_Time is measured from power on, and printed as > DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, > SS=sec, and sss=millisec. It "wraps" after 49.710 days. > > Error 54177 occurred at disk power-on lifetime: 35960 hours (1498 days + 8 > hours) > When the command that caused the error occurred, the device was active > or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 34 54 50 e0 Error: UNC at LBA = 0x00505434 = 5264436 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 00 20 54 50 e0 08 27d+02:56:44.789 READ DMA > ec 00 00 00 00 00 a0 08 27d+02:56:44.781 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 08 27d+02:56:44.781 SET FEATURES [Set transfer > mode] > > Error 54176 occurred at disk power-on lifetime: 35960 hours (1498 days + 8 > hours) > When the command that caused the error occurred, the device was active > or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 34 54 50 e0 Error: UNC at LBA = 0x00505434 = 5264436 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 00 20 54 50 e0 08 27d+02:56:41.592 READ DMA > ec 00 00 00 00 00 a0 08 27d+02:56:41.584 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 08 27d+02:56:41.584 SET FEATURES [Set transfer > mode] > > Error 54175 occurred at disk power-on lifetime: 35960 hours (1498 days + 8 > hours) > When the command that caused the error occurred, the device was active > or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 34 54 50 e0 Error: UNC at LBA = 0x00505434 = 5264436 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 00 20 54 50 e0 08 27d+02:56:38.459 READ DMA > ec 00 00 00 00 00 a0 08 27d+02:56:38.451 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 08 27d+02:56:38.451 SET FEATURES [Set transfer > mode] > > Error 54174 occurred at disk power-on lifetime: 35960 hours (1498 days + 8 > hours) > When the command that caused the error occurred, the device was active > or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 34 54 50 e0 Error: UNC at LBA = 0x00505434 = 5264436 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 00 20 54 50 e0 08 27d+02:56:35.518 READ DMA > ec 00 00 00 00 00 a0 08 27d+02:56:35.510 IDENTIFY DEVICE > ef 03 46 00 00 00 a0 08 27d+02:56:35.510 SET FEATURES [Set transfer > mode] > > Error 54173 occurred at disk power-on lifetime: 35960 hours (1498 days + 8 > hours) > When the command that caused the error occurred, the device was active > or idle. > > After command completion occurred, registers were: > ER ST SC SN CL CH DH > -- -- -- -- -- -- -- > 40 51 00 34 54 50 e0 Error: UNC at LBA = 0x00505434 = 5264436 > > Commands leading to the command that caused the error were: > CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name > -- -- -- -- -- -- -- -- ---------------- -------------------- > c8 00 00 20 54 50 e0 08 27d+02:56:32.353 READ DMA > c8 00 88 68 57 50 e0 08 27d+02:56:32.338 READ DMA > c8 00 20 00 54 50 e0 08 27d+02:56:31.798 READ DMA > > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining > LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed: read failure 90% > 35888 307316 > # 2 Short offline Completed: read failure 90% > 35887 330254 > # 3 Extended offline Completed: read failure 90% > 35887 410646 > > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. > > Thanks for you help > > Gary R. >