On Sat, Jun 20, 2015 at 11:35 AM, Theodore Ts'o <[email protected]> wrote:
> On Sat, Jun 20, 2015 at 11:05:31AM -0400, Zack Weinberg wrote:
>>
>> e2fsck successfully repairs both the skeleton image and the complete
>> partition image when they are on a known-good disk.
>
> OK, so this is a storage device issue.  I'd be taking a very jaundiced
> look at the reliability/correctness of your drives.
>
> It could be that they have a firmware bug in how they handle 512e
> emulation.  (See below.)  Or maybe one or more is starting to go bad.
> (Not all drive failures are predicted by S.M.A.R.T.  In fact, only
> about 50-66% of drive failures are predicted by SMART.  Think about
> that the next time you are tempted to skimp on backups.  :-)

Either is possible.  These are an identical pair of Western Digital
drives and they're about five years old.  They *claim* to have
512-byte physical sectors (per hdparm -I -- full dump at the bottom)
but I would totally believe they are faking that.  Also, the
computer's power supply failed catastrophically in the middle of a
system upgrade, which is how the root filesystem got so very
corrupted.  That could certainly have caused physical damage.  (The
drives are currently attached to a different computer for data
recovery.)

The fsck behavior I originally reported continues to be 100%
reproducible on the physical partition.  There are no hard errors in
the SMART logs for either drive.  (After I'm done copying data off the
/home partition, which was not corrupted, I will run extended
selftests.)  Before the catastrophic power supply failure, there were
no problems writing data to either filesystem inside the RAID array.
And the outer partitions are properly aligned.  Putting all of those
things together, I wonder whether this might be a bug in direct (not
filesystem) access to the block devices for misaligned partitions
within MD-RAID(0).

Is it possible for you to construct a similarly-misaligned partition
within an MD-RAID0 array, unpack the skeleton image I sent you into
that partition, and then try to reproduce my original fsck report on
that?  Do you need more information from me first?

...
> Yeah, that's not good.  Congratulation, whatever software set up your
> RAID configuration is as intelligent (or as obsolete) as Windows XP.
> Which explains why hard drive vendors are still selling 512e drives,
> although they devoutly wish they could stop.

In this case, that would have been cfdisk as of roughly 9 months ago,
and I *think* the problem was it didn't know what to do with an MD
device.  Notice how the outer partitions start at offset 2048 but the
inner partitions start at offset 63?

(The disks are much older than the installation because the computer
is secondhand, and had been completely wiped.)

---
# hdparm -I /dev/sdd

/dev/sdd:

ATA device, with non-removable media
    Model Number:       ST3320418AS
    Serial Number:      9VM5KB8B
    Firmware Revision:  CC44
    Transport:          Serial
Standards:
    Used: unknown (minor revision code 0x0029)
    Supported: 8 7 6 5
    Likely used: 8
Configuration:
    Logical        max    current
    cylinders    16383    16383
    heads        16    16
    sectors/track    63    63
    --
    CHS current addressable sectors:   16514064
    LBA    user addressable sectors:  268435455
    LBA48  user addressable sectors:  625142448
    Logical/Physical Sector size:           512 bytes
    device size with M = 1024*1024:      305245 MBytes
    device size with M = 1000*1000:      320072 MBytes (320 GB)
    cache/buffer size  = 16384 KBytes
    Nominal Media Rotation Rate: 7200
Capabilities:
    LBA, IORDY(can be disabled)
    Queue depth: 32
    Standby timer values: spec'd by Standard, no device specific minimum
    R/W multiple sector transfer: Max = 16    Current = 16
    Recommended acoustic management value: 208, current value: 254
    DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
         Cycle time: min=120ns recommended=120ns
    PIO: pio0 pio1 pio2 pio3 pio4
         Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
    Enabled    Supported:
       *    SMART feature set
            Security Mode feature set
       *    Power Management feature set
       *    Write cache
       *    Look-ahead
       *    Host Protected Area feature set
       *    WRITE_BUFFER command
       *    READ_BUFFER command
       *    DOWNLOAD_MICROCODE
            Power-Up In Standby feature set
            SET_FEATURES required to spinup after power up
            SET_MAX security extension
       *    Automatic Acoustic Management feature set
       *    48-bit Address feature set
       *    Device Configuration Overlay feature set
       *    Mandatory FLUSH_CACHE
       *    FLUSH_CACHE_EXT
       *    SMART error logging
       *    SMART self-test
       *    General Purpose Logging feature set
       *    WRITE_{DMA|MULTIPLE}_FUA_EXT
       *    64-bit World wide name
            Write-Read-Verify feature set
       *    WRITE_UNCORRECTABLE_EXT command
       *    {READ,WRITE}_DMA_EXT_GPL commands
       *    Segmented DOWNLOAD_MICROCODE
       *    Gen1 signaling speed (1.5Gb/s)
       *    Gen2 signaling speed (3.0Gb/s)
       *    Native Command Queueing (NCQ)
       *    Phy event counters
            Device-initiated interface power management
       *    Software settings preservation
       *    SMART Command Transport (SCT) feature set
       *    SCT Read/Write Long (AC1), obsolete
       *    SCT Write Same (AC2)
       *    SCT Error Recovery Control (AC3)
       *    SCT Features Control (AC4)
       *    SCT Data Tables (AC5)
            unknown 206[12] (vendor specific)
Security:
    Master password revision code = 65534
        supported
    not    enabled
    not    locked
        frozen
    not    expired: security count
        supported: enhanced erase
    62min for SECURITY ERASE UNIT. 62min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000c50019a09334
    NAA        : 5
    IEEE OUI    : 000c50
    Unique ID    : 019a09334
Checksum: correct

# hdparm -I /dev/sdd > /tmp/A
# hdparm -I /dev/sde > /tmp/B
# diff /tmp/{A,B}
2c2
< /dev/sdd:
---
> /dev/sde:
6c6
<     Serial Number:      9VM5KB8B
---
>     Serial Number:      9VM5KB9Y
87c87
< Logical Unit WWN Device Identifier: 5000c50019a09334
---
> Logical Unit WWN Device Identifier: 5000c50019a095ed
90c90
<     Unique ID    : 019a09334
---
>     Unique ID    : 019a095ed


-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to