On Sunday 23 June 2013, Pavel Machek wrote:
> On Sun 2013-06-23 17:27:52, Mark Lord wrote:
> > On 13-06-23 03:00 PM, Pavel Machek wrote:
> > > Thanks for the hint. (Insert rant about hdparm documentation
> > > explaining that it is bad idea, but not telling me _why_ is it bad
> > > idea. Can I expect cache consistency issues after that, or is it just
> > > simple "you are writing to the disk without any checks"? Plus, I guess
> > > documentation should mention what sector number is. I guess sectors
> > > are 512bytes for the old drives, but is it 512 or 4096 for new
> > > drives?)
> >
> > For ATA, use the "logical sector size".
> > For all existing drives out there, that's a 512 byte unit.
>
> I guessed so. (It would be good to actually document it, as well as
> documenting exactly why it is dangerous. Is it okay to send patches?)
>
> > > ...but it does not do the trick :-(. It behaves strangely as if it was
> > > still cached somewhere. Do I need to turn off the write back cache?
> >
> > No, it works just fine.  You probably have more than one bad sector.
> > After you see a read failure, run "smartctl -a" and look at the error
> > logs to see what sector the drive is choking on.
>
> Well, I definitely have more than one bad sector, but I did try to
> read exactly the same sector and it failed. See below.
>
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
>  961237188 /dev/sda | uniq
>
> /dev/sda:
> FAILED: Input/output error
> reading sector 961237188:
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --write-sector
> 961237188 /dev/sda
>
> /dev/sda:
> re-writing sector 961237188: succeeded
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
> 961237188 /dev/sda | uniq
>
> /dev/sda:
> FAILED: Input/output error
> reading sector 961237188:
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --write-sector
> 961237188 /dev/sda
>
> /dev/sda:
> re-writing sector 961237188: succeeded
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
> 961237188 /dev/sda | uniq
>
> /dev/sda:
> reading sector 961237188: succeeded
> 0000 0000 0000 0000 0000 0000 0000 0000
> root@amd:~# dd if=/dev/sda4 of=/dev/zero bs=4096
> skip=$[8958947328/4096]
> dd: reading `/dev/sda4': Input/output error
> 102+0 records in
> 102+0 records out
> 417792 bytes (418 kB) copied, 6.12536 s, 68.2 kB/s
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
> 961237188 /dev/sda | uniq
>
> /dev/sda:
> reading sector 961237188: succeeded
> 0000 0000 0000 0000 0000 0000 0000 0000
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
> 961237188 /dev/sda | uniq
>
> /dev/sda:
> reading sector 961237188: succeeded
> 0000 0000 0000 0000 0000 0000 0000 0000
> root@amd:~# hdparm --yes-i-know-what-i-am-doing  --read-sector
> 961237188 /dev/sda | uniq
>
> /dev/sda:
> FAILED: Input/output error
> reading sector 961237188:
> root@amd:~#
>
> > Or just low-level format it all with "hdparm --security-erase".
>
> I'd like to understand what is going on there. I can mark the blocks
> as bad at ext3 level, but I'd really like to understand what is going
> on there, and if it is hw issue, sata issue or block layer issue.
>
> (Plus, given that remapping does not work, I'd be afraid that it will
> kill the disk for good).
>
> The disk is
>
> root@amd:~# smartctl -a /dev/sda
> smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen,
> http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate Momentus 5400.6 series
> Device Model:     ST9500325AS
> Serial Number:    5VE41HDA
> Firmware Version: 0001SDM1
> User Capacity:    500,107,862,016 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4
> Local Time is:    Sun Jun 23 23:49:15 2013 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> Thanks for support,
>                                                                       Pavel

Being tired of using hdparm manually, I created a simple hdd_realloc utility
that reads the disk in big blocks (1 MB). When there's a read error, it reads
the failed block sector-by-sector and tries to rewrite the sectors that fail
to read. It work fine for disks with just a couple of pending sectors.

#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <time.h>
#include <unistd.h>

#define BLOCK_SIZE      1048576
#define SECTOR_SIZE     512

int main(int argc, char *argv[]) {
        if (argc < 2) {
                fprintf(stderr, "Usage: %s <device> [pos]\n", argv[0]);
                return 1;
        }
        int dev = open(argv[1], O_RDWR | O_DIRECT | O_SYNC);
        if (dev < 1) {
                perror("Unable to open device");
                return 2;
        }

        posix_fadvise(dev, 0, 0, POSIX_FADV_RANDOM);

        off64_t startpos = 0, pos = 0;
        if (argc > 2) {
                sscanf(argv[2], "%lld", &startpos);
        }
        pos = startpos;
        char *buf = valloc(BLOCK_SIZE);
        char *zeros = valloc(SECTOR_SIZE);
        if (!buf || !zeros) {
                fprintf(stderr, "Memory allocation error\n");
                return 2;
        }
        memset(zeros, 0, SECTOR_SIZE);

        time_t starttime = time(NULL);

        while (1) {
                
printf("\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b");
                printf("Position: %lld B (%lld MiB, %lld GiB, sector %lld), 
rate %lld MiB/s", pos, pos / 1024 / 1024,
                        pos / 1024 / 1024 / 1024, pos / SECTOR_SIZE,
                        (pos - startpos) / 1024 / 1024 / ((time(NULL) - 
starttime) ? (time(NULL) - starttime) : 1) );
                lseek64(dev, pos, SEEK_SET);
                int count = read(dev, buf, BLOCK_SIZE);
                if (count == 0) {/* EOF */
                        printf("End of disk\n");
                        break;
                }
                if (count < 0) { /* read error */
                        printf("\n");
                        perror("Read error");
                        printf("Examining %lld\n", pos);
                        for (int i = 0; i < BLOCK_SIZE/SECTOR_SIZE; i++) {
                                lseek64(dev, pos, SEEK_SET);
                                if (read(dev, buf, SECTOR_SIZE) < SECTOR_SIZE) {
                                        printf("Unable to read at %lld, 
rewriting...", pos);
                                        lseek64(dev, pos, SEEK_SET);
                                        int result = write(dev, zeros, 
SECTOR_SIZE);
                                        if (result < 0) {
                                                printf("write error\n");
                                        } else {
                                                lseek64(dev, pos, SEEK_SET);
                                                if (read(dev, buf, SECTOR_SIZE) 
< SECTOR_SIZE)
                                                        printf("read error 
after rewrite\n");
                                                else
                                                        printf("OK\n");
                                        }
                                }
                                pos += SECTOR_SIZE;
                        }
                } else /* no error */
                        pos += count;
        }

        return 0;
}


-- 
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to