Re: [zfs-discuss] booting from ashift=12 pool..

2011-07-29 Thread Hans Rosenfeld
On Fri, Jul 29, 2011 at 01:04:49AM -0400, Daniel Carosone wrote:
> .. evidently doesn't work.  GRUB reboots the machine moments after
> loading stage2, and doesn't recognise the fstype when examining the
> disk loaded from an alernate source.
> 
> This is with SX-151.  Here's hoping a future version (with grub2?)
> resolves this, as well as lets us boot from raidz.
> 
> Just a note for the archives in case it helps someone else get back
> the afternoon I just burnt.

I've noticed this behaviour this morning and have been debugging it
since. I found out that, for some unknown reason, grub fails to get the
disk geometry, assumes 0 sectors/track and then does a divide-by-zero.

I don't think this is a zfs issue.

Hans


-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] booting from ashift=12 pool..

2011-07-29 Thread Fajar A. Nugraha
On Fri, Jul 29, 2011 at 4:57 PM, Hans Rosenfeld  wrote:
> On Fri, Jul 29, 2011 at 01:04:49AM -0400, Daniel Carosone wrote:
>> .. evidently doesn't work.  GRUB reboots the machine moments after
>> loading stage2, and doesn't recognise the fstype when examining the
>> disk loaded from an alernate source.
>>
>> This is with SX-151.  Here's hoping a future version (with grub2?)
>> resolves this, as well as lets us boot from raidz.
>>
>> Just a note for the archives in case it helps someone else get back
>> the afternoon I just burnt.
>
> I've noticed this behaviour this morning and have been debugging it
> since. I found out that, for some unknown reason, grub fails to get the
> disk geometry, assumes 0 sectors/track and then does a divide-by-zero.
>
> I don't think this is a zfs issue.

If the problem is on zfs code in grub/grub2, then it should be zfs issue, right?

Anyway, for comparison purposes, with ubuntu + grub2 + zfsonlinux
(which can force ashift at pool creation time) + zfs root,  grub2
won't even install on pools with ashift=12, while it works just fine
with ashift=9. There were also booting problems if you've scrubbed
rpool.

Does zfs code for grub/grub2 also depend on Oracle releasing updates,
or is it simply a matter of no one with enough skill have looked into
it yet?

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] booting from ashift=12 pool..

2011-07-29 Thread Hans Rosenfeld
On Fri, Jul 29, 2011 at 10:22:27AM -0400, Fajar A. Nugraha wrote:
> On Fri, Jul 29, 2011 at 4:57 PM, Hans Rosenfeld  
> wrote:
> > I've noticed this behaviour this morning and have been debugging it
> > since. I found out that, for some unknown reason, grub fails to get the
> > disk geometry, assumes 0 sectors/track and then does a divide-by-zero.
> >
> > I don't think this is a zfs issue.
> 
> If the problem is on zfs code in grub/grub2, then it should be zfs issue, 
> right?

I thought that due to the geometry stuff the zfs code never runs, but
after some more debugging I know that was wrong. These are in fact two
unrelated problems.

> Anyway, for comparison purposes, with ubuntu + grub2 + zfsonlinux
> (which can force ashift at pool creation time) + zfs root,  grub2
> won't even install on pools with ashift=12, while it works just fine
> with ashift=9. There were also booting problems if you've scrubbed
> rpool.
> 
> Does zfs code for grub/grub2 also depend on Oracle releasing updates,
> or is it simply a matter of no one with enough skill have looked into
> it yet?

I'm working on a patch for grub that fixes the ashift=12 issue. I'm
probably not going to fix the div-by-zero reboot.


Hans


-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] booting from ashift=12 pool..

2011-07-29 Thread Hans Rosenfeld
On Fri, Jul 29, 2011 at 04:36:33PM +0200, Hans Rosenfeld wrote:
> > Does zfs code for grub/grub2 also depend on Oracle releasing
> > updates,
> > or is it simply a matter of no one with enough skill have looked
> > into
> > it yet?
> 
> I'm working on a patch for grub that fixes the ashift=12 issue. I'm
> probably not going to fix the div-by-zero reboot.

If you want to try it, the patch can be found at
http://cr.illumos.org/view/6qc99xkh/illumos-1303-webrev/illumos-1303-webrev.patch


Hans


-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Gen-ATA read sector errors

2011-07-29 Thread Richard Elling
On Jul 28, 2011, at 4:55 AM, Koopmann, Jan-Peter wrote:

> Hi,
> 
> my system is running oi148 on a super micro X8SIL-F board. I have two pools 
> (2 disc mirror, 4 disc RAIDZ) with RAID level SATA drives. (Hitachi HUA72205 
> and SAMSUNG HE103UJ).  The system runs as expected however every few days 
> (sometimes weeks) the system comes to a halt due to these errors:
> 
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.warning] WARNING: 
> /pci@0,0/pci-ide@1f,2/ide@1/cmdk@0,0 (Disk1):
> Dec  3 13:51:20 nasjpk  Error for commandX \'read sector\' Error Level: Fatal
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice] Requested Block 
> 5503936, Error Block: 5503936
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice] Sense Key: 
> uncorrectable data error
> Dec  3 13:51:20 nasjpk gda: [ID 107833 kern.notice] Vendor \'Gen-ATA \' 
> error code: XX7

Several things:

1. You are using SATA in IDE-compatibility mode.  Usually this is a BIOS setting
and for most BIOSes, IDE-compatibility mode is the default. Change to AHCI 
is an improvement that includes better error monitoring.

2. In this case, the disk is returning an unrecoverable read error. This is the 
most
common error for modern HDDs.

3. When #2 happens, consumer-grade disks can get stuck retrying forever. 
Enterprise-class drives have limited retry. For the retry-forever disks, the OS
is responsible for ultimately timing out the I/O attempt. For many Solaris 
releases,
the default retry/timeout cycle lasts 3 to 5 minutes. Because of #1, the disk 
cannot
service more than one outstanding I/O, so all I/O to the disk is blocked, 
impacting
the rest of the pool.

> 
> It is not related to this one disk. It happens on all disks. Sometimes 
> several are listed before the system "crashes", sometimes just one. I cannot 
> pinpoint it to a single defect disk though (and already have replaced the 
> disks). I suspect that this is an error with the SATA controller or the 
> driver. Can someone give me a hint on whether or not that assumption sounds 
> feasible? I am planning on getting a new "cheap" 6-8 way SATA2 or SATA3 
> controller and switch over the drives to that controller. If it is 
> driver/controller related the problem should disappear. Is it possible to 
> simply reconnect the drives and all is going to be well or will I have to 
> reinstall due to different SATA "layouts" on the disks or alike? 

The ease of migration depends on your HBA and whether it writes metadata
that is not compatible with other HBAs. For simple HBAs, it is quite common for
disks to be migrated to other machines and the pool imported.

HTH,
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Gen-ATA read sector errors

2011-07-29 Thread Richard Elling
Thanks Jens,
I have a vdbench profile and script that will run the new SNIA Solid State 
Storage (SSS)
Performance Test Suite (PTS). I'd be happy to share if anyone is interested.
 -- richard

On Jul 28, 2011, at 7:10 AM, Jens Elkner wrote:

> Hi,
> 
> Roy Sigurd Karlsbakk wrote:
>> Crucial RealSSD C300 has been released and showing good numbers for use as 
>> Zil and L2ARC. Does anyone know if this unit flushes its cache on request, 
>> as opposed to Intel units etc?
>> 
> 
> I had a chance to get my hands on a Crucial RealSSD C300/128MB yesterday and 
> did
> some quick testing. Here are the numbers first, some explanation follows 
> below:
> 
> cache enabled, 32 buffers:
> Linear read, 64k blocks: 134 MB/s
> random read, 64k blocks: 134 MB/s
> linear read, 4k blocks: 87 MB/s
> random read, 4k blocks: 87 MB/s
> linear write, 64k blocks: 107 MB/s
> random write, 64k blocks: 110 MB/s
> linear write, 4k blocks: 76 MB/s
> random write, 4k blocks: 32 MB/s
> 
> cache enabled, 1 buffer:
> linear write, 4k blocks: 51 MB/s (12800 ops/s)
> random write, 4k blocks: 7 MB/s (1750 ops/s)
> linear write, 64k blocks: 106 MB/s (1610 ops/s)
> random write, 64k blocks: 59 MB/s (920 ops/s)
> 
> cache disabled, 1 buffer:
> linear write, 4k blocks: 4.2 MB/s (1050 ops/s)
> random write, 4k blocks: 3.9 MB/s (980 ops/s)
> linear write, 64k blocks: 40 MB/s (650 ops/s)
> random write, 64k blocks: 40 MB/s (650 ops/s)
> 
> cache disabled, 32 buffers:
> linear write, 4k blocks: 4.5 MB/s, 1120 ops/s
> random write, 4k blocks: 4.2 MB/s, 1050 ops/s
> linear write, 64k blocks: 43 MB/s, 680 ops/s
> random write, 64k blocks: 44 MB/s, 690 ops/s
> 
> cache enabled, 1 buffer, with cache flushes
> linear write, 4k blocks, flush after every write: 1.5 MB/s, 385 writes/s
> linear write, 4k blocks, flush after every 4th write: 4.2 MB/s, 1120 writes/s
> 
> 
> The numbers are rough numbers read quickly from iostat, so please don't
> multiply block size by ops and compare with the bandwidth given ;)
> The test operates directly on top of LDI, just like ZFS.
> - "nk blocks" means the size of each read/write given to the device driver
> - "n buffers" means the number of buffers I keep in flight. This is to keep
>   the command queue of the device busy
> - "cache flush" means a synchronous ioctl DKIOCFLUSHWRITECACHE
> 
> These numbers contain a few surprises (at least for me). The biggest surprise
> is that with cache disabled one cannot get good data rates with small blocks,
> even if one keeps the command queue filled. This is completely different from
> what I've seen from hard drives.
> Also the IOPS with cache flushes is quite low, 385 is not much better than
> a 15k hdd, while the latter scales better. On the other hand, from the large
> drop in performance when using flushes one could infer that they indeed flush
> properly, but I haven't built a test setup for that yet.
> 
> Conclusion: From the measurements I'd infer the device makes a good L2ARC,
> but for a slog device the latency is too high and it doesn't scale well.
> 
> I'll do similar tests on a x-25 and ocz vertex 2 pro as soon as they arrive.
> 
> If there are numbers you are missing please tell me, I'll measure them if
> possible. Also please ask if there are questions regarding the test setup.
> 
> --
> Arne
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss