Re: [zfs-discuss] [ZIL device brainstorm] intel x25-M G2 has ram cache?

2010-05-25 Thread Karl Pielorz



--On 24 May 2010 23:41 -0400 rwali...@washdcmail.com wrote:


I haven't seen where anyone has tested this, but the MemoRight SSD (sold
by RocketDisk in the US) seems to claim all the right things:

http://www.rocketdisk.com/vProduct.aspx?ID=1

pdf specs:

http://www.rocketdisk.com/Local/Files/Product-PdfDataSheet-1_MemoRight%20
SSD%20GT%20Specification.pdf

They claim to support the cache flush command, and with respect to DRAM
cache backup they say (p. 14/section 3.9 in that pdf):


At the risk of this getting a little off-topic (but hey, we're all looking 
for ZFS ZIL's ;) We've had similar issues when looking at SSD's recently 
(lack of cache protection during power failure) - the above SSD's look 
interesting [finally someone's noted you need to protect the cache] - but 
from what I've read about the Intel X25-E performance - the Intel drive 
with write cache turned off appears to be as fast, if not faster than those 
drives anyway...


I've tried contacting Intel to find out if it's true their "enterprise" SSD 
has no cache protection on it, and what the effect of turning the write 
cache off would have on both performance and write endurance, but not heard 
anything back yet.


Picking apart the Intel benchmarks published - they always have the 
write-cache enabled, which probably speaks volumes...


-Karl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [ZIL device brainstorm] intel x25-M G2 has ram cache?

2010-05-25 Thread Karl Pielorz


--On 25 May 2010 15:28 +0300 Pasi Kärkkäinen  wrote:


I've tried contacting Intel to find out if it's true their "enterprise"
SSD has no cache protection on it, and what the effect of turning the
write cache off would have on both performance and write endurance, but
not heard anything back yet.



I guess the problem is not the cache by itself, but the fact that they
ignore the CACHE FLUSH command.. and thus the non-battery-backed cache
becomes a problem.


The X25-E's do apparently honour the 'Disable Write Cache' command - 
without write cache, there is no cache to flush - all data is written to 
flash immediately - presumably before it's ACK'd to the host.


I've seen a number of other sites do some testing with this - and found 
that it 'works' (i.e. with write-cache enabled, you get nasty data loss if 
the power is lost - with it disabled, it closes that window). But you 
obviously take quite a sizeable performance hit.


We've got an X25-E here which we intend to test for ourselves (wisely ;) - 
to make sure that is the case...


-Karl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [ZIL device brainstorm] intel x25-M G2 has ram cache?

2010-05-25 Thread Karl Pielorz


--On 25 May 2010 11:15 -0700 Brandon High  wrote:


On Tue, May 25, 2010 at 2:08 AM, Karl Pielorz 
wrote:

I've tried contacting Intel to find out if it's true their "enterprise"
SSD has no cache protection on it, and what the effect of turning the
write


The "E" in X25-E does not mean "enterprise". It means "extreme". Like
the "EE" series CPUs that Intel offers.


Yet most of their web site seems to aim it quite firmly at the 'Enterprise' 
market, "Imagine replacing up to 50 high-RPM hard disk drives with one 
Intel® X25-E Extreme SATA Solid-State Drive in your servers" or, 
"Enterprise applications place a premium on performance, reliability, power 
consumption and space."


If you don't mind a little data loss risk? :)

I'll post back when we've had a chance to try one in the 'real world' for 
our applications - with and without caching, especially when the plug gets 
pulled :)


Otherwise, at least on the surface the quest for the 'perfect' 
(performance, safety, price, size) ZIL continues...


-Karl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...

2010-02-04 Thread Karl Pielorz


Hi All,

I've been using ZFS for a while now - and everything's been going well. I 
use it under FreeBSD - but this question almost certainly should be the 
same answer, whether it's FreeBSD or Solaris (I think/hope :)...



Imagine if I have a zpool with 2 disks in it, that are mirrored:

"
NAME STATE READ WRITE CKSUM
vol  ONLINE   0 0 0
  mirrorONLINE   0 0 0
ad1 ONLINE   0 0 0
ad2 ONLINE   0 0 0
"

(The device names are FreeBSD disks)

If I offline 'ad2' - and then did:

"
dd if=/dev/ad1 of=/dev/ad2
"

(i.e. make a mirror copy of ad1 to ad2 - on a *running* system).


What would happen when I tried to 'online' ad2 again?


I fully expect it might not be pleasant... I'm just curious as to what's 
going to happen.



When I 'online' ad2 will ZFS look at it, and be clever enough to figure out 
the disk is obviously corrupt/unusable/has bad meta data on it - and 
resilver accordingly?


Or is it going to see what it thinks is another 'ad1' and get a little 
upset?



I'm trying to setup something here so I can test what happens - I just 
thought I'd ask around a bit to see if anyone knows what'll happen from 
past experience.



Thanks,

-Karl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...

2010-02-04 Thread Karl Pielorz


--On 04 February 2010 11:31 + Karl Pielorz  
wrote:



What would happen when I tried to 'online' ad2 again?


A reply to my own post... I tried this out, when you make 'ad2' online 
again, ZFS immediately logs a 'vdev corrupt' failure, and marks 'ad2' 
(which at this point is a byte-for-byte copy of 'ad1' as it was being 
written to in background) as 'FAULTED' with 'corrupted data'.


You can't "replace" it with itself at that point, but a detach on ad2, and 
then attaching ad2 back to ad1 results in a resilver, and recovery.


So to answer my own question - from my tests it looks like you can do this, 
and "get away with it". It's probably not ideal, but it does work.


A safer bet would be to detach the drive from the pool, and then re-attach 
it (at which point ZFS assumes it's a new drive and probably ignores the 
'mirror image' data that's on it).


-Karl

(The reason for testing this is because of a weird RAID setup I have where 
if 'ad2' fails, and gets replaced - the RAID controller is going to mirror 
'ad1' over to 'ad2' - and cannot be stopped. However, once the re-mirroring 
is complete the RAID controller steps out the way, and allows raw access to 
each disk in the mirror. Strange, a long story - but true).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...

2010-02-04 Thread Karl Pielorz


--On 04 February 2010 08:58 -0500 Jacob Ritorto  
wrote:



Seems your controller is actually doing only harm here, or am I missing
something?


The RAID controller presents the drives as both a mirrored pair, and JBOD - 
*at the same time*...


The machine boots off the partition on the 'mirrored' pair - and ZFS uses 
the JBOD devices (a different area of, of course).


It's a little weird to say the least - and I wouldn't recommend it, but it 
does work 'for me' - and is a way of getting the system to boot off a 
mirror, and still be able to use ZFS with only 2 drives available in the 
chassis.


-Karl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Interesting screwup. suggestions?

2008-08-24 Thread Karl Pielorz


--On 23 August 2008 17:01 -0700 hunter morgan <[EMAIL PROTECTED]> 
wrote:

> ok so i have 3 500gb hard drives in my freebsd fileserver.  they are set
> up in a pool as a raidz1 of 3 and another raidz1 of 2.  like this:

I'm guessing that's a typo - and you mean '5' hard drives, not 3 ;)

> pool0   ONLINE   0 0 0
>   raidz1ONLINE   0 0 0
> ad2 ONLINE   0 0 0
> ad4 ONLINE   0 0 0
> ad8 ONLINE   0 0 0
>   raidz1ONLINE   0 0 0
> ad10ONLINE   0 0 0
> ad6 ONLINE   0 0 0
> ideally i would like them to be in a single raidz2 vdev and its not time
> for buying more hard drives yet.  i was thinking worst case i would buy 5
> 500 gb hard drives and set up the raidz2 on them and move the data over
> and then copy that setup back to the original drives and return the
> bought ones but its a pain obviously.  is there anyway i can just tell
> zfs to make it magically do what i want?

You cannot 'promote' a raidz1 to raidz2 - building the new array and 
shifting the data across is one way you can do this... Or, make sure your 
backup solution is good (you do have a backup? :), delete the current pool 
- re-create it the way you want, and restore from a backup.

Just remember to make sure the backup is verifiably 'good' - and if 
possible do two :)

-Kp

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs metada corrupted

2008-09-05 Thread Karl Pielorz


--On 05 September 2008 07:37 -0700 Richard Elling <[EMAIL PROTECTED]> 
wrote:

> Also, /dev/ad10 is something I don't recognize... what is it?
> -- richard

'/dev/ad10' is a FreeBSD disk device, which would kind of be fitting, as:

LyeBeng Ong wrote:
> I made a bad judgment and  now my raidz pool is corrupted. I have a raidz
> pool running on Opensolaris b85.  I wanted to try out freenas 0.7 and
> tried to add my pool to freenas.

FreeNAS is FreeBSD based...

-Kp
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

2008-09-08 Thread Karl Pielorz

Hi All,

I run ZFS (a version 6 pool) under FreeBSD. Whilst I realise this changes a 
*whole heap* of things - I'm more interested in if I did 'anything wrong' 
when I had a recent drive failure...

On of a mirrored pair of drives on the system started failing, badly 
(confirmed by 'hard' read & write erros logged to the console). ZFS also 
started showing errors, the machine started hanging, waiting for I/O's to 
complete (which is how I noticed it).

How many errors does a drive have to throw before it's considered "failed" 
by ZFS? - Mine had got to about 30-40 [not a huge amount] - but was making 
the system unusable, so I manually attached another hot-spare drive to the 
'good' device left in that mirrored pair.

However, ZFS was still trying to read data off the failing drive - this 
pushed the re-silver time up to 755 hours, whilst the number of errors in 
the next forty minutes or so got to around 300. Not wanting my data 
unprotected for 755 odd hours (and fearing the number was just going up and 
up) I did:

  zpool detach vol ad4

('ad4' was the failing drive).

This hung all I/O on the pool :( - I waited 5 hours, and then decided to 
reboot.

After the reboot the pool came back OK (with 'ad4' removed) and the 
re-silver continued, and completed in half an hour.

Thinking about it - perhaps I should have detached ad4 (the failing drive) 
before attaching another device? - My thinking at the time was I didn't 
know how badly failed the drive was, and obviously removing what might have 
been 200Gb of 'perfectly' accessible data from a mirrored pair, prior to 
re-silvering to a replacement, didn't sit right.

I'm hoping ZFS shouldn't have hung when I later decided to fix the 
situation, and remove ad4?

-Kp
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?

2008-09-08 Thread Karl Pielorz


--On 08 September 2008 07:30 -0700 Richard Elling <[EMAIL PROTECTED]> 
wrote:

> This seems like a reasonable process to follow, I would have done
> much the same.

> [caveat: I've not examined the FreeBSD ZFS port, the following
> presumes the FreeBSD port is similar to the Solaris port]
> ZFS does not have its own timeouts for this sort of problem.
> It relies on the underlying device drivers to manage their
> timeouts.  So there was not much you could do at the ZFS level
> other than detach the disk.

Ok, I'm glad I'm finally getting the hang of ZFS, and 'did the right 
thing(tm)'.

Is there any tunable on ZFS that will tell it "If you get more than x/y/z 
Read, Write or Checksum errors" - detach the drive as 'failed'? Maybe on a 
per-drive basis?

It'd probably need some way for admin to override it (i.e. force it to be 
ignored)? - for those times where you either have to, or for a drive you 
know will at least stand a chance of reading the rest of the surface 'past' 
the errors.

This would probably be set quite low for 'consumer' grade drives, and 
moderately higher for 'enterprise' drives that don't "go out to lunch" for 
extended periods while seeing if they can recover a block. You could even 
default it to 'infinity' if that's what the current level is.

It'd certainly have saved me a lot of time if the number of errors on the 
drive had past a relatively low figure, and it just ditched the drive...

One other random thought occurred to me when this happened - if I detach a 
drive, does ZFS have to update some meta-data on *all* the drives for that 
pool (including the one I've detached) to know it's been detached? (if that 
makes sense).

That might explain why the 'detach' I issued just hung (if it had to update 
meta-data on the drive I was removing, it probably got caught in the wash 
of failing I/O timing out on that device).

-Karl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05

2009-01-07 Thread Karl Pielorz


--On 06 January 2009 16:37 -0800 Carson Gaspar  wrote:

> On 1/6/2009 4:19 PM, Sam wrote:
>> I was hoping that this was the problem (because just buying more
>> discs is the cheapest solution given time=$$) but running it by
>> somebody at work they said going over 90% can cause decreased
>> performance but is unlikely to cause the strange errors I'm seeing.
>> However, I think I'll stick a 1TB drive in as a new volume and pull
>> some data onto it to bring the zpool down to<75% capacity and see if
>> that helps though anyway.  Probably update the OS to 2008.11 as
>> well.
>
> Pool corruption is _always_ a bug. It may be ZFS, or your block devices,
> but something is broken

Agreed - it shouldn't break just because you're using over 90% - checking 
on  one of my systems here I have:

"
Filesystem   1K-blocksUsed Avail Capacity  Mounted on
vol  2567606528 2403849728 16375680094%/vol
"

Been running like that for months without issue... Whilst it may not be 
'ideal' to run it over 90% (I suspect it's worse for pools made up of 
different sized devices / redundancy) - it's not broken in any shape or 
form with gb's of reads/writes going to that file system.

-Kp
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Changing all the underlying device names...

2007-11-29 Thread Karl Pielorz

Hi All,

I'm a new ZFS convert (so far, I've only been impressed by ZFS) - I'm 
running it under FreeBSD 7 atm.

I've got to 'shuffle' all the underlying devices around on my raidz pool - 
so their device names will all either change (e.g. "da0" will become "ad4") 
- or the devices will get 'jumbled up' (e.g. "ad16" will become "ad22").

I've read bits and pieces about this - from what I've read, I need to do a 
'zpool export' on the pool, shutdown the system - replace the controllers - 
run it all up, then do a 'zpool import'?

The man page mentions, a 'zpool import' will search '/dev/dsk' or another 
directory I give it - here's hoping the FreeBSD port knows how to find the 
disks under /dev? :-)


Finally - if I do this and it all goes horribly wrong, presumably putting 
the old controllers back in place, with the drives in the 'right' positions 
- a 'zpool import' will work?

Cheers,

-Kp
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Freebsd 7.0

2007-12-07 Thread Karl Pielorz


--On 07 December 2007 11:18 -0600 Jason Morton 
<[EMAIL PROTECTED]> wrote:

> I am using ZFS on FreeBSD 7.0_beta3. This is the first time i have used
> ZFS and I have run into something that I am not sure if this is normal,
> but am very concerned about.
>
> SYSTEM INFO:
> hp 320s (storage array)
> 12 disks (750GB each)
> 2GB RAM
> 1GB flash drive (running the OS)

Hi There,

I've been running ZFS under FreeBSD 7.0 for a few months now, and we also 
have a lot of HP / Proliant Kit - and, touch wood, so far - we've not seen 
any issues.

The first thing I'd suggest is make sure you have the absolutely *latest* 
firmware on the BIOS, and RAID controller (P400 I think the 320S is) from 
HP's site. We've had a number of problems with drives 'disappearing' 
array's locking, and errors with previous firmware in the past - which were 
all (finally) resolved by updated firmware. Even our latest delivered batch 
of 360's and 380's didn't have anything like 'current' firmware on.

> When I take a disk offline and replace it with my spare, after the spare
> rebuild it shows there are numerous errors. see below:
> scrub: resilver completed with 946 errors on Thu Dec  6 15:15:32 2007

Being as they're checksum errors - they probably won't be logged on the 
console (as ZFS detected them, and not nesc. the underlying CAM layers) - 
but worth checking in case something "isn't happy".

With that in mind - you might also want to check if there's anything in 
common with da3 and da6 - either in the physical drives, or where they are 
on the DSL320's drive bay/box allocations, as shown by the RAID controller 
config (F8 at boot time when the RAID is init'ing).

-Kp
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Expanding a RAIDZ based Pool...

2007-12-10 Thread Karl Pielorz

Hi,

I've seen/read a number of articles on the net, about RAIDZ - and things 
like Dynamic Striping et'al. I know roughly how this works - but I can't 
seem to get to the bottom of expanding existing pool space, if this is even 
possible.

e.g. If I build a RAIDZ pool with 5 * 400Gb drives, and later add a 6th 
400Gb drive to this pool, will its space instantly be available to volumes 
using that pool? (I can't quite see this working myself)

Other articles, talk about replacing one drive at a time, letting it 
re-silver, and at the end when the last drive is replaced, the space 
available to volumes will reflect the new pool size (i.e. replace each 
400Gb device in turn with a 750Gb device - when the last one is done, 
you'll have a 5 * 750Gb pool with all the space (minus RAIDZ overhead) 
being available).

I know I can add additional RAIDZ pools to the volume - but that's only any 
good for adding numbers of multiple drives, not singles (if you want to 
continue fault tolerance).

Thanks,

-Karl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss