Re: [zfs-discuss] [ZIL device brainstorm] intel x25-M G2 has ram cache?
--On 24 May 2010 23:41 -0400 rwali...@washdcmail.com wrote: I haven't seen where anyone has tested this, but the MemoRight SSD (sold by RocketDisk in the US) seems to claim all the right things: http://www.rocketdisk.com/vProduct.aspx?ID=1 pdf specs: http://www.rocketdisk.com/Local/Files/Product-PdfDataSheet-1_MemoRight%20 SSD%20GT%20Specification.pdf They claim to support the cache flush command, and with respect to DRAM cache backup they say (p. 14/section 3.9 in that pdf): At the risk of this getting a little off-topic (but hey, we're all looking for ZFS ZIL's ;) We've had similar issues when looking at SSD's recently (lack of cache protection during power failure) - the above SSD's look interesting [finally someone's noted you need to protect the cache] - but from what I've read about the Intel X25-E performance - the Intel drive with write cache turned off appears to be as fast, if not faster than those drives anyway... I've tried contacting Intel to find out if it's true their "enterprise" SSD has no cache protection on it, and what the effect of turning the write cache off would have on both performance and write endurance, but not heard anything back yet. Picking apart the Intel benchmarks published - they always have the write-cache enabled, which probably speaks volumes... -Karl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [ZIL device brainstorm] intel x25-M G2 has ram cache?
--On 25 May 2010 15:28 +0300 Pasi Kärkkäinen wrote: I've tried contacting Intel to find out if it's true their "enterprise" SSD has no cache protection on it, and what the effect of turning the write cache off would have on both performance and write endurance, but not heard anything back yet. I guess the problem is not the cache by itself, but the fact that they ignore the CACHE FLUSH command.. and thus the non-battery-backed cache becomes a problem. The X25-E's do apparently honour the 'Disable Write Cache' command - without write cache, there is no cache to flush - all data is written to flash immediately - presumably before it's ACK'd to the host. I've seen a number of other sites do some testing with this - and found that it 'works' (i.e. with write-cache enabled, you get nasty data loss if the power is lost - with it disabled, it closes that window). But you obviously take quite a sizeable performance hit. We've got an X25-E here which we intend to test for ourselves (wisely ;) - to make sure that is the case... -Karl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [ZIL device brainstorm] intel x25-M G2 has ram cache?
--On 25 May 2010 11:15 -0700 Brandon High wrote: On Tue, May 25, 2010 at 2:08 AM, Karl Pielorz wrote: I've tried contacting Intel to find out if it's true their "enterprise" SSD has no cache protection on it, and what the effect of turning the write The "E" in X25-E does not mean "enterprise". It means "extreme". Like the "EE" series CPUs that Intel offers. Yet most of their web site seems to aim it quite firmly at the 'Enterprise' market, "Imagine replacing up to 50 high-RPM hard disk drives with one Intel® X25-E Extreme SATA Solid-State Drive in your servers" or, "Enterprise applications place a premium on performance, reliability, power consumption and space." If you don't mind a little data loss risk? :) I'll post back when we've had a chance to try one in the 'real world' for our applications - with and without caching, especially when the plug gets pulled :) Otherwise, at least on the surface the quest for the 'perfect' (performance, safety, price, size) ZIL continues... -Karl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...
Hi All, I've been using ZFS for a while now - and everything's been going well. I use it under FreeBSD - but this question almost certainly should be the same answer, whether it's FreeBSD or Solaris (I think/hope :)... Imagine if I have a zpool with 2 disks in it, that are mirrored: " NAME STATE READ WRITE CKSUM vol ONLINE 0 0 0 mirrorONLINE 0 0 0 ad1 ONLINE 0 0 0 ad2 ONLINE 0 0 0 " (The device names are FreeBSD disks) If I offline 'ad2' - and then did: " dd if=/dev/ad1 of=/dev/ad2 " (i.e. make a mirror copy of ad1 to ad2 - on a *running* system). What would happen when I tried to 'online' ad2 again? I fully expect it might not be pleasant... I'm just curious as to what's going to happen. When I 'online' ad2 will ZFS look at it, and be clever enough to figure out the disk is obviously corrupt/unusable/has bad meta data on it - and resilver accordingly? Or is it going to see what it thinks is another 'ad1' and get a little upset? I'm trying to setup something here so I can test what happens - I just thought I'd ask around a bit to see if anyone knows what'll happen from past experience. Thanks, -Karl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...
--On 04 February 2010 11:31 + Karl Pielorz wrote: What would happen when I tried to 'online' ad2 again? A reply to my own post... I tried this out, when you make 'ad2' online again, ZFS immediately logs a 'vdev corrupt' failure, and marks 'ad2' (which at this point is a byte-for-byte copy of 'ad1' as it was being written to in background) as 'FAULTED' with 'corrupted data'. You can't "replace" it with itself at that point, but a detach on ad2, and then attaching ad2 back to ad1 results in a resilver, and recovery. So to answer my own question - from my tests it looks like you can do this, and "get away with it". It's probably not ideal, but it does work. A safer bet would be to detach the drive from the pool, and then re-attach it (at which point ZFS assumes it's a new drive and probably ignores the 'mirror image' data that's on it). -Karl (The reason for testing this is because of a weird RAID setup I have where if 'ad2' fails, and gets replaced - the RAID controller is going to mirror 'ad1' over to 'ad2' - and cannot be stopped. However, once the re-mirroring is complete the RAID controller steps out the way, and allows raw access to each disk in the mirror. Strange, a long story - but true). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...
--On 04 February 2010 08:58 -0500 Jacob Ritorto wrote: Seems your controller is actually doing only harm here, or am I missing something? The RAID controller presents the drives as both a mirrored pair, and JBOD - *at the same time*... The machine boots off the partition on the 'mirrored' pair - and ZFS uses the JBOD devices (a different area of, of course). It's a little weird to say the least - and I wouldn't recommend it, but it does work 'for me' - and is a way of getting the system to boot off a mirror, and still be able to use ZFS with only 2 drives available in the chassis. -Karl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting screwup. suggestions?
--On 23 August 2008 17:01 -0700 hunter morgan <[EMAIL PROTECTED]> wrote: > ok so i have 3 500gb hard drives in my freebsd fileserver. they are set > up in a pool as a raidz1 of 3 and another raidz1 of 2. like this: I'm guessing that's a typo - and you mean '5' hard drives, not 3 ;) > pool0 ONLINE 0 0 0 > raidz1ONLINE 0 0 0 > ad2 ONLINE 0 0 0 > ad4 ONLINE 0 0 0 > ad8 ONLINE 0 0 0 > raidz1ONLINE 0 0 0 > ad10ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > ideally i would like them to be in a single raidz2 vdev and its not time > for buying more hard drives yet. i was thinking worst case i would buy 5 > 500 gb hard drives and set up the raidz2 on them and move the data over > and then copy that setup back to the original drives and return the > bought ones but its a pain obviously. is there anyway i can just tell > zfs to make it magically do what i want? You cannot 'promote' a raidz1 to raidz2 - building the new array and shifting the data across is one way you can do this... Or, make sure your backup solution is good (you do have a backup? :), delete the current pool - re-create it the way you want, and restore from a backup. Just remember to make sure the backup is verifiably 'good' - and if possible do two :) -Kp ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs metada corrupted
--On 05 September 2008 07:37 -0700 Richard Elling <[EMAIL PROTECTED]> wrote: > Also, /dev/ad10 is something I don't recognize... what is it? > -- richard '/dev/ad10' is a FreeBSD disk device, which would kind of be fitting, as: LyeBeng Ong wrote: > I made a bad judgment and now my raidz pool is corrupted. I have a raidz > pool running on Opensolaris b85. I wanted to try out freenas 0.7 and > tried to add my pool to freenas. FreeNAS is FreeBSD based... -Kp ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?
Hi All, I run ZFS (a version 6 pool) under FreeBSD. Whilst I realise this changes a *whole heap* of things - I'm more interested in if I did 'anything wrong' when I had a recent drive failure... On of a mirrored pair of drives on the system started failing, badly (confirmed by 'hard' read & write erros logged to the console). ZFS also started showing errors, the machine started hanging, waiting for I/O's to complete (which is how I noticed it). How many errors does a drive have to throw before it's considered "failed" by ZFS? - Mine had got to about 30-40 [not a huge amount] - but was making the system unusable, so I manually attached another hot-spare drive to the 'good' device left in that mirrored pair. However, ZFS was still trying to read data off the failing drive - this pushed the re-silver time up to 755 hours, whilst the number of errors in the next forty minutes or so got to around 300. Not wanting my data unprotected for 755 odd hours (and fearing the number was just going up and up) I did: zpool detach vol ad4 ('ad4' was the failing drive). This hung all I/O on the pool :( - I waited 5 hours, and then decided to reboot. After the reboot the pool came back OK (with 'ad4' removed) and the re-silver continued, and completed in half an hour. Thinking about it - perhaps I should have detached ad4 (the failing drive) before attaching another device? - My thinking at the time was I didn't know how badly failed the drive was, and obviously removing what might have been 200Gb of 'perfectly' accessible data from a mirrored pair, prior to re-silvering to a replacement, didn't sit right. I'm hoping ZFS shouldn't have hung when I later decided to fix the situation, and remove ad4? -Kp ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Failing Drive procedure (mirrored pairs) - did I mess this up?
--On 08 September 2008 07:30 -0700 Richard Elling <[EMAIL PROTECTED]> wrote: > This seems like a reasonable process to follow, I would have done > much the same. > [caveat: I've not examined the FreeBSD ZFS port, the following > presumes the FreeBSD port is similar to the Solaris port] > ZFS does not have its own timeouts for this sort of problem. > It relies on the underlying device drivers to manage their > timeouts. So there was not much you could do at the ZFS level > other than detach the disk. Ok, I'm glad I'm finally getting the hang of ZFS, and 'did the right thing(tm)'. Is there any tunable on ZFS that will tell it "If you get more than x/y/z Read, Write or Checksum errors" - detach the drive as 'failed'? Maybe on a per-drive basis? It'd probably need some way for admin to override it (i.e. force it to be ignored)? - for those times where you either have to, or for a drive you know will at least stand a chance of reading the rest of the surface 'past' the errors. This would probably be set quite low for 'consumer' grade drives, and moderately higher for 'enterprise' drives that don't "go out to lunch" for extended periods while seeing if they can recover a block. You could even default it to 'infinity' if that's what the current level is. It'd certainly have saved me a lot of time if the number of errors on the drive had past a relatively low figure, and it just ditched the drive... One other random thought occurred to me when this happened - if I detach a drive, does ZFS have to update some meta-data on *all* the drives for that pool (including the one I've detached) to know it's been detached? (if that makes sense). That might explain why the 'detach' I issued just hung (if it had to update meta-data on the drive I was removing, it probably got caught in the wash of failing I/O timing out on that device). -Karl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems at 90% zpool capacity 2008.05
--On 06 January 2009 16:37 -0800 Carson Gaspar wrote: > On 1/6/2009 4:19 PM, Sam wrote: >> I was hoping that this was the problem (because just buying more >> discs is the cheapest solution given time=$$) but running it by >> somebody at work they said going over 90% can cause decreased >> performance but is unlikely to cause the strange errors I'm seeing. >> However, I think I'll stick a 1TB drive in as a new volume and pull >> some data onto it to bring the zpool down to<75% capacity and see if >> that helps though anyway. Probably update the OS to 2008.11 as >> well. > > Pool corruption is _always_ a bug. It may be ZFS, or your block devices, > but something is broken Agreed - it shouldn't break just because you're using over 90% - checking on one of my systems here I have: " Filesystem 1K-blocksUsed Avail Capacity Mounted on vol 2567606528 2403849728 16375680094%/vol " Been running like that for months without issue... Whilst it may not be 'ideal' to run it over 90% (I suspect it's worse for pools made up of different sized devices / redundancy) - it's not broken in any shape or form with gb's of reads/writes going to that file system. -Kp ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Changing all the underlying device names...
Hi All, I'm a new ZFS convert (so far, I've only been impressed by ZFS) - I'm running it under FreeBSD 7 atm. I've got to 'shuffle' all the underlying devices around on my raidz pool - so their device names will all either change (e.g. "da0" will become "ad4") - or the devices will get 'jumbled up' (e.g. "ad16" will become "ad22"). I've read bits and pieces about this - from what I've read, I need to do a 'zpool export' on the pool, shutdown the system - replace the controllers - run it all up, then do a 'zpool import'? The man page mentions, a 'zpool import' will search '/dev/dsk' or another directory I give it - here's hoping the FreeBSD port knows how to find the disks under /dev? :-) Finally - if I do this and it all goes horribly wrong, presumably putting the old controllers back in place, with the drives in the 'right' positions - a 'zpool import' will work? Cheers, -Kp ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on Freebsd 7.0
--On 07 December 2007 11:18 -0600 Jason Morton <[EMAIL PROTECTED]> wrote: > I am using ZFS on FreeBSD 7.0_beta3. This is the first time i have used > ZFS and I have run into something that I am not sure if this is normal, > but am very concerned about. > > SYSTEM INFO: > hp 320s (storage array) > 12 disks (750GB each) > 2GB RAM > 1GB flash drive (running the OS) Hi There, I've been running ZFS under FreeBSD 7.0 for a few months now, and we also have a lot of HP / Proliant Kit - and, touch wood, so far - we've not seen any issues. The first thing I'd suggest is make sure you have the absolutely *latest* firmware on the BIOS, and RAID controller (P400 I think the 320S is) from HP's site. We've had a number of problems with drives 'disappearing' array's locking, and errors with previous firmware in the past - which were all (finally) resolved by updated firmware. Even our latest delivered batch of 360's and 380's didn't have anything like 'current' firmware on. > When I take a disk offline and replace it with my spare, after the spare > rebuild it shows there are numerous errors. see below: > scrub: resilver completed with 946 errors on Thu Dec 6 15:15:32 2007 Being as they're checksum errors - they probably won't be logged on the console (as ZFS detected them, and not nesc. the underlying CAM layers) - but worth checking in case something "isn't happy". With that in mind - you might also want to check if there's anything in common with da3 and da6 - either in the physical drives, or where they are on the DSL320's drive bay/box allocations, as shown by the RAID controller config (F8 at boot time when the RAID is init'ing). -Kp ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Expanding a RAIDZ based Pool...
Hi, I've seen/read a number of articles on the net, about RAIDZ - and things like Dynamic Striping et'al. I know roughly how this works - but I can't seem to get to the bottom of expanding existing pool space, if this is even possible. e.g. If I build a RAIDZ pool with 5 * 400Gb drives, and later add a 6th 400Gb drive to this pool, will its space instantly be available to volumes using that pool? (I can't quite see this working myself) Other articles, talk about replacing one drive at a time, letting it re-silver, and at the end when the last drive is replaced, the space available to volumes will reflect the new pool size (i.e. replace each 400Gb device in turn with a 750Gb device - when the last one is done, you'll have a 5 * 750Gb pool with all the space (minus RAIDZ overhead) being available). I know I can add additional RAIDZ pools to the volume - but that's only any good for adding numbers of multiple drives, not singles (if you want to continue fault tolerance). Thanks, -Karl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss