On Tue, Aug 4, 2009 at 10:40 AM, erik.ableson<eable...@mac.com> wrote:
> You're running into the same problem I had with 2009.06 as they have
> "corrected" a bug where the iSCSI target prior to
> 2009.06 didn't honor completely SCSI sync commands issued by the initiator.
> Some background :
> Discussion:
> http://opensolaris.org/jive/thread.jspa?messageID=388492
> "corrected bug"
> http://bugs.opensolaris.org/view_bug.do?bug_id=6770534

But this MUST happen. If it doesn't then you are playing Russian
Roulette with your data, as a kernel panic can cause a loss of up to
1/8 of the size of your system's RAM (ZFS lazy write cache) of your
iSCSI target's data!

> The upshot is that unless you have an SSD (or other high speed dedicated
> device) attached as a ZIL (or slog) on 2009.06 you won't see anywhere near
> the local speed performance that the storage is capable of since you're
> forcing individual transactions all the way down to disk and back up before
> moving onto the next SCSI block command.

Actually I recommend using a controller with an NVRAM cache on it, say
256MB-512MB (or more).

This is much faster then SSD and has the advantage that the ZIL is
stripped across the pool making ZIL reads much faster!

You don't need to use the hardware raid, export the drives as JBOD or
individual RAID0 luns and make a zpool out of them.

> This iSCSI performance profile is currently specific to 2009.06 and does not
> occur on 2008.11.  As a stopgap (since I don't have a budget for SSDs right
> now) I'm keeping my production servers on 2008.11 (taking into account the
> additional potential risk, but these are machines with battery backed SAS
> cards in a conditioned data center). These machines are serving up iSCSI to
> ESX 3.5 and ESX 4 servers.

God I hope not.

Tick-tock, eventually you will corrupt your iSCSI data with that
setup, it's not a matter of if, it's a matter of when.

> For my freewheeling home use where everything gets tried, crashed, patched
> and put back together with baling twine (and is backed up elsewhere...) I've
> mounted a RAM disk of 1Gb which is attached to the pool as a ZIL and you see
> the performance run in cycles where the ZIL loads up to saturation, flushes
> out to disk and keeps going. I did write a script to regularly dd the ram
> disk device out to a file so that I can recreate with the appropriate
> signature if I have to reboot the osol box. This is used with the GlobalSAN
> initiator on OS X as well as various Windows and Linux machines, physical
> and VM.
> Assuming this is a test system that you're playing with and you can destroy
> the pool with inpunity, and you don't have an SSD lying around to test with,
> try the following :
> ramdiskadm -a slog 2g (or whatever size you can manage reasonably with the
> available physical RAM - try "vmstat 1 2" to determine available memory)
> zpool add <poolname> log /dev/ramdisk/slog
> If you want to perhaps reuse the slog later (ram disks are not preserved
> over reboot) write the slog volume out to disk and dump it back in after
> restarting.
>  dd if=/dev/ramdisk/slog of=/root/slog.dd

You might as well use a ramdisk ZIL in production too with 2008.11 ZVOLs.

> All of the above assumes that you are not doing this stuff against rpool.  I
> think that attaching a volatile log device to your boot pool would result in
> a machine that can't mount the root zfs volume.

I think you can re-create the ramdisk and do a replace to bring the pool online.

Just don't do it with your rpool or you will be in a world of hurt.

> It's easiest to monitor from the Mac (I find) so try your test again with
> the Activity Monitor showing network traffic and you'll see that it goes to
> a wire speed ceiling while it's filling up the ZIL and once it's saturated
> your traffic will drop to near nothing, and then pick up again after a few
> seconds. If you don't saturate the ZIL you'll see continuous speed data
> transfer.

I also use a network activity monitor for quick estimate of throughput
while running. Works good on Mac, Windows (Task Manager) or Linux
(variety, GUI sysstat, ntop, etc).

-Ross
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to