Re: [zfs-discuss] [osol-discuss] Moving Storage to opensolaris+zfs. What about backup?

2010-03-08 Thread erik.ableson
On 8 mars 2010, at 11:33, Svein Skogen wrote:

> Let's say for a moment I should go for this solution, with the rpool tucked 
> away on an usb-stick in the same case as the LTO-3 tapes it "matches" 
> timelinewise (I'm using HP C8017A  kits) as a zfs send -R to a file on the 
> USB stick. (If, and that's a big if, I get amanda or bacula to do a job I'm 
> comfortable with that has been verified. Not a stab at those software 
> projects, more a stab at them being an unknown entity for me), how would I go 
> about restoring:
> 
> a) the boot record
> b) the rpool (and making it actually bootable off the usb stick)
> c) the storage zpool (probably after I get the system back up after a and b, 
> but please humor me).

Assuming that you use a USB key/external drive for booting, all you need to do 
is dd it to an identically sized one while the key is not the current boot 
volume (dd if=/dev/disk1 of=/dev/disk2 while on your computer), and there you 
have your boot record and your rpool. Stick in the backup key, tell your BIOS 
to boot from a USB device and you're running. This requires downtime while 
you're creating the copy.

If the disks that make up your storage zpool are still available, it will 
probably automount without any difficulty, or worst case, you'll need to do a 
zpool import -f .  Note that this also brings over all of your zfs 
based sharing configuration (sharenfs & sharesmb) so your clients are back 
online with a minimum of fuss.

No zfs send/recv required in this scenario. Note that there are no dependencies 
between the boot pool and the storage pool.  No timeline matching to worry 
about. Think of data backup and boot volume backup as two entirely distinct 
operations to manage.

In a worst case, ie, you lost the whole machine, you have a boot key and you've 
bought new disks.  The boot process is still the same with no tapes or files 
involved. In this case you'll need to create a new zpool from your new disks 
and restore the data.  The restore process depends on your backup process.  If 
you're using amanda or bacula, you create new zfs filesystems and restore to 
them as per the tool in question.  If you've ignored the current advice and are 
using zfs send streams to tape, you'll start with your baseline tape file and 
pipe the file to zfs recv and the name of the destination filesystem you want 
to create. And pray that there are no errors reading from the tape.

If you're using zfs send/recv to some other kind of external storage like USB 
drives, you just plug them in, zpool import and be back in business right away 
with the option to do a send/recv to clone the filesystems to the new disks.

Or you can go the traditional route (no downtime for the backup process of the 
boot volume), the instructions at: 

 are quite detailed as to the process involved for both backing up to file and 
restoring.

Erik

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect

2010-03-10 Thread erik.ableson
I' ve found that the NFS host based settings required the FQDN, and that the 
reverse lookup must be available in your DNS.

Try "rw,root=host1.mydomain.net"

Cheers,

Erik
On 10 mars 2010, at 05:47, mingli wrote:

> And I update the sharenfs option with "rw,ro...@100.198.100.0/24", it works 
> fine, and the NFS client can do the write without error.
> 
> Thanks.
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can we get some documentation on iSCSI sharing after comstar took over?

2010-03-16 Thread erik.ableson

On 16 mars 2010, at 21:00, Marc Nicholas wrote:
> On Tue, Mar 16, 2010 at 3:16 PM, Svein Skogen  wrote:
> 
> > I'll write you a Perl script :)
> 
> I think there are ... several people that'd like a script that gave us
> back some of the ease of the old shareiscsi one-off, instead of having
> to spend time on copy-and-pasting GUIDs they have ... no real use for. ;)
> 
> 
> I'll try and knock something up in the next few days, then!

Try this :

http://www.infrageeks.com/groups/infrageeks/wiki/56503/zvol2iscsi.html

Cheers,

Erik

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread erik.ableson
An interesting thing I just noticed here testing out some Firewire drives with 
OpenSolaris. 

Setup :
OpenSolaris 2009.06 and a dev version (snv_129)
2 500Gb Firewire 400 drives with integrated hubs for daisy-chaining (net: 4 
devices on the chain)
- one SATA bridge
- one PATA bridge

Created a zpool with both drives as simple vdevs
Started a zfs send/recv to backup a local filesystem

Watching zpool iostat I see that the total throughput maxes out at about 
10MB/s.  Thinking that one of the drives may be at fault, I stopped, destroyed 
the pool and created two separate pools from each drive. Restarting the 
send/recv to one disk and saw the same max throughput.  Tried to the other and 
got the same thing.

Then I started one send/recv to one disk, got the max right away, and started 
and send/recv to the second one and got about 4MB/second while the first 
operation dropped to about 6MB/second.

It would appear that the bus bandwidth is limited to about 10MB/sec (~80Mbps) 
which is well below the theoretical 400Mbps that 1394 is supposed to be able to 
handle.  I know that these two disks can go significantly higher since I was 
seeing 30MB/sec when they were used on Macs previously in the same daisy-chain 
configuration.

I get the same symptoms on both the 2009.06 and the b129 machines.

It's not a critical issue to me since these drives will eventually just be used 
for send/recv backups over a slow link, but it doesn't augur well for the day I 
need to restore data...

Anyone else seen this behaviour with Firewire devices and OpenSolaris?

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread erik.ableson
On 18 mars 2010, at 16:58, David Dyer-Bennet wrote:
> On Thu, March 18, 2010 04:50, erik.ableson wrote:
> 
>> It would appear that the bus bandwidth is limited to about 10MB/sec
>> (~80Mbps) which is well below the theoretical 400Mbps that 1394 is
>> supposed to be able to handle.  I know that these two disks can go
>> significantly higher since I was seeing 30MB/sec when they were used on
>> Macs previously in the same daisy-chain configuration.
>> 
>> I get the same symptoms on both the 2009.06 and the b129 machines.
> 
> While it wasn't on Solaris, I must say that I've been consistently
> disappointed by the performance of external 1394 drives on various Linux
> boxes.  I invested in the interface cards for the boxes, and in the
> external drives that supported Firewire, because everything said it
> performed much better for disk IO, but in fact I  have never found it to
> be the case.
> 
> Sort-of-glad to hear I don't have to wonder if I should be trying it on
> Solaris.

Ditto on the Linux front.  I was hoping that Solaris would be the exception, 
but no luck.  I wonder if Apple wouldn't mind lending one of the driver 
engineers to OpenSolaris for a few months...

Hmmm - that makes me wonder about the Darwin drivers - they're open sourced if 
I remember correctly.

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread erik.ableson
On 18 mars 2010, at 15:51, Damon Atkins wrote:

> A system with 100TB of data its 80% full and the a user ask their local 
> system admin to restore a  directory with large files, as it was 30days ago 
> with all Windows/CIFS ACLS and NFSv4/ACLS etc.
> 
> If we used zfs send, we need to go back to a zfs send some 30days ago, and 
> find 80TB of disk space to be able to restore it.
> 
> zfs send/recv is great for copy zfs from one zfs file system to another file 
> system even across servers. 

Bingo ! The zfs send/recv scenario is for backup to another site or server.  
Backup in this context being a second copy stored independently from the 
original/master.

In one scenario here, we have individual sites that have zvol backed iSCSI 
volumes based on small, high performance 15K disks in mirror vdevs for the best 
performance.  I only keep about a week of daily snapshots locally.  I use ZFS 
send/recv to a backup system where I have lots of cheap, slow SATA drives in 
RAIDZ6 where I can afford to accumulate a lot more historical snapshots.

The interest is that you can use the same tools in an asymmetric manner, with 
high performance primary systems and one or a few big slow systems to store 
your backups.

Now for instances where I need to go back and get a file back off an NFS 
published filesystem, I can just go browse the .zfs/snapshot directory as 
required - or search for it or whatever I want. It's a live filesystem, not an 
inert object, dependent on external indices and hardware. I think that this is 
the fundamental disconnect in these discussions where people's ideas (or 
requirements) of what constitutes "a backup" are conflicting.

There are two major reasons and types of backups : one is to be able to 
minimize your downtime and get systems running again as quickly as possible. 
(the server's dead - make it come back!). The other is the ability to go back 
in time and rescue data that has become lost, corrupted or otherwise 
unavailable often with very granular requirements. (I need this particular 12K 
file from August 12, 2009) 

For my purposes, most of my backup strategies are oriented towards Business 
Uptime and minimal RTO. Given the data volume I work with using lots of virtual 
machines, tape is strictly an archival tool.  I just can't restore fast enough, 
and it introduces way to many mechanical dependencies into the process (well I 
could if I had an unlimited budget).  I can restart entire sites from a backup 
system by cloning a filesystem off a backup snapshot and presenting the volumes 
to the servers that need it. Granted, I won't have the performance of a primary 
site, but it will work and people can get work done. This responds to the first 
requirement of minimal downtime.

Going back in time is accomplished via lots of snapshots on the backup storage 
system. Which I can afford since I'm not using expensive disks here.

Then you move up the stack into the contents of the volumes and here's where 
you use your traditional backup tools to get data off the top of the stack - 
out of the OS that's handling the contents of the volume that understands it's 
particularities regarding ACLS and private volume formats like VMFS. 

zfs send/recv is for cloning data off the bottom of the stack without requiring 
the least bit of knowledge about what's happening on top. It's just like using 
any of the asynchronous replication tools that are used in SANs. And they make 
no bones about the fact that they are strictly a block-level thing and don't 
even ask them about the contents. At best, they will try to coordinate 
filesystem snapshots and quiescing operations with the block level snapshots.

Other backup tools take your data off the top of the stack in the context where 
it is used with a fuller understanding of the issues of stuff like ACLs.

When dealing with zvols, ZFS should have no responsibility in trying to 
understand what you do in there other than supplying the blocks.  VMFS, NTFS, 
btrfs, ext4, HFS+, XFS, JFS, ReiserFS and that's just the tip of the iceberg...

ZFS has muddied the waters by straddling the SAN and NAS worlds.

> But their needs to be a tool:
> * To restore an individual file or a zvol (with all ACLs/properties)
> * That allows backup vendors (which place backups on tape or disk or CD or 
> ..) build indexes of what is contain in the backup (e.g. filename, owner, 
> size modification dates, type (dir/file/etc) )
> *Stream output suitable for devices like tape drives.
> *Should be able to tell if the file is corrupted when being restored.
> *May support recovery of corrupt data blocks within the stream.
> *Preferable gnutar command-line compatible
> *That admins can use to backup and transfer a subset of files e.g user home 
> directory (which is not a file system) to another server or on to CD to be 
> sent to their new office location, or 

Highly incomplete and in no particular order :
Backup Exec
NetBackup
Bacula
Amanda/Zmanda
Retrospect
Avamar
Ar

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-19 Thread erik.ableson
On 19 mars 2010, at 17:11, Joerg Schilling wrote:

>> I'm curious, why isn't a 'zfs send' stream that is stored on a tape yet 
>> the implication is that a tar archive stored on a tape is considered a 
>> backup ?
> 
> You cannot get a single file out of the zfs send datastream.

zfs send is a block-level transaction with no filesystem dependencies - it 
could be transmitting a couple of blocks that represent a portion of a file, 
not necessarily an entire file.  And since it can also be used to host a zvol 
with any filesystem format imaginable it doesn't want to know.

Going back to star as an example - from the man page :

"Star archives and extracts multiple files to and from a single file called a 
tarfile. A tarfile is usually a magnetic tape, but it can be any file. In all 
cases, appearance of a directory name refers to the files and (recursively) 
subdirectories of that directory."

This process pulls files (repeat: files! not blocks) off of the top of a 
filesystem so it needs to be presented a filesystem with interpretable file 
objects (like almost all backup tools). ZFS confuses the issue by integrating 
volume management with filesystem management. zfs send is dealing with the 
volume and the blocks that represent the volume without any file-level 
dependence.

It addresses an entirely different type of backup need, that is to be able to 
restore or mirror (especially mirror to another live storage system) an entire 
volume at a point in time.  It does not replace the requirement for file-level 
backups which deal with a different level of granularity. Simply because the 
restore use-case is different.

For example, on my Mac servers, I run two different backup strategies 
concurrently - one is bootable clone from which I can restart the computer 
immediately in the case of a drive failure.  At the same time, I use the Time 
Machine backups for file level granularity that allows me to easily find a 
particular file at a particular moment. Before Time Machine, this role was 
fulfilled with Retrospect to a tape drive.  However, a block-level dump to tape 
had little interest in the first use case since the objective is to minimize 
the RTO.

For disaster recovery purposes any of these backup objects can be externalized. 
Offsite rotation of the disks used allow the management of the RPO. 

Remember that files exist in a filesystem context and need to be backed up in 
this context.  Volumes exist in another context and can be replicated/backed up 
in this context.

zfs send/recv =  EMC MirrorView, NetApp Snap Mirror, EqualLogic 
Auto-replication, HP StorageWorks Continuous Access, DataCore AIM, etc.
zfs send/recv ≠ star, Backup Exec, CommVault, ufsdump, bacula, zmanda, 
Retrospect, etc.

Erik

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-09 Thread erik.ableson
No idea about the build quality, but is this the sort of thing you're looking 
for?

Not cheap, integrated RAID (sigh), but one cable only
http://www.pc-pitstop.com/das/fit-500.asp

Cheap, simple, 4 eSATA connections on one box
http://www.pc-pitstop.com/sata_enclosures/scsat4eb.asp

Still cheap, uses 4x SFF-8470 for a single cable connection
http://www.pc-pitstop.com/sata_enclosures/scsat44xb.asp

Slightly more expensive, but integrated port multiplier means one standard 
eSATA cable required
http://www.pc-pitstop.com/sata_port_multipliers/scsat05b.asp

On 9 avr. 2010, at 15:14, Edward Ned Harvey wrote:

> I am now in the market to try and identify any *well made* external
> enclosures.  The best I've seen so far is the Dell RD1000, but we're talking
> crazy overpriced, and hard drives that are too small to be useful to me.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore

2010-04-30 Thread erik.ableson

On 30 avr. 2010, at 13:47, Euan Thoms wrote:

> Well I'm so impressed with zfs at the moment! I just got steps 5 and 6 (form 
> my last post) to work, and it works well. Not only does it send the increment 
> over to the backup drive, the latest increment/snapshot appears in the 
> mounted filesystem. In nautilus I can browse an exact copy of my PC, from / 
> to the deepest parts of my home folder. And it will backup my entire system 
> in 1-2 minutes, AMAZING!!
> 
> Below are the steps, try it for yourself on a spare USB HDD:
> 
> # Create backup storage pool on drive c12t0d0
> pfexec zpool create backup-pool c12t0d0
> # Recursively snapshot the root pool (rpool)
> pfexec zfs snapshot -r rp...@first
> 
> # Send the entire pool in all it's snapshots to the backup pool, disable 
> mounting
> pfexec zfs send rp...@first | pfexec zfs receive -u backup-pool/rpool
> [snip ]pfexec zfs send rpool/export/home/euan/vbox-...@first | pfexec zfs 
> receive -u backup-pool/rpool/export/home/euan/VBOX-HDD
> 
> # Take second snapshot at a later point in time
> pfexec zfs snapshot -r rp...@second
> 
> # Send the increments to the backup pool
> pfexec zfs send -i rpool/r...@first rpool/r...@second | pfexec zfs recv -F 
> backup-pool/rpool/ROOT
> [snip

> ]pfexec zfs send -i rpool/export/home/euan/downlo...@first 
> rpool/export/home/euan/downlo...@second | pfexec zfs recv -F 
> backup-pool/rpool/export/home/euan/Downloads

Just a quick comment for the send/recv operations, adding -R makes it recursive 
so you only need one line to send the rpool and all descendant filesystems. 

I use the send/recv operations for all sorts of backup operations. For the 
equivalent of a "full backup" of my boot volumes :
NOW=`date +%Y-%m-%d_%H-%M-%S`
pfexec /usr/sbin/zfs snapshot -r rp...@$now
pfexec /usr/sbin/zfs send –R rp...@now | /usr/bin/gzip > 
/mnt/backups/rpool.$NOW.zip
pfexec /usr/sbin/zfs destroy -r rp...@$now

But for any incremental transfers it's better to recv to an actual filesystem 
that you can scrub and confirm that the stream made it over OK.

Cheers,

Erik___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Tips for ZFS tuning for NFS store of VM images

2010-07-29 Thread erik.ableson
Hmmm, that's odd. I have a number of VMs running on NFS (hosted on ESX, rather 
than Xen) with no problems at all. I did add a SLOG device to get performance 
up to a reasonable level, but it's been running flawlessly for a few months 
now. Previously I was using iSCSI for most of the connections, but with the 
addition of the SLOG device NFS has become feasible.

All I'm using on the OSOL side is sharenfs=anon=0 and adding the server's 
addresses to /etc/hosts to permit access. Running osol2009.06.

Cheers,

Erik

On 28 juil. 2010, at 21:11, sol wrote:

> Richard Elling wrote:
>> Gregory Gee wrote:
>>> I am using OpenSolaris to host VM images over NFS for XenServer.  I'm 
>>> looking 
>> for tips on what parameters can be set to help optimize my ZFS pool that 
>> holds 
>> my VM images.
>> There is nothing special about tuning for VMs, the normal NFS tuning applies.
> 
> 
> That's not been my experience. Out of the box VMware server would not work 
> with 
> the VMs stored on a zfs pool via NFS. I've not yet found out why but the 
> analytics showed millions of getattr/access/lookup compared to read/write.
> 
> A partial workaround was to turn off access time on the share and to mount 
> with 
> noatime,actimeo=60
> 
> But that's not perfect because when left along the VM got into a "stuck" 
> state. 
> I've never seen that state before when the VM was hosted on a local disk. 
> Hosting VMs on NFS is not working well so far...
> 
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-30 Thread erik.ableson

Heh :-)  Disk usage is directly related to available space.

At home I have a 4x1Tb raidz filled to overflowing with music, photos,  
movies, archives, and backups for 4 other machines in the house. I'll  
be adding another 4 and an SSD shortly.


It starts with importing CDs into iTunes or WMP, then comes the TV  
recordings, then comes ripping your DVD collection... Hey disk is  
cheap, right?


Once you have gotten out  of the habit of using shiny discs for music,  
video is a logical progression. You also stop being finicky about  
minimizing file space - I've gone from high quality mp3 to lossless  
formats.


I also have some colleagues that have Flip Mimos and equivalents that  
capture 720p video and that just chews through disk space. Those 12Mb  
shots of baby taking his/her first steps are now multi gigabyte raw  
video files.


Trust me, it's easy.

Erik

On 30 sept. 2009, at 16:48, David Dyer-Bennet wrote:

I can see that people heavily active in live audio or (especially)  
video
recording would fill disks considerably faster than my still  
photography
does (about 12MB per image, before I start editing it and storing  
extra
copies).  But I have to say that I'm finding the size NAS boxes  
people are
building for what they call "home use" to be rather startling.  I'm  
using
4 400GB disks with 100% redundancy; lots of people are talking about  
using
8 or more 1TB or bigger disks with 25% redundancy.  That's a hugely  
bigger

pool!  Do you actually fill up that space?  With what?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Incremental snapshot size

2009-09-30 Thread erik.ableson
Depending on the data content that you're dealing you can compress the  
snapshots inline with the send/receive operations by piping the data  
through gzip.  Given that we've been talking about 500Mb text files,  
this seems to be a very likely solution. There was some mention in the  
Kernel Keynote in Australia of inline deduplication, ie  
compression :-) in the zfs send stream. But there remains the question  
of references to deduplicated blocks that no longer exist on the  
destination.


Noting that ZFS deduplication will eventually help in diminishing the  
overall volume you have to treat since that while the output of the  
text editor will be to different physical blocks, many of these blocks  
will be identical to previously stored blocks (which will also be kept  
since they exist in snapshots) so that the send/receive operations  
will consist of a lot more block references rather than complete blocks.


Erik

PS - this is pretty much the operational mode of all products that use  
snapshots.  It's even worse on a lot of other storage systems where  
the snapshot content must be written to a specific reserved volume  
(which is often very small compared to the main data store) rather  
than the host pool. Until deduplication becomes the standard method of  
managing blocks, the volume of data required by this use case will not  
change.


On 30 sept. 2009, at 16:35, Brian Hubbleday wrote:

I took binary dumps of the snapshots taken in between the edits and  
this showed that there was actually very little change in the block  
structure, however the incremental snapshots were very large. So the  
conclusion I draw from this is that the snapshot simply contains  
every written block since the last snapshot regardless of whether  
the data in the block has changed or not.


Okay so snapshots work this way, I'm simply suggesting that things  
could be better.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD value [was SSD over 10gbe not any faster than 10K SAS over GigE]

2009-10-13 Thread erik.ableson

On 13 oct. 2009, at 15:24, Derek Anderson wrote:

Simple answer:  Man hour math.  I have 150 virtual machines on these  
disks for shared storage.  They hold no actual data so who really  
cares if they get lost.  However 150 users of these virtual machines  
will save 5 minutes or so every day of work, which translates to  
$250.   So $3,000 in SSD's which are easily replaced one by one with  
zfs saves the company $250,000 in labor.  So when I replace these  
drives in 6 months, for somewhere around $1500 its a fantastic deal.


Overall, I think this is a reasonable model for the medium sized  
enterprise to work with. As in most cases the mythical 5 minutes saved  
with be invisible to the overall operations, and difficult to justify  
to management, but if you can squeeze it into an annual operating  
budget rather than a capital expense that requires separate  
justification you should be good.


The only bad part is I cannot estimate how much of the old disks  
have life is left because in a few months, I am going to have a  
handful of the fastest SSD's around and not sure if I would trust  
them for much of anything.


As for what to do with the SSDs - you can resell them or give them to  
employees (being clear on their usage and provenance) since they  
represent a risk in a high volume enterprise environment, but could   
probably supply several years worth of service in a single-user mode.  
I'd be very happy to get a top of the line SSD at half price for my  
laptop for a year's projected use...knowing of course that I backup  
daily as a matter of religious observance :-)


Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dumb idea?

2009-10-26 Thread erik.ableson
Or in OS X with smart folders where you define a set of search terms  
and as write operations occur on the known filesystems the folder  
contents will be updated to reflect the current state of the attached  
filesystems


The structures you defined seemed to be designed around the idea of  
reductionism (ie - subfolders representing a subset of the parent)  
which cannot currently be implemented in Libraries or Smart folders  
since the contents are read-only listings.  I don't know for sure  
about the Win7 Libraries behaviour though - it might be more  
permissive in this respect...


Erik

On 25 oct. 2009, at 20:48, j...@lentecs.com wrote:

This actually sounds a little like what ms is trying to accomplish,  
in win7, with libraries.  They will act as standard folders if you  
treat them as such.  But they are really designed to group different  
pools of files into one easy place.  You just have to configure it  
to pull from local and remote sources.  I have heard it works well  
with win home server, and win7 networks.


Its also similar to what google and the like are doing with their  
web crawlers.


But I think this is something better left to run on top of the file  
system.  Rather than integrated into the file system.  A true  
database and "crawling bot" would seem to be the better method of  
implementing this.


--Original Message--
From: Orvar Korvar
Sender: zfs-discuss-boun...@opensolaris.org
To: zfs Discuss
Subject: [zfs-discuss] Dumb idea?
Sent: Oct 24, 2009 8:12 AM

Would this be possible to implement ontop ZFS? Maybe it is a dumb  
idea, I dont know. What do you think, and how to improve this?


Assume all files are put in the zpool, helter skelter. And then you  
can create arbitrary different filters that shows you the files you  
want to see.


As of now, you have files in one directory structure. This makes the  
organization of the files, hardcoded. You have /Movies/Action and  
that is it. But if you had all movies in one large zpool, and if you  
could programmatically define different structures that act as  
filters, you could have different directory structures.


Programmatically defined directory structure1, that acts on the zpool:
/Movies/Action

Programmatically defined directory structure2:
/Movies/Actors/AlPacino

etc.

Maybe this is what MS WinFS was about? Maybe tag the files? Maybe a  
relational database ontop ZFS? Maybe no directories at all? I dont  
know, just brain storming. Is this is a dumb idea? Or old idea?

--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Sent from my BlackBerry® smartphone with SprintSpeed
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Dedup memory overhead

2010-01-21 Thread erik.ableson
Hi all,

I'm going to be trying out some tests using b130 for dedup on a server with 
about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks).  What 
I'm trying to get a handle on is how to estimate the memory overhead required 
for dedup on that amount of storage.  From what I gather, the dedup hash keys 
are held in ARC and L2ARC and as such are in competition for the available 
memory.

So the question is how much memory or L2ARC would be necessary to ensure that 
I'm never going back to disk to read out the hash keys. Better yet would be 
some kind of algorithm for calculating the overhead. eg - averaged block size 
of 4K = a hash key for every 4k stored and a hash occupies 256 bits. An 
associated question is then how does the ARC handle competition between hash 
keys and regular ARC functions?

Based on these estimations, I think that I should be able to calculate the 
following:
1,7 TB
1740,8  GB
1782579,2   MB
1825361100,8KB
4   average block size
456340275,2 blocks
256 hash key size-bits
1,16823E+11 hash key overhead - bits
1460206,4   hash key size-bytes
14260633,6  hash key size-KB
13926,4 hash key size-MB
13,6hash key overhead-GB

Of course the big question on this will be the average block size - or better 
yet - to be able to analyze an existing datastore to see just how many blocks 
it uses and what is the current distribution of different block sizes. I'm 
currently playing around with zdb with mixed success  on extracting this kind 
of data. That's also a worst case scenario since it's counting really small 
blocks and using 100% of available storage - highly unlikely. 

# zdb -ddbb siovale/iphone
Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects

ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, 
flags 0x0

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  57.0K64K   77.34  DMU dnode
 1116K 1K  1.50K 1K  100.00  ZFS master node
 2116K512  1.50K512  100.00  ZFS delete queue
 3216K16K  18.0K32K  100.00  ZFS directory
 4316K   128K   408M   408M  100.00  ZFS plain file
 5116K16K  3.00K16K  100.00  FUID table
 6116K 4K  4.50K 4K  100.00  ZFS plain file
 7116K  6.50K  6.50K  6.50K  100.00  ZFS plain file
 8316K   128K   952M   952M  100.00  ZFS plain file
 9316K   128K   912M   912M  100.00  ZFS plain file
10316K   128K   695M   695M  100.00  ZFS plain file
11316K   128K   914M   914M  100.00  ZFS plain file
 
Now, if I'm understanding this output properly, object 4 is composed of 128KB 
blocks with a total size of 408MB, meaning that it uses 3264 blocks.  Can 
someone confirm (or correct) that assumption? Also, I note that each object  
(as far as my limited testing has shown) has a single block size with no 
internal variation.

Interestingly, all of my zvols seem to use fixed size blocks - that is, there 
is no variation in the block sizes - they're all the size defined on creation 
with no dynamic block sizes being used. I previously thought that the -b option 
set the maximum size, rather than fixing all blocks.  Learned something today 
:-)

# zdb -ddbb siovale/testvol
Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  21.0K16K6.25  DMU dnode
 1116K64K  064K0.00  zvol object
 2116K512  1.50K512  100.00  zvol prop

# zdb -ddbb siovale/tm-media
Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects

ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, 
flags 0x0

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  21.0K16K6.25  DMU dnode
 1516K 8K   240G   250G   97.33  zvol object
 2116K512  1.50K512  100.00  zvol prop

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup memory overhead

2010-01-22 Thread erik.ableson

On 21 janv. 2010, at 22:55, Daniel Carosone wrote:

> On Thu, Jan 21, 2010 at 05:04:51PM +0100, erik.ableson wrote:
> 
>> What I'm trying to get a handle on is how to estimate the memory
>> overhead required for dedup on that amount of storage.   
> 
> We'd all appreciate better visibility of this. This requires:
> - time and observation and experience, and
> - better observability tools and (probably) data exposed for them

I'd guess that since every written block is going to go and ask for the hash 
keys, this should result in this data living in the ARC based on the MFU 
ruleset.  The theory being that as a result if I can determine the maximum 
memory requirement for these keys, I know what my minimum memory baseline 
requirements will be to guarantee that I won't be caught short.

>> So the question is how much memory or L2ARC would be necessary to
>> ensure that I'm never going back to disk to read out the hash keys. 
> 
> I think that's a wrong-goal for optimisation.
> 
> For performance (rather than space) issues, I look at dedup as simply
> increasing the size of the working set, with a goal of reducing the
> amount of IO (avoided duplicate writes) in return.

True.  but as a practical aspect, we've seen that overall performance drops off 
the cliff if you overstep your memory bounds and the system is obliged to go to 
disk to evaluate a new block to write against the hash keys. Compounded by the 
fact that the ARC is full so it's obliged to go straight to disk, further 
exacerbating the problem.

It's this particular scenario that I'm trying to avoid and from a business 
aspect of selling ZFS based solutions (whether to a client or to an internal 
project) we need to be able to ensure that the performance is predictable with 
no surprises.

Realizing of course that all of this is based on a slew of uncontrollable 
variables (size of the working set, IO profiles, ideal block sizes, etc.).  The 
empirical approach of "give it lots and we'll see if we need to add an L2ARC 
later" is not really viable for many managers (despite the fact that the real 
world works like this).

> The trouble is that the hash function produces (we can assume) random
> hits across the DDT, so the working set depends on the amount of
> data and the rate of potentially dedupable writes as well as the
> actual dedup hit ratio.  A high rate of writes also means a large
> amount of data in ARC waiting to be written at the same time. This
> makes analysis very hard (and pushes you very fast towards that very
> steep cliff, as we've all seen). 

I don't think  it would be random since _any_ write operation on a deduplicated 
filesystem would require a hash check, forcing them to live in the MFU.  
However I agree that a high write rate would result in memory pressure on the 
ARC which could result in the eviction of the hash keys. So the next factor to 
include in memory sizing is the maximum write rate (determined by IO 
availability). So with a team of two GbE cards, I could conservatively say that 
I need to size for inbound write IO of 160MB/s, worst case accumulated for the 
30 second flush cycle so, say about 5GB of memory (leaving aside ZIL issues 
etc.). Noting that this is all very back of the napkin estimations, and I also 
need to have some idea of what my physical storage is capable of ingesting 
which could add to this value.

> I also think a threshold on the size of blocks to try deduping would
> help.  If I only dedup blocks (say) 64k and larger, i might well get
> most of the space benefit for much less overhead.

Well - since my primary use case is iSCSI presentation to VMware backed by 
zvols and I can manually force the block size on volume creation to 64, this 
reduces the unpredictability a little bit. That's based on the hypothesis that 
zvols use a fixed block size.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2.5" JBOD

2010-01-25 Thread erik.ableson
On 24 janv. 2010, at 08:36, Erik Trimble wrote:

> These days, I've switched to 2.5" SATA laptop drives for large-storage 
> requirements.
> They're going to cost more $/GB than 3.5" drives, but they're still not 
> horrible ($100 for a 500GB/7200rpm Seagate Momentus).  They're also easier to 
> cram large numbers of them in smaller spaces, so it's easier to get larger 
> number of spindles in the same case. Not to mention being lower-power than 
> equivalent 3.5" drives.

Ditto.  After doing a quick check of the power consumption of various drives 
it's clear that 2.5" drives are significantly less power-hungry, and with 500GB 
drives it's entirely reasonable for many workloads as far as capacity 
requirements go. Even with 5400RPM mechanisms, it's more than enough for most 
home server IOPS requirements, especially if you throw a few more axes at the 
server.

> My sole problem is finding well-constructed high-density 2.5" hot-swap 
> bay/chassis setups. 
> If anyone has a good recommendation for a 1U or 2U JBOD chassis for 2.5" 
> drives, that would really be helpful.

Not cheap, but I've used the HP MSA70 a while ago and was quite happy with the 
results 
()

And Dell has recently joined the crowd with the MD1120.  I've used the MD1000 
enclosures with 3.5" drives in many installations.
()

Although both of those models talk about supporting Nearline SATA so I don't 
know if they'll take a regular off the shelf SATA laptop drive.

Outside of that range, I've recently been looking at rebuilding my home storage 
server with a full-sized tower (something like: 
) 
and filling in front facing bays with multiple SuperMicro 8in2 chassis 
() 
which have include an 2x expander and are can be cascaded internally, so you 
should be able to add modules as capacity requirement grow.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy hangs machine if snapshot exists- workaround found

2010-01-27 Thread erik.ableson

On 27 janv. 2010, at 12:10, Georg S. Duck wrote:

> Hi,
> I was suffering for weeks from the following problem:
> a zfs dataset contained an automatic snapshot (monthly) that used 2.8 TB of 
> data. The dataset was deprecated, so I chose to destroy it after I had 
> deleted some files; eventually it was completely blank besides the snapshot 
> that still locked 2.8 TB on the pool.
> 
> 'zfs destroy -r pool/dataset'
> 
> hung the machine within seconds to be completely unresponsive. No respective 
> messages could be found in logs. The issue was reproducible.
> The same happened for 
> 'zfs destroy pool/data...@snapshot'
> 
> Thus, the conclusion was that the snapshot was indeed the problem. 

For info, I have exactly the same situation here with a snapshot that cannot be 
deleted that results in the same symptoms.  Total freeze, even on the console.  
Server responds to pings, but that's it. All iSCSI, NFS and ssh connections are 
cut. Currently running b130.

I'll try the workaround once I get some spare space to migrate the contents.

Erik


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mac OS X clients with ZFS server

2010-09-16 Thread erik.ableson

On 15 sept. 2010, at 22:04, Mike Mackovitch wrote:

> On Wed, Sep 15, 2010 at 12:08:20PM -0700, Nabil wrote:
>> any resolution to this issue?  I'm experiencing the same annoying
>> lockd thing with mac osx 10.6 clients.  I am at pool ver 14, fs ver
>> 3.  Would somehow going back to the earlier 8/2 setup make things
>> better?
> 
> As noted in the earlier thread, the "annoying lockd thing" is not a
> ZFS issue, but rather a networking issue.
> 
> FWIW, I never saw a resolution.  But the suggestions for how to debug
> situations like this still stand:

And for reference, I have a number of 10.6 clients using NFS for sharing Fusion 
virtual machines, iTunes library, iPhoto libraries etc. without any issues.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Slow zfs import solved (beware iDRAC/ILO)

2010-10-13 Thread erik.ableson
Just a note to pass on in case anyone runs into the same situation.

I have a DELL R510 that is running just fine, up until the day that I needed to 
import a pool from a USB hard drive. I plug in the disk, check it with rmformat 
and try to import the zpool.  And it sits there for practically forever, not 
responding. The machine still responds to network connections etc., it's just 
the import command that takes forever.

After poking around with truss zpool import, I discovered that there were some 
devices that were taking forever to check/enumerate and that they didn't 
correspond to any devices I recognized. Finally I tweaked to the following:

from cfgadm -lv
Mfg: Avocent  Product: USB Composite Device-0  NConfigs: 1  Config: 0  

Which is the supplier of the DELL components for the iDRAC management tools.

It turns out that the Virtual Media options were set to "Attach" in the BIOS so 
on a zpool import it was trying to read from these devices and taking between 
3-10 minutes to timeout per slice/partition to check.

Setting the Virtual Media to "Detach" and the OS will no longer see these 
devices and zpool importing works just fine.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-14 Thread erik.ableson

On 13 oct. 2010, at 18:37, Marty Scholes wrote:

> The only thing that still stands out is that network operations (iSCSI and 
> NFS) to external drives are slow, correct?
> 
> Just for completeness, what happens if you scp a file to the three different 
> pools?  If the results are the same as NFS and iSCSI, then I think the 
> network can be ruled out.
> 
> I would be leaning toward thinking there is some mismatch between the network 
> protocols and the external controllers/cables/arrays.

Sounding more and more like a networking issue - are the network cards set up 
in an aggregate? I had some similar issues on GbE where there was a mismatch 
between the aggregate settings on the switches and the LACP settings on the 
server. Basically the network was wasting a ton of time trying to renegotiate 
the LACP settings and slowing everything down.

Ditto for the Linux networking - single port or aggregated dual port?

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread erik.ableson
On 15 oct. 2010, at 22:19, Ian D wrote:

> A little setback  We found out that we also have the issue with the Dell 
> H800 controllers, not just the LSI 9200-16e.  With the Dell it's initially 
> faster as we benefit from the cache, but after a little while it goes sour- 
> from 350MB/sec down to less than 40MB/sec.  We've also tried with a LSI 
> 9200-8e with the same results.
> 
> So to recap...  No matter what HBA we use, copying through the network 
> to/from the external drives is painfully slow when access is done through 
> either NFS or iSCSI.  HOWEVER, it is plenty fast when we do a scp where the 
> data is written to the external drives (or internal ones for that matter) 
> when they are seen by the Nexenta box as local drives- ie when neither NFS or 
> iSCSI are involved.  

Sounds an awful lot like client side issues coupled possibly with networking 
problems.

Have you looked into disabling the Nagle algorithm on the client side? That's 
something that can impact both iSCSI and NFS badly, but ssh is usually not as 
affected... I vaguely remember that being a real performance killer on some 
Linux versions.

Another thing to check would be ensure that noatime is set so that your reads 
aren't triggering writes across the network as well.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Newbie question : snapshots, replication and recovering failure of Site B

2010-11-01 Thread erik.ableson

On 26 oct. 2010, at 16:21, Matthieu Fecteau wrote:

> Hi,
> 
> I'm planning to use the replication scripts on that page :
> http://www.infrageeks.com/groups/infrageeks/wiki/8fb35/zfs_autoreplicate_script.html
> 
> It uses the timeslider (other way possible) to take snapshots, uses zfs 
> send/receive to replicate and another script for cleaning up the old 
> snapshots.
> 
> My question : in the event that there's no more common snapshot between Site 
> A and Site B, how can we replicate again ? (example : Site B has a power 
> failure and then Site A cleanup his snapshots before Site B is brought back, 
> so that there's no more common snapshots between the sites).
> 
> I'm thinking of using OpenSolaris for my 30TB storage (replicated to another 
> 30TB). If a situation like this happens, will I need to erase eveything in 
> Site B, and start all over again ?  Or is there another more efficient 
> (faster) way ?  How ?

That's the risk of using Time Slider managing your snapshot deletion...

But there are a few ways around this, the first being to make sure that you 
avoid using volatile snapshots for replication (filter on daily, weekly or 
monthly). And the smart people developing ZFS noted this as an issue and in the 
newer builds (I don't remember in which one this showed up) you can put a hold 
on a snapshot so that in order to delete it you explicitly must remove the 
hold. More details on p203 of the ZFS Admin Guide 2010.01. 
(http://dlc.sun.com/pdf/817-2271/817-2271.pdf)

Or you can roll your own snapshot schedule based on your specific requirements. 
There are a couple of other scripts on the page that you can use in your own 
scripts to handle creation and cleanup of snapshots. I use an hourly schedule 
on some systems, daily on others and weekly for specific off-site backup 
replication. It all depends on the environment. In all cases, if you run into a 
serious issue where one site is going to be offline for an extended period, 
you'll want to stop your snapshot cleanup routine. The hiccup is that if you're 
using Time Slider, the same process manages creation and deletion so you stop 
taking snapshots when you disable Time Slider. Given the amount of data you're 
looking at I would seriously consider writing your own snapshot taking/deleting 
scripts so that you can have a little more control over them.

That said, I seem to recall reading that Time Slider was going to build in the 
send/recv functions as an option, but I never looked into that any further.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread erik.ableson

On 19 nov. 2010, at 03:53, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> 
>> SAS Controller
>> and all ZFS Disks/ Pools are passed-through to Nexenta to have full
> ZFS-Disk
>> control like on real hardware. 
> 
> This is precisely the thing I'm interested in.  How do you do that?  On my
> ESXi (test) server, I have a solaris ZFS VM.  When I configure it... and add
> disk ... my options are (a) create a new virtual disk (b) use an existing
> virtual disk, or (c) (grayed out) raw device mapping.  There is a comment
> "Give your virtual machine direct access to a SAN."  So I guess it only is
> available if you have some iscsi target available...
> 
> But you seem to be saying ... don't add the disks individually to the ZFS
> VM.  You seem to be saying...  Ensure the bulk storage is on a separate
> sas/scsi/sata controller from the ESXi OS...  And then add the sas/scsi/sata
> PCI device to the guest, which will implicitly get all of the disks.  Right?
> 
> Or maybe ... the disks have to be scsi (sas)?  And then you can add the scsi
> device directly pass-thru?

As mentioned by Will, you'll need to use the VMDirectPath which allows you to 
map a hardware device (the disk controller) directly to the VM without passing 
through the VMware managed storage stack. Note that you are presenting the 
hardware directly so it needs to be a compatible controller.

You'll need two controllers in the server since ESXi needs at least one disk 
that it controls to be formatted a VMFS to hold some of its files as well as 
the .vmx configuration files for the VM that will host the storage (and the 
swap file so it's got to be at least as large as the memory you plan to assign 
to the VM). Caveats - while you can install ESXi onto a USB drive, you can't 
manually format a USB drive as VMFS so for best performance you'll want at 
least one SATA or SAS controller that you can leave controlled by ESXi and the 
second controller where the bulk of the storage is attached for the ZFS VM.

As far as the eggs in one basket issue goes, you can either use a clustering 
solution like the Nexenta HA between two servers and then you have a highly 
available storage solution based on two servers that can also run your VMs or 
for a more manual failover, just use zfs send|recv to replicate the data.

You can also accomplish something similar if you have only the one controller 
by manually created local Raw Device Maps of the local disks and presenting 
them individually to the ZFS VM but you don't have direct access to the 
controller so I don't think stuff like blinking a drive will work in this 
configuration since you're not talking directly to the hardware. There's no UI 
for creating RDMs for local drives, but there's a good procedure over at 
 which explains the technique.

>From a performance standpoint it works really well - I have NFS hosted VMs in 
>this configuration getting 396Mo/s throughput on simple dd tests backed by 10 
>zfs mirrored disks, all protected with hourly send|recv to a second box.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-19 Thread erik.ableson

On 19 nov. 2010, at 15:04, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Günther
>> 
>>   Disabling the ZIL (Don't)  
> 
> This is relative.  There are indeed situations where it's acceptable to
> disable ZIL.  To make your choice, you need to understand a few things...
> 
> #1  In the event of an ungraceful reboot, with your ZIL disabled, after
> reboot, your filesystem will be in a valid state, which is not the latest
> point of time before the crash.  Your filesystem will be valid, but you will
> lose up to 30 seconds of the latest writes leading up to the crash.
> #2  Even if you have ZIL enabled, all of the above statements still apply to
> async writes.  The ZIL only provides nonvolatile storage for sync writes.
> 
> Given these facts, it quickly becomes much less scary to disable the ZIL,
> depending on what you use your server for.

Not to mention that in this particular scenario (local storage, local VM, 
loopback to ESXi) where the NFS server is only publishing to the local host, if 
the local host crashes, there are no other NFS clients involved that have local 
caches that will be out of sync with the storage.

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-12-09 Thread erik.ableson

On 9 déc. 2010, at 13:41, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
>> 
>> Also, if you have a NFS datastore, which is not available at the time of
> ESX
>> bootup, then the NFS datastore doesn't come online, and there seems to be
>> no
>> way of telling ESXi to make it come online later.  So you can't auto-boot
>> any guest, which is itself stored inside another guest.
> 
> Someone just told me about
>   esxcfg-nas -r
> So yes, it is possible to make ESX remount the NFS datastore in order to
> boot the other VM's.  The end result should be something which is faster
> than 1G ether, but not as fast as IB, FC, or 10G.

I've got a similar setup running here - with the Nexenta VM set to auto-start, 
you have to wait a bit for the VM to startup until the NFS datastores become 
available, but the actual mount operation from the ESXi side is automatic. I 
suppose that if you played with the startup delays between virtual machines you 
could get everything to start unattended once you know how long it takes for 
the NFS stores to become available.

Combined with send/recv to another box it's an affordable disaster recovery 
solution. And to squeeze every bit of performance out of the configuration, you 
can use VMDirectPath to present the HBA to your storage VM (just remember to 
add another card to boot ESXi or store a VMFS volume for vmx and swap files.

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] sharenfs settings ignored

2009-04-17 Thread erik.ableson

Hi there,

I'm working on a new OS 2008.11 setup here and running into a few  
issues with the nfs integration.  Notably, it appears that subnet  
values attributed to sharenfs are ignored and gives back a permission  
denied for all connection attempts. I have another environment where  
permission is assigned by FQDN which works fine, but I don't want to  
have to manage individual connections for server farms.


Currently the server is running in a dedicated subnet  
(192.168.100.0/24) and the machines that will require access are  
running in two other subnets (192.168.0.0/24 & 192.168.254.0/24-ESX).   
The client machines are ESX Server, Mac OS X, & Linux.  From what I've  
been able to gather, I should be able to set specific permissions in  
CIDR syntax with the @ prefix) in the sharenfs value.  I've tried a  
dozen different variants with no success.


The one that I think should work is :

sharenfs=...@192.168.0.0/24:@192.168.254.0/24,ro...@192.168.254.0/24

giving access to the client machines as well as giving root access to  
the ESX servers.  Every connection attempt returns permission denied  
to the client.  Trying with just a single subnet returns the same error.


sharenfs=...@192.168.254.0/24,ro...@192.168.254.0/24

I've tried all of the following variants (and many others) with no  
success :


sharenfs=on
sharenfs=rw
sharenfs=rw,anon=0
sharenfs=...@192.168.0.0/16

I did check tp make sure that the nfs server is running,  :-)

Everything looks fine from the sharemgr perspective:
sharemgr show -vx zfs


  
changed="true">

  
  


  
  shareopts-nfs="sec=sys,r...@192.168.0.0/24:@192.168.254.0/24,ro...@192.168.254.0 
/24"/>


  


From the client side of the house it looks fine:
showmount -e 192.168.100.113
Exports list on 192.168.100.113:
/n01p01/nfs01  @192.168.254.0/24 @192.168.0.0/24

Time to file a bug report? Or is there already one for this issue?  
Searching "nfs subnet" on defect.opensolaris.org returns nothing.


Any ideas appreciated,

Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Update: sharenfs settings ignored

2009-04-20 Thread erik.ableson
Crossposted since I think there may be zfs folks that are new to using  
the direct integration with nfs and may be confounded as I was.


The problem as outlined exists as long as the client machine is not  
referenced in any kind of name resolution service.  It turns out that  
if I can do a reverse lookup from the DNS server for the client IP  
address nfs connections are permitted, or if the IP address is listed  
in /etc/hosts.  it doesn't matter what name you give it, just that the  
address resolves to a name and it will permit access.


Can anyone explain this behaviour?  It's manageable as long as you  
know this is the case, but it strikes me as a non obvious dependency  
since the subnet declaration should be sufficient to permit access (or  
at least it would appear to be the case from reading the documentation).


Cheers,

Erik

Begin forwarded message:


From: "erik.ableson" 
Date: 17 avril 2009 13:15:21 HAEC
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] sharenfs settings ignored

Hi there,

I'm working on a new OS 2008.11 setup here and running into a few  
issues with the nfs integration.  Notably, it appears that subnet  
values attributed to sharenfs are ignored and gives back a  
permission denied for all connection attempts. I have another  
environment where permission is assigned by FQDN which works fine,  
but I don't want to have to manage individual connections for server  
farms.


Currently the server is running in a dedicated subnet  
(192.168.100.0/24) and the machines that will require access are  
running in two other subnets (192.168.0.0/24 & 192.168.254.0/24- 
ESX).  The client machines are ESX Server, Mac OS X, & Linux.  From  
what I've been able to gather, I should be able to set specific  
permissions in CIDR syntax with the @ prefix) in the sharenfs  
value.  I've tried a dozen different variants with no success.


The one that I think should work is :

sharenfs=...@192.168.0.0/24:@192.168.254.0/24,ro...@192.168.254.0/24

giving access to the client machines as well as giving root access  
to the ESX servers.  Every connection attempt returns permission  
denied to the client.  Trying with just a single subnet returns the  
same error.


sharenfs=...@192.168.254.0/24,ro...@192.168.254.0/24

I've tried all of the following variants (and many others) with no  
success :


sharenfs=on
sharenfs=rw
sharenfs=rw,anon=0
sharenfs=...@192.168.0.0/16

I did check tp make sure that the nfs server is running,  :-)

Everything looks fine from the sharemgr perspective:
sharemgr show -vx zfs


 
   changed="true">

 
 
   
   
 
 shareopts-nfs="sec=sys,r...@192.168.0.0/24:@192.168.254.0/24,ro...@192.168.254.0 
/24"/>

   
 


From the client side of the house it looks fine:
showmount -e 192.168.100.113
Exports list on 192.168.100.113:
/n01p01/nfs01  @192.168.254.0/24 @192.168.0.0/24

Time to file a bug report? Or is there already one for this issue?  
Searching "nfs subnet" on defect.opensolaris.org returns nothing.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS 15K drives as L2ARC

2009-05-06 Thread erik.ableson


On 7 mai 09, at 04:03, Adam Leventhal wrote:

After all this discussion, I am not sure if anyone adequately  
answered the
original poster's question as to whether at 2540 with SAS 15K  
drives would
provide substantial synchronous write throughput improvement when  
used as

a L2ARC device.


I was under the impression that the L2ARC was to speed up reads, as  
it
allows things to be cached on something faster than disks (usually  
MLC
SSDs). Offloading the ZIL is what handles synchronous writes, isn't  
it?


How would adding an L2ARC speed up writes?


You're absolutely right. The L2ARC is for accelerating reads only  
and will

not affect write performance.


With the small caveat that if the bulk of your read traffic is being  
served by the L2ARC, that means there is much less contention for  
access to the slower physical disks freeing them up for write  
activity. No speed increase in the technical sense over and above the  
capabilities of the disks, but it should have an impact in real world  
IO activity.


Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] surprisingly poor performance

2009-07-03 Thread erik.ableson
This is something that I've run into as well across various installs  
very similar to the one described (PE2950 backed by an MD1000).  I  
find that overall the write performance across NFS is absolutely  
horrible on 2008.11 and 2009.06.  Worse, I use iSCSI under 2008.11 and  
it's just fine with near wire speeds in most cases, but under 2009.06  
I can't even format a VMFS volume from ESX without hitting a timeout.   
Throughput over the iSCSI connection is mostly around 64K/s with 1  
operation per second.


I'm downgrading my new server back to 2008.11 until I can find a way  
to ensure decent performance since this is really a showstopper. But  
in the meantime I've completely given up on NFS as a primary data  
store - strictly used for templates and iso images and stuff which I  
copy up via scp since it's literally 10 times faster than over NFS.


I have a 2008.11 OpenSolaris server with an MD1000 using 7 mirror  
vdevs. The networking is 4 GbE split into two trunked connections.


Locally, I get 460 MB/s write and 1 GB/s read so raw disk performance  
is not a problem. When I use iSCSI I get wire speed in both directions  
on the GbE from ESX and other clients. However when I use NFS, write  
performance is limited to about 2 MB/s. Read performance is close to  
wire speed.


I'm using a pretty vanilla configuration, using only atime=off and  
sharenfs=anon=0.


I've looked at various tuning guides for NFS with and without ZFS but  
I haven't found anything that seems to address this type of issue.


Anyone have some tuning tips for this issue? Other than adding an SSD  
as a write log or disabling the ZIL.. (although from James' experience  
this too seems to have a limited impact).


Cheers,

Erik
On 3 juil. 09, at 08:39, James Lever wrote:

While this was running, I was looking at the output of zpool iostat  
fastdata 10 to see how it was going and was surprised to see the  
seemingly low IOPS.


jam...@scalzi:~$ zpool iostat fastdata 10
  capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
fastdata10.0G  2.02T  0312268  3.89M
fastdata10.0G  2.02T  0818  0  3.20M
fastdata10.0G  2.02T  0811  0  3.17M
fastdata10.0G  2.02T  0860  0  3.27M

Strangely, when I added a second SSD as a second slog, it made no  
difference to the write operations.


I'm not sure where to go from here, these results are appalling  
(about 3x the time of the old system with 8x 10kRPM spindles) even  
with two Enterprise SSDs as separate log devices.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS, ZFS & ESX

2009-07-07 Thread erik.ableson
OK - I'm at my wit's end here as I've looked everywhere to find some  
means of tuning NFS performance with ESX into returning something  
acceptable using osol 2008.11.  I've eliminated everything but the NFS  
portion of the equation and am looking for some pointers in the right  
direction.


Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a  
zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla  
install across the board, no additional software other than the  
Adaptec StorMan to manage the disks.


local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the  
Service Console, transfer of a 8Gb file via the datastore browser)


I just found the tool latencytop which points the finger at the ZIL  
(tip of the hat to Lejun Zhu).  Ref:  & .  Log file: 


Now I can understand that there is a performance hit associated with  
this feature of ZFS for ensuring data integrity, but this drastic a  
difference makes no sense whatsoever. The pool is capable of handling  
natively (at worst) 120*7 IOPS and I'm not even seeing enough to  
saturate a USB thumb drive. This still doesn't answer why the read  
performance is so bad either.  According to latencytop, the culprit  
would be genunix`cv_timedwait_sig rpcmod`svc


From my searching it appears that there's no async setting for the  
osol nfsd, and ESX does not offer any mount controls to force an async  
connection.  Other than putting in an SSD as a ZIL (which still  
strikes me as overkill for basic NFS services) I'm looking for any  
information that can bring me up to at least reasonable throughput.


Would a dedicated 15K SAS drive help the situation by moving the ZIL  
traffic off to a dedicated device? Significantly? This is the sort of  
thing that I don't want to do without some reasonable assurance that  
it will help since you can't remove a ZIL device from a pool at the  
moment.


Hints and tips appreciated,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [nfs-discuss] NFS, ZFS & ESX

2009-07-08 Thread erik.ableson

Comments in line.

On 7 juil. 09, at 19:36, Dai Ngo wrote:

Without any tuning, the default TCP window size and send buffer size  
for NFS
connections is around 48KB which is not very optimal for bulk  
transfer. However

the 1.4MB/s write seems to indicate something else is seriously wrong.


My sentiment as well.

iSCSI performance was good, so the network connection seems to be OK  
(assuming

it's 1GbE).


Yup - I'm running at wire speed on the iSCSI connections.


What is your mount options look like?


Unfortunately, ESX doesn't give any controls over mount options

I don't know what datastore browser does for copying file, but have  
you tried

the vanilla 'cp' command?


The datastore browser copy command is just a wrapper for cp from what  
I can gather. All types of copy operations to the NFS volume, even  
from other machines top out at this speed.  The NFS/iSCSI connections  
are in a separate physical network so I can't easily plug anything  
into it for testing other mount options from another machine or OS.  
I'll try from another VM to see if I can't force a mount with the  
async option to see if that helps any.


You can also try NFS performance using tmpfs, instead of ZFS, to  
make sure

NIC, protocol stack, NFS are not the culprit.


From what I can observe, it appears that the sync commands issues  
over the NFS stack are slowing down the process, even with a  
reasonable number of disks in the pool.


What I was hoping for was the same behavior (albeit slightly risky) of  
having writes cached to RAM and then dumped out in an optimal manner  
to disk, as per the local behavior where you see the flush to disk  
operations happening on a regular cycle. I think that this would be  
doable with an async mount, but I can't set this on the server side  
where it would be used by the servers automatically.


Erik


erik.ableson wrote:
OK - I'm at my wit's end here as I've looked everywhere to find  
some means of tuning NFS performance with ESX into returning  
something acceptable using osol 2008.11.  I've eliminated  
everything but the NFS portion of the equation and am looking for  
some pointers in the right direction.


Configuration: PE2950 bi pro Xeon, 32Gb RAM with an MD1000 using a  
zpool of 7 mirror vdevs. ESX 3.5 and 4.0.  Pretty much a vanilla  
install across the board, no additional software other than the  
Adaptec StorMan to manage the disks.


local performance via dd - 463MB/s write, 1GB/s read (8Gb file)
iSCSI performance - 90MB/s write, 120MB/s read (800Mb file from a VM)
NFS performance - 1.4MB/s write, 20MB/s read (800Mb file from the  
Service Console, transfer of a 8Gb file via the datastore browser)


I just found the tool latencytop which points the finger at the ZIL  
(tip of the hat to Lejun Zhu).  Ref: <http://www.infrageeks.com/zfs/nfsd.png 
> & <http://www.infrageeks.com/zfs/fsflush.png>.  Log file: <http://www.infrageeks.com/zfs/latencytop.log 
>


Now I can understand that there is a performance hit associated  
with this feature of ZFS for ensuring data integrity, but this  
drastic a difference makes no sense whatsoever. The pool is capable  
of handling natively (at worst) 120*7 IOPS and I'm not even seeing  
enough to saturate a USB thumb drive. This still doesn't answer why  
the read performance is so bad either.  According to latencytop,  
the culprit would be genunix`cv_timedwait_sig rpcmod`svc


From my searching it appears that there's no async setting for the  
osol nfsd, and ESX does not offer any mount controls to force an  
async connection.  Other than putting in an SSD as a ZIL (which  
still strikes me as overkill for basic NFS services) I'm looking  
for any information that can bring me up to at least reasonable  
throughput.


Would a dedicated 15K SAS drive help the situation by moving the  
ZIL traffic off to a dedicated device? Significantly? This is the  
sort of thing that I don't want to do without some reasonable  
assurance that it will help since you can't remove a ZIL device  
from a pool at the moment.


Hints and tips appreciated,

Erik
___
nfs-discuss mailing list
nfs-disc...@opensolaris.org




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with setting up ZFS

2009-07-27 Thread erik.ableson
The zfs send command generates a differential file between the two  
selected snapshots so you can send that to anything you'd like.  The  
catch of course is that then you have a collection of files on your  
Linux box that are pretty much useless since your can't mount them or  
read the contents in any meaningful way.  If you're running a Linux  
server as the destination the easiest solution is to create a virtual  
machine running the same revision of OpenSolaris as the server and use  
that as a destination.


It doesn't necessarily need a publicly exposed IP address - you can  
get the source to send the differential file to the Linux box and then  
have the VM "import" the file using a recv command to integrate the  
contents into a local ZFS filesystem. I think that VirtualBox lets you  
access shared folders so you could write a script to check for new  
files and then use the recv command to process them. The trick as  
always for this kind of thing is determining that the file is complete  
before attempting to import it.


There's some good examples in the ZFS Administration Guide (p187) for  
handling remote transfers.

zfs send tank/ci...@today | ssh newsys zfs recv sandbox/res...@today

For a staged approach you could pipe the output to a compressed file  
and send that over to the Linux box.


Combined with a key exchange between the two systems you don't need to  
keep passwords in your scripts either.


Cheers,

Erik

On 27 juil. 09, at 11:15, Brian wrote:

The ZFS send/receive command can presumably only send the filesystem  
to another OpenSolaris OS right?  Is there anyone way to send it to  
a normal Linux distribution (ext3)?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-04 Thread erik.ableson
You're running into the same problem I had with 2009.06 as they have  
"corrected" a bug where the iSCSI target prior to 2009.06 didn't honor  
completely SCSI sync commands issued by the initiator.


Some background :

Discussion:
http://opensolaris.org/jive/thread.jspa?messageID=388492

"corrected bug"
http://bugs.opensolaris.org/view_bug.do?bug_id=6770534

The upshot is that unless you have an SSD (or other high speed  
dedicated device) attached as a ZIL (or slog) on 2009.06 you won't see  
anywhere near the local speed performance that the storage is capable  
of since you're forcing individual transactions all the way down to  
disk and back up before moving onto the next SCSI block command.


This iSCSI performance profile is currently specific to 2009.06 and  
does not occur on 2008.11.  As a stopgap (since I don't have a budget  
for SSDs right now) I'm keeping my production servers on 2008.11  
(taking into account the additional potential risk, but these are  
machines with battery backed SAS cards in a conditioned data center).  
These machines are serving up iSCSI to ESX 3.5 and ESX 4 servers.


For my freewheeling home use where everything gets tried, crashed,  
patched and put back together with baling twine (and is backed up  
elsewhere...) I've mounted a RAM disk of 1Gb which is attached to the  
pool as a ZIL and you see the performance run in cycles where the ZIL  
loads up to saturation, flushes out to disk and keeps going. I did  
write a script to regularly dd the ram disk device out to a file so  
that I can recreate with the appropriate signature if I have to reboot  
the osol box. This is used with the GlobalSAN initiator on OS X as  
well as various Windows and Linux machines, physical and VM.


Assuming this is a test system that you're playing with and you can  
destroy the pool with inpunity, and you don't have an SSD lying around  
to test with, try the following :


ramdiskadm -a slog 2g (or whatever size you can manage reasonably with  
the available physical RAM - try "vmstat 1 2" to determine available  
memory)

zpool add  log /dev/ramdisk/slog

If you want to perhaps reuse the slog later (ram disks are not  
preserved over reboot) write the slog volume out to disk and dump it  
back in after restarting.

 dd if=/dev/ramdisk/slog of=/root/slog.dd

All of the above assumes that you are not doing this stuff against  
rpool.  I think that attaching a volatile log device to your boot pool  
would result in a machine that can't mount the root zfs volume.


It's easiest to monitor from the Mac (I find) so try your test again  
with the Activity Monitor showing network traffic and you'll see that  
it goes to a wire speed ceiling while it's filling up the ZIL and once  
it's saturated your traffic will drop to near nothing, and then pick  
up again after a few seconds. If you don't saturate the ZIL you'll see  
continuous speed data transfer.


Cheers,

Erik

On 4 août 09, at 15:57, Charles Baker wrote:


My testing has shown some serious problems with the
iSCSI implementation for OpenSolaris.

I setup a VMware vSphere 4 box with RAID 10
direct-attached storage and 3 virtual machines:
- OpenSolaris 2009.06 (snv_111b) running 64-bit
- CentOS 5.3 x64 (ran yum update)
- Ubuntu Server 9.04 x64 (ran apt-get upgrade)

I gave each virtual 2 GB of RAM, a 32 GB drive and
setup a 16 GB iSCSI target on each (the two Linux vms
used iSCSI Enterprise Target 0.4.16 with blockio).
VMware Tools was installed on each. No tuning was
done on any of the operating systems.

I ran two tests for write performance - one one the
server itself and one from my Mac connected via
Gigabit (mtu of 1500) iSCSI connection using
globalSAN’s latest initiator.

Here’s what I used on the servers:
time dd if=/dev/zero of=/root/testfile bs=1048576k
count=4
and the Mac OS with the iSCSI connected drive
(formatted with GPT / Mac OS Extended journaled):
time dd if=/dev/zero of=/Volumes/test/testfile
bs=1048576k count=4

The results were very interesting (all calculations
using 1 MB = 1,084,756 bytes)

For OpenSolaris, the local write performance averaged
86 MB/s. I turned on lzjb compression for rpool (zfs
set compression=lzjb rpool) and it went up to 414
MB/s since I’m writing zeros). The average
performance via iSCSI was an abysmal 16 MB/s (even
with compression turned on - with it off, 13 MB/s).

For CentOS (ext3), local write performance averaged
141 MB/s. iSCSI performance was 78 MB/s (almost as
fast as local ZFS performance on the OpenSolaris
server when compression was turned off).

Ubuntu Server (ext4) had 150 MB/s for the local
write. iSCSI performance averaged 80 MB/s.

One of the main differences between the three virtual
machines was that the iSCSI target on the Linux
machines used partitions with no file system. On
OpenSolaris, the iSCSI target created sits on top of
ZFS. That creates a lot of overhead (although you do
get some great features).

Since all the virtual machines were connected to the
same switc

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-07 Thread erik.ableson

On 7 août 09, at 02:03, Stephen Green wrote:

I used a 2GB ram disk (the machine has 12GB of RAM) and this jumped  
the backup up to somewhere between 18-40MB/s, which means that I'm  
only a couple of hours away from finishing my backup.  This is, as  
far as I can tell, magic (since I started this message nearly 10GB  
of data have been transferred, when it took from 6am this morning to  
get to 20GB.)


It transfer speed drops like crazy when the write to disk happens,  
but it jumps right back up afterwards.


If you want to perhaps reuse the slog later (ram disks are not  
preserved over reboot) write the slog volume out to disk and dump  
it back in after restarting.

dd if=/dev/ramdisk/slog of=/root/slog.dd


Now my only question is:  what do I do when it's done?  If I reboot  
and the ram disk disappears, will my tank be dead? Or will it just  
continue without the slog?  I realize that I'm probably totally  
boned if the system crashes, so I'm copying off the stuff that I  
really care about to another pool (the Mac's already been backed up  
to a USB drive.)


Have I meddled in the affairs of wizards?  Is ZFS subtle and quick  
to anger?


You have a number of options to preserve the current state of affairs  
and be able to reboot the OpenSolaris server if required.


The absolute safest bet would be the following, but the resilvering  
will take a while before you'll be able to shutdown:


create a file of the same size of the ramdisk on the rpool volume
replace the ramdisk slog with the 2G file (zpool replace  / 
dev/ramdisk/slog /root/slogtemp)

wait for the resilver/replacement operation to run its course
reboot
create a new ramdisk (same size, as always)
replace the file slog with the newly created ramdisk

If your machine reboots unexpectedly things are a little dicier, but  
you should still be able to get things back online.  If you did a dump  
of the ramdisk via dd to a file it should contain the correct  
signature and be recognized by ZFS.  Now there will be no guarantees  
to the state of the data since if there was anything actively used on  
the ramdisk when it stopped you'll lose data and I'm not sure how the  
pool will deal with this.  But in a pinch, you should be able to  
either replace the missing ramdisk device with the dd file copy of the  
ramdisk (make a copy first, just in case) or mount a new ramdisk, and  
dd the contents of the file back to the device and then import the pool.


Cheers,

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss