Re: [zfs-discuss] Recovering ZFS stops after syseventconfd can't fork

2009-12-22 Thread Anton B. Rang
Something over 8000 sounds vaguely like the default maximum process count. What does 'ulimit -a' show? I don't know why you're seeing so many zfsdle processes, though — sounds like a bug to me. -- This message posted from opensolaris.org ___ zfs-disc

Re: [zfs-discuss] zfs usedbysnapshots is not equial to "zfs list -t snapshot" for a files

2009-11-04 Thread Anton B. Rang
I believe that space shared between multiple snapshots is not assigned to any of the snapshots. So if you have a 100 GB file and take two snapshots, then delete it, the space used won't show up in the snapshot list, but will show up in the 'usedbysnapshots' property. -- This message posted fro

Re: [zfs-discuss] Books on File Systems and File System Programming

2009-08-16 Thread Anton B. Rang
There aren't many good books on file system design. The "VAX/VMS Internals and Data Structures" book by Goldenberg covers a fair amount of the RMS file system design along with its rationale. There is also a "VMS File System Internals" book which I haven't yet read. Apple's early Inside Macinto

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-10 Thread Anton B. Rang
> Fsck can only repair known faults; known > discrepancies in the meta data. > Since ZFS doesn't have such known discrepancies, > there's nothing to repair. I'm rather tired of hearing this mantra. If ZFS detects an error in part of its data structures, then there is clearly something to repair.

Re: [zfs-discuss] Problems with '..' on ZFS pool

2009-01-29 Thread Anton B. Rang
That bug has been in Solaris forever. :-( -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] write cache and cache flush

2009-01-29 Thread Anton B. Rang
If all write caches are truly disabled, then disabling the cache flush won't affect the safety of your data. It will change your performance characteristics, almost certainly for the worse. -- This message posted from opensolaris.org ___ zfs-discuss ma

Re: [zfs-discuss] Problem with snapshot

2009-01-29 Thread Anton B. Rang
Snapshots are not on a per-pool basis but a per-file-system basis. Thus, when you took a snapshot of "testpol", you didn't actually snapshot the pool; rather, you took a snapshot of the top level file system (which has an implicit name matching that of the pool). Thus, you haven't actually aff

Re: [zfs-discuss] zfs null pointer deref,

2009-01-20 Thread Anton B. Rang
Sigh. Richard points out in private email that automatic savecore functionality is disabled in OpenSolaris; you need to manually set up a dump device and save core files if you want them. However, the stack may be sufficient to ID the bug. -- This message posted from opensolaris.org

Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Anton B. Rang
>The user DEFINITELY isn't expecting 5 bytes, or what you meant to say >5000 >bytes, they're expecting 500GB. You know, 536,870,912,000 bytes. But even if >the drive mfg's >calculated it correctly, they wouldn't even be getting that due to filesystem >overhead. I doubt there

Re: [zfs-discuss] zfs null pointer deref,

2009-01-19 Thread Anton B. Rang
If you've got enough space on /var, and you had a dump partition configured, you should find a bunch of "vmcore.[n]" files in /var/crash by now. The system normally dumps the kernel core into the dump partition (which can be the swap partition) and then copies it into /var/crash on the next suc

Re: [zfs-discuss] zfs panic

2009-01-19 Thread Anton B. Rang
Looks like a corrupted pool -- you appear to have a mirror block pointer with no valid children. From the dump, you could probably determine which file is bad, but I doubt you could delete it; you might need to recreate your pool. -- This message posted from opensolaris.org _

Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread Anton B. Rang
For SCSI disks (including FC), you would use the FUA bit on the read command. For SATA disks ... does anyone care? ;-) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mail

Re: [zfs-discuss] ZFS and aging

2008-12-15 Thread Anton B. Rang
Typically you want to do something like this: Write 1,000,000 files of varying length. Randomly select and remove 500,000 of those files. Repeat (a) creating files, and (b) randomly removing files, until your file system is full enough for your test, or you run out of time. That's a pretty

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-13 Thread Anton B. Rang
Some RAID systems compare checksums on reads, though this is usually only for RAID-4 configurations (e.g. DataDirect) because of the performance hit otherwise. End-to-end checksums are not yet common. The SCSI committee recently ratified T10 DIF, which allows either an operating system or appli

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-13 Thread Anton B. Rang
I wasn't joking, though as is well known, the plural of anecdote is not data. Both UFS and ZFS, in common with all file system, have design flaws and bugs. To lose an entire UFS file system (barring the loss of the entire underlying storage) requires a great deal of corruption; there are multipl

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-10 Thread Anton B. Rang
>It sounds like you have access to a source of information that the >rest of us don't have access to. I think if you read the archives of this mailing list, and compare it to the discussions on the other Solaris mailing lists re UFS, it's a reasonable conclusion. -- This message posted from op

Re: [zfs-discuss] ZPool and Filesystem Sizing - Best Practices?

2008-11-26 Thread Anton B. Rang
>If there is a zfs implementation bug it could perhaps be more risky >to have five pools rather than one. Kind of goes both ways. You're perhaps 5 times as likely to wind up with a damaged pool, but if that ever happens, there's only 1/5 as much data to restore. -- This message posted from ope

Re: [zfs-discuss] Lost space in empty pool (no snapshots)

2008-11-11 Thread Anton B. Rang
The "deferred free" indicates that these blocks are supposed to be freed at a future time. A quick glance at the code would seem to indicate that this is supposed to happen when the next transaction group is pushed. Apparently it's not happening on your system ... presumably a bug. -- This mess

[zfs-discuss] SATA/SAS (Re: Quantifying ZFS reliability)

2008-10-05 Thread Anton B. Rang
Erik: > > (2) a SAS drive has better throughput and IOPs than a SATA drive Richard: > Disagree. We proved that the transport layer protocol has no bearing > on throughput or iops. Several vendors offer drives which are > identical in all respects except for transport layer protocol: SAS or > SA

Re: [zfs-discuss] Terabyte scrub

2008-09-04 Thread Anton B. Rang
If you're using a mirror, and each disk manages 50 MB/second (unlikely if it's a single disk doing a lot of seeks, but you might do better using a hardware array for each half of the mirror), simple math says that scanning 1 TB would take roughly 20,000 seconds, or 5 hours. So your speed under S

Re: [zfs-discuss] pulling disks was: ZFS hangs/freezes after disk

2008-08-28 Thread Anton B. Rang
Many mid-range/high-end RAID controllers work by having a small timeout on individual disk I/O operations. If the disk doesn't respond quickly, they'll issue an I/O to the redundant disk(s) to get the data back to the host in a reasonable time. Often they'll change parameters on the disk to limi

Re: [zfs-discuss] pulling disks was: ZFS hangs/freezes after disk

2008-08-28 Thread Anton B. Rang
Many mid-range/high-end RAID controllers work by having a small timeout on individual disk I/O operations. If the disk doesn't respond quickly, they'll issue an I/O to the redundant disk(s) to get the data back to the host in a reasonable time. Often they'll change parameters on the disk to limi

Re: [zfs-discuss] corrupt zfs stream? "checksum mismatch"

2008-08-13 Thread Anton B. Rang
There is an explicit check in ZFS for the checksum, as you deduced. I suspect that by disabling this check you could recover much, if not all, of your data. You could probably do this with mdb by 'simply' writing a NOP over the branch in dmu_recv_stream. It appears that 'zfs send' was designed

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-12 Thread Anton B. Rang
Reed-Solomon could correct multiple-bit errors, but an effective Reed-Solomon code for 128K blocks of data would be very slow if implemented in software (and, for that matter, take a lot of hardware to implement). A multi-bit Hamming code would be simpler, but I suspect that undetected multi-bit

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)

2008-08-11 Thread Anton B. Rang
That brings up another interesting idea. ZFS currently uses a 128-bit checksum for blocks of up to 1048576 bits. If 20-odd bits of that were a Hamming code, you'd have something slightly stronger than SECDED, and ZFS could correct any single-bit errors encountered. This could be done without ch

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Anton B. Rang
> How would you describe the difference between the file system > checking utility and zpool scrub? Is zpool scrub lacking in its > verification of the data? To answer the second question first, yes, zpool scrub is lacking, at least to the best of my knowledge (I haven't looked at the ZFS source

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Anton B. Rang
> As others have explained, if ZFS does not have a > config with data redundancy - there is not much that > can be learned - except that it "just broke". Plenty can be learned by just looking at the pool. Unfortunately ZFS currently doesn't have tools which make that easy; as I understand it, zdb

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Anton B. Rang
> From the ZFS Administration Guide, Chapter 11, Data Repair section: > Given that the fsck utility is designed to repair known pathologies > specific to individual file systems, writing such a utility for a file > system with no known pathologies is impossible. That's a fallacy (and is incorrect

Re: [zfs-discuss] kernel panic - was it zfs related?

2008-07-15 Thread Anton B. Rang
The stack trace makes it clear that it was ZFS that crashed. (The _cmntrap stack frame indicates that a trap happened; in this case it's an access to bad memory by the kernel. The previous stack frame indicates that ZFS was active.) Now, it may not have been ZFS which caused the panic -- another

Re: [zfs-discuss] Raid-Z with N^2+1 disks

2008-07-15 Thread Anton B. Rang
One nit ... the parity computation is 'in the noise' as far as the CPU goes, but it tends to flush the CPU caches (or rather, replace useful cached data with parity), which affects application performance. Modern CPU architectures (including x86/SPARC) provide instructions which allow data to b

Re: [zfs-discuss] Cannot share RW, "Permission Denied" with sharenfs in ZFS

2008-07-15 Thread Anton B. Rang
My first hunch would be to unmount the tank pool from /tank, and check the permissions of the /tank directory. You'll see behavior like this if the directory on which an NFS-exported file system will be mounted is not world-readable before the mount. This message posted from opensolaris.org

Re: [zfs-discuss] confusion and frustration with zpool

2008-07-09 Thread Anton B. Rang
Also worth noting is that the "enterprise-class" drives have protection from heavy load that the "consumer-class" drives don't. In particular, there's no temperature sensor on the voice coil for the consumer drives, which means that under heavy seek load (constant i/o), the drive will eventually

Re: [zfs-discuss] getting inodeno for zfs from vnode in vfs kernel

2008-06-22 Thread Anton B. Rang
If you really need the inode number, you should use the semi-public interface to retrieve it and call VOP_GETATTR. This is what the rest of the kernel does when it needs attributes of a vnode. See for example http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/syscall/stat

Re: [zfs-discuss] Repairing known bad disk blocks before zfs

2008-04-16 Thread Anton B. Rang
If you don't do background scrubbing, you don't know about bad blocks in advance. If you're running RAID-Z, this means you'll lose data if a block is unreadable and another device goes bad. This is the point of scrubbing, it lets you repair the problem while you still have redundancy. :-) Wheth

Re: [zfs-discuss] ZFS and multipath with iSCSI

2008-04-06 Thread Anton B. Rang
> ZFS will handle out of order writes due to it > transactional > nature. Individual writes can be re-ordered safely. > When the transaction > commits it will wait for all writes and flush them; > then write a > new uberblock with the new transaction group number > and flush that. Sure -- the ques

Re: [zfs-discuss] (no subject)

2008-03-14 Thread Anton B. Rang
> But each shows a Max_Payload_Size of 128 bytes in both the DevCap and > DevCtl registers. Clearly they are accepting 256-byte payloads, else I > wouldn't notice the big perf improvement when reading data from the > disks. Right -- you'd see errors instead. > Could it be possible that (1) an err

[zfs-discuss] Max_Payload_Size (was Re: 7-disk raidz achieves 430 MB/s reads and...)

2008-03-13 Thread Anton B. Rang
Be careful of changing the Max_Payload_Size parameter. It needs to match, and be supported, between all PCI-E components which might communicate with each other. You can tell what values are supported by reading the Device Capabilities Register and checking the Max_Payload_Size Supported bits.

Re: [zfs-discuss] want to intercept all IO on ZFS file sytem

2008-03-12 Thread Anton B. Rang
> i want to intercept IO on ZFS at vnode layer, > i changed vnodeops pointer for zfs in vfs frame work > but i only get IO for creating new file but i dont > get for read,lookup,write,changing setattribute > etc. can somebody explain why ? Which vnodeops pointer did you change? ZFS, unlike most f

Re: [zfs-discuss] path-name encodings

2008-03-05 Thread Anton B. Rang
> > In general, they don't. Command-line utilities just use the sequence > > of bytes entered by the user. > > Obviously that depends on the application. A command-line utility that > interprets an normal xml file containing filenames know the characters > but not the bytes. The same goes for com

Re: [zfs-discuss] path-name encodings

2008-03-05 Thread Anton B. Rang
> Do you happen to know where programs in (Open)Solaris look when they > want to know how to encode text to be used in a filename? Is it > LC_CTYPE? In general, they don't. Command-line utilities just use the sequence of bytes entered by the user. GUI-based software does as well, but the encodin

Re: [zfs-discuss] path-name encodings

2008-02-28 Thread Anton B. Rang
> OK, thanks. I still haven't got any answer to my original question, > though. I.e., is there some way to know what text the > filename is, or do I have to make a more or less wild guess what > encoding the program that created the file used? You have to guess. As far as I know, Apple's HFS (and

Re: [zfs-discuss] The old problem with tar, zfs, nfs and zil

2008-02-25 Thread Anton B. Rang
For Linux NFS service, it's a option in /etc/exports. The default for "modern" (post-1.0.1) NFS utilities is "sync", which means that data and metadata will be written to the disk whenever NFS requires it (generally upon an NFS COMMIT operation). This is the same as Solaris with UFS, or with Z

Re: [zfs-discuss] 100% random writes coming out as 50/50 reads/writes

2008-02-14 Thread Anton B. Rang
> Create a pool [ ... ] > Write a 100GB file to the filesystem [ ... ] > Run I/O against that file, doing 100% random writes with an 8K block size. Did you set the record size of the filesystem to 8K? If not, each 8K write will first read 128K, then write 128K. Anton This message posted from

Re: [zfs-discuss] OpenSolaris, ZFS and Hardware RAID,

2008-02-10 Thread Anton B. Rang
> Careful here.  If your workload is unpredictable, RAID 6 (and RAID 5) > for that matter will break down under highly randomized write loads.  Oh? What precisely do you mean by "break down"? RAID 5's write performance is well-understood and it's used successfully in many installations for rand

Re: [zfs-discuss] ZFS+ config for 8 drives, mostly reads

2008-02-05 Thread Anton B. Rang
> -I'm under the impression that ZFS+(ZFS2) is similar > to RAID6, so for the initial 8x500GB two drives would > be sucked into parity so I'd have a 3TB volume with > the ability to lose two discs, no? "RAIDZ2" is the term you're looking for; and yes, you'd wind up with 3 TB of usable space. > -

Re: [zfs-discuss] Memory Corruption

2008-02-03 Thread Anton B. Rang
>is there anything beyond scrub that can verify the integrity of a pool? Probably not. scrub will, at least, traverse all of the metadata involved in the tree of blocks. I don't believe, however, that it checks integrity beyond that (e.g. of the space maps). If you know that your machine had

Re: [zfs-discuss] ZFS raidz small IO write performance compared to raid

2008-02-01 Thread Anton B. Rang
For small random I/O operations I would expect a substantial performance penalty for ZFS. The reason is that RAID-Z is more akin to RAID-3 than RAID-5; each read and write operation touches all of the drives. RAID-5 allows multiple I/O operations to proceed in parallel since each read and write

Re: [zfs-discuss] Block Pointer Rewrite status -also,

2008-01-29 Thread Anton B. Rang
To upgrade a zpool, use the 'zpool upgrade' command. Easy, isn't it? :-) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ATA UDMA data parity error

2008-01-17 Thread Anton B. Rang
Definitely a hardware problem (possibly compounded by a bug). Some key phrases and routines: ATA UDMA data parity error This one actually looks like a misnomer. At least, I'd normally expect "data parity error" not to crash the system! (It should result in a retry or EIO.) PCI(-X) Expre

Re: [zfs-discuss] zfs comparison

2008-01-17 Thread Anton B. Rang
> Pardon my ignorance, but is ZFS with compression safe to use in a > production environment? I'd say, as safe as ZFS in general. ZFS has been well-tested by Sun, but it's not as mature as UFS, say. There is not yet a fsck equivalent for ZFS, so if a bug results in damage to your ZFS data pool

Re: [zfs-discuss] hardware for zfs home storage

2008-01-14 Thread Anton B. Rang
OK, this isn't even vaguely ZFS-related, but at least with Mac OS X 10.5 and 10.5.1, be aware that network volumes are unsupported because they don't work right. :-) For instance, describes one case -- if the Samba destinati

Re: [zfs-discuss] copy on write related query

2008-01-06 Thread Anton B. Rang
> Does copy-on-write happen every time when any data block of ZFS is getting > modified? Yes. (Data block or meta-data block, with the sole exception of the set of überblocks.) > Also where exactly COWed data written I'm not quite sure what you're asking here. Data, whether newly written or

Re: [zfs-discuss] Nice chassis for ZFS server

2007-12-13 Thread Anton B. Rang
> I could use a little clarification on how these unrecoverable disk errors > behave -- or maybe a lot, depending on one's point of view. > > So, when one of these "once in around ten (or 100) terabytes read" events > occurs, my understanding is that a read error is returned by the drive, > and th

Re: [zfs-discuss] Backup in general (was "Does ZFS handle a SATA II '

2007-12-10 Thread Anton B. Rang
> Tape drives and tapes seem to be just too expensive. Am I out of date here? No, I don't think so. The problem is that the low-end tape market has mostly vanished as CDs/DVDs/disks get cheaper -- not that it should, because tape is much more reliable -- so the cost of entry is pretty high. I u

Re: [zfs-discuss] Yager on ZFS

2007-12-07 Thread Anton B. Rang
> NOTHING anton listed takes the place of ZFS That's not surprising, since I didn't list any file systems. Here's a few file systems, and some of their distinguishing features. None of them do exactly what ZFS does. ZFS doesn't do what they do, either. QFS: Very, very fast. Supports segregat

Re: [zfs-discuss] Yager on ZFS

2007-12-07 Thread Anton B. Rang
> There are a category of errors that are > not caused by firmware, or any type of software. The > hardware just doesn't write or read the correct bit value this time > around. With out a checksum there's no way for the firmware to know, and > next time it very well may write or read the correct b

Re: [zfs-discuss] Seperate ZIL

2007-12-07 Thread Anton B. Rang
> 10K RPM SCSI disks will get (best case) 350 to 400 IOPS. Remember, > the main issue with legacy SCSI is that (SCSI) commands are sent > 8-bits wide at 5Mbits/Sec - for backwards compatibility. This is true for really old SCSI configurations, but if you're buying a modern disk and controller

Re: [zfs-discuss] mixing raidz1 and raidz2 in same pool

2007-12-07 Thread Anton B. Rang
There won't be a performance hit beyond that of RAIDZ2 vs. RAIDZ. But you'll wind up with a pool with fundamentally single-disk-failure tolerance, so I'm not sure it's worth it (at least until there's a mechanism for replacing the remaining raidz1 vdevs with raidz2). This message posted from

Re: [zfs-discuss] Odd prioritisation issues.

2007-12-07 Thread Anton B. Rang
> I was under the impression that real-time processes essentially trump all > others, and I'm surprised by this behaviour; I had a dozen or so RT-processes > sat waiting for disc for about 20s. Process priorities on Solaris affect CPU scheduling, but not (currently) I/O scheduling nor memory usag

Re: [zfs-discuss] Yager on ZFS

2007-12-05 Thread Anton B. Rang
> what are you terming as "ZFS' incremental risk reduction"? I'm not Bill, but I'll try to explain. Compare a system using ZFS to one using another file system -- say, UFS, XFS, or ext3. Consider which situations may lead to data loss in each case, and the probability of each such situation.

Re: [zfs-discuss] ZFS write time performance question

2007-12-05 Thread Anton B. Rang
This might have been affected by the cache flush issue -- if the 3310 flushes its NVRAM cache to disk on SYNCHRONIZE CACHE commands, then ZFS is penalizing itself. I don't know whether the 3310 firmware has been updated to support the SYNC_NV bit. It wasn't obvious on Sun's site where to downl

Re: [zfs-discuss] zfs mirroring question

2007-12-05 Thread Anton B. Rang
The file systems are striped between the two mirrors. (If your disks are A, B, C, D then a single file's blocks would reside on disks A+B, then C+D, then A+B again.) If you lose A and B, or C and D, you lose the whole pool. (Hence if you have two power supplies, for instance, you'd probably w

Re: [zfs-discuss] I screwed up my zpool

2007-12-03 Thread Anton B. Rang
> 2007-11-07.14:15:19 zpool create -f tank raidz2 [ ... ] > 2007-12-03.14:42:28 zpool add tank c4t0d0 > > c4t0d0 is not part of raidz2. How can I fix this? Back up your data; destroy the pool; and re-create it. > Ideally I would like to create another zpool with c4t0d0 plus some more disks > sinc

Re: [zfs-discuss] why are these three ZFS caches using so much kmem?

2007-12-03 Thread Anton B. Rang
> Got an issue which is rather annoying to me - three of my > ZFS caches are regularly using nearly 1/2 of the 1.09Gb of > allocated kmem in my system. I think this is just the ARC; you can limit its size using: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC

Re: [zfs-discuss] ZFS metadata

2007-12-03 Thread Anton B. Rang
My guess is that you're looking for the on-disk format document: http://www.opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf If you're more interested in the user-visible metadata (e.g. the settings for pools and filesystems), then you'd probably have to dig through the various manu

Re: [zfs-discuss] Best stripe-size in array for ZFS mail storage?

2007-12-01 Thread Anton B. Rang
> That depends upon exactly what effect turning off the > ZFS cache-flush mechanism has. The only difference is that ZFS won't send a SYNCHRONIZE CACHE command at the end of a transaction group (or ZIL write). It doesn't change the actual read or write commands (which are always sent as ordinary

Re: [zfs-discuss] Problem with sharing multiple zfs file systems

2007-11-26 Thread Anton B. Rang
Given that it will be some time before NFSv4 support, let alone NFSv4 support for mount point crossing, in most client operating systems ... what obstacles are in the way of constructing an NFSv3 server which would 'do the right thing' transparently to clients so long as the file systems involve

Re: [zfs-discuss] ZFS snapshot GUI

2007-11-20 Thread Anton B. Rang
How does the ability to set a snapshot schedule for a particular *file* or *folder* interact with the fact that ZFS snapshots are on a per-filesystem basis? This seems a poor fit. If I choose to snapshot my "Important Documents" folder every 5 minutes, that's implicitly creating snapshots of m

Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and

2007-11-18 Thread Anton B. Rang
Hint: Bill was already writing file system code when I was in elementary school. ;-) Seriously...it's rather sad to see serious and useful discussions derailed by thin skins and zealotry. Bill's kind enough to share some of his real-world experience and observations in the old tradition of sen

Re: [zfs-discuss] read/write NFS block size and ZFS

2007-11-16 Thread Anton B. Rang
If you're running over NFS, the ZFS block size most likely won't have a measurable impact on your performance. Unless you've got multiple gigabit ethernet interfaces, the network will generally be the bottleneck rather than your disks, and NFS does enough caching at both client & server end to

[zfs-discuss] Macs & compatibility (was Re: Yager on ZFS)

2007-11-15 Thread Anton B. Rang
This is clearly off-topic :-) but perhaps worth correcting -- >Long-time MAC users must be getting used to having their entire world >disrupted and having to re-buy all their software. This is at least the >second complete flag-day (no forward or backwards compatibility) change >they've been th

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-15 Thread Anton B. Rang
> When you have a striped storage device under a > file system, then the database or file system's view > of contiguous data is not contiguous on the media. Right. That's a good reason to use fairly large stripes. (The primary limiting factor for stripe size is efficient parallel access; using

Re: [zfs-discuss] ZFS + DB + default blocksize

2007-11-12 Thread Anton B. Rang
Yes. Blocks are compressed individually, so a smaller block size will (on average) lead to less compression. (Assuming that your data is compressible at all, that is.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discu

Re: [zfs-discuss] Error: "Volume size exceeds limit for this system"

2007-11-08 Thread Anton B. Rang
The comment in the header file where this error is defined says: /* volume is too large for 32-bit system */ So it does look like it's a 32-bit CPU issue. Odd, since file systems don't normally have any sort of dependence on the CPU type Anton This message posted from opensolaris.org

Re: [zfs-discuss] Partial ZFS Space Recovery

2007-11-06 Thread Anton B. Rang
Sorry, you can't do that. The snapshots are taken on a per-filesystem basis; there is no way to remove just the data in one directory. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.open

Re: [zfs-discuss] minimum physical memory requirement?

2007-10-24 Thread Anton B. Rang
256 MB is really tight for ZFS. You can try it. FreeBSD suggests a minimum of 1 GB at http://wiki.freebsd.org/ZFSTuningGuide . This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.

Re: [zfs-discuss] Due to 128KB limit in ZFS it can't

2007-10-24 Thread Anton B. Rang
See the QFS documentation: http://docs.sun.com/source/817-4091-10/chapter8.html#59255 (Steps 1 through 4 would apply to any file system which can issue multi-megabyte I/O requests.) This message posted from opensolaris.org ___ zfs-discuss mailing

Re: [zfs-discuss] XFS_IOC_FSGETXATTR & XFS_IOC_RESVSP64 like

2007-10-12 Thread Anton B. Rang
openat() isn't really what he wants. These aren't user-level xattrs, they're ones which affect the file system, or more specifically, a particular file. I don't think the particular xattrs in question (or analogous ones) exist for ZFS at this point. This message posted from opensolaris.org _

Re: [zfs-discuss] Fileserver performance tests

2007-10-09 Thread Anton B. Rang
Do you have compression turned on? If so, dd'ing from /dev/zero isn't very useful as a benchmark. (I don't recall if all-zero blocks are always detected if checksumming is turned on, but I seem to recall that they are, even if compression is off.) This message posted from opensolaris.org ___

Re: [zfs-discuss] ZFS file system is crashing my system

2007-10-08 Thread Anton B. Rang
I didn't see an exact match in the bug database, but http://bugs.opensolaris.org/view_bug.do?bug_id=6328538 looks possible. (The line number doesn't quite match, but the call chain does.) Someone else reported this last month: http://www.opensolaris.org/jive/thread.jspa?messageID=155834 but t

Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Anton B. Rang
> 5) DMA straight from user buffer to disk avoiding a copy. This is what the "direct" in "direct i/o" has historically meant. :-) > line has been that 5) won't help latency much and > latency is here I think the game is currently played. Now the > disconnect might be because people might feel th

Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Anton B. Rang
> - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of > 28911.68 years This should, of course, set off one's common-sense alert. > it is 91 times more likely to fail and this system will contain data > that I don't want to risk losing If you don't want to risk losing data, you ne

Re: [zfs-discuss] pool is full and cant delete files

2007-09-09 Thread Anton B. Rang
At least three alternatives -- 1. If you don't have the latest patches installed, apply them. There have been bugs in this area which have been fixed. 2. If you still can't remove files with the latest patches, and you have a service contract with Sun, open a service request to get help. 3. A

Re: [zfs-discuss] Single SAN Lun presented to 4 Hosts

2007-08-27 Thread Anton B. Rang
> Host w continuously has a UFS mounted with read/write > access. > Host w writes to the file f/ff/fff. > Host w ceases to touch anything under f. > Three hours later, host r mounts the file system read-only, > reads f/ff/fff, and unmounts the file system. This would probably work for a non-journa

Re: [zfs-discuss] Single SAN Lun presented to 4 Hosts

2007-08-25 Thread Anton B. Rang
> Originally, we tried using our tape backup software > to read the oracle flash recovery area (oracle raw > device on a seperate set of san disks), however our > backup software has a known issue with the the > particular version of ORacle we are using. So one option is to get the backup vendor

Re: [zfs-discuss] ZFS/RaidZ newbie questions

2007-07-26 Thread Anton B. Rang
> First, does RaidZ support disks of multiple sizes, or must each RaidZ set > consist of equal sized disks? Each RAID-Z set must be constructed from equal-sized storage. While it's possible to mix disks of different sizes, either you lose the capacity of the larger disks, or you have to partitio

Re: [zfs-discuss] Does iSCSI target support SCSI-3 PGR reservation ?

2007-07-26 Thread Anton B. Rang
A quick look through the source would seem to indicate that the PERSISTENT RESERVE commands are not supported by the Solaris ISCSI target at all. http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/iscsi/iscsitgtd/t10_spc.c This message posted from opensolaris.org __

Re: [zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

2007-07-26 Thread Anton B. Rang
> I'd implement this via LD_PRELOAD library [ ... ] > > There's a problem with sync-on-close anyway - mmap for file I/O. Who > guarantees you no file contents are being modified after the close() ? The latter is actually a good argument for doing this (if it is necessary) in the file system, rat

Re: [zfs-discuss] pool analysis

2007-07-11 Thread Anton B. Rang
> Are Netapp using some kind of block checksumming? They provide an option for it, I'm not sure how often it's used. > If Netapp doesn't do something like [ZFS checksums], that would > explain why there's frequently trouble reconstructing, and point up a > major ZFS advantage. Actually, the real

[zfs-discuss] Re: ZFS Scalability/performance

2007-06-23 Thread Anton B. Rang
> Nothing sucks more than your "redundant" disk array > losing more disks than it can support and you lose all your data > anyway. You'd be better off doing a giant non-parity stripe and dumping to > tape on a regular basis. ;) Anyone who isn't dumping to tape (or some other reliable and [b]off-s

[zfs-discuss] Re: ZFS Scalability/performance

2007-06-23 Thread Anton B. Rang
> Oliver Schinagl wrote: > > zo basically, what you are saying is that on FBSD there's no performane > > issue, whereas on solaris there (can be if write caches aren't enabled) > > Solaris plays it safe by default. You can, of course, override that safety. FreeBSD plays it safe too. It's just t

[zfs-discuss] Re: Mac OS X 10.5 read-only support for ZFS

2007-06-17 Thread Anton B. Rang
Here's one possible reason that a read-only ZFS would be useful: DVD-ROM distribution. Sector errors on DVD are not uncommon. Writing a DVD in ZFS format with duplicated data blocks would help protect against that problem, at the cost of 50% or so disk space. That sounds like a lot, but with Bl

[zfs-discuss] Re: Re: Re: ZFS Apple WWDC Keynote Absence

2007-06-17 Thread Anton B. Rang
>And the posts related to leopard handed out at wwdc 07 seems to >indicate that zfs is not yet fully implemented, which might be the >real reason that zfs isn't the default fs. I suspect there are two other strong reasons why it's not the default. 1. ZFS is a new and immature file system. HFS+ h

[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.

2007-05-24 Thread Anton B. Rang
Richard wrote: > Any system which provides a single view of data (eg. a persistent storage > device) must have at least one single point of failure. Why? Consider this simple case: A two-drive mirrored array. Use two dual-ported drives, two controllers, two power supplies, arranged roughly as fo

[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.

2007-05-23 Thread Anton B. Rang
> If you've got the internal system bandwidth to drive all drives then RAID-Z > is definitely > superior to HW RAID-5. Same with mirroring. You'll need twice as much I/O bandwidth as with a hardware controller, plus the redundancy, since the reconstruction is done by the host. For instance, to

[zfs-discuss] Re: cow performance penatly

2007-04-26 Thread Anton B. Rang
> I wonder if any one have idea about the performance loss caused by COW > in ZFS? If you have to read old data out before write it to some other > place, it involve disk seek. Since all I/O in ZFS is cached, this actually isn't that bad; the seek happens eventually, but it's not an "extra" seek.

[zfs-discuss] Re: Re: [nfs-discuss] Multi-tera, small-file filesystems

2007-04-23 Thread Anton B. Rang
However, the MTTR is likely to be 1/8 the time This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Bandwidth requirements (was Re: Preferred backup mechanism for ZFS?)

2007-04-20 Thread Anton B. Rang
> You need exactly the same bandwidth as with any other > classical backup solution - it doesn't matter how at the end you need > to copy all those data (differential) out of the box regardless if it's > a tape or a disk. Sure. However, it's somewhat cheaper to buy 100 MB/sec of local-attached ta

[zfs-discuss] Re: Re: Help me understand ZFS caching

2007-04-20 Thread Anton B. Rang
> So if someone has a real world workload where having the ability to purposely > not cache user > data would be a win, please let me know. Multimedia streaming is an obvious one. For databases, it depends on the application, but in general the database will do a better job of selecting which d

[zfs-discuss] Re: Re: Preferred backup mechanism for ZFS?

2007-04-20 Thread Anton B. Rang
To clarify, there are at least two issues with remote replication vs. backups in my mind. (Feel free to joke about the state of my mind! ;-) The first, which as you point out can be alleviated with snapshots, is the ability to "go back" in time. If an accident wipes out a file, the missing file

  1   2   3   >