Re: [zfs-discuss] zfs usedbysnapshots is not equial to "zfs list -t snapshot" for a files

2009-11-04 Thread Anton B. Rang
I believe that space shared between multiple snapshots is not assigned to any of the snapshots. So if you have a 100 GB file and take two snapshots, then delete it, the space used won't show up in the snapshot list, but will show up in the 'usedbysnapshots' property. -- This message posted fro

Re: [zfs-discuss] Recovering ZFS stops after syseventconfd can't fork

2009-12-22 Thread Anton B. Rang
Something over 8000 sounds vaguely like the default maximum process count. What does 'ulimit -a' show? I don't know why you're seeing so many zfsdle processes, though — sounds like a bug to me. -- This message posted from opensolaris.org ___ zfs-disc

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Anton B. Rang
> From the ZFS Administration Guide, Chapter 11, Data Repair section: > Given that the fsck utility is designed to repair known pathologies > specific to individual file systems, writing such a utility for a file > system with no known pathologies is impossible. That's a fallacy (and is incorrect

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Anton B. Rang
> As others have explained, if ZFS does not have a > config with data redundancy - there is not much that > can be learned - except that it "just broke". Plenty can be learned by just looking at the pool. Unfortunately ZFS currently doesn't have tools which make that easy; as I understand it, zdb

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Anton B. Rang
> How would you describe the difference between the file system > checking utility and zpool scrub? Is zpool scrub lacking in its > verification of the data? To answer the second question first, yes, zpool scrub is lacking, at least to the best of my knowledge (I haven't looked at the ZFS source

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)

2008-08-11 Thread Anton B. Rang
That brings up another interesting idea. ZFS currently uses a 128-bit checksum for blocks of up to 1048576 bits. If 20-odd bits of that were a Hamming code, you'd have something slightly stronger than SECDED, and ZFS could correct any single-bit errors encountered. This could be done without ch

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-12 Thread Anton B. Rang
Reed-Solomon could correct multiple-bit errors, but an effective Reed-Solomon code for 128K blocks of data would be very slow if implemented in software (and, for that matter, take a lot of hardware to implement). A multi-bit Hamming code would be simpler, but I suspect that undetected multi-bit

Re: [zfs-discuss] corrupt zfs stream? "checksum mismatch"

2008-08-13 Thread Anton B. Rang
There is an explicit check in ZFS for the checksum, as you deduced. I suspect that by disabling this check you could recover much, if not all, of your data. You could probably do this with mdb by 'simply' writing a NOP over the branch in dmu_recv_stream. It appears that 'zfs send' was designed

Re: [zfs-discuss] pulling disks was: ZFS hangs/freezes after disk

2008-08-28 Thread Anton B. Rang
Many mid-range/high-end RAID controllers work by having a small timeout on individual disk I/O operations. If the disk doesn't respond quickly, they'll issue an I/O to the redundant disk(s) to get the data back to the host in a reasonable time. Often they'll change parameters on the disk to limi

Re: [zfs-discuss] pulling disks was: ZFS hangs/freezes after disk

2008-08-28 Thread Anton B. Rang
Many mid-range/high-end RAID controllers work by having a small timeout on individual disk I/O operations. If the disk doesn't respond quickly, they'll issue an I/O to the redundant disk(s) to get the data back to the host in a reasonable time. Often they'll change parameters on the disk to limi

Re: [zfs-discuss] Terabyte scrub

2008-09-04 Thread Anton B. Rang
If you're using a mirror, and each disk manages 50 MB/second (unlikely if it's a single disk doing a lot of seeks, but you might do better using a hardware array for each half of the mirror), simple math says that scanning 1 TB would take roughly 20,000 seconds, or 5 hours. So your speed under S

[zfs-discuss] SATA/SAS (Re: Quantifying ZFS reliability)

2008-10-05 Thread Anton B. Rang
Erik: > > (2) a SAS drive has better throughput and IOPs than a SATA drive Richard: > Disagree. We proved that the transport layer protocol has no bearing > on throughput or iops. Several vendors offer drives which are > identical in all respects except for transport layer protocol: SAS or > SA

Re: [zfs-discuss] Lost space in empty pool (no snapshots)

2008-11-11 Thread Anton B. Rang
The "deferred free" indicates that these blocks are supposed to be freed at a future time. A quick glance at the code would seem to indicate that this is supposed to happen when the next transaction group is pushed. Apparently it's not happening on your system ... presumably a bug. -- This mess

Re: [zfs-discuss] ZPool and Filesystem Sizing - Best Practices?

2008-11-26 Thread Anton B. Rang
>If there is a zfs implementation bug it could perhaps be more risky >to have five pools rather than one. Kind of goes both ways. You're perhaps 5 times as likely to wind up with a damaged pool, but if that ever happens, there's only 1/5 as much data to restore. -- This message posted from ope

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-10 Thread Anton B. Rang
>It sounds like you have access to a source of information that the >rest of us don't have access to. I think if you read the archives of this mailing list, and compare it to the discussions on the other Solaris mailing lists re UFS, it's a reasonable conclusion. -- This message posted from op

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-13 Thread Anton B. Rang
I wasn't joking, though as is well known, the plural of anecdote is not data. Both UFS and ZFS, in common with all file system, have design flaws and bugs. To lose an entire UFS file system (barring the loss of the entire underlying storage) requires a great deal of corruption; there are multipl

Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-13 Thread Anton B. Rang
Some RAID systems compare checksums on reads, though this is usually only for RAID-4 configurations (e.g. DataDirect) because of the performance hit otherwise. End-to-end checksums are not yet common. The SCSI committee recently ratified T10 DIF, which allows either an operating system or appli

Re: [zfs-discuss] ZFS and aging

2008-12-15 Thread Anton B. Rang
Typically you want to do something like this: Write 1,000,000 files of varying length. Randomly select and remove 500,000 of those files. Repeat (a) creating files, and (b) randomly removing files, until your file system is full enough for your test, or you run out of time. That's a pretty

Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread Anton B. Rang
For SCSI disks (including FC), you would use the FUA bit on the read command. For SATA disks ... does anyone care? ;-) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mail

Re: [zfs-discuss] zfs panic

2009-01-19 Thread Anton B. Rang
Looks like a corrupted pool -- you appear to have a mirror block pointer with no valid children. From the dump, you could probably determine which file is bad, but I doubt you could delete it; you might need to recreate your pool. -- This message posted from opensolaris.org _

Re: [zfs-discuss] zfs null pointer deref,

2009-01-19 Thread Anton B. Rang
If you've got enough space on /var, and you had a dump partition configured, you should find a bunch of "vmcore.[n]" files in /var/crash by now. The system normally dumps the kernel core into the dump partition (which can be the swap partition) and then copies it into /var/crash on the next suc

Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Anton B. Rang
>The user DEFINITELY isn't expecting 5 bytes, or what you meant to say >5000 >bytes, they're expecting 500GB. You know, 536,870,912,000 bytes. But even if >the drive mfg's >calculated it correctly, they wouldn't even be getting that due to filesystem >overhead. I doubt there

Re: [zfs-discuss] zfs null pointer deref,

2009-01-20 Thread Anton B. Rang
Sigh. Richard points out in private email that automatic savecore functionality is disabled in OpenSolaris; you need to manually set up a dump device and save core files if you want them. However, the stack may be sufficient to ID the bug. -- This message posted from opensolaris.org

Re: [zfs-discuss] Problem with snapshot

2009-01-29 Thread Anton B. Rang
Snapshots are not on a per-pool basis but a per-file-system basis. Thus, when you took a snapshot of "testpol", you didn't actually snapshot the pool; rather, you took a snapshot of the top level file system (which has an implicit name matching that of the pool). Thus, you haven't actually aff

Re: [zfs-discuss] write cache and cache flush

2009-01-29 Thread Anton B. Rang
If all write caches are truly disabled, then disabling the cache flush won't affect the safety of your data. It will change your performance characteristics, almost certainly for the worse. -- This message posted from opensolaris.org ___ zfs-discuss ma

Re: [zfs-discuss] Problems with '..' on ZFS pool

2009-01-29 Thread Anton B. Rang
That bug has been in Solaris forever. :-( -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-10 Thread Anton B. Rang
> Fsck can only repair known faults; known > discrepancies in the meta data. > Since ZFS doesn't have such known discrepancies, > there's nothing to repair. I'm rather tired of hearing this mantra. If ZFS detects an error in part of its data structures, then there is clearly something to repair.

Re: [zfs-discuss] Books on File Systems and File System Programming

2009-08-16 Thread Anton B. Rang
There aren't many good books on file system design. The "VAX/VMS Internals and Data Structures" book by Goldenberg covers a fair amount of the RMS file system design along with its rationale. There is also a "VMS File System Internals" book which I haven't yet read. Apple's early Inside Macinto

[zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-26 Thread Anton B. Rang
> 1. How stable is ZFS? It's a new file system; there will be bugs. It appears to be well-tested, though. There are a few known issues; for instance, a write failure can panic the system under some circumstances. UFS has known issues too > 2. Recommended config. Above, I have a fairly

[zfs-discuss] Re: high density SAS

2007-01-26 Thread Anton B. Rang
> > How badly can you mess up a JBOD? > > Two words: vibration, cooling. Three more: power, signal quality. I've seen even individual drive cases with bad enough signal quality to cause bit errors. This message posted from opensolaris.org ___ zfs-

[zfs-discuss] Re: hot spares - in standby?

2007-02-02 Thread Anton B. Rang
> Often, the spare is up and running but for whatever reason you'll have a > bad block on it and you'll die during the reconstruct. Shouldn't SCSI/ATA block sparing handle this? Reconstruction should be purely a matter of writing, so "bit rot" shouldn't be an issue; or are there cases I'm not

[zfs-discuss] Re: ZFS panic on B54

2007-02-02 Thread Anton B. Rang
The affected DIMM? Did you have memory errors before this? The message you posted looked like a ZFS encountered an error writing to the drive (which could, admittedly, have been caused by bad memory). This message posted from opensolaris.org ___ zf

[zfs-discuss] Re: ZFS and question on repetative data migrating to it efficiently...

2007-02-02 Thread Anton B. Rang
In general, your backup software should handle making incremental dumps, even from a split mirror. What are you using to write data to tape? Are you simply dumping the whole file system, rather than using standard backup software? ZFS snapshots use a pure copy-on-write model. If you have a block

[zfs-discuss] Re: Best Practises => Keep Pool Below 80%?

2007-02-13 Thread Anton B. Rang
The space management algorithms in many file systems don't always perform well when they can't find a free block of the desired size. There's often a "cliff" where on average, once the file system is too full, performance drops off exponentially. UFS deals with this by reserving space explicitly

[zfs-discuss] Re: Google paper on disk reliability

2007-02-20 Thread Anton B. Rang
It turns out that even rather poor prediction accuracy is good enough to make a big difference (10x) in the failure probability of a RAID system. See Gordon Hughes & Joseph Murray, "Reliability and Security of RAID Storage Systems and D2D Archives Using SATA Disk Drives", ACM Transactions on Sto

[zfs-discuss] Re: ZFS checksum error detection

2007-03-16 Thread Anton B. Rang
It's possible (if unlikely) that you are only getting checksum errors on metadata. Since ZFS always internally mirrors its metadata, even on non-redundant pools, it can recover from metadata corruption which does not affect all copies. (If there is only one LUN, the mirroring happens at differe

[zfs-discuss] Re: user id mapping of exported fs

2007-03-18 Thread Anton B. Rang
That's really an NFS question, not a ZFS one — ZFS simply uses whatever UID the NFS server passes through to it. That said, Solaris doesn’t offer this functionality, as far as I know. Perhaps NFSv4 domains could be used to achieve something similar This message posted from opensolaris.

[zfs-discuss] Re: Proposal: ZFS hotplug support and autoconfiguration

2007-03-21 Thread Anton B. Rang
A couple of questions/comments -- Why is the REMOVED state not persistent? It seems that, if ZFS knows that an administrator pulled a disk deliberately, that's still useful information after a reboot. Changing the state to FAULTED is non-intuitive, at least to me. What happens with autoreplace

[zfs-discuss] Re: Re: Proposal: ZFS hotplug supportandautoconfiguration

2007-03-22 Thread Anton B. Rang
> > Consider a server [with] three drives, A, B, and C, in which A and B are > > mirrored and > > C is not. Pull out A, B, and C, and re-insert them as A, C, and B. If > > B is slow to come up for some reason, ZFS will see "C" in place of > > "B", and happily reformat it into a mirror of "A". (Or

[zfs-discuss] Re: ZFS and UFS performance

2007-03-28 Thread Anton B. Rang
> According to Bug Database bug 6382683 is in 1-Dispatched state, what does > that mean? Roughly speaking, the bug has been entered into the database, but no developer has been assigned to it. (State 3 indicates that a developer or team has agreed that it's a bug; it sounds likely that this bug

[zfs-discuss] Re: How big a write to a regular file is atomic?

2007-03-28 Thread Anton B. Rang
It's not defined by POSIX (or Solaris). You can rely on being able to atomically write a single disk block (512 bytes); anything larger than that is risky. Oh, and it has to be 512-byte aligned. File systems with overwrite semantics (UFS, QFS, etc.) will never guarantee atomicity for more than

[zfs-discuss] Re: How big a write to a regular file is atomic?

2007-03-28 Thread Anton B. Rang
I should probably clarify my answer. All file systems provide writes by default which are atomic with respect to readers of the file. That's a POSIX requirement. In other words, if you're writing ABC, there's no possibility that a reader might see ABD (if D was previously contained in the file)

[zfs-discuss] Re: Re[2]: 6410 expansion shelf

2007-03-29 Thread Anton B. Rang
However, even with sequential writes, a large I/O size makes a huge difference in throughput. Ask the QFS folks about data capture applications. ;-) (This is less true with ATA disks that tend to have less buffering and much less sophisticated architectures. I'm not aware of any dual-processor A

[zfs-discuss] Re: How big a write to a regular file is atomic?

2007-04-02 Thread Anton B. Rang
> > All file systems provide writes by default which are > > atomic with respect to readers of the file. > > Surely, only in the absence of a crash - otherwise, > POSIX would require implementation of transactional > write semantics in all file systems. Or is that what > you meant by the last sent

[zfs-discuss] Re: Size taken by a zfs symlink

2007-04-02 Thread Anton B. Rang
It's hard to say precisely, but asymptotically you should see one znode & one directory entry (plus a bit of associated tree overhead) for "smaller" symlinks (56 bytes?) and an additional data block of 512 or 1024 bytes for larger symlinks. This message posted from opensolaris.org __

[zfs-discuss] Re: Gzip compression for ZFS

2007-04-05 Thread Anton B. Rang
> Assuming that you may pick a specific compression algorithm, > most algorithms can have different levels/percentages of > deflations/inflations which affects the time to compress > and/or inflate wrt the CPU capacity. Yes? I'm not sure what your point is. Are you suggesting that, rather than

[zfs-discuss] Re: Something like spare sectors...

2007-04-06 Thread Anton B. Rang
> This sounds a lot like: > > 6417779 ZFS: I/O failure (write on ...) -- need to > reallocate writes > > Which would allow us to retry write failures on > alternate vdevs. Of course, if there's only one vdev, the write should be retried to a different block on the original vdev ... right? Anto

[zfs-discuss] Re: Poor man's backup by attaching/detaching mirrordrives on a _striped_ pool?

2007-04-10 Thread Anton B. Rang
> You'd want to export them, not detach them. But you can't export just one branch of the mirror, can you? > Off the top of my head (i.e. untested): > > - zpool create tank mirror > - zpool export tank But this will unmount all the file systems, right? -- Anton This message posted from

[zfs-discuss] Re: Poor man's backup by attaching/detaching mirror

2007-04-10 Thread Anton B. Rang
> How would you access the data on that device? Presumably, zpool import. This is basically what everyone does today with mirrors, isn't it? :-) Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

[zfs-discuss] Re: ZFS improvements

2007-04-10 Thread Anton B. Rang
>> please stop crashing the kernel. > > This is: > > 6322646 ZFS should gracefully handle all devices failing (when writing) That's only one cause of panics. At least two of gino's panics appear due to corrupted space maps, for instance. I think there may also still be a case where a failure t

[zfs-discuss] Re: ZFS improvements

2007-04-10 Thread Anton B. Rang
> Without understanding the underlying pathology it's impossible to "fix" a ZFS > pool. Sorry, but I have to disagree with this. The goal of fsck is not to bring a file system into the state it "should" be in had no errors occurred. The goal, rather, is to bring a file system to a self-consist

[zfs-discuss] Re: Benchmarking

2007-04-12 Thread Anton B. Rang
> I time mkfile'ing a 1 gb file on ufs and copying it [...] then did the same > thing on each zfs partition. Then I took snapshots, copied files, more > snapshots, keeping timings all the way. [ ... ] > > Is this a sufficient, valid test? If your applications do that -- manipulate large files, p

[zfs-discuss] Re: Update/append of compressed files

2007-04-17 Thread Anton B. Rang
Remember that ZFS is a copy-on-write file system. ZFS, much like UFS, uses indirect blocks to point to file contents. However, unlike UFS (which supports only 8K and 1K blocks, and 1K blocks only at the end of a file), the underlying stored data blocks can be of different sizes. An uncompressed

[zfs-discuss] Re: Outdated FAQ entry

2007-04-17 Thread Anton B. Rang
There are still some cases of corrupted pools that cause panics at boot (see some of the threads from the past few weeks), so the FAQ probably needs to stay for now. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@o

[zfs-discuss] Re: Testing of UFS, VxFS and ZFS

2007-04-17 Thread Anton B. Rang
> Second, VDBench is great for testing raw block i/o devices. > I think a tool that does file system testing will get you > better data. OTOH, shouldn't a tool that measures raw device performance be reasonable to reflect Oracle performance when configured for raw devices? I don't know the curre

[zfs-discuss] Re: LZO compression?

2007-04-18 Thread Anton B. Rang
For what it's worth, at a previous job I actually ported LZO to an OpenFirmware implementation. It's very small, doesn't rely on the standard libraries, and would be trivial to get running in a kernel. (Licensing might be an issue, of course.) This message posted from opensolaris.org ___

[zfs-discuss] Re: Re: zfs boot image conversion kit is posted

2007-04-19 Thread Anton B. Rang
A virtual machine can be thought of as a physical machine with the hardware removed. ;-) To set up a VMware virtual machine, for instance, you'd just (a) start with a VM with a blank disk, (b) install OpenSolaris, (c) configure as desired. I think this is all the original poster is suggesting.

[zfs-discuss] Re: Preferred backup mechanism for ZFS?

2007-04-20 Thread Anton B. Rang
> Initially I wanted a way to do a dump to tape like ufsdump. I > don't know if this makes sense anymore because the tape market is > crashing slowly. It makes sense if you need to keep backups for more than a handful of years (think regulatory requirements or scientific data), or if cost is im

[zfs-discuss] Re: Bottlenecks in building a system

2007-04-20 Thread Anton B. Rang
If you're using this for multimedia, do some serious testing first. ZFS tends to have "bursty" write behaviour, and the worst-case latency can be measured in seconds. This has been improved a bit in recent builds but it still seems to "stall" periodically. (QFS works extremely well for streamin

[zfs-discuss] Re: Multi-tera, small-file filesystems

2007-04-20 Thread Anton B. Rang
You should definitely worry about the number of files when it comes to backup & management. It will also make a big difference in space overhead. A ZFS filesystem with 2^35 files will have a minimum of 2^44 bytes of overhead just for the file nodes, which is about 16 TB. If it takes about 20 ms

[zfs-discuss] Re: Help me understand ZFS caching

2007-04-20 Thread Anton B. Rang
ZFS uses caching heavily as well; much more so, in fact, than UFS. Copy-on-write and direct i/o are not related. As you say, data gets written first, then the metadata which points to it, but this isn't anything like direct I/O. In particular, direct I/O avoids caching the data, instead transfe

[zfs-discuss] Re: Re: Preferred backup mechanism for ZFS?

2007-04-20 Thread Anton B. Rang
To clarify, there are at least two issues with remote replication vs. backups in my mind. (Feel free to joke about the state of my mind! ;-) The first, which as you point out can be alleviated with snapshots, is the ability to "go back" in time. If an accident wipes out a file, the missing file

[zfs-discuss] Re: Re: Help me understand ZFS caching

2007-04-20 Thread Anton B. Rang
> So if someone has a real world workload where having the ability to purposely > not cache user > data would be a win, please let me know. Multimedia streaming is an obvious one. For databases, it depends on the application, but in general the database will do a better job of selecting which d

[zfs-discuss] Bandwidth requirements (was Re: Preferred backup mechanism for ZFS?)

2007-04-20 Thread Anton B. Rang
> You need exactly the same bandwidth as with any other > classical backup solution - it doesn't matter how at the end you need > to copy all those data (differential) out of the box regardless if it's > a tape or a disk. Sure. However, it's somewhat cheaper to buy 100 MB/sec of local-attached ta

[zfs-discuss] Re: Re: [nfs-discuss] Multi-tera, small-file filesystems

2007-04-23 Thread Anton B. Rang
However, the MTTR is likely to be 1/8 the time This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: cow performance penatly

2007-04-26 Thread Anton B. Rang
> I wonder if any one have idea about the performance loss caused by COW > in ZFS? If you have to read old data out before write it to some other > place, it involve disk seek. Since all I/O in ZFS is cached, this actually isn't that bad; the seek happens eventually, but it's not an "extra" seek.

[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.

2007-05-23 Thread Anton B. Rang
> If you've got the internal system bandwidth to drive all drives then RAID-Z > is definitely > superior to HW RAID-5. Same with mirroring. You'll need twice as much I/O bandwidth as with a hardware controller, plus the redundancy, since the reconstruction is done by the host. For instance, to

[zfs-discuss] Re: ZFS - Use h/w raid or not? Thoughts. Considerations.

2007-05-24 Thread Anton B. Rang
Richard wrote: > Any system which provides a single view of data (eg. a persistent storage > device) must have at least one single point of failure. Why? Consider this simple case: A two-drive mirrored array. Use two dual-ported drives, two controllers, two power supplies, arranged roughly as fo

[zfs-discuss] Re: Re: Re: ZFS Apple WWDC Keynote Absence

2007-06-17 Thread Anton B. Rang
>And the posts related to leopard handed out at wwdc 07 seems to >indicate that zfs is not yet fully implemented, which might be the >real reason that zfs isn't the default fs. I suspect there are two other strong reasons why it's not the default. 1. ZFS is a new and immature file system. HFS+ h

[zfs-discuss] Re: Mac OS X 10.5 read-only support for ZFS

2007-06-17 Thread Anton B. Rang
Here's one possible reason that a read-only ZFS would be useful: DVD-ROM distribution. Sector errors on DVD are not uncommon. Writing a DVD in ZFS format with duplicated data blocks would help protect against that problem, at the cost of 50% or so disk space. That sounds like a lot, but with Bl

[zfs-discuss] Re: ZFS Scalability/performance

2007-06-23 Thread Anton B. Rang
> Oliver Schinagl wrote: > > zo basically, what you are saying is that on FBSD there's no performane > > issue, whereas on solaris there (can be if write caches aren't enabled) > > Solaris plays it safe by default. You can, of course, override that safety. FreeBSD plays it safe too. It's just t

[zfs-discuss] Re: ZFS Scalability/performance

2007-06-23 Thread Anton B. Rang
> Nothing sucks more than your "redundant" disk array > losing more disks than it can support and you lose all your data > anyway. You'd be better off doing a giant non-parity stripe and dumping to > tape on a regular basis. ;) Anyone who isn't dumping to tape (or some other reliable and [b]off-s

Re: [zfs-discuss] pool analysis

2007-07-11 Thread Anton B. Rang
> Are Netapp using some kind of block checksumming? They provide an option for it, I'm not sure how often it's used. > If Netapp doesn't do something like [ZFS checksums], that would > explain why there's frequently trouble reconstructing, and point up a > major ZFS advantage. Actually, the real

Re: [zfs-discuss] ZFS/RaidZ newbie questions

2007-07-26 Thread Anton B. Rang
> First, does RaidZ support disks of multiple sizes, or must each RaidZ set > consist of equal sized disks? Each RAID-Z set must be constructed from equal-sized storage. While it's possible to mix disks of different sizes, either you lose the capacity of the larger disks, or you have to partitio

Re: [zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

2007-07-26 Thread Anton B. Rang
> I'd implement this via LD_PRELOAD library [ ... ] > > There's a problem with sync-on-close anyway - mmap for file I/O. Who > guarantees you no file contents are being modified after the close() ? The latter is actually a good argument for doing this (if it is necessary) in the file system, rat

Re: [zfs-discuss] Does iSCSI target support SCSI-3 PGR reservation ?

2007-07-26 Thread Anton B. Rang
A quick look through the source would seem to indicate that the PERSISTENT RESERVE commands are not supported by the Solaris ISCSI target at all. http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/iscsi/iscsitgtd/t10_spc.c This message posted from opensolaris.org __

Re: [zfs-discuss] Single SAN Lun presented to 4 Hosts

2007-08-25 Thread Anton B. Rang
> Originally, we tried using our tape backup software > to read the oracle flash recovery area (oracle raw > device on a seperate set of san disks), however our > backup software has a known issue with the the > particular version of ORacle we are using. So one option is to get the backup vendor

Re: [zfs-discuss] Single SAN Lun presented to 4 Hosts

2007-08-27 Thread Anton B. Rang
> Host w continuously has a UFS mounted with read/write > access. > Host w writes to the file f/ff/fff. > Host w ceases to touch anything under f. > Three hours later, host r mounts the file system read-only, > reads f/ff/fff, and unmounts the file system. This would probably work for a non-journa

Re: [zfs-discuss] pool is full and cant delete files

2007-09-09 Thread Anton B. Rang
At least three alternatives -- 1. If you don't have the latest patches installed, apply them. There have been bugs in this area which have been fixed. 2. If you still can't remove files with the latest patches, and you have a service contract with Sun, open a service request to get help. 3. A

Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Anton B. Rang
> - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of > 28911.68 years This should, of course, set off one's common-sense alert. > it is 91 times more likely to fail and this system will contain data > that I don't want to risk losing If you don't want to risk losing data, you ne

Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Anton B. Rang
> 5) DMA straight from user buffer to disk avoiding a copy. This is what the "direct" in "direct i/o" has historically meant. :-) > line has been that 5) won't help latency much and > latency is here I think the game is currently played. Now the > disconnect might be because people might feel th

Re: [zfs-discuss] ZFS file system is crashing my system

2007-10-08 Thread Anton B. Rang
I didn't see an exact match in the bug database, but http://bugs.opensolaris.org/view_bug.do?bug_id=6328538 looks possible. (The line number doesn't quite match, but the call chain does.) Someone else reported this last month: http://www.opensolaris.org/jive/thread.jspa?messageID=155834 but t

Re: [zfs-discuss] Fileserver performance tests

2007-10-09 Thread Anton B. Rang
Do you have compression turned on? If so, dd'ing from /dev/zero isn't very useful as a benchmark. (I don't recall if all-zero blocks are always detected if checksumming is turned on, but I seem to recall that they are, even if compression is off.) This message posted from opensolaris.org ___

Re: [zfs-discuss] XFS_IOC_FSGETXATTR & XFS_IOC_RESVSP64 like

2007-10-12 Thread Anton B. Rang
openat() isn't really what he wants. These aren't user-level xattrs, they're ones which affect the file system, or more specifically, a particular file. I don't think the particular xattrs in question (or analogous ones) exist for ZFS at this point. This message posted from opensolaris.org _

Re: [zfs-discuss] Due to 128KB limit in ZFS it can't

2007-10-24 Thread Anton B. Rang
See the QFS documentation: http://docs.sun.com/source/817-4091-10/chapter8.html#59255 (Steps 1 through 4 would apply to any file system which can issue multi-megabyte I/O requests.) This message posted from opensolaris.org ___ zfs-discuss mailing

Re: [zfs-discuss] minimum physical memory requirement?

2007-10-24 Thread Anton B. Rang
256 MB is really tight for ZFS. You can try it. FreeBSD suggests a minimum of 1 GB at http://wiki.freebsd.org/ZFSTuningGuide . This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.

Re: [zfs-discuss] Partial ZFS Space Recovery

2007-11-06 Thread Anton B. Rang
Sorry, you can't do that. The snapshots are taken on a per-filesystem basis; there is no way to remove just the data in one directory. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.open

Re: [zfs-discuss] Error: "Volume size exceeds limit for this system"

2007-11-08 Thread Anton B. Rang
The comment in the header file where this error is defined says: /* volume is too large for 32-bit system */ So it does look like it's a 32-bit CPU issue. Odd, since file systems don't normally have any sort of dependence on the CPU type Anton This message posted from opensolaris.org

Re: [zfs-discuss] ZFS + DB + default blocksize

2007-11-12 Thread Anton B. Rang
Yes. Blocks are compressed individually, so a smaller block size will (on average) lead to less compression. (Assuming that your data is compressible at all, that is.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discu

Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-15 Thread Anton B. Rang
> When you have a striped storage device under a > file system, then the database or file system's view > of contiguous data is not contiguous on the media. Right. That's a good reason to use fairly large stripes. (The primary limiting factor for stripe size is efficient parallel access; using

[zfs-discuss] Macs & compatibility (was Re: Yager on ZFS)

2007-11-15 Thread Anton B. Rang
This is clearly off-topic :-) but perhaps worth correcting -- >Long-time MAC users must be getting used to having their entire world >disrupted and having to re-buy all their software. This is at least the >second complete flag-day (no forward or backwards compatibility) change >they've been th

Re: [zfs-discuss] read/write NFS block size and ZFS

2007-11-16 Thread Anton B. Rang
If you're running over NFS, the ZFS block size most likely won't have a measurable impact on your performance. Unless you've got multiple gigabit ethernet interfaces, the network will generally be the bottleneck rather than your disks, and NFS does enough caching at both client & server end to

Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and

2007-11-18 Thread Anton B. Rang
Hint: Bill was already writing file system code when I was in elementary school. ;-) Seriously...it's rather sad to see serious and useful discussions derailed by thin skins and zealotry. Bill's kind enough to share some of his real-world experience and observations in the old tradition of sen

Re: [zfs-discuss] ZFS snapshot GUI

2007-11-20 Thread Anton B. Rang
How does the ability to set a snapshot schedule for a particular *file* or *folder* interact with the fact that ZFS snapshots are on a per-filesystem basis? This seems a poor fit. If I choose to snapshot my "Important Documents" folder every 5 minutes, that's implicitly creating snapshots of m

Re: [zfs-discuss] Problem with sharing multiple zfs file systems

2007-11-26 Thread Anton B. Rang
Given that it will be some time before NFSv4 support, let alone NFSv4 support for mount point crossing, in most client operating systems ... what obstacles are in the way of constructing an NFSv3 server which would 'do the right thing' transparently to clients so long as the file systems involve

Re: [zfs-discuss] Best stripe-size in array for ZFS mail storage?

2007-12-01 Thread Anton B. Rang
> That depends upon exactly what effect turning off the > ZFS cache-flush mechanism has. The only difference is that ZFS won't send a SYNCHRONIZE CACHE command at the end of a transaction group (or ZIL write). It doesn't change the actual read or write commands (which are always sent as ordinary

Re: [zfs-discuss] ZFS metadata

2007-12-03 Thread Anton B. Rang
My guess is that you're looking for the on-disk format document: http://www.opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf If you're more interested in the user-visible metadata (e.g. the settings for pools and filesystems), then you'd probably have to dig through the various manu

Re: [zfs-discuss] why are these three ZFS caches using so much kmem?

2007-12-03 Thread Anton B. Rang
> Got an issue which is rather annoying to me - three of my > ZFS caches are regularly using nearly 1/2 of the 1.09Gb of > allocated kmem in my system. I think this is just the ARC; you can limit its size using: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC

Re: [zfs-discuss] I screwed up my zpool

2007-12-03 Thread Anton B. Rang
> 2007-11-07.14:15:19 zpool create -f tank raidz2 [ ... ] > 2007-12-03.14:42:28 zpool add tank c4t0d0 > > c4t0d0 is not part of raidz2. How can I fix this? Back up your data; destroy the pool; and re-create it. > Ideally I would like to create another zpool with c4t0d0 plus some more disks > sinc

Re: [zfs-discuss] zfs mirroring question

2007-12-05 Thread Anton B. Rang
The file systems are striped between the two mirrors. (If your disks are A, B, C, D then a single file's blocks would reside on disks A+B, then C+D, then A+B again.) If you lose A and B, or C and D, you lose the whole pool. (Hence if you have two power supplies, for instance, you'd probably w

  1   2   3   >