[zfs-discuss] Re: Preferred backup mechanism for ZFS?
> Initially I wanted a way to do a dump to tape like ufsdump. I > don't know if this makes sense anymore because the tape market is > crashing slowly. It makes sense if you need to keep backups for more than a handful of years (think regulatory requirements or scientific data), or if cost is important. Storing tape is much cheaper than keeping disks running. (Storing disks isn't practical over long periods of time; not only does the signal on the media degrade, but so do some components.) > People just don't backup 300MB per night anymore. We > are looking at terabytes of data and I don't know how > to backup a terabyte a night. If you're actually generating a terabyte per day of data, I'm impressed. :-) Tape seems a reasonable way to back that up, in any case. A T1 stores 500 GB on each tape and runs at 120 MB/sec, so a terabyte would take roughly 2.5 hours to backup with a single tape drive. LTO-4 is in the same ballpark. Of course, that assumes your disk system can keep up. The SAM-QFS approach of continuous archiving makes a lot of sense here since it effectively lets backups run continuously. I don't know how much Sun can say about the work going on to add SAM to ZFS. > Or a really big question that I guess I have to ask, do we even care anymore? If we're serious about disaster recovery, we do. In particular, remote replication is NOT a substitute for backups. Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Bottlenecks in building a system
If you're using this for multimedia, do some serious testing first. ZFS tends to have "bursty" write behaviour, and the worst-case latency can be measured in seconds. This has been improved a bit in recent builds but it still seems to "stall" periodically. (QFS works extremely well for streaming, as evidenced in recent Sun press releases, but I'm not sure what the cost is these days.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Multi-tera, small-file filesystems
You should definitely worry about the number of files when it comes to backup & management. It will also make a big difference in space overhead. A ZFS filesystem with 2^35 files will have a minimum of 2^44 bytes of overhead just for the file nodes, which is about 16 TB. If it takes about 20 ms for the overhead to backup a file (2 seeks), then 2^35 files will take 21 years to back up. ;-) I'm guessing you didn't really mean 2^35, though. (If you did, you're likely to need a system along the lines of DARPA's HPCS program) Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS boot: 3 smaller glitches with console, /etc/dfs/sharetab and /dev/random
Hi, >> 2. After going through the zfs-bootification, Solaris complains on >> reboot that >>/etc/dfs/sharetab is missing. Somehow this seems to have been >> fallen through >>the cracks of the find command. Well, touching /etc/dfs/sharetab >> just fixes >>the issue. > > This is unrelated to ZFS boot issues, and sounds like this bug: > > 6542481 No sharetab after BFU from snv_55 > > It's fixed in build 62. hmm, that doesn't fit what I saw: - Upgraded from snv_61 to snv_62 - snv_62 booted with not problems (other than the t_optmgmt bug) - Then migrated to ZFS boot - Now the sharetab issues shows up. So why did the sharetab issue only show up after the ZFSification of the boot process? Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Preferred backup mechanism for ZFS?
Hi Wee I run a setup of SAM-FS for our main file server and we loved the backup/restore parts that you described. That is great to hear. The main concerns I have with SAM fronting the entire conversation is data integrity. Unlike ZFS, SAMFS does not do end to end checksumming. My initial reaction is that the world has got by without file systems that can do this for a long time...so I don't see the absence of this as a big deal. On the other hand, it hard to argue against a feature that improves data integrity, so I will not. Anyway, SAM-FS has been enhanced in this respect...in SAM-FS 4.6 you can enable the following... /If required, you can enable data verification for archive copies. This feature checks for data corruption on any data that is copied to secondary and/or tertiary media. The data verification process performs a read-after-write verification test, and records a confirmation of data validity in the metadata properties for that file. An ssum option is used to mark files and directories as needing to be verified. Child directories inherit the data verification properties of their parent. The normal checksum method is employed to verify copies written to tape or disk archive. Use the ssum -e command to set data verification for a file or directory. This forces the generation and use of checksums for archiving and staging, and prevents the release of the file until all archive copies have been created and their checksums verified. Only a superuser can set this attribute on a file or directory./ This is taken from the Sun StorageTek SAM Archive Configuration and Administration Guide Version 4, Update 6 (SAM-FS 4.6 was released April 6th). You can get all the SAM-FS 4.6 docs from here... http://www.sun.com/products-n-solutions/hardware/docs/Software/Storage_Software/Sun_SAM-FS_and_Sun_SAM-QFS_Software/index.html This checksum model is different than ZFS and is more like the way a backup product verifies its backups. We have considered the setup you proposed (samfs copy1 -> ZFS) but you will run into problem with fs-cache. Being only a copy, ZFS probably do not need much caching but will win the battle for memory due to the way its cache is managed. Unless there is a visible memory shortfall, ZFS will starve (sorry guys) samfs from memory it could use as cache. Also, ZFS's data integrity feature is limited by the use of 2nd hand data. I don't know enough about how ZFS manages memory other than what I have seen on this alias (I just joined a couple of weeks ago) which seems to indicate it is a memory hog...as is VxFS so we are in good company. I am not against keeping data in memory so long as it has also been written to somewhere non-volatile as well so that data is not lost if the lights go out... and applications don't fight for memory to run. I recall stories from years ago where VxFS hogged so much memory on a Sun Cluster node that the Cluster services stalled and the cluster failed over! I need to go read some white papers on this...but I assume that something like direct I/O (which UFS, VxFS and QFS all have) is in the plans for ZFS so we don't end up double buffering data for apps like databases ? - that is just ugly. Rgds Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bottlenecks in building a system
Adam Lindsay wrote: > In asking about ZFS performance in streaming IO situations, discussion > quite quickly turned to potential bottlenecks. By coincidence, I was > wondering about the same thing. > > Richard Elling said: > >> We know that channels, controllers, memory, network, and CPU bottlenecks >> can and will impact actual performance, at least for large configs. >> Modeling these bottlenecks is possible, but will require more work in >> the tool. If you know the hardware topology, you can do a >> back-of-the-napkin >> analysis, too. > > > Well, I'm normally a Mac guy, so speccing server hardware is a bit of > a revelation for me. I'm trying to come up with a ZFS storage server > for a networked multimedia research project which hopefully has enough > oomph to be a nice resource that outlasts the (2-year) project, but > without breaking the bank. > > Does anyone have a clue as to where the bottlenecks are going to be > with this: > > 16x hot swap SATAII hard drives (plus an internal boot drive) > Tyan S2895 (K8WE) motherboard > Dual GigE (integral nVidia ports) > 2x Areca 8-port PCIe (8-lane) RAID drivers > 2x AMD Opteron 275 CPUs (2.2GHz, dual core) > 8 GiB RAM > I'm putting together a similar specified machine (Quad-FX with 8GB RAM), but fewer drives. If there any specific tests you want me to run on it while it's still on my bench, drop me a line. Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[4]: [zfs-discuss] Preferred backup mechanism for ZFS?
Hello Wee, Friday, April 20, 2007, 5:20:00 AM, you wrote: WYT> On 4/20/07, Robert Milkowski <[EMAIL PROTECTED]> wrote: >> You can limit how much memory zfs can use for its caching. >> WYT> Indeed, but that memory will still be locked. How can you tell the WYT> system to be "flexible" with the caching? It shouldn't be locked but in reality it can. WYT> I deem that archiving will not present a cache challenge but we will WYT> want zfs to prefetch (or do whatever magic) when staging in files. We WYT> do not want to limit ZFS's cache but we want to tell the system to WYT> prefer SAMFS's cache to ZFS's. I don't know how SAM-FS works (I've never used it) but I'm surprised that you started paging via using swap - or perhaps you meant something else. If qfs uses standard page cache perhaps increasing segmap would also help. It's still static however. By limit ZFS arc you are not disabling prefetching or any other features. First I would be most interested what exactly happened when your server started to crawl 'coz you can't "swap out" page cache or arc cache... -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Permanently removing vdevs from a pool
Hello George, Friday, April 20, 2007, 7:37:52 AM, you wrote: GW> This is a high priority for us and is actively being worked. GW> Vague enough for you. :-) Sorry I can't give you anything more exact GW> that that. Can you at least give us feature list being developed? Some answers to questions like: 1. evacuating a vdev resulting in a smaller pool for all raid configs - ? 2. adding new vdev and rewriting all existing data to new larger stripe - ? 3. expanding stripe width for raid-z1 and raid-z2 - ? 4. live conversion between different raid kinds on the same disk set - ? 5. live data migration from one disk set to another - ? [if 1 works it should be simple - first force adding new disks, even if with different redundancy scheme then evacuate old disks. This also partly solves 5 but you need different disks.] 6. rewriting data in a dataset (not entire pool) after changing some parameters like compression, encryption, ditto blocks, ... so it will affect also already written data in a dataset. This should be both pool wise and data set wise - ? 7. de-fragmentation of a pool - ? 8. anything else ? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Preferred backup mechanism for ZFS?
Hello Anton, Friday, April 20, 2007, 9:02:12 AM, you wrote: >> Initially I wanted a way to do a dump to tape like ufsdump. I >> don't know if this makes sense anymore because the tape market is >> crashing slowly. ABR> It makes sense if you need to keep backups for more than a ABR> handful of years (think regulatory requirements or scientific ABR> data), or if cost is important. Storing tape is much cheaper than ABR> keeping disks running. (Storing disks isn't practical over long ABR> periods of time; not only does the signal on the media degrade, but so do some components.) >> People just don't backup 300MB per night anymore. We >> are looking at terabytes of data and I don't know how >> to backup a terabyte a night. ABR> If you're actually generating a terabyte per day of data, I'm impressed. :-) ABR> Tape seems a reasonable way to back that up, in any case. A ABR> T1 stores 500 GB on each tape and runs at 120 MB/sec, so a ABR> terabyte would take roughly 2.5 hours to backup with a single ABR> tape drive. LTO-4 is in the same ballpark. Of course, that ABR> assumes your disk system can keep up. ABR> The SAM-QFS approach of continuous archiving makes a lot of ABR> sense here since it effectively lets backups run continuously. I ABR> don't know how much Sun can say about the work going on to add SAM to ZFS. >> Or a really big question that I guess I have to ask, do we even care anymore? ABR> If we're serious about disaster recovery, we do. ABR> In particular, remote replication is NOT a substitute for backups. I can't entirely agree - it really depends. If you do remote replication and also provide snapshoting it will work extremely well. And your "restore" would be MUCH more efficient than from tape. Then if your primary array is down you just switch to secondary - depending on environment it could be all you need. With tapes not only you will have to wait for restore you also need a working array so you have a place to restore. Of course if you need to take your backup outside then that's different. I'm really disappointed that our try at adding zfs async replication hasn't worked out. We'll have to settle with 'while [ 1 ]; do snapshot; zfs send -i | zfs recv ; sleep 10s; done' ... -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Help me understand ZFS caching
I have a few questions regarding ZFS, and would appreciate if someone could enlighten me as I work my way through. First write cache. If I look at traditional UFS / VxFS type file systems, they normally cache metadata to RAM before flushing it to disk. This helps increase their perceived write performance (perceived in the sense that if a power outage occurs, data loss can occur). ZFS on the other hand, performs copy-on-write to ensure that the disk is always consistent, I see this as sort of being equivalent to using a directio option. I understand that the data is written first, then the points are updated, but if I were to use the directio analogy, would this be correct? If that is the case, then is it true that ZFS really does not use a write cache at all? And if it does, then how is it used? Read Cache. Any of us that have started using or benchmakring ZFS, have seen its voracious appetite for memory, an appetite that is fully shared with VxFS for example, as I am not singling out ZFS (I'm rather a fan). On reboot of my T2000 test server (32GB Ram) I see that the arc cache max size is set to 30.88GB - a sizeable piece of memory. Now, is all that cache space only for read cache? (given my assumption regarding write cache) Tuneable Parameters: I know that the philosophy of ZFS is that you should never have to tune your file system, but might I suggest, that tuning the FS is not always a bad thing. You can't expect a FS to be all things for all people. If there are variables that can be modified to provide different performance characteristics and profiles, then I would contend that it could strengthen ZFS and lead to wider adoption and acceptance if you could, for example, limit the amount of memory used by items like the cache without messing with c_max / c_min directly in the kernel. -Tony This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool status -v
On Apr 19, 2007, at 12:50 PM, Ricardo Correia wrote: eric kustarz wrote: Two reasons: 1) cluttered the output (as the path name is variable length). We could perhaps add another flag (-V or -vv or something) to display the ranges. 2) i wasn't convinced that output was useful, especially to most users/admins. If we did provide the range information, how would you actually use that information? or would providing the number of checksum errors per file be what you're really looking for? I agree that the current display is more appropriate as a default. But yes, I think adding a -vv flag to show the range output would be useful. It seems interesting from an observability standpoint, since I could easily tell how much damage did the file get. Simply telling the number of checksum errors per file would be useful too, but not as useful it was since each checksum error can be between 512 bytes and 128 KB. I agree it would be interesting (especially for us developers). What i'm curious is (and anyone can answer), what action would you take (or not take) based on this additional information? ps: could you send me the 'zpool status -v' output for curiosity's sake Sure :) thanks... eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[2]: [zfs-discuss] Re: storage type for ZFS
Has an analysis of most common storage system been done on how they treat SYNC_NV bit and if any additional tweaking is needed? Would such analysis be publicly available? I am not aware of any analysis and would love to see it done (i'm sure any vendors who are lurking on this list that support the SYNC_NV would surely want to speak up now). Due to not every vendor not supporting SYNC_NV, our solution is to first see if SYNC_NV is supported and if not, then provide a config file (as a short term necessity) that you can hardcore certain products to act as if they support SYNC_NV (which we would then not send a flushing of the cache). If the SYNC_NV bit is not supported and the config file is not updated for the device, then we do what we do today. But if anyone knows for certain if a particular device supports SYNC_NV, please post... eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Help me understand ZFS caching
Let me elaborate slightly on the reason I ask these questions. I am performing some simple benchmarking, and during this a file is created by sequentially writing 64k blocks until the 100Gb file is created. I am seeing, and this is the exact same as VxFS, large pauses while the system reclaims the memory that it has consumed. I assume that since ZFS (back to the write cache question) is copy-on-write and is not write caching anything (correct me if I am wrong), it is instead using memory for my read-cache. Also, since I have 32Gb of memory the reclaim periods are quite long while it frees this memory - basically rendering my volume unusable until that memory is reclaimed. With VxFS I was able to tune the file system with write_throttle, and this allowed me to find a balance basically whereby the system writes crazy fast, and then reclaims memory, and repeats that cycle. I guess I could modify c_max in the kernel, to provide the same type of result, but this is not a supported tuning practice - and thus I do not want to do that. I am simply trying to determine where ZFS is different, the same, and where how I can modify its default behaviours (or if I ever will). Also, FYI, I'm testing on Solaris 10 11/06 (All testing must be performed in production versions of Solaris) but if there are changes in Nevada that will show me different results, I would be interested in those as an aside. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Help me understand ZFS caching
ZFS uses caching heavily as well; much more so, in fact, than UFS. Copy-on-write and direct i/o are not related. As you say, data gets written first, then the metadata which points to it, but this isn't anything like direct I/O. In particular, direct I/O avoids caching the data, instead transferring it directly to/from user buffers, while ZFS-style copy-on-write caches all data. ZFS does not have direct I/O at all right now. One key difference between UFS & ZFS is that ZFS flushes the drive's write cache at key points. (It does this rather than using ordered commands, even on SCSI disks, which to me is a little disappointing.) This guarantees that the data is on-disk before the associated metadata. UFS relies on keeping the write cache disabled to ensure that its journal is written to disk before its metadata, again with the goal of keeping the file system consistent at all times. I agree with you on tuning. It's clearly desirable that the "out-of-box" settings for a file system work well for "general purpose" loads; but there are almost always applications which require a different strategy. This is much of why UFS/QFS/VxFS added direct i/o, and it's why VxFS (which focuses heavily on database) added quick i/o. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Help me understand ZFS caching
Anton & Roch, Thank you for helping me understand this. I didn't want to make too many assumptions that were unfounded and then incorrectly relay that information back to clients. So if I might just repeat your statements, so my slow mind is sure it understands, and Roch, yes your assumption is correct that I am referencing File System Cache, not disk cache. A. Copy-on-write exists solely to ensure on disk data integrity, and as Anton pointed out it is completely different than DirectIO. b. ZFS still avail's itself of a file system cache, and therefore, it is possible that data can be lost if it hasn't been written to disk and the server fails. c. The write throttling issue is known, and being looked at - when it is fixed we don't know? I'll add myself to the notification list as an interested party :) Now to another question related to Anton's post. You mention that directIO does not exist in ZFS at this point. Are their plan's to support DirectIO; any functionality that will simulate directIO or some other non-caching ability suitable for critical systems such as databases if the client still wanted to deploy on filesystems. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Preferred backup mechanism for ZFS?
To clarify, there are at least two issues with remote replication vs. backups in my mind. (Feel free to joke about the state of my mind! ;-) The first, which as you point out can be alleviated with snapshots, is the ability to "go back" in time. If an accident wipes out a file, the missing file will shortly be deleted on the remote end. Snapshots help you here ... as long as you can keep sufficient space online. If your turnover is 1 TB/day and you require the ability to go back to the end of any week in the past year, that's 52 TB. The second is protection against file system failures. If a bug in file system code, or damage to the metadata structures on disk, results in the master being unreadable, then it could easily be replicated to the remote system. (Consider a bug which manifests itself only when 10^9 files have been created; both file systems will shortly fail.) Keeping backups in a file system independent manner (e.g. tar format, netbackup format, etc.) protects against this. If you're not concerned about the latter, and you can afford to keep all of your backups on rotating rust (and have sufficient CPU & I/O bandwidth at the remote site to scrub those backups), and have sufficient bandwidth to actually move data between sites (for 1 TB/day, assuming continuous modification, that's 11 MB/second if data is never rewritten during the day, or potentially much more in a real environment) then remote replication could work. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: [nfs-discuss] NFSd and dtrace
On Apr 18, 2007, at 9:33 PM, Robert Milkowski wrote: Hello Robert, Thursday, April 19, 2007, 1:57:38 AM, you wrote: RM> Hello nfs-discuss, RM> Does anyone have a dtrace script (or any other means) to track which RM> files are open/read/write (ops and bytes) by nfsd? To make things RM> little bit harder lets assume that local storage in on zfs, nfsd RM> server using nfsv3 and system is S10U3. RM> The script would distinguish between cache read and disk read. RM> So something like: RM> ./nfsd_file.d RM> CLIENT_IP OPERATION BYTES TYPE FILE RM> X.X.X.X READ3 32768 logical/nfs/d1000/fileA RM> ... RM> and something like: RM> ./nfsd_file_summ.d 100s RM> CLIENT_IP OPERATION OPSBYTES TYPE FILE RM> X.X.X.X READ3 230 5MBlogical/nfs/d1000/ fileA RM> X.X.X.X READ3 15 1MBphysical /nfs/d1000/ fileA RM> ... RM> RM> RM> Looks like vopstat and rfileio from DTrace toolkit is what I'm looking for (with some modifications) very cool, would you mind posting your dscript when you get it working? eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: Preferred backup mechanism for ZFS?
Hello Anton, Friday, April 20, 2007, 3:54:52 PM, you wrote: ABR> To clarify, there are at least two issues with remote ABR> replication vs. backups in my mind. (Feel free to joke about the state of my mind! ;-) ABR> The first, which as you point out can be alleviated with ABR> snapshots, is the ability to "go back" in time. If an accident ABR> wipes out a file, the missing file will shortly be deleted on the ABR> remote end. Snapshots help you here ... as long as you can keep ABR> sufficient space online. If your turnover is 1 TB/day and you ABR> require the ability to go back to the end of any week in the past year, that's 52 TB. Really depends. With ZFS snapshots in order to consume 1TB by snapshot you would have deleted 1TB of files or make 1TB modification to files (or both with 1TB in SUM). There certainly are such workload. But if you just put new data (append to files, or write new files) then snapshots practically won't consume any storage. In that case it works perfectly. ABR> The second is protection against file system failures. If a bug ABR> in file system code, or damage to the metadata structures on ABR> disk, results in the master being unreadable, then it could ABR> easily be replicated to the remote system. (Consider a bug which ABR> manifests itself only when 10^9 files have been created; both ABR> file systems will shortly fail.) Keeping backups in a file system ABR> independent ABR> ABR> manner (e.g. tar format, netbackup format, etc.) protects against this. Lets say I agree. :) ABR> If you're not concerned about the latter, and you can afford to ABR> keep all of your backups on rotating rust (and have sufficient ABR> CPU & I/O bandwidth at the remote site to scrub those backups), ABR> and have sufficient bandwidth to actually move data between sites ABR> (for 1 TB/day, assuming continuous modification, that's 11 ABR> MB/second if data is never rewritten during the day, or ABR> potentially much more in a real environment) then remote replication could work. You need exactly the same bandwidth as with any other classical backup solution - it doesn't matter how at the end you need to copy all those data (differential) out of the box regardless if it's a tape or a disk. However instead of doing backup during the night, which you want to do so there will be limited impact on production performance, with replication you can do it continuously 24x7. The actual performance impact will be minimal as you should get most data from memory without touching much of disks on sending side. That also means you actually need much less throughput available to remote side. Also with frequent enough snapshoting you have your backup basically every 30 minutes or every one hour. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[4]: [zfs-discuss] Re: storage type for ZFS
Hello eric, Friday, April 20, 2007, 3:36:20 PM, you wrote: >> >> Has an analysis of most common storage system been done on how they >> treat SYNC_NV bit and if any additional tweaking is needed? Would such >> analysis be publicly available? >> ek> I am not aware of any analysis and would love to see it done (i'm ek> sure any vendors who are lurking on this list that support the ek> SYNC_NV would surely want to speak up now). ek> Due to not every vendor not supporting SYNC_NV, our solution is to ek> first see if SYNC_NV is supported and if not, then provide a config ek> file (as a short term necessity) that you can hardcore certain ek> products to act as if they support SYNC_NV (which we would then not ek> send a flushing of the cache). If the SYNC_NV bit is not supported ek> and the config file is not updated for the device, then we do what we ek> do today. ek> But if anyone knows for certain if a particular device supports ek> SYNC_NV, please post... Why config file and not a property for a pool? A pool can have disks from different arrays :) Useful thing would be to ba able to keep that config file in a pool so if one exports/imports to different server... you get the idea. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help me understand ZFS caching
Tony Galway writes: > I have a few questions regarding ZFS, and would appreciate if someone > could enlighten me as I work my way through. > > First write cache. > We often use write cache to designate the cache present at the disk level. Lets call this "disk write cache". Most FS will cache information in host memory. Let's call this "FS cache". I think your questions are more about FS cache behavior for differnet types of loads > If I look at traditional UFS / VxFS type file systems, they normally > cache metadata to RAM before flushing it to disk. This helps increase > their perceived write performance (perceived in the sense that if a > power outage occurs, data loss can occur). > Correct and application can influence the behavior this with O_DSYNC,Fsync... > ZFS on the other hand, performs copy-on-write to ensure that the disk > is always consistent, I see this as sort of being equivalent to using > a directio option. I understand that the data is written first, then > the points are updated, but if I were to use the directio analogy, > would this be correct? As pointed out by Anton. That's a no here. The COW ensures that ZFS is always consistent but it's not really related to application consistency (that's the job of O_DSYNC,fsync)... So ZFS caches data on writes like most FS. > > If that is the case, then is it true that ZFS really does not use a > write cache at all? And if it does, then how is it used? > you write to cache and every 5 seconds, all the dirty data if shipped to disk in a transaction group. On low memory we also will not wait for the 5 second clock to hit and issue a txg. The problem you and many face, is lack of write throttling. This is being worked on and should be fix I hope soon. The perception that ZFS is Ram hungry will have to be reevaluated at that time. See: 6429205 each zpool needs to monitor it's throughput and throttle heavy writers > Read Cache. > > Any of us that have started using or benchmakring ZFS, have seen its > voracious appetite for memory, an appetite that is fully shared with > VxFS for example, as I am not singling out ZFS (I'm rather a fan). On > reboot of my T2000 test server (32GB Ram) I see that the arc cache max > size is set to 30.88GB - a sizeable piece of memory. > > Now, is all that cache space only for read cache? (given my assumption > regarding write cache) > > Tuneable Parameters: > I know that the philosophy of ZFS is that you should never have to > tune your file system, but might I suggest, that tuning the FS is not > always a bad thing. You can't expect a FS to be all things for all > people. If there are variables that can be modified to provide > different performance characteristics and profiles, then I would > contend that it could strengthen ZFS and lead to wider adoption and > acceptance if you could, for example, limit the amount of memory used > by items like the cache without messing with c_max / c_min directly in > the kernel. > Once we have write throttling, we will be better equipped to see if the ARC dynamical adjustments works or not. I believe most problems will go away and there will be less demand for such a tunable... On to your next mail... > -Tony > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bottlenecks in building a system
Richard Elling wrote: Does anyone have a clue as to where the bottlenecks are going to be with this: 16x hot swap SATAII hard drives (plus an internal boot drive) Be sure to check the actual bandwidth of the drives when installed in the final location. We have been doing some studies on the impact of vibration on performance and reliability. If your enclosure does not dampen vibrations, then you should see reduced performance, and it will be obvious for streaming workloads. There was a thread about this a year or so ago regarding thumpers, but since then we've seen it in a number of other systems, too. There have also been industry papers on this topic. Okay, we have a number of the chassis installed here from the same source, but none seem to share the high-throughput workflow, so that's one thing to quiz the integrator on. Tyan S2895 (K8WE) motherboard Dual GigE (integral nVidia ports) All I can add to the existing NIC comments in this thread is that Neptune kicks ass. The GbE version is: http://www.sun.com/products/networking/ethernet/sunx8quadgigethernet/index.xml ... but know that I don't set pricing :-0 Oh, man, I didn't need to know about that NIC. Actually, it's something to shoot for. 2x Areca 8-port PCIe (8-lane) RAID drivers I think this is overkill. I'm getting convinced of that. With the additional comments in this thread, I'm now seriously considering replacing these PCIe cards with Supermicro's PCI-X cards, and switching over to a different Tyan board... - 2x SuperMicro AOC-SAT2-MV8 PCI-X SATA2 interfaces - Tyan S2892 (K8SE) motherboard, so that ditches nvidia for: - Dual GigE (integral Broadcom ports) 2x AMD Opteron 275 CPUs (2.2GHz, dual core) This should be a good choice. For high networking loads, you can burn a lot of cycles handling the NICs. For example, using Opterons to drive the dual 10GbE version of Neptune can pretty much consume a significant number of cores. I don't think your workload will come close to this, however. No, but it's something to shoot for. :) 8 GiB RAM I recommend ECC memory, not the cheap stuff... but I'm a RAS guy. So noted. Pretty much any SAS/SATA controller will work ok. You'll be media speed bound, not I/O channel bound. Okay, that message is coming through. RAM as a cache presumes two things: prefetching and data re-use. Most likely, you won't have re-use and prefetching only makes sense when the disk subsystem is approximately the same speed as the network. Personally, I'd start at 2-4 GBytes and expand as needed (this is easily measured) I'll start with 4GBytes, because I like to deploy services in containers, and so will need some elbow room. Many thanks to all in this thread: my spec has certainly evolved, and I hope the machine has gotten cheaper in the process, with little sacrifice in theoretical performance. adam ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive question
Hi, Krzys wrote: > Ok, so -F option is not in U3, is there any way to replicate file system > and not be able to mount it automatically? so when I do zfs send/receive > it wont be mounted and changes would not be made so that further > replications could be possible? What I did notice was that if I am doing > zfs send/receive right one after another I am able to replicate all my > snaps, but when I wait a day or even few hours I get notice that file > system got changed, and that is because it was mounted and I guess > because of that I am not able to perform any more snaps to be send... > any idea what I could do meanwhile I am waiting for -F? this should work: zfs unmount pool/filesystem zfs rollback (latest snapshot) zfs send ... | zfs receive zfs mount pool/filesystem Better yet: Assuming you don't actually want to use the filesystem you replicate to, but just use it as a sink for backup purposes, you can mark it unmountable, then just send stuff to it. zfs set canmount=off pool/filesystem zfs rollback (latest snapshot, one last time) Then, whenever you want to access the receiving filesystem, clone it. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Help me understand ZFS caching
On Apr 20, 2007, at 10:47 AM, Anton B. Rang wrote: ZFS uses caching heavily as well; much more so, in fact, than UFS. Copy-on-write and direct i/o are not related. As you say, data gets written first, then the metadata which points to it, but this isn't anything like direct I/O. In particular, direct I/O avoids caching the data, instead transferring it directly to/from user buffers, while ZFS-style copy-on-write caches all data. ZFS does not have direct I/O at all right now. You're context is correct, but i'd be careful with "direct I/O", as i think its an overloaded term that most people don't understand what it does - just that it got them good performance (somehow). Roch has a blog on this: http://blogs.sun.com/roch/entry/zfs_and_directio But you are correct that ZFS does not have the ability for the user to say "don't cache user data for this filesystem" (which is one part of direct I/O). I've talked to some database people and they aren't convinced having this feature would be a win. So if someone has a real world workload where having the ability to purposely not cache user data would be a win, please let me know. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Bottlenecks in building a system
Anton B. Rang wrote: If you're using this for multimedia, do some serious testing first. ZFS tends to have "bursty" write behaviour, and the worst-case latency can be measured in seconds. This has been improved a bit in recent builds but it still seems to "stall" periodically. I had wondered about that, after reading some old threads. For the high-performance stuff, the machine is mostly to be marked as experimental and will spend most of its time being "tested". I'm watching Tony Galway's current thread most closely, as well. adam ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re[2]: [nfs-discuss] NFSd and dtrace
Hello eric, Friday, April 20, 2007, 4:01:46 PM, you wrote: ek> On Apr 18, 2007, at 9:33 PM, Robert Milkowski wrote: >> Hello Robert, >> >> Thursday, April 19, 2007, 1:57:38 AM, you wrote: >> >> RM> Hello nfs-discuss, >> >> RM> Does anyone have a dtrace script (or any other means) to >> track which >> RM> files are open/read/write (ops and bytes) by nfsd? To make >> things >> RM> little bit harder lets assume that local storage in on zfs, nfsd >> RM> server using nfsv3 and system is S10U3. >> >> RM> The script would distinguish between cache read and disk read. >> >> RM> So something like: >> >> RM> ./nfsd_file.d >> >> RM> CLIENT_IP OPERATION BYTES TYPE FILE >> RM> X.X.X.X READ3 32768 logical/nfs/d1000/fileA >> >> RM> ... >> >> >> >> RM> and something like: >> >> RM> ./nfsd_file_summ.d 100s >> >> RM> CLIENT_IP OPERATION OPSBYTES TYPE FILE >> RM> X.X.X.X READ3 230 5MBlogical/nfs/d1000/ >> fileA >> RM> X.X.X.X READ3 15 1MBphysical /nfs/d1000/ >> fileA >> >> RM> ... >> >> RM> >> RM> >> RM> >> >> >> Looks like vopstat and rfileio from DTrace toolkit is what I'm looking >> for (with some modifications) >> ek> very cool, would you mind posting your dscript when you get it working? ek> eric Those scripts are from DTraceToolkit! I've just make some simple modifications like parameterized frequency, total summary, ... As I see Brendan hooks into proper VOP operations. The question however is why if I want to use dtrace io provider with zfs + nfsd I don't get file names from args[2].fi_pathname? Perhaps fsinfo::: could help but it's not on current s10 - I hope it will be in U4 as it looks that it works with zfs (without manually looking into vnodes, etc.): bash-3.00# dtrace -n fsinfo::fop_read:read'{trace(args[0]->fi_pathname);trace(arg1);}'|grep -v unknown dtrace: description 'fsinfo::fop_read:read' matched 1 probe CPU IDFUNCTION:NAME 0 65495fop_read:read /usr/bin/cat 8 0 65495fop_read:read /usr/bin/cat 52 0 65495fop_read:read /usr/bin/cat 224 0 65495fop_read:read /usr/bin/cat 17 0 65495fop_read:read /lib/ld.so.1 52 0 65495fop_read:read /lib/ld.so.1 160 0 65495fop_read:read /home/milek/hs_err_pid23665.log 8192 0 65495fop_read:read /home/milek/hs_err_pid23665.log 3777 0 65495fop_read:read /home/milek/hs_err_pid23665.log 0 ^Cbash-3.00# /home is on zfs. Looks like fsinfo::: should work properly with nfsd + zfs! -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bottlenecks in building a system
Hi, hope you don't mind if I make some portions of your email public in a reply--I hadn't seen it come through on the list at all, so it's no duplicate to me. Johansen wrote: > Adam: > > Sorry if this is a duplicate, I had issues sending e-mail this morning. > > Based upon your CPU choices, I think you shouldn't have a problem > saturating a GigE link with a pair of Operton 275's. Just as a point of > comparison, Sun sells a server with 48 SATA disks and 4 GigE ports: > > http://www.sun.com/servers/x64/x4500/specs.xml > > You have fewer disks, and nearly as much CPU power as the x4500. I > think you have plenty of CPU in your system. > > Your RAID controllers have as many SATA ports as the SATA cards in the > x4500, and you seem to have the same ratio of disks to controllers. I'm well aware of the Thumper, and it's fair to say it was an inspiration, just without two-thirds of the capacity or any of the serious redundancy. I also used the X4500 as a guide for > I suspect that if you have a bottleneck in your system, it would be due > to the available bandwidth on the PCI bus. Mm. yeah, it's what I was worried about, too (mostly through ignorance of the issues), which is why I was hoping HyperTransport and PCIe were going to give that data enough room on the bus. But after others expressed the opinion that the Areca PCIe cards were overkill, I'm now looking to putting some PCI-X cards on a different (probably slower) motherboard. > Caching isn't going to be a huge help for writes, unless there's another > thread reading simultaneoulsy from the same file. > > Prefetch will definitely use the additional RAM to try to boost the > performance of sequential reads. However, in the interest of full > disclosure, there is a pathology that we've seen where the number of > sequential readers exceeds the available space in the cache. In this > situation, sometimes the competeing prefetches for the different streams > will cause more temporally favorable data to be evicted from the cache > and performance will drop. The workaround right now is just to disable > prefetch. We're looking into more comprehensive solutions. Interesting. So noted. I will expect to have to test thoroughly. >> I understand I'm not going to get terribly far in thought experiment >> mode, but I want to be able to spec a box that balances cheap with >> utility over time. > > If that's the case, I'm sure you could get by just fine with the pair of > 275's. Thanks, adam ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Help me understand ZFS caching
Tony Galway writes: > Anton & Roch, > > Thank you for helping me understand this. I didn't want to make too many assumptions that were unfounded and then incorrectly relay that information back to clients. > > So if I might just repeat your statements, so my slow mind is sure it > understands, and Roch, yes your assumption is correct that I am referencing > File System Cache, not disk cache. > > A. Copy-on-write exists solely to ensure on disk data integrity, and as Anton pointed out it is completely different than DirectIO. I would say 'ensure pool integrity' but you get the idea. > > b. ZFS still avail's itself of a file system cache, and therefore, it is possible that data can be lost if it hasn't been written to disk and the server fails. Yep. > > c. The write throttling issue is known, and being looked at - when it is fixed we don't know? I'll add myself to the notification list as an interested party :) Yep. > > Now to another question related to Anton's post. You mention that directIO > does not exist in ZFS at this point. Are their plan's to support DirectIO; > any functionality that will simulate directIO or some other non-caching > ability suitable for critical systems such as databases if the client still > wanted to deploy on filesystems. > here Anton and I disagree on this. I believe that ZFS design would not gain much performance from something we'd call directio. See: http://blogs.sun.com/roch/entry/zfs_and_directio -r > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] problem mounting one of the zfs file system during boot
hello everyone, I have strange issue and I am not sure why is this happening. syncing file systems... done rebooting... SC Alert: Host System has Reset Probing system devices Probing memory Probing I/O buses Sun Fire V240, No Keyboard Copyright 2006 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.22.19, 8192 MB memory installed, Serial #65031515. Ethernet address 0:3:ba:e0:4d:5b, Host ID: 83e04d5b. Rebooting with command: boot Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a File and args: SunOS Release 5.10 Version Generic_125100-05 64-bit Copyright 1983-2006 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hardware watchdog enabled Hostname: chrysek /kernel/drv/sparcv9/zpool symbol avl_add multiply defined /kernel/drv/sparcv9/zpool symbol assfail3 multiply defined WARNING: kstat_create('unix', 0, 'dmu_buf_impl_t'): namespace collision mypool2/d3 uncorrectable error checking ufs filesystems /dev/rdsk/c1t0d0s7: is logging. When my system is booting it does complain about mypool2/d3 mypool2/d3 uncorrectable error but when system boots and I do [11:31:01] [EMAIL PROTECTED]: /root > mount /d/d3 [11:31:06] [EMAIL PROTECTED]: /root > df -k /d/d3 Filesystem 1k-blocks Used Available Use% Mounted on mypool2/d3 648755898 179354764 469401134 28% /d/d3 not a problem no errors, not compain, so manual mount works just fine while zfs boot mount of it does not work. this is the entry in vfstab that I have: mypool2/d3 mypool2/d3 /d/d3 zfs 2 yes logging is there anything wrong that I do with it? As I said manual mount works just fine but during boot it complains about mounting it [11:35:08] [EMAIL PROTECTED]: /root > zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 272G 2.12G 24.5K /mypool mypool/d 271G 2.12G 143G /d/d2 mypool/[EMAIL PROTECTED] 3.72G - 123G - mypool/[EMAIL PROTECTED] 22.3G - 156G - mypool/[EMAIL PROTECTED] 23.3G - 161G - mypool/[EMAIL PROTECTED] 16.1G - 172G - mypool/[EMAIL PROTECTED] 13.8G - 168G - mypool/[EMAIL PROTECTED] 15.7G - 168G - mypool2489G 448G52K /mypool2 mypool2/d3 171G 448G 171G legacy Regards, Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive question
It does not work, I did try to remove every snap and I ended destroying that pool all together and had to resend it all.. My goal is to use zfs send/receive for backup purposes to big storage system that I have, and keep snaps, I dont care if file system is mounted or not but I want to have ability every month to be able to send changes to it and update with current incremental snaps... but because of the main disk is beeing mounted, even that I dont go there it changes something on it like access times or something and that prevents me from sending incremental zfs snaps... :( So that -F option will work but its few months away from what I understand... and I would like to just do zfs send/receive now and keep updating it monthly or even daily. Regards, Chris On Fri, 20 Apr 2007, Constantin Gonzalez wrote: Hi, Krzys wrote: Ok, so -F option is not in U3, is there any way to replicate file system and not be able to mount it automatically? so when I do zfs send/receive it wont be mounted and changes would not be made so that further replications could be possible? What I did notice was that if I am doing zfs send/receive right one after another I am able to replicate all my snaps, but when I wait a day or even few hours I get notice that file system got changed, and that is because it was mounted and I guess because of that I am not able to perform any more snaps to be send... any idea what I could do meanwhile I am waiting for -F? this should work: zfs unmount pool/filesystem zfs rollback (latest snapshot) zfs send ... | zfs receive zfs mount pool/filesystem Better yet: Assuming you don't actually want to use the filesystem you replicate to, but just use it as a sink for backup purposes, you can mark it unmountable, then just send stuff to it. zfs set canmount=off pool/filesystem zfs rollback (latest snapshot, one last time) Then, whenever you want to access the receiving filesystem, clone it. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering !DSPAM:122,4628d32121915021468! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Help me understand ZFS caching
Tony Galway writes: > Let me elaborate slightly on the reason I ask these questions. > > I am performing some simple benchmarking, and during this a file is > created by sequentially writing 64k blocks until the 100Gb file is > created. I am seeing, and this is the exact same as VxFS, large pauses > while the system reclaims the memory that it has consumed. > > I assume that since ZFS (back to the write cache question) is > copy-on-write and is not write caching anything (correct me if I am > wrong), it is instead using memory for my read-cache. Also, since I > have 32Gb of memory the reclaim periods are quite long while it frees > this memory - basically rendering my volume unusable until that memory > is reclaimed. > > With VxFS I was able to tune the file system with write_throttle, and > this allowed me to find a balance basically whereby the system writes > crazy fast, and then reclaims memory, and repeats that cycle. > > I guess I could modify c_max in the kernel, to provide the same type > of result, but this is not a supported tuning practice - and thus I do > not want to do that. > > I am simply trying to determine where ZFS is different, the same, and > where how I can modify its default behaviours (or if I ever will). > > Also, FYI, I'm testing on Solaris 10 11/06 (All testing must be > performed in production versions of Solaris) but if there are changes > in Nevada that will show me different results, I would be interested > in those as an aside. > Today, a txg sync can take a very long time for this type of workload. A first goal of write throttling will be to at least bound the sync times. The amount of dirty memory (not quickly reclaimable) will then be limited and ARC should be much better at adjusting itself. A second goal will be to keep sync times close to 5 seconds further limiting the RAM consumption. > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive question
Ok, so -F option is not in U3, is there any way to replicate file system and not be able to mount it automatically? so when I do zfs send/receive it wont be mounted and changes would not be made so that further replications could be possible? What I did notice was that if I am doing zfs send/receive right one after another I am able to replicate all my snaps, but when I wait a day or even few hours I get notice that file system got changed, and that is because it was mounted and I guess because of that I am not able to perform any more snaps to be send... any idea what I could do meanwhile I am waiting for -F? Thank you. Chris On Tue, 17 Apr 2007, Nicholas Lee wrote: On 4/17/07, Krzys <[EMAIL PROTECTED]> wrote: and when I did try to run that last command I got the following error: [16:26:00] [EMAIL PROTECTED]: /root > zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | zfs receive mypool2/[EMAIL PROTECTED] cannot receive: destination has been modified since most recent snapshot is there any way to do such replication by zfs send/receive and avoind such error message? Is there any way to force file system not to be mounted? Is there any way to make it maybe read only parition and then when its needed maybe make it live or whaverer? Check the -F option to zfs receive. This automatically rolls back the target. Nicholas !DSPAM:122,4623f42c1444623226276! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re[2]: [nfs-discuss] NFSd and dtrace
Hello Robert, Friday, April 20, 2007, 4:54:33 PM, you wrote: RM> Perhaps fsinfo::: could help but it's not on current s10 - I hope it RM> will be in U4 as it looks that it works with zfs (without manually RM> looking into vnodes, etc.): Well, it's already in s10! (122641) I missed that... :) -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Experience with Promise Tech. arrays/jbod's?
Thanks to all for the helpful comments and questions. [EMAIL PROTECTED] said: > Isn't MPXIO support by HBA and hard drive identification (not by the > enclosure)? At least I don't see how the enclosure should matter, as long as > it has 2 active paths. So if you add the drive vendor info into /kernel/drv/ > scsi_vhci.conf it should work. If the enclosure is JBOD, then yes, the drives would be the targets of MPXIO. But for a RAID enclosure, it's the RAID controller which speaks SCSI, adds and removes LUN's, etc. The three different arrays I've used have all had settings where you specify what kind of alternate-path "protocol" to speak to the various hosts involved. [EMAIL PROTECTED] said: > In a so called symmetric mode it should work as you described. But many entry > level and midsize arrays aren't actually symmetric and they have to be > treated specifically. This matches my limited experience. What Sun calls "asymmetric" seems to match what some array vendors call "active/active with LUN affinity" (or "LUN ownership"). MPXIO "knows" about such asymmetric arrays, but some arrays don't speak the right protocol (T10 ALUA), and there's so far no way to manually tell MPXIO to do the asymmetric thing with them. For example, our low-end HDS array looks to MPXIO as if it's symmetric, since both controllers show their configured LUN's all the time. But only one controller can do I/O to a given LUN at one time, and the array takes a long while to swap ownership between controllers, so MPXIO's default round-robin load balancing yields terrible performance. The workaround is to manually set load-balancing to "none" and hope MPXIO uses the controller that you were wanting to be primary. And some people wonder why I prefer NAS over SAN...(:-). Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Preferred backup mechanism for ZFS?
Tim Thomas wrote: I don't know enough about how ZFS manages memory other than what I have seen on this alias (I just joined a couple of weeks ago) which seems to indicate it is a memory hog...as is VxFS so we are in good company. I am not against keeping data in memory so long as it has also been written to somewhere non-volatile as well so that data is not lost if the lights go out... and applications don't fight for memory to run. I recall stories from years ago where VxFS hogged so much memory on a Sun Cluster node that the Cluster services stalled and the cluster failed over! Even after many years, I can still get mileage from this one :-) http://www.sun.com/blueprints/0400/ram-vxfs.pdf ZFS behaves differently, however, so the symptoms and prescriptions are slightly different. I need to go read some white papers on this...but I assume that something like direct I/O (which UFS, VxFS and QFS all have) is in the plans for ZFS so we don't end up double buffering data for apps like databases ? - that is just ugly. Before you get very far down this path, it gets regularly rehashed here, so Roch Bourbonnaise and Bob Sneed wrote some good blogs on the topic. Especially: http://blogs.sun.com/bobs/entry/one_i_o_two_i http://blogs.sun.com/roch/entry/zfs_and_directio -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Permanently removing vdevs from a pool
On 4/20/07, George Wilson <[EMAIL PROTECTED]> wrote: This is a high priority for us and is actively being worked. Vague enough for you. :-) Sorry I can't give you anything more exact that that. Hi George, If ZFS is supposed to be part of "open"solaris, then why can't the community get additional details? If really seems like much of the development and design of ZFS goes on behind closed doors, and the community as a whole is involved after the fact (Eric Shrock has requested feedback from list members, which is awesome!). This makes it difficult for folks to contribute, and to offer suggestions (or code) that would better ZFS as a whole. Is there a reason that more of the ZFS development discussions aren't occuring in public? Thanks, - Ryan -- UNIX Administrator http://prefetch.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: zfs boot image conversion kit is posted
> Now the original question by MC I belive was about providing VMware and/or Xen image with guest OS being snv_62 with / as zfs. This is true. I'm not sure what Jim meant about the host system needing to support zfs. Maybe you're on a different page, Jim :) > I will setup a VM image that can be downloaded (I hope to get it done tomorrow, but if not definitely by early next week) and played with by anyone who is interested. That would be golden, Brian. Let me know if you can't get suitable hosting for it! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Help me understand ZFS caching
> So if someone has a real world workload where having the ability to purposely > not cache user > data would be a win, please let me know. Multimedia streaming is an obvious one. For databases, it depends on the application, but in general the database will do a better job of selecting which data to keep in memory than the file system can. (Of course, some low-end databases rely on the file system for this.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
We are having a really tough time accepting the performance with ZFS and NFS interaction. I have tried so many different ways trying to make it work (even zfs set:zil_disable 1) and I'm still no where near the performance of using a standard NFS mounted UFS filesystem - insanely slow; especially on file rewrites. We have been combing the message boards and it looks like there was a lot of talk about this interaction of zfs+nfs back in november and before but since i have not seen much. It seems the only fix up to that date was to disable zil, is that still the case? Did anyone ever get closure on this? We are running solaris 10 (SPARC) .latest patched 11/06 release connecting directly via FC to a 6120 with 2 raid 5 volumes over a bge interface (gigabit). tried raidz, mirror and stripe with no negligible difference in speed. the clients connecting to this machine are HP-UX 11i and OS X 10.4.9 and they both have corresponding performance characteristics. Any insight would be appreciated - we really like zfs compared to any filesystem we have EVER worked on and dont want to revert if at all possible! TIA, Andy Lubel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: zfs boot image conversion kit is posted
On Fri, Apr 20, 2007 at 12:25:30PM -0700, MC wrote: > > > I will setup a VM image that can be downloaded (I hope to get it done > tomorrow, but if not definitely by early next week) and played with > by anyone who is interested. > > That would be golden, Brian. Let me know if you can't get suitable hosting > for it! I have somewhere I can put it, thanks for the offer though. :) I'm not going to get it done today. What I will do, however, is upload the tarball of the patches b62 dvd image (once it's done compressing) for anyone who wants to snag it. I'll probably turn it back into an ISO come monday as well. Probably a little later today. I'll let you all know when it's up. -brian -- "Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it's just that most of the shit out there is built by people who'd be better suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Bandwidth requirements (was Re: Preferred backup mechanism for ZFS?)
> You need exactly the same bandwidth as with any other > classical backup solution - it doesn't matter how at the end you need > to copy all those data (differential) out of the box regardless if it's > a tape or a disk. Sure. However, it's somewhat cheaper to buy 100 MB/sec of local-attached tape than 100 MB/sec of long-distance networking. (The pedant in me points out that you also need to move the tape to the remote site, which isn't entirely free) Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Preferred backup mechanism for ZFS?
On April 20, 2007 9:54:07 AM +0100 Tim Thomas <[EMAIL PROTECTED]> wrote: My initial reaction is that the world has got by without file systems that can do [end-to-end data integrity] for a long time...so I don't see the absence of this as a big deal. How about My initial reaction is that the world has got by without [email|cellphone| other technology] for a long time ... so not a big deal. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bottlenecks in building a system
Adam: > Hi, hope you don't mind if I make some portions of your email public in > a reply--I hadn't seen it come through on the list at all, so it's no > duplicate to me. I don't mind at all. I had hoped to avoid sending the list a duplicate e-mail, although it looks like my first post never made it here. > > I suspect that if you have a bottleneck in your system, it would be due > > to the available bandwidth on the PCI bus. > > Mm. yeah, it's what I was worried about, too (mostly through ignorance > of the issues), which is why I was hoping HyperTransport and PCIe were > going to give that data enough room on the bus. > But after others expressed the opinion that the Areca PCIe cards were > overkill, I'm now looking to putting some PCI-X cards on a different > (probably slower) motherboard. I dug up a copy of the S2895 block diagram and asked Bill Moore about it. He said that you should be able to get about 700mb/s off of each of the PCI-X channels and that you only need 100mb/s to saturate a GigE link. He also observed that the RAID card you were using was unnecessary and would probably hamper performance. He reccomended non-RAID SATA cards based upon the Marvell chipset. Here's the e-mail trail on this list where he discusses Marvell SATA cards in a bit more detail: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html It sounds like if getting disk -> network is the concern, you'll have plenty of bandwidth, assuming you have a reasonable controller card. > > Caching isn't going to be a huge help for writes, unless there's another > > thread reading simultaneoulsy from the same file. > > > > Prefetch will definitely use the additional RAM to try to boost the > > performance of sequential reads. However, in the interest of full > > disclosure, there is a pathology that we've seen where the number of > > sequential readers exceeds the available space in the cache. In this > > situation, sometimes the competeing prefetches for the different streams > > will cause more temporally favorable data to be evicted from the cache > > and performance will drop. The workaround right now is just to disable > > prefetch. We're looking into more comprehensive solutions. > > Interesting. So noted. I will expect to have to test thoroughly. If you run across this problem and are willing to let me debug on your system, shoot me an e-mail. We've only seen this in a couple of situations and it was combined with another problem where we were seeing excessive overhead for kcopyout. It's unlikely, but possible that you'll hit this. -K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Help me understand ZFS caching
Tony: > Now to another question related to Anton's post. You mention that > directIO does not exist in ZFS at this point. Are their plan's to > support DirectIO; any functionality that will simulate directIO or > some other non-caching ability suitable for critical systems such as > databases if the client still wanted to deploy on filesystems. I would describe DirectIO as the ability to map the application's buffers directly for disk DMAs. You need to disable the filesystem's cache to do this correctly. Having the cache disabled is an implementation requirement for this feature. Based upon this definition, are you seeking the ability to disable the filesystem's cache or the ability to directly map application buffers for DMA? -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Permanently removing vdevs from a pool
Knowing that this is a planned feature and the ZFS team is actively working on it answers my question more than expected. Thanks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: zfs boot image conversion kit is posted
Good deal. We'll have a race to build a a vm image, then :) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
When you say rewrites, can you give more detail? For example, are you rewriting in 8K chunks, random sizes, etc? The reason I ask is because ZFS will, by default, use 128K blocks for large files. If you then rewrite a small chunk at a time, ZFS is forced to read 128K, modify the small chunk you're changing, and then write 128K. Obviously, this has adverse effects on performance. :) If your typical workload has a preferred block size that it uses, you might try setting the recordsize property in ZFS to match - that should help. If you're completely rewriting the file, then I can't imagine why it would be slow. The only thing I can think of is the forced sync that NFS does on a file closed. But if you set zil_disable in /etc/system and reboot, you shouldn't see poor performance in that case. Other folks have had good success with NFS/ZFS performance (while other have not). If it's possible, could you characterize your workload in a bit more detail? --Bill On Fri, Apr 20, 2007 at 04:07:44PM -0400, Andy Lubel wrote: > > We are having a really tough time accepting the performance with ZFS > and NFS interaction. I have tried so many different ways trying to > make it work (even zfs set:zil_disable 1) and I'm still no where near > the performance of using a standard NFS mounted UFS filesystem - > insanely slow; especially on file rewrites. > > We have been combing the message boards and it looks like there was a > lot of talk about this interaction of zfs+nfs back in november and > before but since i have not seen much. It seems the only fix up to > that date was to disable zil, is that still the case? Did anyone ever > get closure on this? > > We are running solaris 10 (SPARC) .latest patched 11/06 release > connecting directly via FC to a 6120 with 2 raid 5 volumes over a bge > interface (gigabit). tried raidz, mirror and stripe with no > negligible difference in speed. the clients connecting to this > machine are HP-UX 11i and OS X 10.4.9 and they both have corresponding > performance characteristics. > > Any insight would be appreciated - we really like zfs compared to any > filesystem we have EVER worked on and dont want to revert if at all > possible! > > > TIA, > > Andy Lubel > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
[EMAIL PROTECTED] said: > We have been combing the message boards and it looks like there was a lot of > talk about this interaction of zfs+nfs back in november and before but since > i have not seen much. It seems the only fix up to that date was to disable > zil, is that still the case? Did anyone ever get closure on this? There's a way to tell your 6120 to ignore ZFS cache flushes, until ZFS learns to do that itself. See: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024194.html Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
Marion Hakanson wrote: [EMAIL PROTECTED] said: We have been combing the message boards and it looks like there was a lot of talk about this interaction of zfs+nfs back in november and before but since i have not seen much. It seems the only fix up to that date was to disable zil, is that still the case? Did anyone ever get closure on this? There's a way to tell your 6120 to ignore ZFS cache flushes, until ZFS learns to do that itself. See: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024194.html The 6120 isn't the same as a 6130/61340/6540. The instructions referenced above won't work on a T3/T3+/6120/6320 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
yeah i saw that post about the other arrays but none for this EOL'd hunk of metal. i have some 6130's but hopefully by the time they are implemented we will have retired this nfs stuff and stepped into zvol iscsi targets. thanks anyways.. back to the drawing board on how to resolve this! -Andy -Original Message- From: [EMAIL PROTECTED] on behalf of Torrey McMahon Sent: Fri 4/20/2007 6:00 PM To: Marion Hakanson Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4) Marion Hakanson wrote: > [EMAIL PROTECTED] said: > >> We have been combing the message boards and it looks like there was a lot of >> talk about this interaction of zfs+nfs back in november and before but since >> i have not seen much. It seems the only fix up to that date was to disable >> zil, is that still the case? Did anyone ever get closure on this? >> > > There's a way to tell your 6120 to ignore ZFS cache flushes, until ZFS > learns to do that itself. See: > http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024194.html > > The 6120 isn't the same as a 6130/61340/6540. The instructions referenced above won't work on a T3/T3+/6120/6320 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Preferred backup mechanism for ZFS?
My initial reaction is that the world has got by without [email|cellphone| other technology] for a long time ... so not a big deal. Well, I did say I viewed it as an indefensible position :-) Now shall we debate if the world is a better place because of cell phones :-P ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
[EMAIL PROTECTED] said: > The 6120 isn't the same as a 6130/61340/6540. The instructions referenced > above won't work on a T3/T3+/6120/6320 Sigh. I can't keep up (:-). Thanks for the correction. Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
Im not sure about the workload but I did configure the volumes with the block size in mind.. didnt seem to do much. it could be due to the fact im basically HW raid then zfs raid and i just dont know the equation to define a smarter blocksize. seems like if i have 2 arrays with 64kb striped together that 128k would be ideal for my zfs datasets, but again.. my logic isnt infinite when it comes to this fun stuff ;) The 6120 has 2 volumes each with 64k stripe size blocks. i then raidz'ed the 2 volumes and tried both 64k and 128k. i do get a bit of a performance gain on rewrite at 128k. These are dd tests by the way: *this one is locally, and works just great. bash-3.00# date ; uname -a Thu Apr 19 21:11:22 EDT 2007 SunOS yuryaku 5.10 Generic_125100-04 sun4u sparc SUNW,Sun-Fire-V210 ^---^ bash-3.00# df -k Filesystemkbytesused avail capacity Mounted on ... se6120 697761792 26 666303904 1%/pool/se6120 se6120/rfs-v10 31457280 9710895 2174638431%/pool/se6120/rfs-v10 bash-3.00# time dd if=/dev/zero of=/pool/se6120/rfs-v10/rw-test-1.loo bs=8192 count=131072 131072+0 records in 131072+0 records out real0m13.783s real0m14.136s user0m0.331s sys 0m9.947s *this one is from a HP-UX 11i system mounted to the v210 listed above: onyx:/rfs># date ; uname -a Thu Apr 19 21:15:02 EDT 2007 HP-UX onyx B.11.11 U 9000/800 1196424606 unlimited-user license ^^ onyx:/rfs># bdf Filesystem kbytesused avail %used Mounted on ... yuryaku.sol:/pool/se6120/rfs-v10 31457280 9710896 21746384 31% /rfs/v10 onyx:/rfs># time dd if=/dev/zero of=/rfs/v10/rw-test-2.loo bs=8192 count=131072 131072+0 records in 131072+0 records out real1m2.25s real0m29.02s real0m50.49s user0m0.30s sys 0m8.16s *my 6120 tidbits of interest: 6120 Release 3.2.6 Mon Feb 5 02:26:22 MST 2007 (xxx.xxx.xxx.xxx) Copyright (C) 1997-2006 Sun Microsystems, Inc. All Rights Reserved. daikakuji:/:<1>vol mode volume mounted cachemirror v1 yes writebehind off v2 yes writebehind off daikakuji:/:<5>vol list volumecapacity raid data standby v1 340.851 GB5 u1d01-06 u1d07 v2 340.851 GB5 u1d08-13 u1d14 daikakuji:/:<6>sys list controller : 2.5 blocksize : 64k cache : auto mirror : auto mp_support : none naca : off rd_ahead : off recon_rate : med sys memsize: 256 MBytes cache memsize : 1024 MBytes fc_topology: auto fc_speed : 2Gb disk_scrubber : on ondg : befit Am i missing something? As far as the RW test, i will tinker some more and paste the results soonish. Thanks in advance, Andy Lubel -Original Message- From: Bill Moore [mailto:[EMAIL PROTECTED] Sent: Fri 4/20/2007 5:13 PM To: Andy Lubel Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4) When you say rewrites, can you give more detail? For example, are you rewriting in 8K chunks, random sizes, etc? The reason I ask is because ZFS will, by default, use 128K blocks for large files. If you then rewrite a small chunk at a time, ZFS is forced to read 128K, modify the small chunk you're changing, and then write 128K. Obviously, this has adverse effects on performance. :) If your typical workload has a preferred block size that it uses, you might try setting the recordsize property in ZFS to match - that should help. If you're completely rewriting the file, then I can't imagine why it would be slow. The only thing I can think of is the forced sync that NFS does on a file closed. But if you set zil_disable in /etc/system and reboot, you shouldn't see poor performance in that case. Other folks have had good success with NFS/ZFS performance (while other have not). If it's possible, could you characterize your workload in a bit more detail? --Bill On Fri, Apr 20, 2007 at 04:07:44PM -0400, Andy Lubel wrote: > > We are having a really tough time accepting the performance with ZFS > and NFS interaction. I have tried so many different ways trying to > make it work (even zfs set:zil_disable 1) and I'm still no where near > the performance of using a standard NFS mounted UFS filesystem - > insanely slow; especially on file rewrites. > > We have been combing the message boards and it looks like there was a > lot of talk about this interaction of zfs+nfs back in november and > before but since i have not seen much. It seems the only fix up to > that date was to disable zil, is that still the case? Did anyone ever > get closure on this? > > We are running solaris 10 (SPARC) .latest patched 11/06 release > connecting directly via FC to a 6120 with 2 raid 5 volumes over a bge > interface (gigabit). tried rai
Re: [zfs-discuss] Permanently removing vdevs from a pool
Matty wrote: On 4/20/07, George Wilson <[EMAIL PROTECTED]> wrote: This is a high priority for us and is actively being worked. Vague enough for you. :-) Sorry I can't give you anything more exact that that. Hi George, If ZFS is supposed to be part of "open"solaris, then why can't the community get additional details? If really seems like much of the development and design of ZFS goes on behind closed doors, and the community as a whole is involved after the fact (Eric Shrock has requested feedback from list members, which is awesome!). This makes it difficult for folks to contribute, and to offer suggestions (or code) that would better ZFS as a whole. Is there a reason that more of the ZFS development discussions aren't occuring in public? I can't speak for the zpool-shrink work, but I've been inspired (partly by this post) to start collecting more input from the community regarding the use of zfs as a root file system. I've just started a blog (http://blogs.sun.com/lalt/) and I'll be putting out posts there and on this alias regarding some of the design issues that need to get resolved. So watch for it. Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bandwidth requirements (was Re: Preferred backup mechanism for ZFS?)
Anton B. Rang wrote: >>You need exactly the same bandwidth as with any other >>classical backup solution - it doesn't matter how at the end you need >>to copy all those data (differential) out of the box regardless if it's >>a tape or a disk. >> >> > >Sure. However, it's somewhat cheaper to buy 100 MB/sec of local-attached tape >than 100 MB/sec of long-distance networking. (The pedant in me points out >that you also need to move the tape to the remote site, which isn't entirely >free) > > > But a tape in a van is a very high bandwidth connection :) Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bandwidth requirements (was Re: Preferred backup mechanism for ZFS?)
But a tape in a van is a very high bandwidth connection :) Australia used to get it's usenet feed on FedExed 9-tracks. --lyndon The two most common elements in the universe are Hydrogen and stupidity. -- Harlan Ellison ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Preferred backup mechanism for ZFS?
On 20-Apr-07, at 5:54 AM, Tim Thomas wrote: Hi Wee I run a setup of SAM-FS for our main file server and we loved the backup/restore parts that you described. That is great to hear. The main concerns I have with SAM fronting the entire conversation is data integrity. Unlike ZFS, SAMFS does not do end to end checksumming. My initial reaction is that the world has got by without file systems that can do this for a long time. Indeed. Progress is one-way like that... ..so I don't see the absence of this as a big deal. Except that it dilutes the ZFS promise to, "we can keep your data safe... unless we have to restore any of it from a backup." It's a big deal if you want the integrity promise to extend beyond the pool itself! Or so it seems to me. --T Rgds Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bottlenecks in building a system
[EMAIL PROTECTED] wrote: I suspect that if you have a bottleneck in your system, it would be due to the available bandwidth on the PCI bus. Mm. yeah, it's what I was worried about, too (mostly through ignorance of the issues), which is why I was hoping HyperTransport and PCIe were going to give that data enough room on the bus. But after others expressed the opinion that the Areca PCIe cards were overkill, I'm now looking to putting some PCI-X cards on a different (probably slower) motherboard. I dug up a copy of the S2895 block diagram and asked Bill Moore about it. He said that you should be able to get about 700mb/s off of each of the PCI-X channels and that you only need 100mb/s to saturate a GigE link. He also observed that the RAID card you were using was unnecessary and would probably hamper performance. He reccomended non-RAID SATA cards based upon the Marvell chipset. Here's the e-mail trail on this list where he discusses Marvell SATA cards in a bit more detail: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html It sounds like if getting disk -> network is the concern, you'll have plenty of bandwidth, assuming you have a reasonable controller card. Well, if that isn't from the horse's mouth, I don't know what is. Elsewhere in the thread, I mention that I'm trying to go for a simpler system (well, less dependent upon PCIe) in favour of the S2892, which has the added benefit of having a NIC that is less maligned in the community. From what I can tell of the block diagram, it looks like the PCI-X subsystem is similar enough (except that it's shared with the NIC). It's sounding like a safe compromise to me, to use the Marvell chips on the oft-cited SuperMicro cards. Caching isn't going to be a huge help for writes, unless there's another thread reading simultaneoulsy from the same file. Prefetch will definitely use the additional RAM to try to boost the performance of sequential reads. However, in the interest of full disclosure, there is a pathology that we've seen where the number of sequential readers exceeds the available space in the cache. In this situation, sometimes the competeing prefetches for the different streams will cause more temporally favorable data to be evicted from the cache and performance will drop. The workaround right now is just to disable prefetch. We're looking into more comprehensive solutions. Interesting. So noted. I will expect to have to test thoroughly. If you run across this problem and are willing to let me debug on your system, shoot me an e-mail. We've only seen this in a couple of situations and it was combined with another problem where we were seeing excessive overhead for kcopyout. It's unlikely, but possible that you'll hit this. That's one heck of an offer. I'd have no problem with this, nor with taking requests for particular benchmarks from the community. It's essentially a research machine, and if it can help others out, I'm all for it. Now time to check on the project budget... :) thanks, adam ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: solaris - ata over ethernet - zfs - HPC
A new driver was released recently that has support for ZFS; check the Coraid website for details. We have a Coraid at work that we are testing and hope to (eventually) put on our production network. We're running Solaris 9, so I'm not sure how comparable our results are with your situation. Anyway, we have ours configured with 4 RAID-5 volumes across 12 disks. The main reason that it's not being used in production yet, is that we have to get anywhere close to their advertised throughput. We're getting around 30 MB/sec reads and around 23 MB/sec writes. The Coraid is attached to via a cross-over cable to a V240. Last I knew, the Coraid development team was aware of throughput issues on Solaris, and was working to improve their drivers. We have not yet tested with jumbo frames; I would expect that that would improve things somewhat. -Andrew This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Bandwidth requirements (was Re: Preferred backup mechanism for ZFS?)
On Sat, Apr 21, 2007 at 11:14:02AM +1200, Ian Collins wrote: > > >Sure. However, it's somewhat cheaper to buy 100 MB/sec of local-attached > >tape than 100 MB/sec of long-distance networking. (The pedant in me points > >out that you also need to move the tape to the remote site, which isn't > >entirely free) > > > But a tape in a van is a very high bandwidth connection :) What's the old quote? "Never underestimate the bandwidth of a stationwagon full of tapes." ;) -brian -- "Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it's just that most of the shit out there is built by people who'd be better suited to making sure that my burger is cooked thoroughly." -- Jonathan Patschke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: zfs boot image conversion kit is posted
remember that solaris express can only be distributed by authorized parties. On 20/04/07, MC <[EMAIL PROTECTED]> wrote: > Now the original question by MC I belive was about providing VMware and/or Xen image with guest OS being snv_62 with / as zfs. This is true. I'm not sure what Jim meant about the host system needing to support zfs. Maybe you're on a different page, Jim :) > I will setup a VM image that can be downloaded (I hope to get it done tomorrow, but if not definitely by early next week) and played with by anyone who is interested. That would be golden, Brian. Let me know if you can't get suitable hosting for it! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- "Less is only more where more is no good." --Frank Lloyd Wright Shawn Walker, Software and Systems Analyst [EMAIL PROTECTED] - http://binarycrusader.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on the desktop
On Tue, 2007-04-17 at 17:25 -0500, Shawn Walker wrote: > > I would think the average person would want > > to have access to 1000s of DVDs / CDs within > > a small box versus taking up the full wall. > > This is already being done now, and most of the companies doing it are > being sued like crazy :) The legal entanglements seem to specifically be around hard-disk-based DVD jukeboxes. But it's not completely hopeless -- one of them recently won a first round in court: http://www.kaleidescape.com/company/pr/PR-20070329-DVDCCA.html - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Generic "filesystem code" list/community for opensolaris ?
> Hi, > > > so far, discussing filesystem code via opensolaris > means a certain > "specialization", in the sense that we do have: > > zfs-discuss > ufs-discuss > fuse-discuss > > Likewise, there are ZFS, NFS and UFS communities > (though I can't quite > figure out if we have nfs-discuss ?). > > What's not there is a generic "FS thingies not in > either of these". I.e. a > forum with the purpose of talking filesystem code in > general (how to port > a *BSD filesystem, for example), or to contribute and > discuss community > filesystem patches or early-access code. > > Internally, we've been having a fs-interest mailing > list for such a > purpose for decades - why no generic "FS forum" on > OpenSolaris.org ? > > There's more filesystems in the world than just ZFS, > NFS and UFS. We do > have the legacy stuff, but there's also SMB/CIFS, > NTFS, Linux-things, etc. > etc. etc.; I think these alone will never be > high-volume enough to warrant > communities or even discussion lists of their own, > but combined there's > surely enough to fill one mailing list ? > > Why _not_ have a > "[EMAIL PROTECTED]", and a fs > community > that deals with anything that's not [NUZ]FS ? > > Thanks for some thoughts on this, > FrankH. > > == > == > No good can come from selling your freedom, not for > all gold of the world, > for the value of this heavenly gift exceeds that of > any fortune on earth. > == > == > ___ > ufs-discuss mailing list > [EMAIL PROTECTED] > Hi Frank, I'm about to discuss/announce some changes that are coming within the next few months into ONNV. My understanding is that ufs-discuss is the right place to talk about generic file system issues. As I was scanning through old threads, I found this one. Are there plans to create a "file system code" list/community? Is the ufs-discuss alias still the right place at present to discuss VFS-level changes? Thanks, Rich This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 6410 expansion shelf
On Thu, Mar 22, 2007 at 01:21:04PM -0700, Frank Cusack wrote: > Does anyone have a 6140 expansion shelf that they can hook directly to > a host? Just wondering if this configuration works. Previously I > though the expansion connector was proprietary but now I see it's > just fibre channel. The 6140 controller unit has either 2GB or 4GB cache. Does the 6140 expansion shelf have cache as well or is the cache in the controller unit used for all expansions shelves? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS+NFS on storedge 6120 (sun t4)
Welcome to the club, Andy... I tried several times to attract the attention of the community to the dramatic performance degradation (about 3 times) of NFZ/ZFS vs. ZFS/UFS combination - without any result : http://www.opensolaris.org/jive/thread.jspa?messageID=98592";>[1] , http://www.opensolaris.org/jive/thread.jspa?threadID=24015";>[2]. Just look at two graphs in my http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html";>posting dated August, 2006 to see how bad the situation was and, unfortunately, this situation wasn't changed much recently: http://photos1.blogger.com/blogger/7591/428/1600/sfs.1.png I don't think the storage array is a source of the problems you reported. It's somewhere else... [i]-- leon[/i] This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss