using freebsd-update to update jails and their host
I have a 8.0 host system with a few jails (using ezjail) that I am gearing to update to 8.2. I have used freebsd-update a few times in the past to upgrade a system between releases, but how I would I go about using it to also upgrade a few jails made using ezjail? I would obviously need to point freebsd-update to use /basejail as root which I assume isn't too hard, but what about having it merge the new/changed /etc files in individual jails? I've also discovered the "ezjail-admin install -h file://" option which installs a basejail using the host system as base, am I right in thinking I could also use this by first upgrading my host and then running this command to write the /basejail over with the updated files from the host to bring them into sync? I still don't know how I would then fix the /etc under each individual jail though. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS on top of GELI
Hello list. I am evaluating options for my new upcoming storage system, where for various reasons the data will be stored on 2 x 2tb SATA disk in a mirror and has to be encrypted (a 40gb Intel SSD will be used for the system disk). Right now I am considering the options of FreeBSD with GELI+ZFS and Debian Linux with MDRAID and cryptofs. Has anyone here made any benchmarks regarding how much of a performance hit is caused by using 2 geli devices as vdevs for a ZFS mirror pool in FreeBSD (a similar configuration is described here: http://blog.experimentalworks.net/2008/03/setting-up-an-encrypted-zfs-with-freebsd/)? Some direct comparisons using bonnie++ or similar, showing the number differences of "this is read/write/IOPS on top of a ZFS mirror and this is read/write/IOPS on top of a ZFS mirror using GELI" would be nice. I am mostly interested in benchmarks on lower end hardware, the system is an Atom 330 which is currently using Windows 2008 server with TrueCrypt in a non-raid configuration and with that setup, I am getting roughly 55mb/s reads and writes when using TrueCrypt (nonencrypted it's around 115mb/s). Thanks. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI
On Sun, Jan 10, 2010 at 6:12 PM, Damian Gerow wrote: > Dan Naumov wrote: > : I am mostly interested in benchmarks on lower end hardware, the system > : is an Atom 330 which is currently using Windows 2008 server with > : TrueCrypt in a non-raid configuration and with that setup, I am > : getting roughly 55mb/s reads and writes when using TrueCrypt > : (nonencrypted it's around 115mb/s). > > I've been using GELI-backed vdevs for some time now -- since 7.2-ish > timeframes. I've never benchmarked it, but I was running on relatively > low-end hardware. A few things to take into consideration: > > 1) Make sure the individual drives are encrypted -- especially if they're > >=1TB. This is less a performance thing and more a "make sure your > encryption actually encrypts properly" thing. > 2) Seriously consider using the new AHCI driver. I've been using it in a > few places, and it's quite stable, and there is a marked performance > improvement - 10-15% on the hardware I've got. > 3) Take a look at the VIA platform, as a replacement for the Atom. I was > running on an EPIA-SN 1800 (1.8GHz), and didn't have any real troubles > with the encryption aspect of the rig (4x1TB drives). Actually, if you > get performance numbers privately comparing the Atom to a VIA (Nano or > otherwise), can you post them to the list? I'm curious to see if the > on-chip encryption actually makes a difference. > 4) Since you're asking for benchmarks, probably best if you post the > specific bonnie command you want run -- that way, it's tailored to your > use-case, and you'll get consistant, comparable results. Yes, this is what I was basically considering: new AHCI driver => 40gb Intel SSD => UFS2 with Softupdates for the system installation new AHCI driver => 2 x 2tb disks, each fully encrypted with geli => 2 geli vdevs for a ZFS mirror for important data The reason I am considering the new AHCI driver is to get NCQ support now and TRIM support for the SSD later when it gets implemented, although if the performance difference right now is already 10-15%, that's a reason good enough on it's own. On a semi-related note, is it still recommended to use softupdates or is GJournal a better choice today? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI
On Sun, Jan 10, 2010 at 8:46 PM, Damian Gerow wrote: > Dan Naumov wrote: > : Yes, this is what I was basically considering: > : > : new AHCI driver => 40gb Intel SSD => UFS2 with Softupdates for the > : system installation > : new AHCI driver => 2 x 2tb disks, each fully encrypted with geli => 2 > : geli vdevs for a ZFS mirror for important data > > If performance is an issue, you may want to consider carving off a partition > on that SSD, geli-fying it, and using it as a ZIL device. You'll probably > see a marked performance improvement with such a setup. That is true, but using a single device for a dedicated ZIL is a huge no-no, considering it's an intent log, it's used to reconstruct the pool in case of a power failure for example, should such an event occur at the same time as a ZIL provider dies, you lose the entire pool because there is no way to recover it, so if ZIL gets put "elsewhere", that elsewhere really should be a mirror and sadly I don't see myself affording to use 2 SSDs for my setup :) - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
bin/115406: [patch] gpt(8) GPT MBR hangs award BIOS on boot
I have a few questions about this PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=115406&cat=bin 1) Is this bug now officially fixed as of 8.0-RELEASE? Ie, can I expect to set up a completely GPT-based system using an Intel D945GCLF2 board and not have the installation crap out on me later? 2) The very last entry into the PR states the following: "The problem has been addressed in gart(8) and gpt(8) is obsolete, so no follow-up is to be expected at this time. Close the PR to reflect this." What exactly is "gart" and where do I find it's manpage, http://www.freebsd.org/cgi/man.cgi comes up with nothing? Also, does this mean that GPT is _NOT_ in fact fixed regarding this bug? Thanks. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI
On Mon, Jan 11, 2010 at 7:30 PM, Pete French wrote: >> GELI+ZFS and Debian Linux with MDRAID and cryptofs. Has anyone here >> made any benchmarks regarding how much of a performance hit is caused >> by using 2 geli devices as vdevs for a ZFS mirror pool in FreeBSD (a > > I havent done it directly on the same boxes, but I have two systems > with idenitical drives, each with a ZFS mirror pool, one wth GELI, and > one without. Simple read test shows no overhead in using GELI at all. > > I would recommend using the new AHCI driver though - greatly > improves throughput. How fast is the CPU in the system showing no overhead? Having no noticable overhead whatsoever sounds extremely unlikely unless you are actually using it on something like a very modern dualcore or better. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI
2010/1/12 Rafał Jackiewicz : > Two hdd Seagate ES2,Intel Atom 330 (2x1.6GHz), 2GB RAM: > > geli: > geli init -s 4096 -K /etc/keys/ad4s2.key /dev/ad4s2 > geli init -s 4096 -K /etc/keys/ad6s2.key /dev/ad6s2 > > zfs: > zpool create data01 ad4s2.eli > > df -h: > dev/ad6s2.eli.journal 857G 8.0K 788G 0% /data02 > data01 850G 128K 850G 0% /data01 > > srebrny# dd if=/dev/zero of=/data01/test bs=1M count=500 > 500+0 records in > 500+0 records out > 524288000 bytes transferred in 8.802691 secs (59559969 bytes/sec) > srebrny# dd if=/dev/zero of=/data02/test bs=1M count=500 > 500+0 records in > 500+0 records out > 524288000 bytes transferred in 20.090274 secs (26096608 bytes/sec) > > Rafal Jackiewicz Thanks, could you do the same, but using 2 .eli vdevs mirrorred together in a zfs mirror? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI
On Tue, Jan 12, 2010 at 1:29 AM, K. Macy wrote: >>> >>> If performance is an issue, you may want to consider carving off a partition >>> on that SSD, geli-fying it, and using it as a ZIL device. You'll probably >>> see a marked performance improvement with such a setup. >> >> That is true, but using a single device for a dedicated ZIL is a huge >> no-no, considering it's an intent log, it's used to reconstruct the >> pool in case of a power failure for example, should such an event >> occur at the same time as a ZIL provider dies, you lose the entire >> pool because there is no way to recover it, so if ZIL gets put >> "elsewhere", that elsewhere really should be a mirror and sadly I >> don't see myself affording to use 2 SSDs for my setup :) >> > > This is false. The ZIL is used for journalling synchronous writes. If > your ZIL is lost you will lose the data that was written to the ZIL, > but not yet written to the file system proper. Barring disk > corruption, the file system is always consistent. > > -Kip Ok, lets assume we have a dedicated ZIL on a single non-redundant disk. This disk dies. How do you remove the dedicated ZIL from the pool or replace it with a new one? Solaris ZFS documentation indicates that this is possible for dedicated L2ARC - you can remove a dedicated l2arc from a pool at any time you wish and should some IO fail on the l2arc, the system will gracefully continue to run, reverting said IO to be processed by the actual default built-in ZIL on the disks of the pool. However the capability to remove dedicated ZIL or gracefully handle the death of a non-redundant dedicated ZIL vdev does not currently exist in Solaris/OpenSolaris at all. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
installing FreeBSD 8 on SSDs and UFS2 - partition alignment, block sizes, what does one need to know?
For my upcoming storage system, the OS install is going to be on a 80gb Intel SSD disk and for various reasons, I am now pretty convinced to stick with UFS2 for the root partition (the actual data pool will be ZFS using traditional SATA disks). I am probably going to use GPT partitioning and have the SSD host the swap, boot, root and a few other partitions. What do I need to know in regards to partition alignment and filesystem block sizes to get the best performance out of the Intel SSDs? Thanks. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI
2010/1/12 Rafał Jackiewicz : >>Thanks, could you do the same, but using 2 .eli vdevs mirrorred >>together in a zfs mirror? >> >>- Sincerely, >>Dan Naumov > > Hi, > > Proc: Intell Atom 330 (2x1.6Ghz) - 1 package(s) x 2 core(s) x 2 HTT threads > Chipset: Intel 82945G > Sys: 8.0-RELEASE FreeBSD 8.0-RELEASE #0 > empty file: /boot/loader.conf > Hdd: > ad4: 953869MB at ata2-master SATA150 > ad6: 953869MB at ata3-master SATA150 > Geli: > geli init -s 4096 -K /etc/keys/ad4s2.key /dev/ad4s2 > geli init -s 4096 -K /etc/keys/ad6s2.key /dev/ad6s2 > > > Results: > > > *** single drive write MB/s read MB/s > eli.journal.ufs2 23 14 > eli.zfs 19 36 > > > *** mirror write MB/s read MB/s > mirror.eli.journal.ufs2 23 16 > eli.zfs 31 40 > zfs 83 79 > > > *** degraded mirror write MB/s read MB/s > mirror.eli.journal.ufs2 16 9 > eli.zfs 56 40 > zfs 86 71 > > Thanks a lot for your numbers, the relevant part for me was this: *** mirror write MB/s read MB/s eli.zfs 31 40 zfs 83 79 *** degraded mirror write MB/s read MB/s eli.zfs 56 40 zfs 86 71 31 mb/s writes and 40 mb/s reads is something that I guess I could potentially live with. I am guessing the main problem of stacking ZFS on top of geli like this is the fact that writing to a mirror requires double the CPU use, because we have to encrypt all written data twice (once to each disk) instead of encrypting first and then writing the encrypted data to 2 disks as would be the case if we had crypto sitting on top of ZFS instead of ZFS sitting on top of crypto. I now have to reevaluate my planned use of an SSD though, I was planning to use a 40gb partition on an Intel 80GB X25-M G2 as a dedicated L2ARC device for a ZFS mirror of 2 x 2tb disks. However these numbers make it quite obvious that I would already be CPU-starved at 40-50mb/s throughput on the encrypted ZFS mirror, so adding an l2arc SSD, while improving latency, would do really nothing for actual disk read speeds, considering the l2arc itself would too, have to sit on top of a GELI device. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: bin/115406: [patch] gpt(8) GPT MBR hangs award BIOS on boot
> 1) Is this bug now officially fixed as of 8.0-RELEASE? Ie, can I > expect to set up a completely GPT-based system using an Intel > D945GCLF2 board and not have the installation crap out on me later? > > 2) The very last entry into the PR states the following: > "The problem has been addressed in gart(8) and gpt(8) is obsolete, so > no follow-up is to be expected at this time. Close the PR to reflect > this." Hello list. Referring to PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=115406&cat=bin I have now been battling with trying to setup a FreeBSD 8.0 system using GPT on an Intel D945GCLF2 board for over 24 hours and it looks to me that the problem is not resolved. If I do a traditional installation using sysinstall / MBR, everything works. But if I use GPT and do a manual installation and do everything right, the way it's supposed to be done, the BIOS refuses to boot off the disk. I have verified that I am doing everything right by employing the exact same installation method with GPT inside a VMWare Player virtual machine and there, everything works as expected and I have also been testing this with an installation script in both cases to ensure that this is definately no user error :) Reading the original PR, it can be seen that a (supposed) fix to gpart was committed to stable/8 back in Aug 27, is it possible that this somehow didn't make it into 8.0-RELEASE or is this a question of the fix being there but not actually solving the problem? Reading the discussion on the forums at http://forums.freebsd.org/showthread.php?t=4680 I am seeing that a 7.2-RELEASE user had solved his exact same problem by editing the actual PMBR (resulting in "bootable" flag (0x80) being set and the start of the partition has being set to the beginning of the disk (0x010100).) and applying it to his disk with DD. Can anyone point me towards an explanation regarding how to edit and apply my own PMBR to my disk to see if it helps? Thanks. Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
8.0-RELEASE / gpart / GPT / marking a partition as "active"
It seems that quite a few BIOSes have serious issues booting off disks using GPT partitioning when no partition present is marked as "active". See http://www.freebsd.org/cgi/query-pr.cgi?pr=115406&cat=bin for a prime example. In 8.0-RELEASE, using gpart, setting a slice as "active" in MBR partitioning mode is trivial, ie: gpart set -a active -i 1 DISKNAME However, trying to do the same thing with GPT partitioning yields no results: gpart set -a active -i 1 DISKNAME gpart: attrib 'active': Device not configured As a result of this issue, I can configure and make a succesfull install using GPT in 8.0, but I cannot boot off it using my Intel D945GCLF2 board. I have found this discussion from about a month ago: http://www.mail-archive.com/freebsd-stable@freebsd.org/msg106918.html where Robert mentions that "gpart set -a active -i 1" is no longer needed in 8-STABLE, because the pmbr will be marked as active during the installation of the bootcode. Is there anything I can do to archieve the same result in 8.0-RELEASE or is installing from a snapshop of 8-STABLE my only option? Thanks. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE / gpart / GPT / marking a partition as "active"
On 1/19/2010 12:11 PM, Dan Naumov wrote: > It seems that quite a few BIOSes have serious issues booting off disks > using GPT partitioning when no partition present is marked as > "active". See http://www.freebsd.org/cgi/query-pr.cgi?pr=115406&cat=bin > for a prime example. > > In 8.0-RELEASE, using gpart, setting a slice as "active" in MBR > partitioning mode is trivial, ie: > > gpart set -a active -i 1 DISKNAME > > However, trying to do the same thing with GPT partitioning yields no results: > > gpart set -a active -i 1 DISKNAME > gpart: attrib 'active': Device not configured > > As a result of this issue, I can configure and make a succesfull > install using GPT in 8.0, but I cannot boot off it using my Intel > D945GCLF2 board. > > I have found this discussion from about a month ago: > http://www.mail-archive.com/freebsd-stable@freebsd.org/msg106918.html > where Robert mentions that "gpart set -a active -i 1" is no longer > needed in 8-STABLE, because the pmbr will be marked as active during > the installation of the bootcode. Is there anything I can do to > archieve the same result in 8.0-RELEASE or is installing from a > snapshop of 8-STABLE my only option? > After using gpart to create the GPT (and thus the PMBR and its > bootcode), why not simply use "fdisk -a -1 DISKNAME" to set the PMBR > partition active? According to the fdisk output, the partition flag did change from 0 to 80. Can the "fdisk: Class not found" error showing up at the very end of the procedure of doing "fdisk -a -1 DISKNAME" be safely ignored? - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Loader, MBR and the boot process
I recently found a nifty "FreeBSD ZFS root installation script" and been reworking it a bit to suit my needs better, including changing it from GPT to MBR partitioning. However, I was stumped, even though I had done everything right (or so I thought), the system would get stuck at Loader and refuse to go anywhere. After trying over a dozen different things, it downed on me to change the partition order inside the slice, I had 1) swap 2) freebsd-zfs and for the test, I got rid of swap altogether and gave the entire slice to the freebsd-zfs partition. Suddenly, my problem went away and the system booted just fine. So it seems that Loader requires that the partition containing the files vital to the boot is the first partition on the slice and that "swap first, then the rest" doesn't work. The thing is, I am absolutely positive that in the past, I've had sysinstall created installs using MBR partitioning and that I had swap as my first partition inside the slice and that it all worked dandy. Has this changed at some point? Oh, and for the curious the installation script is here: http://jago.pp.fi/zfsmbrv1-works.sh - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Loader, MBR and the boot process
On Fri, Jan 22, 2010 at 6:12 AM, Thomas K. wrote: > On Fri, Jan 22, 2010 at 05:57:23AM +0200, Dan Naumov wrote: > > Hi, > >> I recently found a nifty "FreeBSD ZFS root installation script" and >> been reworking it a bit to suit my needs better, including changing it >> from GPT to MBR partitioning. However, I was stumped, even though I >> had done everything right (or so I thought), the system would get >> stuck at Loader and refuse to go anywhere. After trying over a dozen > > probably this line is the cause: > > dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s1a skip=1 seek=1024 > > Unless by "swap first" you meant the on-disk location, and not the > partition letter. If swap is partition "a", you're writing the loader > into swapspace. > > > Regards, > Thomas At first you made me feel silly, but then I decided to double-check, I uncommented the swap line in the partitioning part again, ensured I was writing the bootloader to "${TARGETDISK}"s1b and ran the script. Same problem, hangs at loader. Again, if I comment out the swap, giving the entire slice to ZFS and then write the bootloader to "${TARGETDISK}"s1a, run the script, everything works. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Loader, MBR and the boot process
On Fri, Jan 22, 2010 at 6:49 AM, Dan Naumov wrote: > On Fri, Jan 22, 2010 at 6:12 AM, Thomas K. wrote: >> On Fri, Jan 22, 2010 at 05:57:23AM +0200, Dan Naumov wrote: >> >> Hi, >> >>> I recently found a nifty "FreeBSD ZFS root installation script" and >>> been reworking it a bit to suit my needs better, including changing it >>> from GPT to MBR partitioning. However, I was stumped, even though I >>> had done everything right (or so I thought), the system would get >>> stuck at Loader and refuse to go anywhere. After trying over a dozen >> >> probably this line is the cause: >> >> dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s1a skip=1 seek=1024 >> >> Unless by "swap first" you meant the on-disk location, and not the >> partition letter. If swap is partition "a", you're writing the loader >> into swapspace. >> >> >> Regards, >> Thomas > > At first you made me feel silly, but then I decided to double-check, I > uncommented the swap line in the partitioning part again, ensured I > was writing the bootloader to "${TARGETDISK}"s1b and ran the script. > Same problem, hangs at loader. Again, if I comment out the swap, > giving the entire slice to ZFS and then write the bootloader to > "${TARGETDISK}"s1a, run the script, everything works. I have also just tested creating 2 slices, like this: gpart create -s mbr "${TARGETDISK}" gpart add -s 3G -t freebsd "${TARGETDISK}" gpart create -s BSD "${TARGETDISK}"s1 gpart add -t freebsd-swap "${TARGETDISK}"s1 gpart add -t freebsd "${TARGETDISK}" gpart create -s BSD "${TARGETDISK}"s2 gpart add -t freebsd-zfs "${TARGETDISK}"s2 gpart set -a active -i 2 "${TARGETDISK}" gpart bootcode -b /mnt2/boot/boot0 "${TARGETDISK}" and later: dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s2 count=1 dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s2a skip=1 seek=1024 Putting the swap into it's own slice and then putting FreeBSD into it's own slice worked fine. So why the hell can't they both coexist in 1 slice if the swap comes first? - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
posting coding bounties, appropriate money amounts?
Hello I am curious about posting some coding bounties, my current interest revolves around improving the ZVOL functionality in FreeBSD: fixing the known ZVOL SWAP reliability/stability problems as well as making ZVOLs work as a dumpon device (as is already the case in OpenSolaris) for crash dumps. I am a private individual and not some huge Fortune 100 and while I am not exactly rich, I am willing to put some of my personal money towards this. I am curious though, what would be the best way to approach this: directly approaching committer(s) with the know-how-and-why of the areas involved or through the FreeBSD Foundation? And how would one go about calculating the appropriate amount of money for such a thing? Thanks. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Loader, MBR and the boot process
On Sun, Jan 24, 2010 at 5:29 PM, John wrote: > On Fri, Jan 22, 2010 at 07:02:53AM +0200, Dan Naumov wrote: >> On Fri, Jan 22, 2010 at 6:49 AM, Dan Naumov wrote: >> > On Fri, Jan 22, 2010 at 6:12 AM, Thomas K. wrote: >> >> On Fri, Jan 22, 2010 at 05:57:23AM +0200, Dan Naumov wrote: >> >> >> >> Hi, >> >> >> >>> I recently found a nifty "FreeBSD ZFS root installation script" and >> >>> been reworking it a bit to suit my needs better, including changing it >> >>> from GPT to MBR partitioning. However, I was stumped, even though I >> >>> had done everything right (or so I thought), the system would get >> >>> stuck at Loader and refuse to go anywhere. After trying over a dozen >> >> >> >> probably this line is the cause: >> >> >> >> dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s1a skip=1 seek=1024 >> >> >> >> Unless by "swap first" you meant the on-disk location, and not the >> >> partition letter. If swap is partition "a", you're writing the loader >> >> into swapspace. >> >> >> >> >> >> Regards, >> >> Thomas >> > >> > At first you made me feel silly, but then I decided to double-check, I >> > uncommented the swap line in the partitioning part again, ensured I >> > was writing the bootloader to "${TARGETDISK}"s1b and ran the script. >> > Same problem, hangs at loader. Again, if I comment out the swap, >> > giving the entire slice to ZFS and then write the bootloader to >> > "${TARGETDISK}"s1a, run the script, everything works. >> >> I have also just tested creating 2 slices, like this: >> >> gpart create -s mbr "${TARGETDISK}" >> gpart add -s 3G -t freebsd "${TARGETDISK}" >> gpart create -s BSD "${TARGETDISK}"s1 >> gpart add -t freebsd-swap "${TARGETDISK}"s1 >> >> gpart add -t freebsd "${TARGETDISK}" >> gpart create -s BSD "${TARGETDISK}"s2 >> gpart add -t freebsd-zfs "${TARGETDISK}"s2 >> >> gpart set -a active -i 2 "${TARGETDISK}" >> gpart bootcode -b /mnt2/boot/boot0 "${TARGETDISK}" >> >> >> and later: >> >> dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s2 count=1 >> dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s2a skip=1 seek=1024 >> >> >> Putting the swap into it's own slice and then putting FreeBSD into >> it's own slice worked fine. So why the hell can't they both coexist in >> 1 slice if the swap comes first? > > I know what the answer to this USED to be, but I don't know if it is > still true (obviously, I think so, I or wouldn't waste your time). > > The filesystem code is all carefully written to avoid the very > first few sector of the partition. That's because the partition > table is there for the first filesystem of the slice (or disk). > That's a tiny amout of space wasted, because it's also skipped on > all the other filesystems even though there's not actually anything > there, but it was a small inefficency, even in the 70's. > > Swap does not behave that way. SWAP will begin right at the slice > boundry, with 0 offset. As long as it's not the first partition, no > harm, no foul. If it IS the first partition, you just nuked your partition > table. As long as SWAP owns the slice, again, no harm, no foul, but > if there were filesystems BEHIND it, you just lost 'em. > > That's the way it always used to be, and I think it still is. SWAP can > only be first if it is the ONLY thing using that slice (disk), otherwise, > you need a filesystem first to protect the partition table. > -- > > John Lind > j...@starfire.mn.org This explanation does sound logical, but holy crap, if this is the case, you'd think there would be bells, whistles and huge red label warnings in EVERY FreeBSD installation / partitioning guide out there warning people to not put swap first (unless given a dedicated slice) under any circumstances. The warnings were nowhere to be seen and lots of pointy hair first greyed and were then lost during the process of me trying to figure out why my system would install but wouldn't boot. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
1 sec =6.278 msec Short backward: 400 iter in 2.233714 sec =5.584 msec Seq outer: 2048 iter in 0.427523 sec =0.209 msec Seq inner: 2048 iter in 0.341185 sec =0.167 msec Transfer rates: outside: 102400 kbytes in 1.516305 sec =67533 kbytes/sec middle:102400 kbytes in 1.351877 sec =75747 kbytes/sec inside:102400 kbytes in 2.090069 sec =48994 kbytes/sec === The exact same disks, on the exact same machine, are well capable of 65+ mb/s throughput (tested with ATTO multiple times) with different block sizes using Windows 2008 Server and NTFS. So what would be the cause of these very low Bonnie result numbers in my case? Should I try some other benchmark and if so, with what parameters? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
test2 bs=1M count=4096 4096+0 records in 4096+0 records out 4294967296 bytes transferred in 143.878615 secs (29851325 bytes/sec) This works out to 1GB in 36,2 seconds / 28,2mb/s in the first test and 4GB in 143.8 seconds / 28,4mb/s and somewhat consistent with the bonnie results. It also sadly seems to confirm the very slow speed :( The disks are attached to a 4-port Sil3124 controller and again, my Windows benchmarks showing 65mb/s+ were done on exact same machine, with same disks attached to the same controller. Only difference was that in Windows the disks weren't in a mirror configuration but were tested individually. I do understand that a mirror setup offers roughly the same write speed as individual disk, while the read speed usually varies from "equal to individual disk speed" to "nearly the throughput of both disks combined" depending on the implementation, but there is no obvious reason I am seeing why my setup offers both read and write speeds roughly 1/3 to 1/2 of what the individual disks are capable of. Dmesg shows: atapci0: port 0x1000-0x100f mem 0x90108000-0x9010807f,0x9010-0x90107fff irq 21 at device 0.0 on pci4 ad8: 1907729MB at ata4-master SATA300 ad10: 1907729MB at ata5-master SATA300 I do recall also testing an alternative configuration in the past, where I would boot off an UFS disk and have the ZFS mirror consist of 2 discs directly. The bonnie numbers in that case were in line with my expectations, I was seeing 65-70mb/s. Note: again, exact same hardware, exact same disks attached to the exact same controller. In my knowledge, Solaris/OpenSolaris has an issue where they have to automatically disable disk cache if ZFS is used on top of partitions instead of raw disks, but to my knowledge (I recall reading this from multiple reputable sources) this issue does not affect FreeBSD. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
On Sun, Jan 24, 2010 at 7:42 PM, Dan Naumov wrote: > On Sun, Jan 24, 2010 at 7:05 PM, Jason Edwards wrote: >> Hi Dan, >> >> I read on FreeBSD mailinglist you had some performance issues with ZFS. >> Perhaps i can help you with that. >> >> You seem to be running a single mirror, which means you won't have any speed >> benefit regarding writes, and usually RAID1 implementations offer little to >> no acceleration to read requests also; some even just read from the master >> disk and don't touch the 'slave' mirrored disk unless when writing. ZFS is >> alot more modern however, although i did not test performance of its mirror >> implementation. >> >> But, benchmarking I/O can be tricky: >> >> 1) you use bonnie, but bonnie's tests are performed without a 'cooldown' >> period between the tests; meaning that when test 2 starts, data from test 1 >> is still being processed. For single disks and simple I/O this is not so >> bad, but for large write-back buffers and more complex I/O buffering, this >> may be inappropriate. I had patched bonnie some time in the past, but if you >> just want a MB/s number you can use DD for that. >> >> 2) The diskinfo tiny benchmark is single queue only i assume, meaning that >> it would not scale well or at all on RAID-arrays. Actual filesystems on >> RAID-arrays use multiple-queue; meaning it would not read one sector at a >> time, but read 8 blocks (of 16KiB) "ahead"; this is called read-ahead and >> for traditional UFS filesystems its controlled by the sysctl vfs.read_max >> variable. ZFS works differently though, but you still need a "real" >> benchmark. >> >> 3) You need low-latency hardware; in particular, no PCI controller should be >> used. Only PCI-express based controllers or chipset-integrated Serial ATA >> cotrollers have proper performance. PCI can hurt performance very badly, and >> has high interrupt CPU usage. Generally you should avoid PCI. PCI-express is >> fine though, its a completely different interface that is in many ways the >> opposite of what PCI was. >> >> 4) Testing actual realistic I/O performance (in IOps) is very difficult. But >> testing sequential performance should be alot easier. You may try using dd >> for this. >> >> >> For example, you can use dd on raw devices: >> >> dd if=/dev/ad4 of=/dev/null bs=1M count=1000 >> >> I will explain each parameter: >> >> if=/dev/ad4 is the input file, the "read source" >> >> of=/dev/null is the output file, the "write destination". /dev/null means it >> just goes no-where; so this is a read-only benchmark >> >> bs=1M is the blocksize, howmuch data to transfer per time. default is 512 or >> the sector size; but that's very slow. A value between 64KiB and 1024KiB is >> appropriate. bs=1M will select 1MiB or 1024KiB. >> >> count=1000 means transfer 1000 pieces, and with bs=1M that means 1000 * 1MiB >> = 1000MiB. >> >> >> >> This example was raw reading sequentially from the start of the device >> /dev/ad4. If you want to test RAIDs, you need to work at the filesystem >> level. You can use dd for that too: >> >> dd if=/dev/zero of=/path/to/ZFS/mount/zerofile.000 bs=1M count=2000 >> >> This command will read from /dev/zero (all zeroes) and write to a file on >> ZFS-mounted filesystem, it will create the file "zerofile.000" and write >> 2000MiB of zeroes to that file. >> So this command tests write-performance of the ZFS-mounted filesystem. To >> test read performance, you need to clear caches first by unmounting that >> filesystem and re-mounting it again. This would free up memory containing >> parts of the filesystem as cached (reported in top as "Inact(ive)" instead >> of "Free"). >> >> Please do make sure you double-check a dd command before running it, and run >> as normal user instead of root. A wrong dd command may write to the wrong >> destination and do things you don't want. The only real thing you need to >> check is the write destination (of=). That's where dd is going to write >> to, so make sure its the target you intended. A common mistake made by >> myself was to write dd of=... if=... (starting with of instead of if) and >> thus actually doing something the other way around than what i was meant to >> do. This can be disastrous if you work with live data, so be careful! ;-) >> >> Hope any of this was helpful. During the dd benchmark, you can of course >> open a
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
On Sun, Jan 24, 2010 at 8:12 PM, Bob Friesenhahn wrote: > On Sun, 24 Jan 2010, Dan Naumov wrote: >> >> This works out to 1GB in 36,2 seconds / 28,2mb/s in the first test and >> 4GB in 143.8 seconds / 28,4mb/s and somewhat consistent with the >> bonnie results. It also sadly seems to confirm the very slow speed :( >> The disks are attached to a 4-port Sil3124 controller and again, my >> Windows benchmarks showing 65mb/s+ were done on exact same machine, >> with same disks attached to the same controller. Only difference was >> that in Windows the disks weren't in a mirror configuration but were >> tested individually. I do understand that a mirror setup offers >> roughly the same write speed as individual disk, while the read speed >> usually varies from "equal to individual disk speed" to "nearly the >> throughput of both disks combined" depending on the implementation, >> but there is no obvious reason I am seeing why my setup offers both >> read and write speeds roughly 1/3 to 1/2 of what the individual disks >> are capable of. Dmesg shows: > > There is a mistatement in the above in that a "mirror setup offers roughly > the same write speed as individual disk". It is possible for a mirror setup > to offer a similar write speed to an individual disk, but it is also quite > possible to get 1/2 (or even 1/3) the speed. ZFS writes to a mirror pair > requires two independent writes. If these writes go down independent I/O > paths, then there is hardly any overhead from the 2nd write. If the writes > go through a bandwidth-limited shared path then they will contend for that > bandwidth and you will see much less write performance. > > As a simple test, you can temporarily remove the mirror device from the pool > and see if the write performance dramatically improves. Before doing that, > it is useful to see the output of 'iostat -x 30' while under heavy write > load to see if one device shows a much higher svc_t value than the other. Ow, ow, WHOA: atombsd# zpool offline tank ad8s1a [j...@atombsd ~]$ dd if=/dev/zero of=/home/jago/test3 bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes transferred in 16.826016 secs (63814382 bytes/sec) Offlining one half of the mirror bumps DD write speed from 28mb/s to 64mb/s! Let's see how Bonnie results change: Mirror with both parts attached: ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 8192 18235 46.7 23137 19.9 13927 13.6 24818 49.3 44919 17.3 134.3 2.1 Mirror with 1 half offline: ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 1024 22888 58.0 41832 35.1 22764 22.0 26775 52.3 54233 18.3 166.0 1.6 Ok, the Bonnie results have improved, but only very little. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
On Sun, Jan 24, 2010 at 8:34 PM, Jason Edwards wrote: >> ZFS writes to a mirror pair >> requires two independent writes. If these writes go down independent I/O >> paths, then there is hardly any overhead from the 2nd write. If the >> writes >> go through a bandwidth-limited shared path then they will contend for that >> bandwidth and you will see much less write performance. > > What he said may confirm my suspicion on PCI. So if you could try the same > with "real" Serial ATA via chipset or PCI-e controller you can confirm this > story. I would be very interested. :P > > Kind regards, > Jason This wouldn't explain why ZFS mirror on 2 disks directly, on the exact same controller (with the OS running off a separate disks) results in "expected" performance, while having the OS run off/on a ZFS mirror running on top of MBR-partitioned disks, on the same controller, results in very low speed. - Dan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
On Sun, Jan 24, 2010 at 11:53 PM, Alexander Motin wrote: > Dan Naumov wrote: >> This works out to 1GB in 36,2 seconds / 28,2mb/s in the first test and >> 4GB in 143.8 seconds / 28,4mb/s and somewhat consistent with the >> bonnie results. It also sadly seems to confirm the very slow speed :( >> The disks are attached to a 4-port Sil3124 controller and again, my >> Windows benchmarks showing 65mb/s+ were done on exact same machine, >> with same disks attached to the same controller. Only difference was >> that in Windows the disks weren't in a mirror configuration but were >> tested individually. I do understand that a mirror setup offers >> roughly the same write speed as individual disk, while the read speed >> usually varies from "equal to individual disk speed" to "nearly the >> throughput of both disks combined" depending on the implementation, >> but there is no obvious reason I am seeing why my setup offers both >> read and write speeds roughly 1/3 to 1/2 of what the individual disks >> are capable of. Dmesg shows: >> >> atapci0: port 0x1000-0x100f mem >> 0x90108000-0x9010807f,0x9010-0x90107fff irq 21 at device 0.0 on >> pci4 >> ad8: 1907729MB at ata4-master SATA300 >> ad10: 1907729MB at ata5-master SATA300 > > 8.0-RELEASE, and especially 8-STABLE provide alternative, much more > functional driver for this controller, named siis(4). If your SiI3124 > card installed into proper bus (PCI-X or PCIe x4/x8), it can be really > fast (up to 1GB/s was measured). > > -- > Alexander Motin Sadly, it seems that utilizing the new siis driver doesn't do much good: Before utilizing siis: iozone -s 4096M -r 512 -i0 -i1 random randombkwd record stride KB reclen write rewritereadrereadread writeread rewrite read fwrite frewrite fread freread 4194304 512 28796 287665161050695 After enabling siis in loader.conf (and ensuring the disks show up as ada): iozone -s 4096M -r 512 -i0 -i1 random randombkwd record stride KB reclen write rewritereadrereadread writeread rewrite read fwrite frewrite fread freread 4194304 512 28781 288974721450540 I've checked with the manufacturer and it seems that the Sil3124 in this NAS is indeed a PCI card. More info on the card in question is available at http://green-pcs.co.uk/2009/01/28/tranquil-bbs2-those-pci-cards/ I have the card described later on the page, the one with 4 SATA ports and no eSATA. Alright, so it being PCI is probably a bottleneck in some ways, but that still doesn't explain the performance THAT bad, considering that same hardware, same disks, same disk controller push over 65mb/s in both reads and writes in Win2008. And agian, I am pretty sure that I've had "close to expected" results when I was booting an UFS FreeBSD installation off an SSD (attached directly to SATA port on the motherboard) while running the same kinds of benchmarks with Bonnie and DD on a ZFS mirror made directly on top of 2 raw disks. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
On Mon, Jan 25, 2010 at 2:14 AM, Dan Naumov wrote: > On Sun, Jan 24, 2010 at 11:53 PM, Alexander Motin wrote: >> Dan Naumov wrote: >>> This works out to 1GB in 36,2 seconds / 28,2mb/s in the first test and >>> 4GB in 143.8 seconds / 28,4mb/s and somewhat consistent with the >>> bonnie results. It also sadly seems to confirm the very slow speed :( >>> The disks are attached to a 4-port Sil3124 controller and again, my >>> Windows benchmarks showing 65mb/s+ were done on exact same machine, >>> with same disks attached to the same controller. Only difference was >>> that in Windows the disks weren't in a mirror configuration but were >>> tested individually. I do understand that a mirror setup offers >>> roughly the same write speed as individual disk, while the read speed >>> usually varies from "equal to individual disk speed" to "nearly the >>> throughput of both disks combined" depending on the implementation, >>> but there is no obvious reason I am seeing why my setup offers both >>> read and write speeds roughly 1/3 to 1/2 of what the individual disks >>> are capable of. Dmesg shows: >>> >>> atapci0: port 0x1000-0x100f mem >>> 0x90108000-0x9010807f,0x9010-0x90107fff irq 21 at device 0.0 on >>> pci4 >>> ad8: 1907729MB at ata4-master SATA300 >>> ad10: 1907729MB at ata5-master SATA300 >> >> 8.0-RELEASE, and especially 8-STABLE provide alternative, much more >> functional driver for this controller, named siis(4). If your SiI3124 >> card installed into proper bus (PCI-X or PCIe x4/x8), it can be really >> fast (up to 1GB/s was measured). >> >> -- >> Alexander Motin > > Sadly, it seems that utilizing the new siis driver doesn't do much good: > > Before utilizing siis: > > iozone -s 4096M -r 512 -i0 -i1 > random > random bkwd record stride > KB reclen write rewrite read reread read > write read rewrite read fwrite frewrite fread freread > 4194304 512 28796 28766 51610 50695 > > After enabling siis in loader.conf (and ensuring the disks show up as ada): > > iozone -s 4096M -r 512 -i0 -i1 > > random > random bkwd record stride > KB reclen write rewrite read reread read > write read rewrite read fwrite frewrite fread freread > 4194304 512 28781 28897 47214 50540 Just to add to the numbers above, exact same benchmark, on 1 disk (detached 2nd disk from the mirror) while using the siis driver: random randombkwd record stride KB reclen write rewritereadrereadread writeread rewrite read fwrite frewrite fread freread 4194304 512 57760 563716886774047 - Dan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
On Mon, Jan 25, 2010 at 7:33 AM, Bob Friesenhahn wrote: > On Mon, 25 Jan 2010, Dan Naumov wrote: >> >> I've checked with the manufacturer and it seems that the Sil3124 in >> this NAS is indeed a PCI card. More info on the card in question is >> available at >> http://green-pcs.co.uk/2009/01/28/tranquil-bbs2-those-pci-cards/ >> I have the card described later on the page, the one with 4 SATA ports >> and no eSATA. Alright, so it being PCI is probably a bottleneck in >> some ways, but that still doesn't explain the performance THAT bad, >> considering that same hardware, same disks, same disk controller push >> over 65mb/s in both reads and writes in Win2008. And agian, I am >> pretty sure that I've had "close to expected" results when I was > > The slow PCI bus and this card look like the bottleneck to me. Remember that > your Win2008 tests were with just one disk, your zfs performance with just > one disk was similar to Win2008, and your zfs performance with a mirror was > just under 1/2 that. > > I don't think that your performance results are necessarily out of line for > the hardware you are using. > > On an old Sun SPARC workstation with retrofitted 15K RPM drives on Ultra-160 > SCSI channel, I see a zfs mirror write performance of 67,317KB/second and a > read performance of 124,347KB/second. The drives themselves are capable of > 100MB/second range performance. Similar to yourself, I see 1/2 the write > performance due to bandwidth limitations. > > Bob There is lots of very sweet irony in my particular situiation. Initially I was planning to use a single X25-M 80gb SSD in the motherboard sata port for the actual OS installation as well as to dedicate 50gb of it to a become a designaed L2ARC vdev for my ZFS mirrors. The SSD attached to the motherboard port would be recognized only as a SATA150 device for some reason, but I was still seeing 150mb/s throughput and sub 0.1 ms latencies on that disk simply because of how crazy good the X25-M's are. However I ended up having very bad issues with the Icydock 2,5" to 3,5" converter jacket I was using to keep/fit the SSD in the system and it would randomly drop write IO on heavy load due to bad connectors. Having finally figured out the cause of my OS installations to the SSD going belly up during applying updates, I decided to move the SSD to my desktop and use it there instead, additionally thinking that my perhaps my idea of the SSD was crazy overkill for what I need the system to do. Ironically now that I am seeing how horrible the performance is when I am operating on the mirror through this PCI card, I realize that actually, my idea was pretty bloody brilliant, I just didn't really know why at the time. An L2ARC device on the motherboard port would really help me with random read IO, but to work around the utterly poor write performance, I would also need a dedicaled SLOG ZIL device. The catch is that while L2ARC devices and be removed from the pool at will (should the device up and die all of a sudden), the dedicated ZILs cannot and currently a "missing" ZIL device will render the pool it's included in be unable to import and become inaccessible. There is some work happening in Solaris to implement removing SLOGs from a pool, but that work hasn't yet found it's way in FreeBSD yet. - Sincerely, Dan Naumov - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
On Mon, Jan 25, 2010 at 9:34 AM, Dan Naumov wrote: > On Mon, Jan 25, 2010 at 7:33 AM, Bob Friesenhahn > wrote: >> On Mon, 25 Jan 2010, Dan Naumov wrote: >>> >>> I've checked with the manufacturer and it seems that the Sil3124 in >>> this NAS is indeed a PCI card. More info on the card in question is >>> available at >>> http://green-pcs.co.uk/2009/01/28/tranquil-bbs2-those-pci-cards/ >>> I have the card described later on the page, the one with 4 SATA ports >>> and no eSATA. Alright, so it being PCI is probably a bottleneck in >>> some ways, but that still doesn't explain the performance THAT bad, >>> considering that same hardware, same disks, same disk controller push >>> over 65mb/s in both reads and writes in Win2008. And agian, I am >>> pretty sure that I've had "close to expected" results when I was >> >> The slow PCI bus and this card look like the bottleneck to me. Remember that >> your Win2008 tests were with just one disk, your zfs performance with just >> one disk was similar to Win2008, and your zfs performance with a mirror was >> just under 1/2 that. >> >> I don't think that your performance results are necessarily out of line for >> the hardware you are using. >> >> On an old Sun SPARC workstation with retrofitted 15K RPM drives on Ultra-160 >> SCSI channel, I see a zfs mirror write performance of 67,317KB/second and a >> read performance of 124,347KB/second. The drives themselves are capable of >> 100MB/second range performance. Similar to yourself, I see 1/2 the write >> performance due to bandwidth limitations. >> >> Bob > > There is lots of very sweet irony in my particular situiation. > Initially I was planning to use a single X25-M 80gb SSD in the > motherboard sata port for the actual OS installation as well as to > dedicate 50gb of it to a become a designaed L2ARC vdev for my ZFS > mirrors. The SSD attached to the motherboard port would be recognized > only as a SATA150 device for some reason, but I was still seeing > 150mb/s throughput and sub 0.1 ms latencies on that disk simply > because of how crazy good the X25-M's are. However I ended up having > very bad issues with the Icydock 2,5" to 3,5" converter jacket I was > using to keep/fit the SSD in the system and it would randomly drop > write IO on heavy load due to bad connectors. Having finally figured > out the cause of my OS installations to the SSD going belly up during > applying updates, I decided to move the SSD to my desktop and use it > there instead, additionally thinking that my perhaps my idea of the > SSD was crazy overkill for what I need the system to do. Ironically > now that I am seeing how horrible the performance is when I am > operating on the mirror through this PCI card, I realize that > actually, my idea was pretty bloody brilliant, I just didn't really > know why at the time. > > An L2ARC device on the motherboard port would really help me with > random read IO, but to work around the utterly poor write performance, > I would also need a dedicaled SLOG ZIL device. The catch is that while > L2ARC devices and be removed from the pool at will (should the device > up and die all of a sudden), the dedicated ZILs cannot and currently a > "missing" ZIL device will render the pool it's included in be unable > to import and become inaccessible. There is some work happening in > Solaris to implement removing SLOGs from a pool, but that work hasn't > yet found it's way in FreeBSD yet. > > > - Sincerely, > Dan Naumov OK final question: if/when I go about adding more disks to the system and want redundancy, am I right in thinking that: ZFS pool of disk1+disk2 mirror + disk3+disk4 mirror (a la RAID10) would completely murder my write and read performance even way below the current 28mb/s / 50mb/s I am seeing with 2 disks on that PCI controller and that in order to have the least negative impact, I should simply have 2 independent mirrors in 2 independent pools (with the 5th disk slot in the NAS given to a non-redundant single disk running off the one available SATA port on the motherboard)? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
On Mon, Jan 25, 2010 at 7:40 PM, Alexander Motin wrote: > Artem Belevich wrote: >> aoc-sat2-mv8 was somewhat slower compared to ICH9 or LSI1068 >> controllers when I tried it with 6 and 8 disks. >> I think the problem is that MV8 only does 32K per transfer and that >> does seem to matter when you have 8 drives hooked up to it. I don't >> have hard numbers, but peak throughput of MV8 with 8-disk raidz2 was >> noticeably lower than that of LSI1068 in the same configuration. Both >> LSI1068 and MV2 were on the same PCI-X bus. It could be a driver >> limitation. The driver for Marvel SATA controllers in NetBSD seems a >> bit more advanced compared to what's in FreeBSD. > > I also wouldn't recommend to use Marvell 88SXx0xx controllers now. While > potentially they are interesting, lack of documentation and numerous > hardware bugs make existing FreeBSD driver very limited there. > >> I wish intel would make cheap multi-port PCIe SATA card based on their >> AHCI controllers. > > Indeed. Intel on-board AHCI SATA controllers are fastest from all I have > tested. Unluckily, they are not producing discrete versions. :( > > Now, if discrete solution is really needed, I would still recommend > SiI3124, but with proper PCI-X 64bit/133MHz bus or built-in PCIe x8 > bridge. They are fast and have good new siis driver. > >> On Mon, Jan 25, 2010 at 3:29 AM, Pete French >> wrote: >>>> I like to use pci-x with aoc-sat2-mv8 cards or pci-e cardsthat way you >>>> get a lot more bandwidth.. >>> I would goalong with that - I have precisely the same controller, with >>> a pair of eSATA drives, running ZFS mirrored. But I get a nice 100 >>> meg/second out of them if I try. My controller is, however on PCI-X, not >>> PCI. It's a shame PCI-X appears to have gone the way of the dinosaur :-( > > -- > Alexander Motin Alexander, since you seem to be experienced in the area, what do you think of these 2 for use in a FreeBSD8 ZFS NAS: http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance
On Mon, Jan 25, 2010 at 8:32 PM, Alexander Motin wrote: > Dan Naumov wrote: >> Alexander, since you seem to be experienced in the area, what do you >> think of these 2 for use in a FreeBSD8 ZFS NAS: >> >> http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H >> http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y > > Unluckily I haven't yet touched Atom family close yet, so I can't say > about it's performance. But higher desktop level (even bit old) ICH9R > chipset there is IMHO a good option. It is MUCH better then ICH7, often > used with previous Atoms. If I had nice small Mini-ITX case with 6 drive > bays, I would definitely look for some board like that to build home > storage. > > -- > Alexander Motin CPU-performance-wise, I am not really worried. The current system is an Atom 330 and even that is a bit overkill for what I do with it and from what I am seeing, the new Atom D510 used on those boards is a tiny bit faster. What I want and care about for this system are reliability, stability, low power use, quietness and fast disk read/write speeds. I've been hearing some praise of ICH9R and 6 native SATA ports should be enough for my needs. AFAIK, the Intel 82574L network cards included on those are also very well supported? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: immense delayed write to file system (ZFS and UFS2), performance issues
> You're welcome. I just feel as bad for you as for everyone else who > has bought these obviously Windoze optimized harddrives. Unfortunately > neither wdidle3 nor an updated firmware is available or functioning on > the latest models in the Green series. At least that's what I've read > from other people having this issue. WD only claims they don't support > Linux and they probably have never heard of FreeBSD. This discussion made me have a look at my 2tb WD Green disks, one of them is from May 2009, looks pretty reasonable : Device Model: WDC WD20EADS-00R6B0 Serial Number:WD-WCAVY0301430 Firmware Version: 01.00A01 9 Power_On_Hours 0x0032 093 093 000Old_age Always - 5253 193 Load_Cycle_Count0x0032 200 200 000Old_age Always - 55 And another is very recent, from December 2009, this does look a bit worrying in comparison: Device Model: WDC WD20EADS-32R6B0 Serial Number:WD-WCAVY1611513 Firmware Version: 01.00A01 9 Power_On_Hours 0x0032 100 100 000Old_age Always - 136 193 Load_Cycle_Count0x0032 199 199 000Old_age Always - 5908 The disks are of exact same model and look to be same firmware. Should I be worried that the newer disk has, in 136 hours reached a higher Load Cycle count twice as big as on the disk thats 5253 hours old? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: immense delayed write to file system (ZFS and UFS2), performance issues
Can anyone confirm that using the WDIDLE3 utility on the 2TB WD20EADS discs will not cause any issues if these disks are part of a ZFS mirror pool? I do have backups of data, but I would rather not spend the time rebuilding the entire system and restoring enormous amounts of data over a 100mbit network unless I absolutely have to :) - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: immense delayed write to file system (ZFS and UFS2), performance issues
Thank you, thank you, thank you! Now I neither have to worry about premature death of my disks, nor do I have to endure the loud clicking noises (I have a NAS with these in my living room)! If either of you (or both) want me to Paypal you money for a beer, send me details offlist :) - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: immense delayed write to file system (ZFS and UFS2), performance issues
>I have a WD2003FYPS sitting in a system, to be used for testing. Bought it >just before this thread started, and here's what it looks like right now: > > 9 Power_On_Hours 0x0032 100 100 000Old_age Always - > 508 >193 Load_Cycle_Count 0x0032 200 200 000Old_age Always - >2710 > >This drive is sitting, unused, with no filesystem, and I've performed >approximately zero writes to the disk. > >Having a script kick off and write to a disk will help so long as that >disk is writable; if it's being used as a hot spare in a raidz array, it's >not going to help much. I wouldn't worry in your particular case. A value of 2710 in 508 hours is a rate of 5,33/hour. At this rate, it's going to take you 56285 hours or 2345 days to reach 300,000 and most disks will likely function past 400,000 (over 600,000 all bets are off though). The people who need(ed) to worry were people like me, who were seeing the rate increase at a rate of 43+ per hour. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
booting off GPT partitions
Hey I was under the impression that everyone and their dog is using GPT partitioning in FreeBSD these days, including for boot drives and that I was just being unlucky with my current NAS motherboard (Intel D945GCLF2) having supposedly shaky support for GPT boot. But right now I am having an email exchange with Supermicro support (whom I contacted since I am pondering their X7SPA-H board for a new system), who are telling me that booting off GPT requires UEFI BIOS, which is supposedly a very new thing and that for example NONE of their current motherboards have support for this. Am I misunderstanding something or is the Supermicro support tech misguided? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: one more load-cycle-count problem
>Any further ideas how to get rid of this "feature"? You have several options. 1) The most "clean" solution is probably using the WDIDLE3 utility on your drives to disable automatic parking or in cases where its not possible to complete disable it, you can adjust it to 5 minutes, which essentially solves the problem. Note that going this route will probably involve rebuilding your entire array from scratch, because applying WDIDLE3 to the disk is likely to very slightly affect disk geometry, but just enough for hardware raid or ZFS or whatever to bark at you and refuse to continue using the drive in an existing pool (the affected disk can become very slightly smaller in capacity). Backup data, apply WDIDLE3 to all disks. Recreate the pool, restore backups. This will also void your warranty if used on the new WD drives, although it will still work just fine. 2) A less clean solution would be to setup a script that polls the SMART data of all disks affected by the problem every 8-9 seconds and have this script launch on boot. This will keep the affected drives just busy enough to not park their heads. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: one more load-cycle-count problem
2010/2/8 Gerrit Kühn : > On Mon, 8 Feb 2010 15:43:46 +0200 Dan Naumov wrote > about RE: one more load-cycle-count problem: > > DN> >Any further ideas how to get rid of this "feature"? > > DN> 1) The most "clean" solution is probably using the WDIDLE3 utility on > DN> your drives to disable automatic parking or in cases where its not > DN> possible to complete disable it, you can adjust it to 5 minutes, which > DN> essentially solves the problem. Note that going this route will > DN> probably involve rebuilding your entire array from scratch, because > DN> applying WDIDLE3 to the disk is likely to very slightly affect disk > DN> geometry, but just enough for hardware raid or ZFS or whatever to bark > DN> at you and refuse to continue using the drive in an existing pool (the > DN> affected disk can become very slightly smaller in capacity). Backup > DN> data, apply WDIDLE3 to all disks. Recreate the pool, restore backups. > DN> This will also void your warranty if used on the new WD drives, > DN> although it will still work just fine. > > Thanks for the warning. How on earth can a tool to set the idle time > affect the disk geometry?! WDIDLE3 changes the drive firmware. This is also how WD can detect you've used it on your disk and void your warranty accordingly :) - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
booting off a ZFS pool consisting of multiple striped mirror vdevs
Hello I have succesfully tested and used a "full ZFS install" of FreeBSD 8.0 on both single disk and mirror disk configurations using both MBR and GPT partitioning. AFAIK, with the more recent -CURRENT and -STABLE it is also possible to boot off a root filesystem located on raidz/raidz2 pools. But what about booting off pools consisting of multiple striped mirror or raidz vdevs? Like this: Assume each disk looks like a half of a traditional ZFS mirror root configuration using GPT 1: freebsd-boot 2: freebsd-swap 3: freebsd-zfs |disk1+disk2| + |disk3+disk4| + |disk5+disk6| My logic tells me that while booting off any of the 6 disks, boot0 and boot1 stage should obviously work fine, but what about the boot2 stage? Can it properly handle booting off a root filesystem thats striped across 3 mirror vdevs or is booting off a single mirror vdev the best that one can do right now? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
managing ZFS automatic mounts - FreeBSD deviates from Solaris?
Hello >From the SUN ZFS Administration Guide: http://docs.sun.com/app/docs/doc/819-5461/gaztn?a=view "If ZFS is currently managing the file system but it is currently unmounted, and the mountpoint property is changed, the file system remains unmounted." This does not seem to be the case in FreeBSD (8.0-RELEASE): = zfs get mounted tank/home NAMEPROPERTYVALUE SOURCE tank/home mounted no - zfs set mountpoint=/mnt/home tank/home zfs get mounted tank/home NAMEPROPERTYVALUE SOURCE tank/home mounted no - = This might not look like a serious issue at first, until you try doing an installation of FreeBSD from FIXIT, trying to setup multiple filesystems and their mountpoints at the very end of the installation process. For example if you set the mountpoint of your poolname/rootfs/usr to /usr as one of the finishing touches to the system installation, it will immideately mount the filesystem, instantly breaking your FIXIT environment and you cannot proceed any further. Is this a known issue and/or should I submit a PR? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: managing ZFS automatic mounts - FreeBSD deviates from Solaris?
On Sun, Feb 14, 2010 at 2:24 AM, Dan Naumov wrote: > Hello > > From the SUN ZFS Administration Guide: > http://docs.sun.com/app/docs/doc/819-5461/gaztn?a=view > > "If ZFS is currently managing the file system but it is currently > unmounted, and the mountpoint property is changed, the file system > remains unmounted." > > This does not seem to be the case in FreeBSD (8.0-RELEASE): > > = > zfs get mounted tank/home > NAME PROPERTY VALUE SOURCE > tank/home mounted no - > > zfs set mountpoint=/mnt/home tank/home > > zfs get mounted tank/home > NAME PROPERTY VALUE SOURCE > tank/home mounted no - > = > > This might not look like a serious issue at first, until you try doing > an installation of FreeBSD from FIXIT, trying to setup multiple > filesystems and their mountpoints at the very end of the installation > process. For example if you set the mountpoint of your > poolname/rootfs/usr to /usr as one of the finishing touches to the > system installation, it will immideately mount the filesystem, > instantly breaking your FIXIT environment and you cannot proceed any > further. Is this a known issue and/or should I submit a PR? Oops, I managed to screw up my previous email. My point was to show that "mounted" changes to YES after changing the mountpoint property :) - Dan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: hardware for home use large storage
> On Sun, 14 Feb 2010, Dan Langille wrote: >> After creating three different system configurations (Athena, >> Supermicro, and HP), my configuration of choice is this Supermicro >> setup: >> >> 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) >> 2. SuperMicro 5046A $750 (+$43 shipping) >> 3. LSI SAS 3081E-R $235 >> 4. SATA cables $60 >> 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) >> 6. Xeon W3520 $310 You do realise how much of a massive overkill this is and how much you are overspending? - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: hardware for home use large storage
On Sun, Feb 14, 2010 at 11:38 PM, Dan Langille wrote: > Dan Naumov wrote: >>> >>> On Sun, 14 Feb 2010, Dan Langille wrote: >>>> >>>> After creating three different system configurations (Athena, >>>> Supermicro, and HP), my configuration of choice is this Supermicro >>>> setup: >>>> >>>> 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) >>>> 2. SuperMicro 5046A $750 (+$43 shipping) >>>> 3. LSI SAS 3081E-R $235 >>>> 4. SATA cables $60 >>>> 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) >>>> 6. Xeon W3520 $310 >> >> You do realise how much of a massive overkill this is and how much you >> are overspending? > > > I appreciate the comments and feedback. I'd also appreciate alternative > suggestions in addition to what you have contributed so far. Spec out the > box you would build. == Case: Fractal Design Define R2 - 89 euro: http://www.fractal-design.com/?view=product&prod=32 Mobo/CPU: Supermicro X7SPA-H / Atom D510 - 180-220 euro: http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H PSU: Corsair 400CX 80+ - 59 euro: http://www.corsair.com/products/cx/default.aspx RAM: Corsair 2x2GB, DDR2 800MHz SO-DIMM, CL5 - 85 euro == Total: ~435 euro The motherboard has 6 native AHCI-capable ports on ICH9R controller and you have a PCI-E slot free if you want to add an additional controller card. Feel free to blow the money you've saved on crazy fast SATA disks and if your system workload is going to have a lot of random reads, then spend 200 euro on a 80gb Intel X25-M for use as a dedicated L2ARC device for your pool. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: hardware for home use large storage
On Mon, Feb 15, 2010 at 12:42 AM, Dan Naumov wrote: > On Sun, Feb 14, 2010 at 11:38 PM, Dan Langille wrote: >> Dan Naumov wrote: >>>> >>>> On Sun, 14 Feb 2010, Dan Langille wrote: >>>>> >>>>> After creating three different system configurations (Athena, >>>>> Supermicro, and HP), my configuration of choice is this Supermicro >>>>> setup: >>>>> >>>>> 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) >>>>> 2. SuperMicro 5046A $750 (+$43 shipping) >>>>> 3. LSI SAS 3081E-R $235 >>>>> 4. SATA cables $60 >>>>> 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) >>>>> 6. Xeon W3520 $310 >>> >>> You do realise how much of a massive overkill this is and how much you >>> are overspending? >> >> >> I appreciate the comments and feedback. I'd also appreciate alternative >> suggestions in addition to what you have contributed so far. Spec out the >> box you would build. > > == > Case: Fractal Design Define R2 - 89 euro: > http://www.fractal-design.com/?view=product&prod=32 > > Mobo/CPU: Supermicro X7SPA-H / Atom D510 - 180-220 euro: > http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H > > PSU: Corsair 400CX 80+ - 59 euro: > http://www.corsair.com/products/cx/default.aspx > > RAM: Corsair 2x2GB, DDR2 800MHz SO-DIMM, CL5 - 85 euro > == > Total: ~435 euro > > The motherboard has 6 native AHCI-capable ports on ICH9R controller > and you have a PCI-E slot free if you want to add an additional > controller card. Feel free to blow the money you've saved on crazy > fast SATA disks and if your system workload is going to have a lot of > random reads, then spend 200 euro on a 80gb Intel X25-M for use as a > dedicated L2ARC device for your pool. And to expand a bit, if you want that crazy performance without blowing silly amounts of money: Get a dock for holding 2 x 2,5" disks in a single 5,25" slot and put it at the top, in the only 5,25" bay of the case. Now add an additional PCI-E SATA controller card, like the often mentioned PCIE SIL3124. Now you have 2 x 2,5" disk slots and 8 x 3,5" disk slots, with 6 native SATA ports on the motherboard and more ports on the controller card. Now get 2 x 80gb Intel SSDs and put them into the dock. Now partition each of them in the following fashion: 1: swap: 4-5gb 2: freebsd-zfs: ~10-15gb for root filesystem 3: freebsd-zfs: rest of the disk: dedicated L2ARC vdev GMirror your SSD swap partitions. Make a ZFS mirror pool out of your SSD root filesystem partitions Build your big ZFS pool however you like out of the mechanical disks you have. Add the 2 x ~60gb partitions as dedicated independant L2ARC devices for your SATA disk ZFS pool. Now you have redundant swap, redundant and FAST root filesystem and your ZFS pool of SATA disks has 120gb worth of L2ARC space on the SSDs. The L2ARC vdevs dont need to be redundant, because should an IO error occur while reading off L2ARC, the IO is deferred to the "real" data location on the pool of your SATA disks. You can also remove your L2ARC vdevs from your pool at will, on a live pool. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: hardware for home use large storage
>> PSU: Corsair 400CX 80+ - 59 euro - > >> http://www.corsair.com/products/cx/default.aspx > > http://www.newegg.com/Product/Product.aspx?Item=N82E16817139008 for $50 > > Is that sufficient power up to 10 SATA HDD and an optical drive? Disk power use varies from about 8 watt/disk for "green" disks to 20 watt/disk for really powerhungry ones. So yes. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: hardware for home use large storage
> I had a feeling someone would bring up L2ARC/cache devices. This gives > me the opportunity to ask something that's been on my mind for quite > some time now: > > Aside from the capacity different (e.g. 40GB vs. 1GB), is there a > benefit to using a dedicated RAM disk (e.g. md(4)) to a pool for > L2ARC/cache? The ZFS documentation explicitly states that cache > device content is considered volatile. Using a ramdisk as an L2ARC vdev doesn't make any sense at all. If you have RAM to spare, it should be used by regular ARC. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: hardware for home use large storage
On Mon, Feb 15, 2010 at 7:14 PM, Dan Langille wrote: > Dan Naumov wrote: >> >> On Sun, Feb 14, 2010 at 11:38 PM, Dan Langille wrote: >>> >>> Dan Naumov wrote: >>>>> >>>>> On Sun, 14 Feb 2010, Dan Langille wrote: >>>>>> >>>>>> After creating three different system configurations (Athena, >>>>>> Supermicro, and HP), my configuration of choice is this Supermicro >>>>>> setup: >>>>>> >>>>>> 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) >>>>>> 2. SuperMicro 5046A $750 (+$43 shipping) >>>>>> 3. LSI SAS 3081E-R $235 >>>>>> 4. SATA cables $60 >>>>>> 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping) >>>>>> 6. Xeon W3520 $310 >>>> >>>> You do realise how much of a massive overkill this is and how much you >>>> are overspending? >>> >>> I appreciate the comments and feedback. I'd also appreciate alternative >>> suggestions in addition to what you have contributed so far. Spec out >>> the >>> box you would build. >> >> == >> Case: Fractal Design Define R2 - 89 euro: >> http://www.fractal-design.com/?view=product&prod=32 >> >> Mobo/CPU: Supermicro X7SPA-H / Atom D510 - 180-220 euro: >> http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H >> >> PSU: Corsair 400CX 80+ - 59 euro: >> http://www.corsair.com/products/cx/default.aspx >> >> RAM: Corsair 2x2GB, DDR2 800MHz SO-DIMM, CL5 - 85 euro >> == >> Total: ~435 euro >> >> The motherboard has 6 native AHCI-capable ports on ICH9R controller >> and you have a PCI-E slot free if you want to add an additional >> controller card. Feel free to blow the money you've saved on crazy >> fast SATA disks and if your system workload is going to have a lot of >> random reads, then spend 200 euro on a 80gb Intel X25-M for use as a >> dedicated L2ARC device for your pool. > > Based on the Fractal Design case mentioned above, I was told about Lian Lia > cases, which I think are great. As a result, I've gone with a tower case > without hot-swap. The parts are listed at and reproduced below: > > http://dan.langille.org/2010/02/15/a-full-tower-case/ > > 1. LIAN LI PC-A71F Black Aluminum ATX Full Tower Computer Case $240 (from > mwave) > 2. Antec EarthWatts EA650 650W PSU $80 > 3. Samsung SATA CD/DVD Burner $20 (+ $8 shipping) > 4. Intel S3200SHV LGA 775 Intel 3200 m/b $200 > 5. Intel Core2 Quad Q9400 CPU $190 > 6. SATA cables $22 > 7. Supermicro LSI MegaRAID 8 Port SAS RAID Controller $118 > 8. Kingston ValueRAM 4GB (2 x 2GB) 240-Pin DDR2 SDRAM ECC $97 > > Total cost is about $1020 with shipping. Plus HDD. > > No purchases yet, but the above is what appeals to me now. A C2Q CPU makes little sense right now from a performance POV. For the price of that C2Q CPU + LGA775 board you can get an i5 750 CPU and a 1156 socket motherboard that will run circles around that C2Q. You would lose the ECC though, since that requires the more expensive 1366 socket CPUs and boards. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: hardware for home use large storage
>> A C2Q CPU makes little sense right now from a performance POV. For the >> price of that C2Q CPU + LGA775 board you can get an i5 750 CPU and a 1156 >> socket motherboard that will run circles around that C2Q. You would lose >> the ECC though, since that requires the more expensive 1366 socket CPUs >> and boards. >> >> - Sincerely, >> Dan Naumov > > Hi, > > Do have test about this? I'm not really impressed with the i5 series. > > Regards, > Andras There: http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3634&p=10 The i5 750, which is a 180 euro CPU, beats Q9650 C2Q, which is a 300 euro CPU. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: booting off a ZFS pool consisting of multiple striped mirror vdevs
> I don't know, but I plan to test that scenario in a few days. > > Matt Please share the results when you're done, I am really curious :) > It *should* work... I made changes a while back that allow for multiple > vdevs to attach to the root. In this case you should have 3 mirror > vdevs attached to the root, so as long as the BIOS can enumerate all of > the drives, we should find all of the vdevs and build the tree > correctly. It should be simple enough to test in qemu, except that the > BIOS in qemu is a little broken and might not id all of the drives. > > robert. If booting of a stripe of 3 mirrors should work assuming no BIOS bugs, can you explain why is booting off simple stripes (of any number of disks) currently unsupported? I haven't tested that myself, but everywhere I look seems to indicate that booting off a simple stripe doesn't work or is that "everywhere" also out of date after your changes? :) - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: booting off a ZFS pool consisting of multiple striped mirror vdevs
On Fri, Feb 19, 2010 at 1:03 AM, Matt Reimer wrote: > On Thu, Feb 18, 2010 at 10:57 AM, Matt Reimer wrote: >> >> On Tue, Feb 16, 2010 at 12:38 AM, Dan Naumov wrote: >>> >>> > I don't know, but I plan to test that scenario in a few days. >>> > >>> > Matt >>> >>> Please share the results when you're done, I am really curious :) >> >> Booting from a stripe of two raidz vdevs works: >> FreeBSD/i386 boot >> Default: doom:/boot/zfsloader >> boot: status >> pool: doom >> config: >> NAME STATE >> doom ONLINE >> raidz1 ONLINE >> label/doom-0 ONLINE >> label/doom-1 ONLINE >> label/doom-2 ONLINE >> raidz1 ONLINE >> label/doom-3 ONLINE >> label/doom-4 ONLINE >> label/doom-5 ONLINE >> I'd guess a stripe of mirrors would work fine too. If I get a chance I'll >> test that combo. > > A stripe of three-way mirrors works: > FreeBSD/i386 boot > Default: mithril:/boot/zfsloader > boot: status > pool: mithril > config: > NAME STATE > mithril ONLINE > mirror ONLINE > label/mithril-0 ONLINE > label/mithril-1 ONLINE > label/mithril-2 ONLINE > mirror ONLINE > label/mithril-3 ONLINE > label/mithril-4 ONLINE > label/mithril-5 ONLINE > Matt A stripe of 3-way mirrors, whoa. Out of curiosity, what is the system used for? I am not doubting that there exist some uses/workloads for a system that uses 6 disks with 2 disks worth of usable space, but that's a bit of an unusual configuration. What are your system/disc specs and what kind of performance are you seeing from the pool? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance (fixed)
Hello folks A few weeks ago, there was a discussion started by me regarding abysmal read/write performance using ZFS mirror on 8.0-RELEASE. I was using an Atom 330 system with 2GB ram and it was pointed out to me that my problem was most likely having both disks attached to a PCI SIL3124 controller, switching to the new AHCI drivers didn't help one bit. To reitirate, here are the Bonnie and DD numbers I got on that system: === Atom 330 / 2gb ram / Intel board + PCI SIL3124 ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 8192 21041 53.5 22644 19.4 13724 12.8 25321 48.5 43110 14.0 143.2 3.3 dd if=/dev/zero of=/root/test1 bs=1M count=4096 4096+0 records in 4096+0 records out 4294967296 bytes transferred in 143.878615 secs (29851325 bytes/sec) (28,4 mb/s) === Since then, I switched the exact same disks to a different system: Atom D510 / 4gb ram / Supermicro X7SPA-H / ICH9R controller (native). Here are the updated results: ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 8192 30057 68.7 50965 36.4 27236 21.3 33317 58.0 53051 14.3 172.4 3.2 dd if=/dev/zero of=/root/test1 bs=1M count=4096 4096+0 records in 4096+0 records out 4294967296 bytes transferred in 54.977978 secs (78121594 bytes/sec) (74,5 mb/s) === Write performance now seems to have increased by a factor of 2 to 3 and is now definately in line with the expected performance of the disks in question (cheap 2TB WD20EADS with 32mb cache). Thanks to everyone who has offered help and tips! - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
8.0 on new hardware and a few errors, should I be worried?
Hello I've very recently finished installing 8.0-RELEASE on some new hardware and I noticed a few error messages that make me a bit uneasy. This is a snip from my dmesg: -- acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of fee0, 1000 (3) failed acpi0: reservation of 0, a (3) failed acpi0: reservation of 10, bf60 (3) failed -- What do these mean and should I worry about it? The full DMESG can be viewed here: http://jago.pp.fi/temp/dmesg.txt Additionally, while building a whole bunch of ports on this new system (about 30 or so, samba, ncftp, portaudit, bash, the usual suspects), I noticed the following in my logs during the build process: -- Feb 27 21:24:01 atombsd kernel: pid 38846 (try), uid 0: exited on signal 10 (core dumped) Feb 27 22:17:49 atombsd kernel: pid 89665 (conftest), uid 0: exited on signal 6 (core dumped) -- All ports seem to have built and installed succesfully. Again, what do these mean and should I worry about it? :) Thanks! - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
powerd on 8.0, is it considered safe?
Hello Is powerd finally considered stable and safe to use on 8.0? At least on 7.2, it consistently caused panics when used on Atom systems with Hyper-Threading enabled, but I recall that Attilio Rao was looking into it. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: powerd on 8.0, is it considered safe?
Okay, now I am baffled. Up until this point, I wasn't using powerd on this new Atom D510 system. I ran sysctl and noticed that dev.cpu.0.freq: is actually 1249 and doesn't change no matter what kind of load the system is under. If I boot to BIOS, under BIOS CPU is shown as 1,66 Ghz. Okayy... I guess this explains why my buildworld and buildkernel took over 5 hours if by default, it gets stuck at 1249 Mhz for no obvious reason. I enabled powerd and now according to dev.cpu.0.freq:, the system is permanently stuck at 1666 Mhz, regardless of whether the system is under load or not. atombsd# uname -a FreeBSD atombsd.localdomain 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan 5 21:11:58 UTC 2010 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 atombsd# kenv | grep smbios.planar.product smbios.planar.product="X7SPA-H" atombsd# sysctl dev.cpu dev.est dev.cpufreq dev.p4tcc debug.cpufreq kern.timecounter dev.cpu.0.%desc: ACPI CPU dev.cpu.0.%driver: cpu dev.cpu.0.%location: handle=\_PR_.P001 dev.cpu.0.%pnpinfo: _HID=none _UID=0 dev.cpu.0.%parent: acpi0 dev.cpu.0.freq: 1666 dev.cpu.0.freq_levels: 1666/-1 1457/-1 1249/-1 1041/-1 833/-1 624/-1 416/-1 208/-1 dev.cpu.0.cx_supported: C1/0 dev.cpu.0.cx_lowest: C1 dev.cpu.0.cx_usage: 100.00% last 500us dev.cpu.1.%desc: ACPI CPU dev.cpu.1.%driver: cpu dev.cpu.1.%location: handle=\_PR_.P002 dev.cpu.1.%pnpinfo: _HID=none _UID=0 dev.cpu.1.%parent: acpi0 dev.cpu.1.cx_supported: C1/0 dev.cpu.1.cx_lowest: C1 dev.cpu.1.cx_usage: 100.00% last 500us dev.cpu.2.%desc: ACPI CPU dev.cpu.2.%driver: cpu dev.cpu.2.%location: handle=\_PR_.P003 dev.cpu.2.%pnpinfo: _HID=none _UID=0 dev.cpu.2.%parent: acpi0 dev.cpu.2.cx_supported: C1/0 dev.cpu.2.cx_lowest: C1 dev.cpu.2.cx_usage: 100.00% last 500us dev.cpu.3.%desc: ACPI CPU dev.cpu.3.%driver: cpu dev.cpu.3.%location: handle=\_PR_.P004 dev.cpu.3.%pnpinfo: _HID=none _UID=0 dev.cpu.3.%parent: acpi0 dev.cpu.3.cx_supported: C1/0 dev.cpu.3.cx_lowest: C1 dev.cpu.3.cx_usage: 100.00% last 500us sysctl: unknown oid 'dev.est' Right. So how do I investigate why does the CPU get stuck at 1249 Mhz after boot by default when not using powerd and why it gets stuck at 1666 Mhz with powerd enabled and doesn't scale back down when IDLE? Out of curiosity, I stopped powerd but the CPU remained at 1666 Mhz. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: powerd on 8.0, is it considered safe?
>Up until this point, I wasn't using powerd on this new Atom D510 >system. I ran sysctl and noticed that dev.cpu.0.freq: is actually 1249 >and doesn't change no matter what kind of load the system is under. If >I boot to BIOS, under BIOS CPU is shown as 1,66 Ghz. Okayy... I guess >this explains why my buildworld and buildkernel took over 5 hours if >by default, it gets stuck at 1249 Mhz for no obvious reason. I enabled >powerd and now according to dev.cpu.0.freq:, the system is permanently >stuck at 1666 Mhz, regardless of whether the system is under load or >not. OK, a reboot somehow fixed the powerd issue: 1) Disabled powerd 2) Rebooted 3) Upon bootup, checked dev.cpu.0.freq - it's stuck at 1249 (should be 1666 by default) 4) Enabled and started powerd - CPU scales correctly according to load There is some bug somewhere though, because something puts my CPU to 1249 Mhz upon boot with powerd disabled and it gets stuck there, this shouldn't happen. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: powerd on 8.0, is it considered safe?
OK, now I feel a bit stupid. The second half of my PR at http://www.freebsd.org/cgi/query-pr.cgi?pr=144551 (anything related to powerd behaviour) can be ignored. For testing purposes, I started powerd in the foreground and observed it's behaviour. It works exactly as advertised and apparently the very act of issuing a "sysctl -a | grep dev.cpu.0.freq" command uses up a high % of CPU time for a fraction of a second, resulting in confusing output, I was always getting the highest cpu frequency state as the output. Testing powerd in foreground however, shows correct behaviour, CPU is downclocked both before and after issuing that command :) Still doesn't explain why the system boots up at 1249 Mhz, but that's not that big of an issue at this point now I see that powerd is behaving correctly. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Samba read speed performance tuning
On a FreeBSD 8.0-RELEASE/amd64 system with a Supermicro X7SPA-H board using an Intel gigabit nic with the em driver, running on top of a ZFS mirror, I was seeing a strange issue. Local reads and writes to the pool easily saturate the disks with roughly 75mb/s throughput, which is roughly the best these drives can do. However, working with Samba, writes to a share could easily pull off 75mb/s and saturate the disks, but reads off a share were resulting in rather pathetic 18mb/s throughput. I found a threadon the FreeBSD forums (http://forums.freebsd.org/showthread.php?t=9187) and followed the suggested advice. I rebuilt Samba with AIO support, kldloaded the aio module and made the following changes to my smb.conf From: socket options=TCP_NODELAY To: socket options=SO_RCVBUF=131072 SO_SNDBUF=131072 TCP_NODELAY min receivefile size=16384 use sendfile=true aio read size = 16384 aio write size = 16384 aio write behind = true dns proxy = no[/CODE] This showed a very welcome improvement in read speed, I went from 18mb/s to 48mb/s. The write speed remained unchanged and was still saturating the disks. Now I tried the suggested sysctl tunables: atombsd# sysctl net.inet.tcp.delayed_ack=0 net.inet.tcp.delayed_ack: 1 -> 0 atombsd# sysctl net.inet.tcp.path_mtu_discovery=0 net.inet.tcp.path_mtu_discovery: 1 -> 0 atombsd# sysctl net.inet.tcp.recvbuf_inc=524288 net.inet.tcp.recvbuf_inc: 16384 -> 524288 atombsd# sysctl net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.recvbuf_max: 262144 -> 16777216 atombsd# sysctl net.inet.tcp.sendbuf_inc=524288 net.inet.tcp.sendbuf_inc: 8192 -> 524288 atombsd# sysctl net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.sendbuf_max: 262144 -> 16777216 atombsd# sysctl net.inet.tcp.sendspace=65536 net.inet.tcp.sendspace: 32768 -> 65536 atombsd# sysctl net.inet.udp.maxdgram=57344 net.inet.udp.maxdgram: 9216 -> 57344 atombsd# sysctl net.inet.udp.recvspace=65536 net.inet.udp.recvspace: 42080 -> 65536 atombsd# sysctl net.local.stream.recvspace=65536 net.local.stream.recvspace: 8192 -> 65536 atombsd# sysctl net.local.stream.sendspace=65536 net.local.stream.sendspace: 8192 -> 65536 This improved the read speeds a further tiny bit, now I went from 48mb/s to 54mb/s. This is it however, I can't figure out how to increase Samba read speed any further. Any ideas? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Samba read speed performance tuning
On Fri, Mar 19, 2010 at 11:14 PM, Dan Naumov wrote: > On a FreeBSD 8.0-RELEASE/amd64 system with a Supermicro X7SPA-H board > using an Intel gigabit nic with the em driver, running on top of a ZFS > mirror, I was seeing a strange issue. Local reads and writes to the > pool easily saturate the disks with roughly 75mb/s throughput, which > is roughly the best these drives can do. However, working with Samba, > writes to a share could easily pull off 75mb/s and saturate the disks, > but reads off a share were resulting in rather pathetic 18mb/s > throughput. > > I found a threadon the FreeBSD forums > (http://forums.freebsd.org/showthread.php?t=9187) and followed the > suggested advice. I rebuilt Samba with AIO support, kldloaded the aio > module and made the following changes to my smb.conf > > From: > socket options=TCP_NODELAY > > To: > socket options=SO_RCVBUF=131072 SO_SNDBUF=131072 TCP_NODELAY > min receivefile size=16384 > use sendfile=true > aio read size = 16384 > aio write size = 16384 > aio write behind = true > dns proxy = no[/CODE] > > This showed a very welcome improvement in read speed, I went from > 18mb/s to 48mb/s. The write speed remained unchanged and was still > saturating the disks. Now I tried the suggested sysctl tunables: > > atombsd# sysctl net.inet.tcp.delayed_ack=0 > net.inet.tcp.delayed_ack: 1 -> 0 > > atombsd# sysctl net.inet.tcp.path_mtu_discovery=0 > net.inet.tcp.path_mtu_discovery: 1 -> 0 > > atombsd# sysctl net.inet.tcp.recvbuf_inc=524288 > net.inet.tcp.recvbuf_inc: 16384 -> 524288 > > atombsd# sysctl net.inet.tcp.recvbuf_max=16777216 > net.inet.tcp.recvbuf_max: 262144 -> 16777216 > > atombsd# sysctl net.inet.tcp.sendbuf_inc=524288 > net.inet.tcp.sendbuf_inc: 8192 -> 524288 > > atombsd# sysctl net.inet.tcp.sendbuf_max=16777216 > net.inet.tcp.sendbuf_max: 262144 -> 16777216 > > atombsd# sysctl net.inet.tcp.sendspace=65536 > net.inet.tcp.sendspace: 32768 -> 65536 > > atombsd# sysctl net.inet.udp.maxdgram=57344 > net.inet.udp.maxdgram: 9216 -> 57344 > > atombsd# sysctl net.inet.udp.recvspace=65536 > net.inet.udp.recvspace: 42080 -> 65536 > > atombsd# sysctl net.local.stream.recvspace=65536 > net.local.stream.recvspace: 8192 -> 65536 > > atombsd# sysctl net.local.stream.sendspace=65536 > net.local.stream.sendspace: 8192 -> 65536 > > This improved the read speeds a further tiny bit, now I went from > 48mb/s to 54mb/s. This is it however, I can't figure out how to > increase Samba read speed any further. Any ideas? Oh my god... Why did noone tell me how much of an enormous performance boost vfs.zfs.prefetch_disable=0 (aka actually enabling prefetch) is. My local reads off the mirror pool jumped from 75mb/s to 96mb/s (ie. they are now nearly 25% faster than reading off an individual disk) and reads off a Samba share skyrocketed from 50mb/s to 90mb/s. By default, FreeBSD sets vfs.zfs.prefetch_disable to 1 on any i386 systems and on any amd64 systems with less than 4GB of avaiable memory. My system is amd64 with 4gb ram, but integrated video eats some of that, so the autotuning disabled the prefetch. I had read up on it and a fair amount of people seemed to have performance issues caused by having prefetch enabled and get better results with it turned off, in my case however, it seems that enabling it gave a really solid boost to performance. - Sincerely Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Samba read speed performance tuning
On Sat, Mar 20, 2010 at 3:49 AM, Gary Gatten wrote: > It MAY make a big diff, but make sure during your tests you use unique files > or flush the cache or you'll me testing cache speed and not disk speed. Yeah I did make sure to use unique files for testing the effects of prefetch. This is Atom D510 / Supermicro X75SPA-H / 4Gb Ram with 2 x slow 2tb WD Green (WD20EADS) disks with 32mb cache in a ZFS mirror after enabling prefetch.: Code: bonnie -s 8192 ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 8192 29065 68.9 52027 39.8 39636 33.3 54057 95.4 105335 34.6 174.1 7.9 DD read: dd if=/dev/urandom of=test2 bs=1M count=8192 dd if=test2 of=/dev/zero bs=1M 8589934592 bytes transferred in 76.031399 secs (112978779 bytes/sec) (107,74mb/s) Individual disks read capability: 75mb/s Reading off a mirror of 2 disks with prefetch disabled: 60mb/s Reading off a mirror of 2 disks with prefetch enabled: 107mb/s - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
RE: Can't boot after make installworld
The ZFS bootloader has been changed in 8-STABLE compared to 8.0-RELEASE. Reinstall your boot blocks. P.S: "LOADER_ZFS_SUPPORT=YES" is also deprecated in 8-STABLE, not to mention that you have it in the wrong place, for 8.0, it goes into make.conf, not src.conf. Is there any particular reason you are upgrading from a production release to a development branch of the OS? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Can't boot after make installworld
> I've read that FreeBSD kernel supports 3D acceleration in ATI R7xx > chipset and as I own motherboard with HD3300 built-in I thought that I > would give it a try. I upgraded to see if there is any progress with > ¿zfs? I don't really know if it's zfs related, but at certain load, my > system crashes, and reboots. It happens only when using bonnie++ to > benchmark I/O. And I'm a little bit to lazy to prepare my system for > coredumps - I don't have swap slice for crashdumps, because I wanted > to simplify adding drives to my raidz1 configuration. Could anyone > tell me what's needed, besides having swap to produce good crashdump? As of right now, even if you don't care about capability to take crash dumps, it is highly recommended to still use traditional swap partitions even if your system is otherwise fully ZFS. There are know stability problems involving using a ZVOL as a swap device. These issues are being worked on, but this is still the situation as of now. > At first I didn't knew that I am upgrading to bleeding edge/developer > branch of FreeBSD. I'll come straight out with it, 8.0-STABLE sounds > more stable than 8.0-RELEASE-p2, which I was running before upgrade ;) > I'm a little confused with FreeBSD release cycle at first I compared > it with Debian release cycle, because I'm most familiar to it, and I > used it a lot before using FreeBSD. Debian development is more > one-dimensional - unstable/testing/stable/oldstable whereas FreeBSD > has two stable branches - 8.0 and 7.2 which are actively developed. > But still I am confused with FreeBSD naming and it's relation with > tags which are used in standard-supfile. We have something like this: > 9.0-CURRENT -> tag=. > 8.0-STABLE -> tag=RELENG_8 > 8.0-RELEASE-p2 -> tag=RELENG_8_0 ? (btw what does p2 mean?) > If someone patient could explain it to me I'd be grateful. 9-CURRENT: the real crazyland 8-STABLE: a dev branch, from which 8.0 was tagged and eventually 8.1 will be RELENG_8_0: 8.0-RELEASE + latest critical security and reliability updates (8.0 is up to patchset #2, hence -p2) Same line of thinking applies to 7-STABLE, 7.3-RELEASE and so on. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Can't boot after make installworld
On Mon, Mar 22, 2010 at 10:41 PM, Krzysztof Dajka wrote: > I've read that FreeBSD kernel supports 3D acceleration in ATI R7xx > chipset and as I own motherboard with HD3300 built-in I thought that I > would give it a try. I upgraded to see if there is any progress with > ¿zfs? I don't really know if it's zfs related, but at certain load, my > system crashes, and reboots. It happens only when using bonnie++ to > benchmark I/O. If you can consistently panic your 8.0 system with just bonnie++ alone, something is really really wrong. Are you using an amd64 system with 2gb ram or more or is this i386 + 1-2gb ram? Amd64 systems with 2gb ram or more don't really usually require any tuning whatsoever (except for tweaking performance for a specific workload), but if this is i386, tuning will be generally required to archieve stability. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS on top of GELI / Intel Atom 330 system
Is there anyone here using ZFS on top of a GELI-encrypted provider on hardware which could be considered "slow" by today's standards? What are the performance implications of doing this? The reason I am asking is that I am in the process of building a small home NAS/webserver, starting with a single disk (intending to expand as the need arises) on the following hardware: http://www.tranquilpc-shop.co.uk/acatalog/BAREBONE_SERVERS.html This is essentially: Intel Arom 330 1.6 Ghz dualcore on an Intel D945GCLF2-based board with 2GB Ram, the first disk I am going to use is a 1.5TB Western Digital Caviar Green. I had someone run a few openssl crypto benchmarks (to unscientifically assess the maximum possible GELI performance) on a machine running FreeBSD on nearly the same hardware and it seems the CPU would become the bottleneck at roughly 200 MB/s throughput when using 128 bit Blowfish, 70 MB/s when using AES128 and 55 MB/s when using AES256. This, on it's own is definately enough for my neeeds (especially in the case of using Blowfish), but what are the performance implications of using ZFS on top of a GELI-encrypted provider? Also, free free to criticize my planned filesystem layout for the first disk of this system, the idea behind /mnt/sysbackup is to take a snapshot of the FreeBSD installation and it's settings before doing potentially hazardous things like upgrading to a new -RELEASE: ad1s1 (freebsd system slice) ad1s1a => 128bit Blowfish ad1s1a.eli 4GB swap ad1s1b 128GB ufs2+s / ad1s1c 128GB ufs2+s noauto /mnt/sysbackup ad1s2 => 128bit Blowfish ad1s2.eli zpool /home /mnt/data1 Thanks for your input. - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI / Intel Atom 330 system
Ouch, that does indeed sounds quite slow, especially considering that a dual core Athlon 6400 is pretty fast CPU. Have you done any comparison benchmarks between UFS2 with Softupdates and ZFS on the same system? What are the read/write numbers like? Have you done any investigating regarding possible causes of ZFS working so slow on your system? Just wondering if its an ATA chipset problem, a drive problem, a ZFS problem or what... - Dan Naumov On Fri, May 29, 2009 at 12:10 PM, Pete French wrote: >> Is there anyone here using ZFS on top of a GELI-encrypted provider on >> hardware which could be considered "slow" by today's standards? What > > I run a mirrored zpool on top of a pair of 1TB SATA drives - they are > only 7200 rpm so pretty dog slow as far as I'm concerned. The > CPOU is a dual core Athlon 6400, and I am running amd64. The performance > is not brilliant - about 25 meg/second writing a file, and about > 53 meg/second reading it. > > It's a bit dissapointing really - thats a lot slower that I expected > when I built it, especially the write speed. > > -pete. > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI / Intel Atom 330 system
Thank you for your numbers, now I know what to expect when I get my new machine, since our system specs look identical. So basically on this system: unencrypted ZFS read: ~70 MB/s per disk 128bit Blowfish GELI/ZFS write: 35 MB/s per disk 128bit Blowfish GELI/ZFS read: 24 MB/s per disk I am curious what part of GELI is so inefficient to cause such a dramatic slowdown. In comparison, my home desktop is a C2D E6600 2,4 Ghz, 4GB RAM, Intel DP35DP, 1 x 1,5TB Seagate Barracuda - Windows Vista x64 SP1 Read/Write on an unencrypted NTFS partition: ~85 MB/s Read/Write on a Truecrypt AES-encrypted NTFS partition: ~65 MB/s As you can see, the performance drop is noticeable, but not anywhere nearly as dramatic. - Dan Naumov > I have a zpool mirror on top of two 128bit GELI blowfish devices with > Sectorsize 4096, my system is a D945GCLF2 with 2GB RAM and a Intel Arom > 330 1.6 Ghz dualcore. The two disks are a WDC WD10EADS and a WD10EACS > (5400rpm). The system is running 8.0-CURRENT amd64. I have set > kern.geom.eli.threads=3. > > This is far from a real benchmarks but: > > Using dd with bs=4m I get 35 MByte/s writing to the mirror (writing 35 > MByte/s to each disk) and 48 MByte/s reading from the mirror (reading > with 24 MByte/s from each disk). > > My experience is that ZFS is not much of an overhead and will not > degrade the performance as much as the encryption, so GELI is the > limiting factor. Using ZFS without GELI on this system gives way higher > read and write numbers, like reading with 70 MByte/s per disk etc. > > greetings, > philipp ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI / Intel Atom 330 system
Now that I have evaluated the numbers and my needs a bit, I am really confused about what appropriate course of action for me would be. 1) Use ZFS without GELI and hope that zfs-crypto get implemented in Solaris and ported to FreeBSD "soon" and that when it does, it won't come with such a dramatic performance decrease as GELI/ZFS seems to result in. 2) Go ahead with the original plan of using GELI/ZFS and grind my teeth at the 24 MB/s read speed off a single disk. >> So basically on this system: >> >> unencrypted ZFS read: ~70 MB/s per disk >> >> 128bit Blowfish GELI/ZFS write: 35 MB/s per disk >> 128bit Blowfish GELI/ZFS read: 24 MB/s per disk > I'm in the same spot as you, planning to build a home NAS. I have > settled for graid5/geli but haven't yet decided if I would benefit most > from a dual core CPU at 3+ GHz or a quad core at 2.6. Budget is a concern... Our difference is that my hardware is already ordered and Intel Atom 330 + D945GCLF2 + 2GB ram is what it's going to have :) - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI / Intel Atom 330 system
Pardon my ignorance, but what do these numbers mean and what information is deductible from them? - Dan Naumov > I don't mean to take this off-topic wrt -stable but just > for fun, I built a -current kernel with dtrace and did: > > geli onetime gzero > ./hotkernel & > dd if=/dev/zero of=/dev/gzero.eli bs=1m count=1024 > killall dtrace > geli detach gzero > > The hot spots: > [snip stuff under 0.3%] > kernel`g_eli_crypto_run 50 0.3% > kernel`_mtx_assert 56 0.3% > kernel`SHA256_Final 58 0.3% > kernel`rijndael_encrypt 72 0.4% > kernel`_mtx_unlock_flags 74 0.4% > kernel`rijndael128_encrypt 74 0.4% > kernel`copyout 92 0.5% > kernel`_mtx_lock_flags 93 0.5% > kernel`bzero 114 0.6% > kernel`spinlock_exit 240 1.3% > kernel`bcopy 325 1.7% > kernel`sched_idletd 810 4.3% > kernel`swcr_process 1126 6.0% > kernel`SHA256_Transform 1178 6.3% > kernel`rijndaelEncrypt 5574 29.7% > kernel`acpi_cpu_c1 8383 44.6% > > I had to build crypto and geom_eli into the kernel to get proper > symbols. > > References: > http://wiki.freebsd.org/DTrace > http://www.brendangregg.com/DTrace/hotkernel > > --Emil ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS NAS configuration question
Hey I am not entirely sure if this question belongs here or to another list, so feel free to direct me elsewhere :) Anyways, I am trying to figure out the best way to configure a NAS system I will soon get my hands on, it's a Tranquil BBS2 ( http://www.tranquilpc-shop.co.uk/acatalog/BAREBONE_SERVERS.html ). which has 5 SATA ports. Due to budget constraints, I have to start small, either a single 1,5 TB drive or at most, a small 500 GB system drive + a 1,5 TB drive to get started with ZFS. What I am looking for is a configuration setup that would offer maximum possible storage, while having at least _some_ redundancy and having the possibility to grow the storage pool without having to reload the entire setup. Using ZFS root right now seems to involve a fair bit of trickery (you need to make an .ISO snapshot of -STABLE, burn it, boot from it, install from within a fixit environment, boot into your ZFS root and then make and install world again to fix the permissions). To top that off, even when/if you do it right, not your entire disk goes to ZFS anyway, because you still do need a swap and a /boot to be non-ZFS, so you will have to install ZFS onto a slice and not the entire disk and even SUN discourages to do that. Additionally, there seems to be at least one reported case of a system failing to boot after having done installworld on a ZFS root: the installworld process removes the old libc, tries to install a new one and due to failing to apply some flags to it which ZFS doesn't support, leave it uninstall, leaving the system in an unusable state. This can be worked around, but gotchas like this and the amount of work involved in getting the whole thing running make me really lean towards having a smaller traditional UFS2 system disk for FreeBSD itself. So, this leaves me with 1 SATA port used for a FreeBSD disk and 4 SATA ports available for tinketing with ZFS. What would make the most sense if I am starting with 1 disk for ZFS and eventually plan on having 4 and want to maximise storage, yet have SOME redundancy in case of a disk failure? Am I stuck with 2 x 2 disk mirrors or is there some 3+1 configuration possible? Sincerely, - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS NAS configuration question
Is the idea behind leaving 1GB unused on each disk to work around the problem of potentially being unable to replace a failed device in a ZFS pool because a 1TB replacement you bought actually has a lower sector count than your previous 1TB drive (since the replacement device has to be either of exact same size or bigger than the old device)? - Dan Naumov On Sat, May 30, 2009 at 10:06 PM, Louis Mamakos wrote: > I built a system recently with 5 drives and ZFS. I'm not booting off a ZFS > root, though it does mount a ZFS file system once the system has booted from > a UFS file system. Rather than dedicate drives, I simply partitioned each > of the drives into a 1G partition ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI / Intel Atom 330 system
I am pretty sure that adding more disks wouldn't solve anything in this case, only either using a faster CPU or a faster crypto system. When you are capable of 70 MB/s reads on a single unecrypted disk, but only 24 MB/s reads off the same disk while encrypted, your disk speed isn't the problem. - Dan Naumov On Sun, May 31, 2009 at 5:29 PM, Ronald Klop wrote: > On Fri, 29 May 2009 13:34:57 +0200, Dan Naumov wrote: > >> Now that I have evaluated the numbers and my needs a bit, I am really >> confused about what appropriate course of action for me would be. >> >> 1) Use ZFS without GELI and hope that zfs-crypto get implemented in >> Solaris and ported to FreeBSD "soon" and that when it does, it won't >> come with such a dramatic performance decrease as GELI/ZFS seems to >> result in. >> 2) Go ahead with the original plan of using GELI/ZFS and grind my >> teeth at the 24 MB/s read speed off a single disk. > > 3) Add extra disks. It will speed up reading. One disk extra will about > double the read speed. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS on top of GELI / Intel Atom 330 system
Hi Since you are suggesting 2 x 8GB USB for a root partition, what is your experience with read/write speed and lifetime expectation of modern USB sticks under FreeBSD and why 2 of them, GEOM mirror? - Dan Naumov > Hi Dan, > > everybody has different needs, but what exactly are you doing with 128GB > of / ? What I did is the following: > > 2GB CF card + CF to ATA adapter (today, I would use 2x8GB USB sticks, > CF2ATA adapters suck, but then again, which Mobo has internal USB ports?) > > Filesystem 1024-blocks Used Avail Capacity Mounted on > /dev/ad0a 507630 139740 327280 30% / > /dev/ad0d 1453102 1292296 44558 97% /usr > /dev/md0 253678 16 233368 0% /tmp > > /usr is quite crowded, but I just need to clean up some ports again. > /var, /usr/src, /home, /usr/obj, /usr/ports are all on the GELI+ZFS > pool. If /usr turns out to be to small, I can also move /usr/local > there. That way booting and single user involves trusty old UFS only. > > I also do regular dumps from the UFS filesystems to the ZFS tank, but > there's really no sacred data under / or /usr that I would miss if the > system crashed (all configuration changes are tracked using mercurial). > > Anyway, my point is to use the full disks for GELI+ZFS whenever > possible. This makes it more easy to replace faulty disks or grow ZFS > pools. The FreeBSD base system, I would put somewhere else. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS NAS configuration question
USB root partition for booting off UFS is something I have considered. I have looked around and it seems that all the "install FreeBSD onto USB stick" guides seem to involve a lot of manual work from a fixit environment, does sysinstall not recognise USB drives as a valid disk device to parition/label/install FreeBSD on? If I do go with an USB boot/root, what things I should absolutely keep on it and which are "safe" to move to a ZFS pool? The idea is that in case my ZFS configuration goes bonkers for some reason, I still have a fully workable singleuser configuration to boot from for recovery. I haven't really used USB flash for many years, but I remember when they first started appearing on the shelves, they got well known for their horrible reliability (stick would die within a year of use, etc). Have they improved to the point of being good enough to host a root partition on, without having to setup some crazy GEOM mirror setup using 2 of them? - Dan Naumov 2009/6/2 Gerrit Kühn > On Sat, 30 May 2009 21:41:36 +0300 Dan Naumov wrote > about ZFS NAS configuration question: > > DN> So, this leaves me with 1 SATA port used for a FreeBSD disk and 4 SATA > DN> ports available for tinketing with ZFS. > > Do you have a USB port available to boot from? A conventional USB stick (I > use 4 GB or 8GB these days, but smaller ones would certainly also do) is > enough to hold the base system on UFS, and you can give the whole of your > disks to ZFS without having to bother with booting from them. > > > cu > Gerrit > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS NAS configuration question
This reminds me. I was reading the release and upgrade notes of OpenSolaris 2009.6 and noted one thing about upgrading from a previous version to the new one:: When you pick the "upgrade OS" option in the OpenSolaris installer, it will check if you are using a ZFS root partition and if you do, it intelligently suggests to take a current snapshot of the root filesystem. After you finish the upgrade and do a reboot, the boot menu offers you the option of booting the new upgraded version of the OS or alternatively _booting from the snapshot taken by the upgrade installation procedure_. Reading that made me pause for a second and made me go "WOW", this is how UNIX system upgrades should be done. Any hope of us lowly users ever seeing something like this implemented in FreeBSD? :) - Dan Naumov On Tue, Jun 2, 2009 at 9:47 PM, Zaphod Beeblebrox wrote: > > > The system boots from a pair of drives in a gmirror. Mot because you can't > boot from ZFS, but because it's just so darn stable (and it predates the use > of ZFS). > > Really there are two camps here --- booting from ZFS is the use of ZFS as > the machine's own filesystem. This is one goal of ZFS that is somewhat > imperfect on FreeBSD at the momment. ZFS file servers are another goal > where booting from ZFS is not really required and only marginally > beneficial. > > > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS NAS configuration question
A little more info for the (perhaps) curious: Managing Multiple Boot Environments: http://dlc.sun.com/osol/docs/content/2009.06/getstart/bootenv.html#bootenvmgr Introduction to Boot Environments: http://dlc.sun.com/osol/docs/content/2009.06/snapupgrade/index.html - Dan Naumov On Tue, Jun 2, 2009 at 10:39 PM, Dan Naumov wrote: > > This reminds me. I was reading the release and upgrade notes of OpenSolaris > 2009.6 and noted one thing about upgrading from a previous version to the new > one:: > > When you pick the "upgrade OS" option in the OpenSolaris installer, it will > check if you are using a ZFS root partition and if you do, it intelligently > suggests to take a current snapshot of the root filesystem. After you finish > the upgrade and do a reboot, the boot menu offers you the option of booting > the new upgraded version of the OS or alternatively _booting from the > snapshot taken by the upgrade installation procedure_. > > Reading that made me pause for a second and made me go "WOW", this is how > UNIX system upgrades should be done. Any hope of us lowly users ever seeing > something like this implemented in FreeBSD? :) > > - Dan Naumov > > > > > > On Tue, Jun 2, 2009 at 9:47 PM, Zaphod Beeblebrox wrote: >> >> >> The system boots from a pair of drives in a gmirror. Mot because you can't >> boot from ZFS, but because it's just so darn stable (and it predates the use >> of ZFS). >> >> Really there are two camps here --- booting from ZFS is the use of ZFS as >> the machine's own filesystem. This is one goal of ZFS that is somewhat >> imperfect on FreeBSD at the momment. ZFS file servers are another goal >> where booting from ZFS is not really required and only marginally beneficial. >> >> > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS NAS configuration question
Anyone else think that this combined with freebsd-update integration and a simplistic menu GUI for choosing the preferred boot environment would make an _awesome_ addition to the base system? :) - Dan Naumov On Wed, Jun 3, 2009 at 5:42 PM, Philipp Wuensche wrote: > I wrote a script implementing the most useful features of the solaris > live upgrade, the only thing missing is selecting a boot-environment > from the loader and freebsd-update support as I write the script on a > system running current. I use this on all my freebsd-zfs boxes and it is > extremely useful! > > http://anonsvn.h3q.com/projects/freebsd-patches/wiki/manageBE > > greetings, > philipp ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
sponsoring ZFS development on FreeBSD
Hello Kip, since your name comes up wherever I look about ZFS development on FreeBSD, I thought to send this mail to you directly as well. My question is concerning sponsoring the FreeBSD project and ZFS development in particular. I know I am just a relatively poor person so I can't contribute much (maybe on the order of 20-30 euro a month), but I keep seeing FreeBSD core team members keep mentioning "we value donations of all sizes", so what the hell :) Anyways, in the past I have directed my donations to The FreeBSD Foundation, if I want to ensure that as much of my money as possible goes directly to benefit the development of ZFS support on FreeBSD, should I continue donating to the foundation or should I be sending donations directly to specific developers? Thank you - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
gptzfsboot and RELENG_7
Hello list Any ideas if gptzfsboot is going to be MFC'ed into RELENG_7 anytime soon? I am going to be building a NAS soon and I would like to have a "full ZFS" system without having to resort to running 8-CURRENT :) Sincerely, - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: gptzfsboot and RELENG_7
Several posts made to this list AFTER the zfs v13 MFC to RELENG_7 indicated that even after that MFC, you still needed gptzfsboot from 8-CURRENT to be able to boot from a full ZFS system. Is this not the case? I have a 7.2-STABLE built on May 30 and I do not have gptzfsboot in my /boot, only gptboot. I didn't make any changes to the stock Makefiles and used GENERIC kernel config. Do I need to adjust some options for gptzfsboot to get built? - Dan Naumov >> > > 5/25/09 - last month > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: /boot/loader and RELENG_7 (WAS: gptzfsboot and RELENG_7)
Ah, so there is still a (small) piece of 8-CURRENT needed to have a working 7-STABLE zfs boot configuration? I am getting really confused now, if I add LOADER_ZFS_SUPPORT=yes to my /etc/make.conf, the RELENG_7 system will be built with zfs boot support, but I still need the actual /boot/loader from 8-CURRENT? Is that getting MFC'ed into into RELENG_7 anytime soon? Where are all make.conf options documented by the way? Neither /usr/share/examples/etc/make.conf nor "man make.conf" make any reference to the LOADER_ZFS_SUPPORT option. - Dan Naumov On Mon, Jun 8, 2009 at 7:49 PM, Alberto Villa wrote: > On Monday 08 June 2009 17:44:40 Dan Naumov wrote: >> Several posts made to this list AFTER the zfs v13 MFC to RELENG_7 >> indicated that even after that MFC, you still needed gptzfsboot from >> 8-CURRENT to be able to boot from a full ZFS system. Is this not the >> case? I have a 7.2-STABLE built on May 30 and I do not have gptzfsboot >> in my /boot, only gptboot. I didn't make any changes to the stock >> Makefiles and used GENERIC kernel config. Do I need to adjust some >> options for gptzfsboot to get built? > > no, it's /boot/loader from 8-current which is needed (the one shared on this > list works perfectly for me) > to build your system with zfs boot support just add LOADER_ZFS_SUPPORT=yes > to /etc/make.conf > -- > Alberto Villa ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
trouble building a "make release" snapshot of 7.2-STABLE
So first I cvsupped the entire cvs repository (sans ports) using the following supfile: === *default host=ftp13.FreeBSD.org *default base=/var/db *default prefix=/backup/ncvs *default release=cvs *default delete use-rel-suffix compress src-all doc-all cvsroot-all === Then I cd /usr/src/release and do: === make release RELEASETAG=RELENG_7 TARGET_ARCH=amd64 TARGET=amd64 BUILDNAME=7.2-STABLE CHROOTDIR=/backup/releng CVSROOT=/backup/ncvs NODOC=yes NOPORTS=yes NOPORTREADMES=yes MAKE_ISOS=yes NO_FLOPPIES=yes LOCAL_PATCHES=/root/zfs-libstand-loader-patch === However, the process bombs out on me within 5 seconds with the following: === -- >>> Installing everything -- cd /usr/src; make -f Makefile.inc1 install ===> share/info (install) install -o root -g wheel -m 444 dir-tmpl /backup/releng/usr/share/info/dir install:No such file or directory *** Error code 1 Stop in /usr/src/share/info. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src/release. === And... === agathon# which install /usr/bin/install === Any ideas? - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Issues with gjournal (heaaaaaaaaaaalp!)
You need to mount your /dev/ad6s1d.journal as /usr and not /dev/ad6s1d, because this is the new device provided to you by GEOM. - Dan Naumov On Thu, Jun 11, 2009 at 5:50 AM, Garrett Cooper wrote: > On Wed, Jun 10, 2009 at 7:44 PM, Garrett Cooper wrote: >> Hi Pawel, ATA, and Stable folks, >> >> This time when I did a reinstall I took the bullet and tried to >> use gjournaling instead of softupdates. The unfortunate thing is that >> I can't seem to get it to work. >> >> Here's the procedure that I'm trying to follow (based off of [1]): >> - sysinstall from scratch with a minimal distribution. This creates >> /usr // /dev/ad6s1d as UFS2 with softupdates disabled. >> - Pull latest stable sources. Rebuild kernel (with `options >> GEOM_JOURNAL'), world, install kernel, then world after reboot. >> - gjournal label -f ad6s1d ad6s2d >> - mount /dev/ad6s1d /usr # That works (I think...), but prints out the >> error message below: >> >> GEOM_JOURNAL: [flush] Error while writing data (error=1) >> ad6s2d[WRITE(offset=512, length=6656)] >> >> gjournal status says: >> Name Status Components >> ad6s1d.journal N/A ad6s1d >> ad6s2d >> >> Some issues I noticed: >> >> - GJOURNAL ROOT (something) loops infinitely if the device can't be >> found; this should probably time out and panic / exit if a device >> becomes unavailable (depends on fstab values in the final 2 fields no >> doubt). I did this by accident when I forgot to add iir statically to >> the kernel. >> - The LiveCD doesn't fully support gjournal (userland's there, kernel >> support isn't). Kind of annoying and counterproductive... >> - Existing journal partitions disappeared when I upgraded by accident >> from 7.2-RELEASE to 8-CURRENT (silly me copied my srcs.sup file from >> my server with label=.). That was weird... >> - When I use gjournal label with an existing filesystem I _must_ use -f. >> >> Any help with this endeavor would be more than appreciated, as I want >> to enable this functionality before I move on to installing X11, as >> nvidia-driver frequently hardlocks the desktop (or has in the past). >> >> Thanks, >> -Garrett >> >> [1] >> http://www.freebsd.org/doc/en_US.ISO8859-1/articles/gjournal-desktop/article.html > > And to answer another potential question, I've tried mounting both > with -o rw,async and with -o rw. > Thanks! > -Garrett > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Does this disk/filesystem layout look sane to you?
Hello list. I just wanted to have an extra pair (or a dozen) of eyes look this configuration over before I commit to it (tested it in VMWare just in case, it works, so I am considering doing this on real hardware soon). I drew a nice diagram: http://www.pastebin.ca/1460089 Since it doesnt show on the diagram, let me clarify that the geom mirror consumers as well as the vdevz for ZFS RAIDZ are going to be partitions (raw disk => full disk slice => swap partition | mirror provider partition | zfs vdev partition | unused. Is there any actual downside to having a 5-way mirror vs a 2-way or a 3-way one? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Does this disk/filesystem layout look sane to you?
The main reason for NOT using zfs directly on raw disks is the fact that you cannot replace a vdev in a pool with a smaller one, only with one of equal size or bigger. This leads to a problem: if you are a regular Joe User (and not a company buying certified hardware from a specific vendor) and want to replace one of the disks in your pool. The new 2tb disk you buy can very often be actually a few sectors smaller then the disk you are trying to replace, this in turn will lead to zfs not accepting the new disk as a replacement, because it's smaller (no matter how small). Using zfs on partitions instead and keeping a few gb unused on each disk leaves us with some room to play and be able to avoid this issue. - Dan Naumov On Mon, Jun 15, 2009 at 5:16 AM, Freddie Cash wrote: > I don't know for sure if it's the same on FreeBSD, but on Solaris, ZFS will > disable the onboard disk cache if the vdevs are not whole disks. IOW, if > you use slices, partitions, or files, the onboard disk cache is disabled. > This can lead to poor write performance. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Does this disk/filesystem layout look sane to you?
If this is true, some magic has been done to the FreeBSD port of ZFS, because according to SUN documentation is is definitely not supposed to be possible. - Dan Naumov On Mon, Jun 15, 2009 at 10:48 AM, Pete French wrote: >> The new 2tb disk you buy can very often be actually a few sectors >> smaller then the disk you are trying to replace, this in turn will >> lead to zfs not accepting the new disk as a replacement, because it's >> smaller (no matter how small). > > Heh - you are in for a pleasent surprise my friend! ;-) If you actually > try this in practice you will find ZFS *does* accept a smaller drive as > a replacement. Preseumably to cope with the natural variability in sector > size that you describe. > > Surprised me too the first time I saw it... > > -pete. > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Does this disk/filesystem layout look sane to you?
Haven't had time to test (stuck at work), but I will trust your word :) Well, this sounds nice and sensible. I am curious though if there have been any numbers regarding how much do "actual" drive sizes vary in the real world when it comes to disks of same manufacturer/model/size. I guess this probably varies from manufacturer to manufacturer, but some average estimates would be nice, just so that one could evaluate whether this 64k barrier is enough. - Dan Naumov On Mon, Jun 15, 2009 at 11:35 AM, Pete French wrote: >> If this is true, some magic has been done to the FreeBSD port of ZFS, >> because according to SUN documentation is is definitely not supposed >> to be possible. > > I just tried it again to make sure I wasn't imagining things - you > can give it a shot yourself using mdconfig to create some drives. It > will let me drop in a replacement up to about 64k smaller than the original > with no problems. Below that and it refuses saying the drive is too > small. > > -pete. > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates
I am wondering if the numbers I am seeing is something expected or is something broken somewhere. Output of bonnie -s 1024: on UFS2 + SoftUpdates: ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 1024 56431 94.5 88407 38.9 77357 53.3 64042 98.6 644511 98.6 23603.8 243.3 on ZFS: ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 1024 22591 53.7 45602 35.1 14770 13.2 45007 83.8 94595 28.0 102.2 1.2 atom# cat /boot/loader.conf vm.kmem_size="1024M" vm.kmem_size_max="1024M" vfs.zfs.arc_max="96M" The test isn't completely fair in that the test on UFS2 is done on a partition that resides on the first 16gb of a 2tb disk while the zfs test is done on the enormous 1,9tb zfs pool that comes after that partition (same disk). Can this difference in layout make up for the huge difference in performance or is there something else in play? The system is an Intel Atom 330 dualcore, 2gb ram, Western Digital Green 2tb disk. Also what would be another good way to get good numbers for comparing the performance of UFS2 vs ZFS on the same system. Sincerely, - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates
All the ZFS tuning guides for FreeBSD (including one on the FreeBSD ZFS wiki) have recommended values between 64M and 128M to improve stability, so that what I went with. How much of my max kmem is it safe to give to ZFS? - Dan Naumov On Thu, Jun 18, 2009 at 2:51 AM, Ronald Klop wrote: > Isn't 96M for ARC really small? > Mine is 860M. > vfs.zfs.arc_max: 860072960 > kstat.zfs.misc.arcstats.size: 657383376 > > I think the UFS2 cache is much bigger which makes a difference in your test. > > Ronald. > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ufs2 / softupdates / ZFS / disk write cache
I have the following setup: A single consumer-grade 2tb SATA disk: Western Digital Green (model WDC WD20EADS-00R6B0). This disk is setup like this: 16gb root partition with UFS2 + softupdates, containing mostly static things: /bin /boot /etc /root /sbin /usr /var and such a 1,9tb non-redundant zfs pool on top of a slice, it hosts things like: /DATA, /home, /usr/local, /var/log and such. What should I do to ensure (as much as possible) filesystem consistency of the root filesystem in the case of the power loss? I know there have been a lot of discussions on the subject of consumer-level disks literally lying about the state of files in transit (disks telling the system that files have been written to disk while in reality they are still in disk's write cache), in turn throwing softupdates off balance (since softupdates assumes the disks don't lie about such things), in turn sometimes resulting in severe data losses in the case of a system power loss during heavy disk IO. One of the solutions that was often brought up in the mailing lists is disabling the actual disk write cache via adding hw.ata.wc=0 to /boot/loader.conf, FreeBSD 4.3 actually even had this setting by default, but this was apparently reverted back because some people have reported a write performance regression on the tune of becoming 4-6 times slower. So what should I do in my case? Should I disable disk write cache via the hw.ata.wc tunable? As far as I know, ZFS has a write cache of it's own and since the ufs2 root filesystem in my case is mostly static data, I am guessing I "shouldn't" notice that big of a performance hit. Or am I completely in the wrong here and setting hw.ata.wc=0 is going to adversely affect the write performance on both the root partition AND the zfs pool despite zfs using it's own write cache? Another thing I have been pondering is: I do have 2gb of space left unused on the system (currently being used as swap, I have 2 swap slices, one 1gb at the very beginning of the disk, the other being 2gb at the end), which I could turn into a GJOURNAL for the root filesystem... Sincerely, - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Zpool on raw disk and weird GEOM complaint
On Mon, Jun 29, 2009 at 12:43 PM, Patrick M. Hausen wrote: > Hi, all, > > I have a system with 12 S-ATA disks attached that I set up > as a raidz2: > > %zpool status zfs > pool: zfs > state: ONLINE > scrub: scrub in progress for 0h5m, 7.56% done, 1h3m to go > config: > > NAME STATE READ WRITE CKSUM > zfs ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > da0 ONLINE 0 0 0 > da1 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > da9 ONLINE 0 0 0 > da10 ONLINE 0 0 0 > da11 ONLINE 0 0 0 > > errors: No known data errors I can't address your issue at hand, but I would point out that having a raidz/raidz2 consisting of more than 9 vdevs is a BAD IDEA (tm). All SUN documentation recommends using groups from 3 to 9 vdevs in size. There are known cases where using more vdevs than recommended causes performance degradation and more importantly, parity computation problems which can result in crashes and potential data loss. In your case, I would have the pool built as a group of 2 x 6-disk raidz. Sincerely, - Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: mergemaster merge left/right
Speaking of mergemaster, it would be really really nice to have "freebsd-update install" get the following functionality/options from mergemaster: -i Automatically install any files that do not exist in the destination directory. -F If the files differ only by VCS Id ($FreeBSD) install the new file. This would help avoid having to manually approve installation of hundreds of files in /etc when you upgrade to new releases using freebsd-update. - Sincerely, Dan Naumov On Fri, Jul 3, 2009 at 11:51 AM, Dominic Fandrey wrote: > I'd really like mergemaster to tell me whether the left > or the right side is the new file. > > # $FreeBSD: src/etc/devd.conf,v 1.38. | # $FreeBSD: src/etc/devd.conf,v 1.38. > > Like this I have no idea which one to pick. > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS and df weirdness
Hello list. I have a single 2tb disk used on a 7.2-release/amd64 system with a small part of it given to UFS and most of the disk given to a single "simple" zfs pool with several filesystems without redundancy. I've noticed a really weird thing regarding what "df" reports regarding the "total space" of one of my filesystems: atom# zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT tank 1.80T294G 1.51T15% ONLINE - atom# zfs list NAME USED AVAIL REFER MOUNTPOINT tank 294G 1.48T18K none tank/DATA 292G 1.48T 292G /DATA tank/home 216K 1.48T21K /home tank/home/jago132K 1.48T 132K /home/jago tank/home/karni62K 1.48T62K /home/karni tank/usr 1.33G 1.48T18K none tank/usr/local455M 1.48T 455M /usr/local tank/usr/obj 18K 1.48T18K /usr/obj tank/usr/ports412M 1.48T 412M /usr/ports tank/usr/src 495M 1.48T 495M /usr/src tank/var 320K 1.48T18K none tank/var/log 302K 1.48T 302K /var/log atom# df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad12s1a 16244334 1032310 13912478 7%/ devfs1 1 0 100%/dev linprocfs4 4 0 100%/usr/compat/linux/proc tank/DATA 1897835904 306397056 159143884816%/DATA tank/home 1591438848 0 1591438848 0%/home tank/home/jago 1591438976 128 1591438848 0%/home/jago tank/home/karni 1591438848 0 1591438848 0%/home/karni tank/usr/local 1591905024466176 1591438848 0%/usr/local tank/usr/obj1591438848 0 1591438848 0%/usr/obj tank/usr/ports 1591860864422016 1591438848 0%/usr/ports tank/usr/src1591945600506752 1591438848 0%/usr/src tank/var/log1591439104 256 1591438848 0%/var/log atom# df -h Filesystem SizeUsed Avail Capacity Mounted on /dev/ad12s1a15G1.0G 13G 7%/ devfs 1.0K1.0K 0B 100%/dev linprocfs 4.0K4.0K 0B 100%/usr/compat/linux/proc tank/DATA 1.8T292G1.5T16%/DATA tank/home 1.5T 0B1.5T 0%/home tank/home/jago 1.5T128K1.5T 0%/home/jago tank/home/karni1.5T 0B1.5T 0%/home/karni tank/usr/local 1.5T455M1.5T 0%/usr/local tank/usr/obj 1.5T 0B1.5T 0%/usr/obj tank/usr/ports 1.5T412M1.5T 0%/usr/ports tank/usr/src 1.5T495M1.5T 0%/usr/src tank/var/log 1.5T256K1.5T 0%/var/log Considering that every single filesystem is part of the exact same pool, with no custom options whatsoever used during filesystem creation (except for mountpoints), why is the size of tank/DATA 1.8T while the others are 1.5T? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS and df weirdness
On Sun, Jul 5, 2009 at 2:26 AM, Freddie Cash wrote: > > > On Sat, Jul 4, 2009 at 2:55 PM, Dan Naumov wrote: >> >> Hello list. >> >> I have a single 2tb disk used on a 7.2-release/amd64 system with a >> small part of it given to UFS and most of the disk given to a single >> "simple" zfs pool with several filesystems without redundancy. I've >> noticed a really weird thing regarding what "df" reports regarding the >> "total space" of one of my filesystems: >> >> atom# df -h >> Filesystem Size Used Avail Capacity Mounted on >> /dev/ad12s1a 15G 1.0G 13G 7% / >> devfs 1.0K 1.0K 0B 100% /dev >> linprocfs 4.0K 4.0K 0B 100% /usr/compat/linux/proc >> tank/DATA 1.8T 292G 1.5T 16% /DATA >> tank/home 1.5T 0B 1.5T 0% /home >> tank/home/jago 1.5T 128K 1.5T 0% /home/jago >> tank/home/karni 1.5T 0B 1.5T 0% /home/karni >> tank/usr/local 1.5T 455M 1.5T 0% /usr/local >> tank/usr/obj 1.5T 0B 1.5T 0% /usr/obj >> tank/usr/ports 1.5T 412M 1.5T 0% /usr/ports >> tank/usr/src 1.5T 495M 1.5T 0% /usr/src >> tank/var/log 1.5T 256K 1.5T 0% /var/log >> >> Considering that every single filesystem is part of the exact same >> pool, with no custom options whatsoever used during filesystem >> creation (except for mountpoints), why is the size of tank/DATA 1.8T >> while the others are 1.5T? > > Did you set a reservation for any of the other filesystems? Reserved space > is not listed in the "general" pool. no custom options whatsoever were used during filesystem creation (except for mountpoints). - Dan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bug in ufs?
2009/7/5 Marat N.Afanasyev : > hello! > > i have a strange problem with writing data to my ufs2+su filesystem. > > 1. i made a 1T gpt partition on my storage server, and formatted it: > newfs -U -m 0 -o time -i 32768 /dev/da1p3a > > 2. i tried to move data from other servers to this filesystem, total size of > files is slightly less than 1T > > 3. i encountered a 'No space left on device' while i still have 11G of free > space and about 13 million free inodes on the filesystem: > > #df -ih > Filesystem Size Used Avail Capacity iused ifree %iused Mounted > on > /dev/da1p3a 1.0T 1.0T 11G 99% 20397465 13363173 60% > /mnt/45_114 > > all i want to know is whether this is a bug or a feature? and if such a > behavior is well-known, where can i read about it? By default, a part of a filesystem is reserved, the amount reserved has historically varied between 5-8%. This is adjustable. See the "-m" switch to tunefs. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: What is /boot/kernel/*.symbols?
On Mon, Jul 6, 2009 at 11:34 AM, Dimitry Andric wrote: > On 2009-07-06 09:42, Patrick M. Hausen wrote: >>> #define ROOT_DEFAULT_SIZE 512 >> >> IMHO it is not. If you install a kernel with *.symbols present >> twice (i.e. kernel and kernel.old contain symbol files), your >> root partition will be > 95% full. > > I'm not sure how you arrive at this number; even with -CURRENT (on i386, > with all debug symbols), I could store about 4 complete kernels on such > a filesystem: > > $ du -hs /boot/kernel* > 122M /boot/kernel > 122M /boot/kernel.20090629a > 121M /boot/kernel.20090630a > 122M /boot/kernel.20090702a > 121M /boot/kernel.20090703a > > All other files on my root filesystem use up an additional ~25 MiB, so > in practice, it would be limited to 3 kernels, with more than enough > breathing room. atom# uname -a FreeBSD atom.localdomain 7.2-RELEASE-p1 FreeBSD 7.2-RELEASE-p1 #0: Tue Jun 9 18:02:21 UTC 2009 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 atom# du -hs /boot/kernel* 205M/boot/kernel This is on a stock 7.2-release/amd64 updated to -p1 with freebsd-update, 2 kernels is the maximum that would fit into the default 512mb partition size for /, a bit too tight for my liking. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
7.2-release/amd64: panic, spin lock held too long
I just got a panic following by a reboot a few seconds after running "portsnap update", /var/log/messages shows the following: Jul 7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel Jul 7 03:49:38 atom kernel: spin lock 0x80b3edc0 (sched lock 1) held by 0xff00017d8370 (tid 100054) too long Jul 7 03:49:38 atom kernel: panic: spin lock held too long /var/crash looks empty. This is a system running official 7.2-p1 binaries since I am using freebsd-update to keep up with the patches (just updated to -p2 after this panic) running with very low load, mostly serving files to my home network over Samba and running a few irssi instances in a screen. What do I need to do to catch more information if/when this happens again? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote: > 2009/7/7 Dan Naumov : >> I just got a panic following by a reboot a few seconds after running >> "portsnap update", /var/log/messages shows the following: >> >> Jul 7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel >> Jul 7 03:49:38 atom kernel: spin lock 0x80b3edc0 (sched lock >> 1) held by 0xff00017d8370 (tid 100054) too long >> Jul 7 03:49:38 atom kernel: panic: spin lock held too long > > That's a known bug, affecting -CURRENT as well. > The cpustop IPI is handled though an NMI, which means it could > interrupt a CPU in any moment, even while holding a spinlock, > violating one well known FreeBSD rule. > That means that the cpu can stop itself while the thread was holding > the sched lock spinlock and not releasing it (there is no way, modulo > highly hackish, to fix that). > In the while hardclock() wants to schedule something else to run and > got stuck on the thread lock. > > Ideal fix would involve not using a NMI for serving the cpustop while > having a cheap way (not making the common path too hard) to tell > hardclock() to avoid scheduling while cpustop is in flight. > > Thanks, > Attilio Any idea if a fix is being worked on and how unlucky must one be to run into this issue, should I expect it to happen again? Is it basically completely random? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: FreeBSD 8.0-BETA1 Available
On Tue, Jul 7, 2009 at 3:33 AM, Ken Smith wrote: > Be careful if you have SCSI drives, more USB disks than just the memory > stick, etc - make sure /dev/da0 (or whatever you use) is the memory > stick. Using this image for livefs based rescue mode is known to not > work, that is one of the things still being worked on. Hey Just wanted a small clarification, does livefs based rescue mode mean "fixit environment" or not? I would like to do some configuration testing with 8.0-beta1, but applying the configuration pretty much requires working in FIXIT, since sysinstall isn't exactly up to the task. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: What is /boot/kernel/*.symbols?
On Tue, Jul 7, 2009 at 7:09 PM, Rick C. Petty wrote: > On Tue, Jul 07, 2009 at 11:24:51AM +0200, Ruben de Groot wrote: >> On Mon, Jul 06, 2009 at 04:20:45PM -0500, Rick C. Petty typed: >> > On Mon, Jul 06, 2009 at 11:39:04AM +0200, Ruben de Groot wrote: >> > > On Mon, Jul 06, 2009 at 10:46:50AM +0200, Dimitry Andric typed: >> > > > >> > > > Right, so it's a lot bigger on amd64. I guess those 64-bit pointers >> > > > aren't entirely free. :) >> > > >> > > I'm not sure where the size difference comes from. I have some sparc64 >> > > systems running -current with symbols and the size of /boot/kernel is >> > > more comparable to i386, even with the 8-byte pointer size: >> > >> > Um, probably there are a lot of devices on amd64 that aren't available for >> > sparc64? >> >> Yes, That's probably it. > > It was just a theory; I don't have sparc64. What's your output of > "ls -1 /boot/kernel | wc"? > > -- Rick C. Petty atom# uname -a FreeBSD atom.localdomain 7.2-RELEASE-p2 FreeBSD 7.2-RELEASE-p2 #0: Wed Jun 24 00:14:35 UTC 2009 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 atom# ls -1 /boot/kernel | wc 10111011 15243 - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS: drive replacement performance
On Wed, Jul 8, 2009 at 1:32 AM, Freddie Cash wrote: > On Tue, Jul 7, 2009 at 3:26 PM, Mahlon E. Smith wrote: > >> On Tue, Jul 07, 2009, Freddie Cash wrote: >> > >> > This is why we've started using glabel(8) to label our drives, and then >> add >> > the labels to the pool: >> > # zpool create store raidz1 label/disk01 label/disk02 label/disk03 >> > >> > That way, it does matter where the kernel detects the drives or what the >> > physical device node is called, GEOM picks up the label, and ZFS uses the >> > label. >> >> Ah, slick. I'll definitely be doing that moving forward. Wonder if I >> could do it piecemeal now via a shell game, labeling and replacing each >> individual drive? Will put that on my "try it" list. Not to derail this discussion, but can anyone explain if the actual glabel metadata is protected in any way? If I use glabel to label a disk and then create a pool using /dev/label/disklabel, won't ZFS eventually overwrite the glabel metadata in the last sector since the disk in it's entirety is given to the pool? Or is every filesystem used by FreeBSD (ufs, zfs, etc) hardcoded to ignore the last few sectors of any disk and/or partition and not write data to it to avoid such issues? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
glabel metadata protection (WAS: ZFS: drive replacement performance)
>> Not to derail this discussion, but can anyone explain if the actual >> glabel metadata is protected in any way? If I use glabel to label a >> disk and then create a pool using /dev/label/disklabel, won't ZFS >> eventually overwrite the glabel metadata in the last sector since the >> disk in it's entirety is given to the pool? Or is every filesystem >> used by FreeBSD (ufs, zfs, etc) hardcoded to ignore the last few >> sectors of any disk and/or partition and not write data to it to avoid >> such issues? > > Disks labeled with glabel lose their last sector to the label. It is not > accessible by ZFS. Disks with bsdlabel partition tables are at risk due to > the brain dead decision to allow partitions to overlap the first sector, > but modern designs like glabel avoid this mistake. > > -- Brooks So what happens if I was to do the following (for the same of example): gpart create -s GPT /dev/ad1 glabel label -v disk01 /dev/ad1 gpart add -b 1 -s -t freebsd-zfs /dev/ad1 Does "gpart add" automatically somehow recognize that the last sector of contains the glabel and automatically re-adjusts this command to make the freebsd-zfs partition take "entiredisk minus last sector" ? I can understand the logic of metadata being protected if I do a: "gpart add -b 1 -s -t freebsd-zfs /dev/label/disk01" since gpart will have to go through the actual label first, but what actually happens if I issue a gpart directly to the /dev/device? - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao wrote: > 2009/7/7 Dan Naumov : >> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote: >>> 2009/7/7 Dan Naumov : >>>> I just got a panic following by a reboot a few seconds after running >>>> "portsnap update", /var/log/messages shows the following: >>>> >>>> Jul 7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel >>>> Jul 7 03:49:38 atom kernel: spin lock 0x80b3edc0 (sched lock >>>> 1) held by 0xff00017d8370 (tid 100054) too long >>>> Jul 7 03:49:38 atom kernel: panic: spin lock held too long >>> >>> That's a known bug, affecting -CURRENT as well. >>> The cpustop IPI is handled though an NMI, which means it could >>> interrupt a CPU in any moment, even while holding a spinlock, >>> violating one well known FreeBSD rule. >>> That means that the cpu can stop itself while the thread was holding >>> the sched lock spinlock and not releasing it (there is no way, modulo >>> highly hackish, to fix that). >>> In the while hardclock() wants to schedule something else to run and >>> got stuck on the thread lock. >>> >>> Ideal fix would involve not using a NMI for serving the cpustop while >>> having a cheap way (not making the common path too hard) to tell >>> hardclock() to avoid scheduling while cpustop is in flight. >>> >>> Thanks, >>> Attilio >> >> Any idea if a fix is being worked on and how unlucky must one be to >> run into this issue, should I expect it to happen again? Is it >> basically completely random? > > I'd like to work on that issue before BETA3 (and backport to > STABLE_7), I'm just time-constrained right now. > it is completely random. > > Thanks, > Attilio Ok, this is getting pretty bad, 23 hours later, I get the same kind of panic, the only difference is that instead of "portsnap update", this was triggered by "portsnap cron" which I have running between 3 and 4 am every day: Jul 8 03:03:49 atom kernel: ssppiinn lloocckk 00xxffffffff8800bb33eeddc400 ((sscchheedd lloocck k1 )0 )h ehledl db yb y 0x0xfff0f1081735339760e 0( t(itdi d 1016070)5 )t otoo ol olnogng Jul 8 03:03:49 atom kernel: p Jul 8 03:03:49 atom kernel: anic: spin lock held too long Jul 8 03:03:49 atom kernel: cpuid = 0 Jul 8 03:03:49 atom kernel: Uptime: 23h2m38s - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 7.2-release/amd64: panic, spin lock held too long
On Wed, Jul 8, 2009 at 3:57 AM, Dan Naumov wrote: > On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao wrote: >> 2009/7/7 Dan Naumov : >>> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote: >>>> 2009/7/7 Dan Naumov : >>>>> I just got a panic following by a reboot a few seconds after running >>>>> "portsnap update", /var/log/messages shows the following: >>>>> >>>>> Jul 7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel >>>>> Jul 7 03:49:38 atom kernel: spin lock 0x80b3edc0 (sched lock >>>>> 1) held by 0xff00017d8370 (tid 100054) too long >>>>> Jul 7 03:49:38 atom kernel: panic: spin lock held too long >>>> >>>> That's a known bug, affecting -CURRENT as well. >>>> The cpustop IPI is handled though an NMI, which means it could >>>> interrupt a CPU in any moment, even while holding a spinlock, >>>> violating one well known FreeBSD rule. >>>> That means that the cpu can stop itself while the thread was holding >>>> the sched lock spinlock and not releasing it (there is no way, modulo >>>> highly hackish, to fix that). >>>> In the while hardclock() wants to schedule something else to run and >>>> got stuck on the thread lock. >>>> >>>> Ideal fix would involve not using a NMI for serving the cpustop while >>>> having a cheap way (not making the common path too hard) to tell >>>> hardclock() to avoid scheduling while cpustop is in flight. >>>> >>>> Thanks, >>>> Attilio >>> >>> Any idea if a fix is being worked on and how unlucky must one be to >>> run into this issue, should I expect it to happen again? Is it >>> basically completely random? >> >> I'd like to work on that issue before BETA3 (and backport to >> STABLE_7), I'm just time-constrained right now. >> it is completely random. >> >> Thanks, >> Attilio > > Ok, this is getting pretty bad, 23 hours later, I get the same kind of > panic, the only difference is that instead of "portsnap update", this > was triggered by "portsnap cron" which I have running between 3 and 4 > am every day: > > Jul 8 03:03:49 atom kernel: ssppiinn lloocckk > 00xx8800bb33eeddc400 ((sscchheedd lloocck k1 )0 )h > ehledl db yb y 0x0xfff0f1081735339760e 0( t(itdi d > 1016070)5 )t otoo ol olnogng > Jul 8 03:03:49 atom kernel: p > Jul 8 03:03:49 atom kernel: anic: spin lock held too long > Jul 8 03:03:49 atom kernel: cpuid = 0 > Jul 8 03:03:49 atom kernel: Uptime: 23h2m38s I have now tried repeating the problem by running "stress --cpu 8 --io 8 --vm 4 --vm-bytes 1024M --timeout 600s --verbose" which pushed system load into the 15.50 ballpark and simultaneously running "portsnap fetch" and "portsnap update" but I couldn't manually trigger the panic, it seems that this problem is indeed random (although it baffles me why is it specifically portsnap triggering it). I have now disabled powerd to check whether that makes any difference to system stability. The only other things running on the system are: sshd, ntpd, smartd, smbd/nmdb and a few instances of irssi in screens. - Sincerely, Dan Naumov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"