from:"Dan Naumov"

using freebsd-update to update jails and their host

2011-02-27 Thread Dan Naumov

I have a 8.0 host system with a few jails (using ezjail) that I am gearing
to update to 8.2. I have used freebsd-update a few times in the past to
upgrade a system between releases, but how I would I go about using it to
also upgrade a few jails made using ezjail? I would obviously need to point
freebsd-update to use /basejail as root which I assume isn't too hard, but
what about having it merge the new/changed /etc files in individual jails?

I've also discovered the "ezjail-admin install -h file://" option which
installs a basejail using the host system as base, am I right in thinking I
could also use this by first upgrading my host and then running this command
to write the /basejail over with the updated files from the host to bring
them into sync? I still don't know how I would then fix the /etc under each
individual jail though.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

ZFS on top of GELI

2010-01-10 Thread Dan Naumov

Hello list.

I am evaluating options for my new upcoming storage system, where for
various reasons the data will be stored on 2 x 2tb SATA disk in a
mirror and has to be encrypted (a 40gb Intel SSD will be used for the
system disk). Right now I am considering the options of FreeBSD with
GELI+ZFS and Debian Linux with MDRAID and cryptofs. Has anyone here
made any benchmarks regarding how much of a performance hit is caused
by using 2 geli devices as vdevs for a ZFS mirror pool in FreeBSD (a
similar configuration is described here:
http://blog.experimentalworks.net/2008/03/setting-up-an-encrypted-zfs-with-freebsd/)?
Some direct comparisons using bonnie++ or similar, showing the number
differences of "this is read/write/IOPS on top of a ZFS mirror and
this is read/write/IOPS on top of a ZFS mirror using GELI" would be
nice.

I am mostly interested in benchmarks on lower end hardware, the system
is an Atom 330 which is currently using Windows 2008 server with
TrueCrypt in a non-raid configuration and with that setup, I am
getting roughly 55mb/s reads and writes when using TrueCrypt
(nonencrypted it's around 115mb/s).

Thanks.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI

2010-01-10 Thread Dan Naumov

On Sun, Jan 10, 2010 at 6:12 PM, Damian Gerow  wrote:
> Dan Naumov wrote:
> : I am mostly interested in benchmarks on lower end hardware, the system
> : is an Atom 330 which is currently using Windows 2008 server with
> : TrueCrypt in a non-raid configuration and with that setup, I am
> : getting roughly 55mb/s reads and writes when using TrueCrypt
> : (nonencrypted it's around 115mb/s).
>
> I've been using GELI-backed vdevs for some time now -- since 7.2-ish
> timeframes.  I've never benchmarked it, but I was running on relatively
> low-end hardware.  A few things to take into consideration:
>
> 1) Make sure the individual drives are encrypted -- especially if they're
>   >=1TB.  This is less a performance thing and more a "make sure your
>   encryption actually encrypts properly" thing.
> 2) Seriously consider using the new AHCI driver.  I've been using it in a
>   few places, and it's quite stable, and there is a marked performance
>   improvement - 10-15% on the hardware I've got.
> 3) Take a look at the VIA platform, as a replacement for the Atom.  I was
>   running on an EPIA-SN 1800 (1.8GHz), and didn't have any real troubles
>   with the encryption aspect of the rig (4x1TB drives).  Actually, if you
>   get performance numbers privately comparing the Atom to a VIA (Nano or
>   otherwise), can you post them to the list?  I'm curious to see if the
>   on-chip encryption actually makes a difference.
> 4) Since you're asking for benchmarks, probably best if you post the
>   specific bonnie command you want run -- that way, it's tailored to your
>   use-case, and you'll get consistant, comparable results.

Yes, this is what I was basically considering:

new AHCI driver => 40gb Intel SSD => UFS2 with Softupdates for the
system installation
new AHCI driver => 2 x 2tb disks, each fully encrypted with geli => 2
geli vdevs for a ZFS mirror for important data

The reason I am considering the new AHCI driver is to get NCQ support
now and TRIM support for the SSD later when it gets implemented,
although if the performance difference right now is already 10-15%,
that's a reason good enough on it's own. On a semi-related note, is it
still recommended to use softupdates or is GJournal a better choice
today?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI

2010-01-10 Thread Dan Naumov

On Sun, Jan 10, 2010 at 8:46 PM, Damian Gerow  wrote:
> Dan Naumov wrote:
> : Yes, this is what I was basically considering:
> :
> : new AHCI driver => 40gb Intel SSD => UFS2 with Softupdates for the
> : system installation
> : new AHCI driver => 2 x 2tb disks, each fully encrypted with geli => 2
> : geli vdevs for a ZFS mirror for important data
>
> If performance is an issue, you may want to consider carving off a partition
> on that SSD, geli-fying it, and using it as a ZIL device.  You'll probably
> see a marked performance improvement with such a setup.

That is true, but using a single device for a dedicated ZIL is a huge
no-no, considering it's an intent log, it's used to reconstruct the
pool in case of a power failure for example, should such an event
occur at the same time as a ZIL provider dies, you lose the entire
pool because there is no way to recover it, so if ZIL gets put
"elsewhere", that elsewhere really should be a mirror and sadly I
don't see myself affording to use 2 SSDs for my setup :)

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

bin/115406: [patch] gpt(8) GPT MBR hangs award BIOS on boot

2010-01-11 Thread Dan Naumov

I have a few questions about this PR:
http://www.freebsd.org/cgi/query-pr.cgi?pr=115406&cat=bin

1) Is this bug now officially fixed as of 8.0-RELEASE? Ie, can I
expect to set up a completely GPT-based system using an Intel
D945GCLF2 board and not have the installation crap out on me later?

2) The very last entry into the PR states the following:
"The problem has been addressed in gart(8) and gpt(8) is obsolete, so
no follow-up is to be expected at this time. Close the PR to reflect
this."

What exactly is "gart" and where do I find it's manpage,
http://www.freebsd.org/cgi/man.cgi comes up with nothing? Also, does
this mean that GPT is _NOT_ in fact fixed regarding this bug?

Thanks.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI

2010-01-11 Thread Dan Naumov

On Mon, Jan 11, 2010 at 7:30 PM, Pete French
 wrote:
>> GELI+ZFS and Debian Linux with MDRAID and cryptofs. Has anyone here
>> made any benchmarks regarding how much of a performance hit is caused
>> by using 2 geli devices as vdevs for a ZFS mirror pool in FreeBSD (a
>
> I havent done it directly on the same boxes, but I have two systems
> with idenitical drives, each with a ZFS mirror pool, one wth GELI, and
> one without. Simple read test shows no overhead in using GELI at all.
>
> I would recommend using the new AHCI driver though - greatly
> improves throughput.

How fast is the CPU in the system showing no overhead? Having no
noticable overhead whatsoever sounds extremely unlikely unless you are
actually using it on something like a very modern dualcore or better.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI

2010-01-11 Thread Dan Naumov

2010/1/12 Rafał Jackiewicz :
> Two hdd Seagate ES2,Intel Atom 330 (2x1.6GHz), 2GB RAM:
>
> geli:
>   geli init -s 4096 -K /etc/keys/ad4s2.key /dev/ad4s2
>   geli init -s 4096 -K /etc/keys/ad6s2.key /dev/ad6s2
>
> zfs:
>   zpool create data01 ad4s2.eli
>
> df -h:
>   dev/ad6s2.eli.journal    857G    8.0K    788G     0%    /data02
>   data01                           850G    128K    850G     0%    /data01
>
> srebrny# dd if=/dev/zero of=/data01/test bs=1M count=500
> 500+0 records in
> 500+0 records out
> 524288000 bytes transferred in 8.802691 secs (59559969 bytes/sec)
> srebrny# dd if=/dev/zero of=/data02/test bs=1M count=500
> 500+0 records in
> 500+0 records out
> 524288000 bytes transferred in 20.090274 secs (26096608 bytes/sec)
>
> Rafal Jackiewicz

Thanks, could you do the same, but using 2 .eli vdevs mirrorred
together in a zfs mirror?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI

2010-01-11 Thread Dan Naumov

On Tue, Jan 12, 2010 at 1:29 AM, K. Macy  wrote:
>>>
>>> If performance is an issue, you may want to consider carving off a partition
>>> on that SSD, geli-fying it, and using it as a ZIL device.  You'll probably
>>> see a marked performance improvement with such a setup.
>>
>> That is true, but using a single device for a dedicated ZIL is a huge
>> no-no, considering it's an intent log, it's used to reconstruct the
>> pool in case of a power failure for example, should such an event
>> occur at the same time as a ZIL provider dies, you lose the entire
>> pool because there is no way to recover it, so if ZIL gets put
>> "elsewhere", that elsewhere really should be a mirror and sadly I
>> don't see myself affording to use 2 SSDs for my setup :)
>>
>
> This is  false. The ZIL is used for journalling synchronous writes. If
> your ZIL is lost you will lose the data that was written to the ZIL,
> but not yet written to the file system proper. Barring disk
> corruption, the file system is always consistent.
>
> -Kip

Ok, lets assume we have a dedicated ZIL on a single non-redundant
disk. This disk dies. How do you remove the dedicated ZIL from the
pool or replace it with a new one? Solaris ZFS documentation indicates
that this is possible for dedicated L2ARC - you can remove a dedicated
l2arc from a pool at any time you wish and should some IO fail on the
l2arc, the system will gracefully continue to run, reverting said IO
to be processed by the actual default built-in ZIL on the disks of the
pool. However the capability to remove dedicated ZIL or gracefully
handle the death of a non-redundant dedicated ZIL vdev does not
currently exist in Solaris/OpenSolaris at all.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

installing FreeBSD 8 on SSDs and UFS2 - partition alignment, block sizes, what does one need to know?

2010-01-12 Thread Dan Naumov

For my upcoming storage system, the OS install is going to be on a
80gb Intel SSD disk and for various reasons, I am now pretty convinced
to stick with UFS2 for the root partition (the actual data pool will
be ZFS using traditional SATA disks). I am probably going to use GPT
partitioning and have the SSD host the swap, boot, root and a few
other partitions. What do I need to know in regards to partition
alignment and filesystem block sizes to get the best performance out
of the Intel SSDs?

Thanks.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI

2010-01-12 Thread Dan Naumov

2010/1/12 Rafał Jackiewicz :
>>Thanks, could you do the same, but using 2 .eli vdevs mirrorred
>>together in a zfs mirror?
>>
>>- Sincerely,
>>Dan Naumov
>
> Hi,
>
> Proc: Intell Atom 330 (2x1.6Ghz) - 1 package(s) x 2 core(s) x 2 HTT threads
> Chipset: Intel 82945G
> Sys: 8.0-RELEASE FreeBSD 8.0-RELEASE #0
> empty file: /boot/loader.conf
> Hdd:
>   ad4: 953869MB  at ata2-master SATA150
>   ad6: 953869MB  at ata3-master SATA150
> Geli:
>   geli init -s 4096 -K /etc/keys/ad4s2.key /dev/ad4s2
>   geli init -s 4096 -K /etc/keys/ad6s2.key /dev/ad6s2
>
>
> Results:
> 
>
> *** single drive                        write MB/s      read  MB/s
> eli.journal.ufs2                        23              14
> eli.zfs                         19              36
>
>
> *** mirror                              write MB/s      read  MB/s
> mirror.eli.journal.ufs2 23              16
> eli.zfs                         31              40
> zfs                                     83              79
>
>
> *** degraded mirror             write MB/s      read MB/s
> mirror.eli.journal.ufs2 16              9
> eli.zfs                         56              40
> zfs                                     86              71
>
> 

Thanks a lot for your numbers, the relevant part for me was this:

*** mirror  write MB/s  read  MB/s
eli.zfs 31  40
zfs 83  79

*** degraded mirror write MB/s  read MB/s
eli.zfs 56  40
zfs 86  71

31 mb/s writes and 40 mb/s reads is something that I guess I could
potentially live with. I am guessing the main problem of stacking ZFS
on top of geli like this is the fact that writing to a mirror requires
double the CPU use, because we have to encrypt all written data twice
(once to each disk) instead of encrypting first and then writing the
encrypted data to 2 disks as would be the case if we had crypto
sitting on top of ZFS instead of ZFS sitting on top of crypto.

I now have to reevaluate my planned use of an SSD though, I was
planning to use a 40gb partition on an Intel 80GB X25-M G2 as a
dedicated L2ARC device for a ZFS mirror of 2 x 2tb disks. However
these numbers make it quite obvious that I would already be
CPU-starved at 40-50mb/s throughput on the encrypted ZFS mirror, so
adding an l2arc SSD, while improving latency, would do really nothing
for actual disk read speeds, considering the l2arc itself would too,
have to sit on top of a GELI device.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: bin/115406: [patch] gpt(8) GPT MBR hangs award BIOS on boot

2010-01-18 Thread Dan Naumov

> 1) Is this bug now officially fixed as of 8.0-RELEASE? Ie, can I
> expect to set up a completely GPT-based system using an Intel
> D945GCLF2 board and not have the installation crap out on me later?
>
> 2) The very last entry into the PR states the following:
> "The problem has been addressed in gart(8) and gpt(8) is obsolete, so
> no follow-up is to be expected at this time. Close the PR to reflect
> this."

Hello list. Referring to PR:
http://www.freebsd.org/cgi/query-pr.cgi?pr=115406&cat=bin

I have now been battling with trying to setup a FreeBSD 8.0 system
using GPT on an Intel D945GCLF2 board for over 24 hours and it looks
to me that the problem is not resolved. If I do a traditional
installation using sysinstall / MBR, everything works. But if I use
GPT and do a manual installation and do everything right, the way it's
supposed to be done, the BIOS refuses to boot off the disk. I have
verified that I am doing everything right by employing the exact same
installation method with GPT inside a VMWare Player virtual machine
and there, everything works as expected and I have also been testing
this with an installation script in both cases to ensure that this is
definately no user error :)

Reading the original PR, it can be seen that a (supposed) fix to gpart
was committed to stable/8 back in Aug 27, is it possible that this
somehow didn't make it into 8.0-RELEASE or is this a question of the
fix being there but not actually solving the problem?

Reading the discussion on the forums at
http://forums.freebsd.org/showthread.php?t=4680 I am seeing that a
7.2-RELEASE user had solved his exact same problem by editing the
actual PMBR (resulting in "bootable" flag (0x80) being set and the
start of the partition has being set to the beginning of the disk
(0x010100).) and applying it to his disk with DD. Can anyone point me
towards an explanation regarding how to edit and apply my own PMBR to
my disk to see if it helps?

Thanks.


Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

8.0-RELEASE / gpart / GPT / marking a partition as "active"

2010-01-19 Thread Dan Naumov

It seems that quite a few BIOSes have serious issues booting off disks
using GPT partitioning when no partition present is marked as
"active". See http://www.freebsd.org/cgi/query-pr.cgi?pr=115406&cat=bin
for a prime example.

In 8.0-RELEASE, using gpart, setting a slice as "active" in MBR
partitioning mode is trivial, ie:

gpart set -a active -i 1 DISKNAME

However, trying to do the same thing with GPT partitioning yields no results:

gpart set -a active -i 1 DISKNAME
gpart: attrib 'active': Device not configured

As a result of this issue, I can configure and make a succesfull
install using GPT in 8.0, but I cannot boot off it using my Intel
D945GCLF2 board.

I have found this discussion from about a month ago:
http://www.mail-archive.com/freebsd-stable@freebsd.org/msg106918.html
where Robert mentions that "gpart set -a active -i 1" is no longer
needed in 8-STABLE, because the pmbr will be marked as active during
the installation of the bootcode. Is there anything I can do to
archieve the same result in 8.0-RELEASE or is installing from a
snapshop of 8-STABLE my only option?

Thanks.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE / gpart / GPT / marking a partition as "active"

2010-01-19 Thread Dan Naumov

On 1/19/2010 12:11 PM, Dan Naumov wrote:
> It seems that quite a few BIOSes have serious issues booting off disks
> using GPT partitioning when no partition present is marked as
> "active". See http://www.freebsd.org/cgi/query-pr.cgi?pr=115406&cat=bin
> for a prime example.
>
> In 8.0-RELEASE, using gpart, setting a slice as "active" in MBR
> partitioning mode is trivial, ie:
>
> gpart set -a active -i 1 DISKNAME
>
> However, trying to do the same thing with GPT partitioning yields no results:
>
> gpart set -a active -i 1 DISKNAME
> gpart: attrib 'active': Device not configured
>
> As a result of this issue, I can configure and make a succesfull
> install using GPT in 8.0, but I cannot boot off it using my Intel
> D945GCLF2 board.
>
> I have found this discussion from about a month ago:
> http://www.mail-archive.com/freebsd-stable@freebsd.org/msg106918.html
> where Robert mentions that "gpart set -a active -i 1" is no longer
> needed in 8-STABLE, because the pmbr will be marked as active during
> the installation of the bootcode. Is there anything I can do to
> archieve the same result in 8.0-RELEASE or is installing from a
> snapshop of 8-STABLE my only option?

> After using gpart to create the GPT (and thus the PMBR and its
> bootcode), why not simply use "fdisk -a -1 DISKNAME" to set the PMBR
> partition active?

According to the fdisk output, the partition flag did change from 0 to
80. Can the "fdisk: Class not found" error showing up at the very end
of the procedure of doing "fdisk -a -1 DISKNAME" be safely ignored?

- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Loader, MBR and the boot process

2010-01-21 Thread Dan Naumov

I recently found a nifty "FreeBSD ZFS root installation script" and
been reworking it a bit to suit my needs better, including changing it
from GPT to MBR partitioning. However, I was stumped, even though I
had done everything right (or so I thought), the system would get
stuck at Loader and refuse to go anywhere. After trying over a dozen
different things, it downed on me to change the partition order inside
the slice, I had 1) swap 2) freebsd-zfs and for the test, I got rid of
swap altogether and gave the entire slice to the freebsd-zfs
partition. Suddenly, my problem went away and the system booted just
fine. So it seems that Loader requires that the partition containing
the files vital to the boot is the first partition on the slice and
that "swap first, then the rest" doesn't work.

The thing is, I am absolutely positive that in the past, I've had
sysinstall created installs using MBR partitioning and that I had swap
as my first partition inside the slice and that it all worked dandy.
Has this changed at some point? Oh, and for the curious the
installation script is here: http://jago.pp.fi/zfsmbrv1-works.sh


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Loader, MBR and the boot process

2010-01-21 Thread Dan Naumov

On Fri, Jan 22, 2010 at 6:12 AM, Thomas K.  wrote:
> On Fri, Jan 22, 2010 at 05:57:23AM +0200, Dan Naumov wrote:
>
> Hi,
>
>> I recently found a nifty "FreeBSD ZFS root installation script" and
>> been reworking it a bit to suit my needs better, including changing it
>> from GPT to MBR partitioning. However, I was stumped, even though I
>> had done everything right (or so I thought), the system would get
>> stuck at Loader and refuse to go anywhere. After trying over a dozen
>
> probably this line is the cause:
>
> dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s1a skip=1 seek=1024
>
> Unless by "swap first" you meant the on-disk location, and not the
> partition letter. If swap is partition "a", you're writing the loader
> into swapspace.
>
>
> Regards,
> Thomas

At first you made me feel silly, but then I decided to double-check, I
uncommented the swap line in the partitioning part again, ensured I
was writing the bootloader to "${TARGETDISK}"s1b and ran the script.
Same problem, hangs at loader. Again, if I comment out the swap,
giving the entire slice to ZFS and then write the bootloader to
"${TARGETDISK}"s1a, run the script, everything works.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Loader, MBR and the boot process

2010-01-21 Thread Dan Naumov

On Fri, Jan 22, 2010 at 6:49 AM, Dan Naumov  wrote:
> On Fri, Jan 22, 2010 at 6:12 AM, Thomas K.  wrote:
>> On Fri, Jan 22, 2010 at 05:57:23AM +0200, Dan Naumov wrote:
>>
>> Hi,
>>
>>> I recently found a nifty "FreeBSD ZFS root installation script" and
>>> been reworking it a bit to suit my needs better, including changing it
>>> from GPT to MBR partitioning. However, I was stumped, even though I
>>> had done everything right (or so I thought), the system would get
>>> stuck at Loader and refuse to go anywhere. After trying over a dozen
>>
>> probably this line is the cause:
>>
>> dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s1a skip=1 seek=1024
>>
>> Unless by "swap first" you meant the on-disk location, and not the
>> partition letter. If swap is partition "a", you're writing the loader
>> into swapspace.
>>
>>
>> Regards,
>> Thomas
>
> At first you made me feel silly, but then I decided to double-check, I
> uncommented the swap line in the partitioning part again, ensured I
> was writing the bootloader to "${TARGETDISK}"s1b and ran the script.
> Same problem, hangs at loader. Again, if I comment out the swap,
> giving the entire slice to ZFS and then write the bootloader to
> "${TARGETDISK}"s1a, run the script, everything works.

I have also just tested creating 2 slices, like this:

gpart create -s mbr "${TARGETDISK}"
gpart add -s 3G -t freebsd "${TARGETDISK}"
gpart create -s BSD "${TARGETDISK}"s1
gpart add -t freebsd-swap "${TARGETDISK}"s1

gpart add -t freebsd "${TARGETDISK}"
gpart create -s BSD "${TARGETDISK}"s2
gpart add -t freebsd-zfs "${TARGETDISK}"s2

gpart set -a active -i 2 "${TARGETDISK}"
gpart bootcode -b /mnt2/boot/boot0 "${TARGETDISK}"


and later:

dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s2 count=1
dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s2a skip=1 seek=1024


Putting the swap into it's own slice and then putting FreeBSD into
it's own slice worked fine. So why the hell can't they both coexist in
1 slice if the swap comes first?


- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

posting coding bounties, appropriate money amounts?

2010-01-22 Thread Dan Naumov

Hello

I am curious about posting some coding bounties, my current interest
revolves around improving the ZVOL functionality in FreeBSD: fixing
the known ZVOL SWAP reliability/stability problems as well as making
ZVOLs work as a dumpon device (as is already the case in OpenSolaris)
for crash dumps. I am a private individual and not some huge Fortune
100 and while I am not exactly rich, I am willing to put some of my
personal money towards this. I am curious though, what would be the
best way to approach this: directly approaching committer(s) with the
know-how-and-why of the areas involved or through the FreeBSD
Foundation? And how would one go about calculating the appropriate
amount of money for such a thing?

Thanks.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Loader, MBR and the boot process

2010-01-24 Thread Dan Naumov

On Sun, Jan 24, 2010 at 5:29 PM, John  wrote:
> On Fri, Jan 22, 2010 at 07:02:53AM +0200, Dan Naumov wrote:
>> On Fri, Jan 22, 2010 at 6:49 AM, Dan Naumov  wrote:
>> > On Fri, Jan 22, 2010 at 6:12 AM, Thomas K.  wrote:
>> >> On Fri, Jan 22, 2010 at 05:57:23AM +0200, Dan Naumov wrote:
>> >>
>> >> Hi,
>> >>
>> >>> I recently found a nifty "FreeBSD ZFS root installation script" and
>> >>> been reworking it a bit to suit my needs better, including changing it
>> >>> from GPT to MBR partitioning. However, I was stumped, even though I
>> >>> had done everything right (or so I thought), the system would get
>> >>> stuck at Loader and refuse to go anywhere. After trying over a dozen
>> >>
>> >> probably this line is the cause:
>> >>
>> >> dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s1a skip=1 seek=1024
>> >>
>> >> Unless by "swap first" you meant the on-disk location, and not the
>> >> partition letter. If swap is partition "a", you're writing the loader
>> >> into swapspace.
>> >>
>> >>
>> >> Regards,
>> >> Thomas
>> >
>> > At first you made me feel silly, but then I decided to double-check, I
>> > uncommented the swap line in the partitioning part again, ensured I
>> > was writing the bootloader to "${TARGETDISK}"s1b and ran the script.
>> > Same problem, hangs at loader. Again, if I comment out the swap,
>> > giving the entire slice to ZFS and then write the bootloader to
>> > "${TARGETDISK}"s1a, run the script, everything works.
>>
>> I have also just tested creating 2 slices, like this:
>>
>> gpart create -s mbr "${TARGETDISK}"
>> gpart add -s 3G -t freebsd "${TARGETDISK}"
>> gpart create -s BSD "${TARGETDISK}"s1
>> gpart add -t freebsd-swap "${TARGETDISK}"s1
>>
>> gpart add -t freebsd "${TARGETDISK}"
>> gpart create -s BSD "${TARGETDISK}"s2
>> gpart add -t freebsd-zfs "${TARGETDISK}"s2
>>
>> gpart set -a active -i 2 "${TARGETDISK}"
>> gpart bootcode -b /mnt2/boot/boot0 "${TARGETDISK}"
>>
>>
>> and later:
>>
>> dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s2 count=1
>> dd if=/mnt2/boot/zfsboot of=/dev/"${TARGETDISK}"s2a skip=1 seek=1024
>>
>>
>> Putting the swap into it's own slice and then putting FreeBSD into
>> it's own slice worked fine. So why the hell can't they both coexist in
>> 1 slice if the swap comes first?
>
> I know what the answer to this USED to be, but I don't know if it is
> still true (obviously, I think so, I or wouldn't waste your time).
>
> The filesystem code is all carefully written to avoid the very
> first few sector of the partition.  That's because the partition
> table is there for the first filesystem of the slice (or disk).
> That's a tiny amout of space wasted, because it's also skipped on
> all the other filesystems even though there's not actually anything
> there, but it was a small inefficency, even in the 70's.
>
> Swap does not behave that way.  SWAP will begin right at the slice
> boundry, with 0 offset.  As long as it's not the first partition, no
> harm, no foul.  If it IS the first partition, you just nuked your partition
> table.  As long as SWAP owns the slice, again, no harm, no foul, but
> if there were filesystems BEHIND it, you just lost 'em.
>
> That's the way it always used to be, and I think it still is.  SWAP can
> only be first if it is the ONLY thing using that slice (disk), otherwise,
> you need a filesystem first to protect the partition table.
> --
>
> John Lind
> j...@starfire.mn.org

This explanation does sound logical, but holy crap, if this is the
case, you'd think there would be bells, whistles and huge red label
warnings in EVERY FreeBSD installation / partitioning guide out there
warning people to not put swap first (unless given a dedicated slice)
under any circumstances. The warnings were nowhere to be seen and lots
of pointy hair first greyed and were then lost during the process of
me trying to figure out why my system would install but wouldn't boot.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-24 Thread Dan Naumov

1 sec =6.278 msec
Short backward:   400 iter in   2.233714 sec =5.584 msec
Seq outer:   2048 iter in   0.427523 sec =0.209 msec
Seq inner:   2048 iter in   0.341185 sec =0.167 msec
Transfer rates:
outside:   102400 kbytes in   1.516305 sec =67533 kbytes/sec
middle:102400 kbytes in   1.351877 sec =75747 kbytes/sec
inside:102400 kbytes in   2.090069 sec =48994 kbytes/sec

===

The exact same disks, on the exact same machine, are well capable of
65+ mb/s throughput (tested with ATTO multiple times) with different
block sizes using Windows 2008 Server and NTFS. So what would be the
cause of these very low Bonnie result numbers in my case? Should I try
some other benchmark and if so, with what parameters?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-24 Thread Dan Naumov

test2 bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes transferred in 143.878615 secs (29851325 bytes/sec)

This works out to 1GB in 36,2 seconds / 28,2mb/s in the first test and
4GB in 143.8 seconds / 28,4mb/s and somewhat consistent with the
bonnie results. It also sadly seems to confirm the very slow speed :(
The disks are attached to a 4-port Sil3124 controller and again, my
Windows benchmarks showing 65mb/s+ were done on exact same machine,
with same disks attached to the same controller. Only difference was
that in Windows the disks weren't in a mirror configuration but were
tested individually. I do understand that a mirror setup offers
roughly the same write speed as individual disk, while the read speed
usually varies from "equal to individual disk speed" to "nearly the
throughput of both disks combined" depending on the implementation,
but there is no obvious reason I am seeing why my setup offers both
read and write speeds roughly 1/3 to 1/2 of what the individual disks
are capable of. Dmesg shows:

atapci0:  port 0x1000-0x100f mem
0x90108000-0x9010807f,0x9010-0x90107fff irq 21 at device 0.0 on
pci4
ad8: 1907729MB  at ata4-master SATA300
ad10: 1907729MB  at ata5-master SATA300

I do recall also testing an alternative configuration in the past,
where I would boot off an UFS disk and have the ZFS mirror consist of
2 discs directly. The bonnie numbers in that case were in line with my
expectations, I was seeing 65-70mb/s. Note: again, exact same
hardware, exact same disks attached to the exact same controller. In
my knowledge, Solaris/OpenSolaris has an issue where they have to
automatically disable disk cache if ZFS is used on top of partitions
instead of raw disks, but to my knowledge (I recall reading this from
multiple reputable sources) this issue does not affect FreeBSD.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-24 Thread Dan Naumov

On Sun, Jan 24, 2010 at 7:42 PM, Dan Naumov  wrote:
> On Sun, Jan 24, 2010 at 7:05 PM, Jason Edwards  wrote:
>> Hi Dan,
>>
>> I read on FreeBSD mailinglist you had some performance issues with ZFS.
>> Perhaps i can help you with that.
>>
>> You seem to be running a single mirror, which means you won't have any speed
>> benefit regarding writes, and usually RAID1 implementations offer little to
>> no acceleration to read requests also; some even just read from the master
>> disk and don't touch the 'slave' mirrored disk unless when writing. ZFS is
>> alot more modern however, although i did not test performance of its mirror
>> implementation.
>>
>> But, benchmarking I/O can be tricky:
>>
>> 1) you use bonnie, but bonnie's tests are performed without a 'cooldown'
>> period between the tests; meaning that when test 2 starts, data from test 1
>> is still being processed. For single disks and simple I/O this is not so
>> bad, but for large write-back buffers and more complex I/O buffering, this
>> may be inappropriate. I had patched bonnie some time in the past, but if you
>> just want a MB/s number you can use DD for that.
>>
>> 2) The diskinfo tiny benchmark is single queue only i assume, meaning that
>> it would not scale well or at all on RAID-arrays. Actual filesystems on
>> RAID-arrays use multiple-queue; meaning it would not read one sector at a
>> time, but read 8 blocks (of 16KiB) "ahead"; this is called read-ahead and
>> for traditional UFS filesystems its controlled by the sysctl vfs.read_max
>> variable. ZFS works differently though, but you still need a "real"
>> benchmark.
>>
>> 3) You need low-latency hardware; in particular, no PCI controller should be
>> used. Only PCI-express based controllers or chipset-integrated Serial ATA
>> cotrollers have proper performance. PCI can hurt performance very badly, and
>> has high interrupt CPU usage. Generally you should avoid PCI. PCI-express is
>> fine though, its a completely different interface that is in many ways the
>> opposite of what PCI was.
>>
>> 4) Testing actual realistic I/O performance (in IOps) is very difficult. But
>> testing sequential performance should be alot easier. You may try using dd
>> for this.
>>
>>
>> For example, you can use dd on raw devices:
>>
>> dd if=/dev/ad4 of=/dev/null bs=1M count=1000
>>
>> I will explain each parameter:
>>
>> if=/dev/ad4 is the input file, the "read source"
>>
>> of=/dev/null is the output file, the "write destination". /dev/null means it
>> just goes no-where; so this is a read-only benchmark
>>
>> bs=1M is the blocksize, howmuch data to transfer per time. default is 512 or
>> the sector size; but that's very slow. A value between 64KiB and 1024KiB is
>> appropriate. bs=1M will select 1MiB or 1024KiB.
>>
>> count=1000 means transfer 1000 pieces, and with bs=1M that means 1000 * 1MiB
>> = 1000MiB.
>>
>>
>>
>> This example was raw reading sequentially from the start of the device
>> /dev/ad4. If you want to test RAIDs, you need to work at the filesystem
>> level. You can use dd for that too:
>>
>> dd if=/dev/zero of=/path/to/ZFS/mount/zerofile.000 bs=1M count=2000
>>
>> This command will read from /dev/zero (all zeroes) and write to a file on
>> ZFS-mounted filesystem, it will create the file "zerofile.000" and write
>> 2000MiB of zeroes to that file.
>> So this command tests write-performance of the ZFS-mounted filesystem. To
>> test read performance, you need to clear caches first by unmounting that
>> filesystem and re-mounting it again. This would free up memory containing
>> parts of the filesystem as cached (reported in top as "Inact(ive)" instead
>> of "Free").
>>
>> Please do make sure you double-check a dd command before running it, and run
>> as normal user instead of root. A wrong dd command may write to the wrong
>> destination and do things you don't want. The only real thing you need to
>> check is the write destination (of=). That's where dd is going to write
>> to, so make sure its the target you intended. A common mistake made by
>> myself was to write dd of=... if=... (starting with of instead of if) and
>> thus actually doing something the other way around than what i was meant to
>> do. This can be disastrous if you work with live data, so be careful! ;-)
>>
>> Hope any of this was helpful. During the dd benchmark, you can of course
>> open a

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-24 Thread Dan Naumov

On Sun, Jan 24, 2010 at 8:12 PM, Bob Friesenhahn
 wrote:
> On Sun, 24 Jan 2010, Dan Naumov wrote:
>>
>> This works out to 1GB in 36,2 seconds / 28,2mb/s in the first test and
>> 4GB in 143.8 seconds / 28,4mb/s and somewhat consistent with the
>> bonnie results. It also sadly seems to confirm the very slow speed :(
>> The disks are attached to a 4-port Sil3124 controller and again, my
>> Windows benchmarks showing 65mb/s+ were done on exact same machine,
>> with same disks attached to the same controller. Only difference was
>> that in Windows the disks weren't in a mirror configuration but were
>> tested individually. I do understand that a mirror setup offers
>> roughly the same write speed as individual disk, while the read speed
>> usually varies from "equal to individual disk speed" to "nearly the
>> throughput of both disks combined" depending on the implementation,
>> but there is no obvious reason I am seeing why my setup offers both
>> read and write speeds roughly 1/3 to 1/2 of what the individual disks
>> are capable of. Dmesg shows:
>
> There is a mistatement in the above in that a "mirror setup offers roughly
> the same write speed as individual disk".  It is possible for a mirror setup
> to offer a similar write speed to an individual disk, but it is also quite
> possible to get 1/2 (or even 1/3) the speed. ZFS writes to a mirror pair
> requires two independent writes.  If these writes go down independent I/O
> paths, then there is hardly any overhead from the 2nd write.  If the writes
> go through a bandwidth-limited shared path then they will contend for that
> bandwidth and you will see much less write performance.
>
> As a simple test, you can temporarily remove the mirror device from the pool
> and see if the write performance dramatically improves. Before doing that,
> it is useful to see the output of 'iostat -x 30' while under heavy write
> load to see if one device shows a much higher svc_t value than the other.

Ow, ow, WHOA:

atombsd# zpool offline tank ad8s1a

[j...@atombsd ~]$ dd if=/dev/zero of=/home/jago/test3 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 16.826016 secs (63814382 bytes/sec)

Offlining one half of the mirror bumps DD write speed from 28mb/s to
64mb/s! Let's see how Bonnie results change:

Mirror with both parts attached:

  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
 8192 18235 46.7 23137 19.9 13927 13.6 24818 49.3 44919 17.3 134.3  2.1

Mirror with 1 half offline:

  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
 1024 22888 58.0 41832 35.1 22764 22.0 26775 52.3 54233 18.3 166.0  1.6

Ok, the Bonnie results have improved, but only very little.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-24 Thread Dan Naumov

On Sun, Jan 24, 2010 at 8:34 PM, Jason Edwards  wrote:
>> ZFS writes to a mirror pair
>> requires two independent writes.  If these writes go down independent I/O
>> paths, then there is hardly any overhead from the 2nd write.  If the
>> writes
>> go through a bandwidth-limited shared path then they will contend for that
>> bandwidth and you will see much less write performance.
>
> What he said may confirm my suspicion on PCI. So if you could try the same
> with "real" Serial ATA via chipset or PCI-e controller you can confirm this
> story. I would be very interested. :P
>
> Kind regards,
> Jason


This wouldn't explain why ZFS mirror on 2 disks directly, on the exact
same controller (with the OS running off a separate disks) results in
"expected" performance, while having the OS run off/on a ZFS mirror
running on top of MBR-partitioned disks, on the same controller,
results in very low speed.

- Dan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-24 Thread Dan Naumov

On Sun, Jan 24, 2010 at 11:53 PM, Alexander Motin  wrote:
> Dan Naumov wrote:
>> This works out to 1GB in 36,2 seconds / 28,2mb/s in the first test and
>> 4GB in 143.8 seconds / 28,4mb/s and somewhat consistent with the
>> bonnie results. It also sadly seems to confirm the very slow speed :(
>> The disks are attached to a 4-port Sil3124 controller and again, my
>> Windows benchmarks showing 65mb/s+ were done on exact same machine,
>> with same disks attached to the same controller. Only difference was
>> that in Windows the disks weren't in a mirror configuration but were
>> tested individually. I do understand that a mirror setup offers
>> roughly the same write speed as individual disk, while the read speed
>> usually varies from "equal to individual disk speed" to "nearly the
>> throughput of both disks combined" depending on the implementation,
>> but there is no obvious reason I am seeing why my setup offers both
>> read and write speeds roughly 1/3 to 1/2 of what the individual disks
>> are capable of. Dmesg shows:
>>
>> atapci0:  port 0x1000-0x100f mem
>> 0x90108000-0x9010807f,0x9010-0x90107fff irq 21 at device 0.0 on
>> pci4
>> ad8: 1907729MB  at ata4-master SATA300
>> ad10: 1907729MB  at ata5-master SATA300
>
> 8.0-RELEASE, and especially 8-STABLE provide alternative, much more
> functional driver for this controller, named siis(4). If your SiI3124
> card installed into proper bus (PCI-X or PCIe x4/x8), it can be really
> fast (up to 1GB/s was measured).
>
> --
> Alexander Motin

Sadly, it seems that utilizing the new siis driver doesn't do much good:

Before utilizing siis:

iozone -s 4096M -r 512 -i0 -i1
random
randombkwd   record   stride
  KB  reclen   write rewritereadrereadread
writeread  rewrite read   fwrite frewrite   fread  freread
 4194304 512   28796   287665161050695

After enabling siis in loader.conf (and ensuring the disks show up as ada):

iozone -s 4096M -r 512 -i0 -i1

random
randombkwd   record   stride
  KB  reclen   write rewritereadrereadread
writeread  rewrite read   fwrite frewrite   fread  freread
 4194304 512   28781   288974721450540

I've checked with the manufacturer and it seems that the Sil3124 in
this NAS is indeed a PCI card. More info on the card in question is
available at http://green-pcs.co.uk/2009/01/28/tranquil-bbs2-those-pci-cards/
I have the card described later on the page, the one with 4 SATA ports
and no eSATA. Alright, so it being PCI is probably a bottleneck in
some ways, but that still doesn't explain the performance THAT bad,
considering that same hardware, same disks, same disk controller push
over 65mb/s in both reads and writes in Win2008. And agian, I am
pretty sure that I've had "close to expected" results when I was
booting an UFS FreeBSD installation off an SSD (attached directly to
SATA port on the motherboard) while running the same kinds of
benchmarks with Bonnie and DD on a ZFS mirror made directly on top of
2 raw disks.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-24 Thread Dan Naumov

On Mon, Jan 25, 2010 at 2:14 AM, Dan Naumov  wrote:
> On Sun, Jan 24, 2010 at 11:53 PM, Alexander Motin  wrote:
>> Dan Naumov wrote:
>>> This works out to 1GB in 36,2 seconds / 28,2mb/s in the first test and
>>> 4GB in 143.8 seconds / 28,4mb/s and somewhat consistent with the
>>> bonnie results. It also sadly seems to confirm the very slow speed :(
>>> The disks are attached to a 4-port Sil3124 controller and again, my
>>> Windows benchmarks showing 65mb/s+ were done on exact same machine,
>>> with same disks attached to the same controller. Only difference was
>>> that in Windows the disks weren't in a mirror configuration but were
>>> tested individually. I do understand that a mirror setup offers
>>> roughly the same write speed as individual disk, while the read speed
>>> usually varies from "equal to individual disk speed" to "nearly the
>>> throughput of both disks combined" depending on the implementation,
>>> but there is no obvious reason I am seeing why my setup offers both
>>> read and write speeds roughly 1/3 to 1/2 of what the individual disks
>>> are capable of. Dmesg shows:
>>>
>>> atapci0:  port 0x1000-0x100f mem
>>> 0x90108000-0x9010807f,0x9010-0x90107fff irq 21 at device 0.0 on
>>> pci4
>>> ad8: 1907729MB  at ata4-master SATA300
>>> ad10: 1907729MB  at ata5-master SATA300
>>
>> 8.0-RELEASE, and especially 8-STABLE provide alternative, much more
>> functional driver for this controller, named siis(4). If your SiI3124
>> card installed into proper bus (PCI-X or PCIe x4/x8), it can be really
>> fast (up to 1GB/s was measured).
>>
>> --
>> Alexander Motin
>
> Sadly, it seems that utilizing the new siis driver doesn't do much good:
>
> Before utilizing siis:
>
> iozone -s 4096M -r 512 -i0 -i1
>                                                            random
> random    bkwd   record   stride
>              KB  reclen   write rewrite    read    reread    read
> write    read  rewrite     read   fwrite frewrite   fread  freread
>         4194304     512   28796   28766    51610    50695
>
> After enabling siis in loader.conf (and ensuring the disks show up as ada):
>
> iozone -s 4096M -r 512 -i0 -i1
>
>                                                            random
> random    bkwd   record   stride
>              KB  reclen   write rewrite    read    reread    read
> write    read  rewrite     read   fwrite frewrite   fread  freread
>         4194304     512   28781   28897    47214    50540

Just to add to the numbers above, exact same benchmark, on 1 disk
(detached 2nd disk from the mirror) while using the siis driver:

random
randombkwd   record   stride
  KB  reclen   write rewritereadrereadread
writeread  rewrite read   fwrite frewrite   fread  freread
 4194304 512   57760   563716886774047


- Dan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-24 Thread Dan Naumov

On Mon, Jan 25, 2010 at 7:33 AM, Bob Friesenhahn
 wrote:
> On Mon, 25 Jan 2010, Dan Naumov wrote:
>>
>> I've checked with the manufacturer and it seems that the Sil3124 in
>> this NAS is indeed a PCI card. More info on the card in question is
>> available at
>> http://green-pcs.co.uk/2009/01/28/tranquil-bbs2-those-pci-cards/
>> I have the card described later on the page, the one with 4 SATA ports
>> and no eSATA. Alright, so it being PCI is probably a bottleneck in
>> some ways, but that still doesn't explain the performance THAT bad,
>> considering that same hardware, same disks, same disk controller push
>> over 65mb/s in both reads and writes in Win2008. And agian, I am
>> pretty sure that I've had "close to expected" results when I was
>
> The slow PCI bus and this card look like the bottleneck to me. Remember that
> your Win2008 tests were with just one disk, your zfs performance with just
> one disk was similar to Win2008, and your zfs performance with a mirror was
> just under 1/2 that.
>
> I don't think that your performance results are necessarily out of line for
> the hardware you are using.
>
> On an old Sun SPARC workstation with retrofitted 15K RPM drives on Ultra-160
> SCSI channel, I see a zfs mirror write performance of 67,317KB/second and a
> read performance of 124,347KB/second.  The drives themselves are capable of
> 100MB/second range performance. Similar to yourself, I see 1/2 the write
> performance due to bandwidth limitations.
>
> Bob

There is lots of very sweet irony in my particular situiation.
Initially I was planning to use a single X25-M 80gb SSD in the
motherboard sata port for the actual OS installation as well as to
dedicate 50gb of it to a become a designaed L2ARC vdev for my ZFS
mirrors. The SSD attached to the motherboard port would be recognized
only as a SATA150 device for some reason, but I was still seeing
150mb/s throughput and sub 0.1 ms latencies on that disk simply
because of how crazy good the X25-M's are. However I ended up having
very bad issues with the Icydock 2,5" to 3,5" converter jacket I was
using to keep/fit the SSD in the system and it would randomly drop
write IO on heavy load due to bad connectors. Having finally figured
out the cause of my OS installations to the SSD going belly up during
applying updates, I decided to move the SSD to my desktop and use it
there instead, additionally thinking that my perhaps my idea of the
SSD was crazy overkill for what I need the system to do. Ironically
now that I am seeing how horrible the performance is when I am
operating on the mirror through this PCI card, I realize that
actually, my idea was pretty bloody brilliant, I just didn't really
know why at the time.

An L2ARC device on the motherboard port would really help me with
random read IO, but to work around the utterly poor write performance,
I would also need a dedicaled SLOG ZIL device. The catch is that while
L2ARC devices and be removed from the pool at will (should the device
up and die all of a sudden), the dedicated ZILs cannot and currently a
"missing" ZIL device will render the pool it's included in be unable
to import and become inaccessible. There is some work happening in
Solaris to implement removing SLOGs from a pool, but that work hasn't
yet found it's way in FreeBSD yet.

- Sincerely,
Dan Naumov

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-25 Thread Dan Naumov

On Mon, Jan 25, 2010 at 9:34 AM, Dan Naumov  wrote:
> On Mon, Jan 25, 2010 at 7:33 AM, Bob Friesenhahn
>  wrote:
>> On Mon, 25 Jan 2010, Dan Naumov wrote:
>>>
>>> I've checked with the manufacturer and it seems that the Sil3124 in
>>> this NAS is indeed a PCI card. More info on the card in question is
>>> available at
>>> http://green-pcs.co.uk/2009/01/28/tranquil-bbs2-those-pci-cards/
>>> I have the card described later on the page, the one with 4 SATA ports
>>> and no eSATA. Alright, so it being PCI is probably a bottleneck in
>>> some ways, but that still doesn't explain the performance THAT bad,
>>> considering that same hardware, same disks, same disk controller push
>>> over 65mb/s in both reads and writes in Win2008. And agian, I am
>>> pretty sure that I've had "close to expected" results when I was
>>
>> The slow PCI bus and this card look like the bottleneck to me. Remember that
>> your Win2008 tests were with just one disk, your zfs performance with just
>> one disk was similar to Win2008, and your zfs performance with a mirror was
>> just under 1/2 that.
>>
>> I don't think that your performance results are necessarily out of line for
>> the hardware you are using.
>>
>> On an old Sun SPARC workstation with retrofitted 15K RPM drives on Ultra-160
>> SCSI channel, I see a zfs mirror write performance of 67,317KB/second and a
>> read performance of 124,347KB/second.  The drives themselves are capable of
>> 100MB/second range performance. Similar to yourself, I see 1/2 the write
>> performance due to bandwidth limitations.
>>
>> Bob
>
> There is lots of very sweet irony in my particular situiation.
> Initially I was planning to use a single X25-M 80gb SSD in the
> motherboard sata port for the actual OS installation as well as to
> dedicate 50gb of it to a become a designaed L2ARC vdev for my ZFS
> mirrors. The SSD attached to the motherboard port would be recognized
> only as a SATA150 device for some reason, but I was still seeing
> 150mb/s throughput and sub 0.1 ms latencies on that disk simply
> because of how crazy good the X25-M's are. However I ended up having
> very bad issues with the Icydock 2,5" to 3,5" converter jacket I was
> using to keep/fit the SSD in the system and it would randomly drop
> write IO on heavy load due to bad connectors. Having finally figured
> out the cause of my OS installations to the SSD going belly up during
> applying updates, I decided to move the SSD to my desktop and use it
> there instead, additionally thinking that my perhaps my idea of the
> SSD was crazy overkill for what I need the system to do. Ironically
> now that I am seeing how horrible the performance is when I am
> operating on the mirror through this PCI card, I realize that
> actually, my idea was pretty bloody brilliant, I just didn't really
> know why at the time.
>
> An L2ARC device on the motherboard port would really help me with
> random read IO, but to work around the utterly poor write performance,
> I would also need a dedicaled SLOG ZIL device. The catch is that while
> L2ARC devices and be removed from the pool at will (should the device
> up and die all of a sudden), the dedicated ZILs cannot and currently a
> "missing" ZIL device will render the pool it's included in be unable
> to import and become inaccessible. There is some work happening in
> Solaris to implement removing SLOGs from a pool, but that work hasn't
> yet found it's way in FreeBSD yet.
>
>
> - Sincerely,
> Dan Naumov

OK final question: if/when I go about adding more disks to the system
and want redundancy, am I right in thinking that: ZFS pool of
disk1+disk2 mirror + disk3+disk4 mirror (a la RAID10) would completely
murder my write and read performance even way below the current 28mb/s
/ 50mb/s I am seeing with 2 disks on that PCI controller and that in
order to have the least negative impact, I should simply have 2
independent mirrors in 2 independent pools (with the 5th disk slot in
the NAS given to a non-redundant single disk running off the one
available SATA port on the motherboard)?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-25 Thread Dan Naumov

On Mon, Jan 25, 2010 at 7:40 PM, Alexander Motin  wrote:
> Artem Belevich wrote:
>> aoc-sat2-mv8 was somewhat slower compared to ICH9 or LSI1068
>> controllers when I tried it with 6 and 8 disks.
>> I think the problem is that MV8 only does 32K per transfer and that
>> does seem to matter when you have 8 drives hooked up to it. I don't
>> have hard numbers, but peak throughput of MV8 with 8-disk raidz2 was
>> noticeably lower than that of LSI1068 in the same configuration. Both
>> LSI1068 and MV2 were on the same PCI-X bus. It could be a driver
>> limitation. The driver for Marvel SATA controllers in NetBSD seems a
>> bit more advanced compared to what's in FreeBSD.
>
> I also wouldn't recommend to use Marvell 88SXx0xx controllers now. While
> potentially they are interesting, lack of documentation and numerous
> hardware bugs make existing FreeBSD driver very limited there.
>
>> I wish intel would make cheap multi-port PCIe SATA card based on their
>> AHCI controllers.
>
> Indeed. Intel on-board AHCI SATA controllers are fastest from all I have
> tested. Unluckily, they are not producing discrete versions. :(
>
> Now, if discrete solution is really needed, I would still recommend
> SiI3124, but with proper PCI-X 64bit/133MHz bus or built-in PCIe x8
> bridge. They are fast and have good new siis driver.
>
>> On Mon, Jan 25, 2010 at 3:29 AM, Pete French
>>  wrote:
>>>> I like to use pci-x with aoc-sat2-mv8 cards or pci-e cardsthat way you
>>>> get a lot more bandwidth..
>>> I would goalong with that - I have precisely the same controller, with
>>> a pair of eSATA drives, running ZFS mirrored. But I get a nice 100
>>> meg/second out of them if I try. My controller is, however on PCI-X, not
>>> PCI. It's a shame PCI-X appears to have gone the way of the dinosaur :-(
>
> --
> Alexander Motin

Alexander, since you seem to be experienced in the area, what do you
think of these 2 for use in a FreeBSD8 ZFS NAS:

http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H
http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance

2010-01-25 Thread Dan Naumov

On Mon, Jan 25, 2010 at 8:32 PM, Alexander Motin  wrote:
> Dan Naumov wrote:
>> Alexander, since you seem to be experienced in the area, what do you
>> think of these 2 for use in a FreeBSD8 ZFS NAS:
>>
>> http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H
>> http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H&IPMI=Y
>
> Unluckily I haven't yet touched Atom family close yet, so I can't say
> about it's performance. But higher desktop level (even bit old) ICH9R
> chipset there is IMHO a good option. It is MUCH better then ICH7, often
> used with previous Atoms. If I had nice small Mini-ITX case with 6 drive
> bays, I would definitely look for some board like that to build home
> storage.
>
> --
> Alexander Motin

CPU-performance-wise, I am not really worried. The current system is
an Atom 330 and even that is a bit overkill for what I do with it and
from what I am seeing, the new Atom D510 used on those boards is a
tiny bit faster. What I want and care about for this system are
reliability, stability, low power use, quietness and fast disk
read/write speeds. I've been hearing some praise of ICH9R and 6 native
SATA ports should be enough for my needs. AFAIK, the Intel 82574L
network cards included on those are also very well supported?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: immense delayed write to file system (ZFS and UFS2), performance issues

2010-01-26 Thread Dan Naumov

> You're welcome. I just feel as bad for you as for everyone else who
> has bought these obviously Windoze optimized harddrives. Unfortunately
> neither wdidle3 nor an updated firmware is available or functioning on
> the latest models in the Green series. At least that's what I've read
> from other people having this issue. WD only claims they don't support
> Linux and they probably have never heard of FreeBSD.

This discussion made me have a look at my 2tb WD Green disks, one of
them is from May 2009, looks pretty reasonable :

Device Model: WDC WD20EADS-00R6B0
Serial Number:WD-WCAVY0301430
Firmware Version: 01.00A01

  9 Power_On_Hours  0x0032   093   093   000Old_age
Always   -   5253
193 Load_Cycle_Count0x0032   200   200   000Old_age
Always   -   55

And another is very recent, from December 2009, this does look a bit
worrying in comparison:

Device Model: WDC WD20EADS-32R6B0
Serial Number:WD-WCAVY1611513
Firmware Version: 01.00A01

  9 Power_On_Hours  0x0032   100   100   000Old_age
Always   -   136
193 Load_Cycle_Count0x0032   199   199   000Old_age
Always   -   5908

The disks are of exact same model and look to be same firmware. Should
I be worried that the newer disk has, in 136 hours reached a higher
Load Cycle count twice as big as on the disk thats 5253 hours old?


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: immense delayed write to file system (ZFS and UFS2), performance issues

2010-01-26 Thread Dan Naumov

Can anyone confirm that using the WDIDLE3 utility on the 2TB WD20EADS
discs will not cause any issues if these disks are part of a ZFS
mirror pool? I do have backups of data, but I would rather not spend
the time rebuilding the entire system and restoring enormous amounts
of data over a 100mbit network unless I absolutely have to :)

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: immense delayed write to file system (ZFS and UFS2), performance issues

2010-01-26 Thread Dan Naumov

Thank you, thank you, thank you!

Now I neither have to worry about premature death of my disks, nor do
I have to endure the loud clicking noises (I have a NAS with these in
my living room)!

If either of you (or both) want me to Paypal you money for a beer,
send me details offlist :)


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: immense delayed write to file system (ZFS and UFS2), performance issues

2010-01-26 Thread Dan Naumov

>I have a WD2003FYPS sitting in a system, to be used for testing.  Bought it
>just before this thread started, and here's what it looks like right now:
>
> 9 Power_On_Hours  0x0032   100   100   000Old_age   Always -  
>  508
>193 Load_Cycle_Count  0x0032   200   200   000Old_age   Always -   
>2710
>
>This drive is sitting, unused, with no filesystem, and I've performed
>approximately zero writes to the disk.
>
>Having a script kick off and write to a disk will help so long as that
>disk is writable; if it's being used as a hot spare in a raidz array, it's
>not going to help much.

I wouldn't worry in your particular case. A value of 2710 in 508 hours
is a rate of 5,33/hour. At this rate, it's going to take you 56285
hours or 2345 days to reach 300,000 and most disks will likely
function past 400,000 (over 600,000 all bets are off though). The
people who need(ed) to worry were people like me, who were seeing the
rate increase at a rate of 43+ per hour.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

booting off GPT partitions

2010-01-27 Thread Dan Naumov

Hey

I was under the impression that everyone and their dog is using GPT
partitioning in FreeBSD these days, including for boot drives and that
I was just being unlucky with my current NAS motherboard (Intel
D945GCLF2) having supposedly shaky support for GPT boot. But right now
I am having an email exchange with Supermicro support (whom I
contacted since I am pondering their X7SPA-H board for a new system),
who are telling me that booting off GPT requires UEFI BIOS, which is
supposedly a very new thing and that for example NONE of their current
motherboards have support for this.

Am I misunderstanding something or is the Supermicro support tech misguided?


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: one more load-cycle-count problem

2010-02-08 Thread Dan Naumov

>Any further ideas how to get rid of this "feature"?

You have several options.

1) The most "clean" solution is probably using the WDIDLE3 utility on
your drives to disable automatic parking or in cases where its not
possible to complete disable it, you can adjust it to 5 minutes, which
essentially solves the problem. Note that going this route will
probably involve rebuilding your entire array from scratch, because
applying WDIDLE3 to the disk is likely to very slightly affect disk
geometry, but just enough for hardware raid or ZFS or whatever to bark
at you and refuse to continue using the drive in an existing pool (the
affected disk can become very slightly smaller in capacity). Backup
data, apply WDIDLE3 to all disks. Recreate the pool, restore backups.
This will also void your warranty if used on the new WD drives,
although it will still work just fine.

2) A less clean solution would be to setup a script that polls the
SMART data of all disks affected by the problem every 8-9 seconds and
have this script launch on boot. This will keep the affected drives
just busy enough to not park their heads.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: one more load-cycle-count problem

2010-02-08 Thread Dan Naumov

2010/2/8 Gerrit Kühn :
> On Mon, 8 Feb 2010 15:43:46 +0200 Dan Naumov  wrote
> about RE: one more load-cycle-count problem:
>
> DN> >Any further ideas how to get rid of this "feature"?
>
> DN> 1) The most "clean" solution is probably using the WDIDLE3 utility on
> DN> your drives to disable automatic parking or in cases where its not
> DN> possible to complete disable it, you can adjust it to 5 minutes, which
> DN> essentially solves the problem. Note that going this route will
> DN> probably involve rebuilding your entire array from scratch, because
> DN> applying WDIDLE3 to the disk is likely to very slightly affect disk
> DN> geometry, but just enough for hardware raid or ZFS or whatever to bark
> DN> at you and refuse to continue using the drive in an existing pool (the
> DN> affected disk can become very slightly smaller in capacity). Backup
> DN> data, apply WDIDLE3 to all disks. Recreate the pool, restore backups.
> DN> This will also void your warranty if used on the new WD drives,
> DN> although it will still work just fine.
>
> Thanks for the warning. How on earth can a tool to set the idle time
> affect the disk geometry?!

WDIDLE3 changes the drive firmware. This is also how WD can detect
you've used it on your disk and void your warranty accordingly :)


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

booting off a ZFS pool consisting of multiple striped mirror vdevs

2010-02-13 Thread Dan Naumov

Hello

I have succesfully tested and used a "full ZFS install" of FreeBSD 8.0
on both single disk and mirror disk configurations using both MBR and
GPT partitioning. AFAIK, with the more recent -CURRENT and -STABLE it
is also possible to boot off a root filesystem located on raidz/raidz2
pools. But what about booting off pools consisting of multiple striped
mirror or raidz vdevs? Like this:

Assume each disk looks like a half of a traditional ZFS mirror root
configuration using GPT

1: freebsd-boot
2: freebsd-swap
3: freebsd-zfs

|disk1+disk2| + |disk3+disk4| + |disk5+disk6|

My logic tells me that while booting off any of the 6 disks, boot0 and
boot1 stage should obviously work fine, but what about the boot2
stage? Can it properly handle booting off a root filesystem thats
striped across 3 mirror vdevs or is booting off a single mirror vdev
the best that one can do right now?


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

managing ZFS automatic mounts - FreeBSD deviates from Solaris?

2010-02-13 Thread Dan Naumov

Hello

>From the SUN ZFS Administration Guide:
http://docs.sun.com/app/docs/doc/819-5461/gaztn?a=view

"If ZFS is currently managing the file system but it is currently
unmounted, and the mountpoint property is changed, the file system
remains unmounted."

This does not seem to be the case in FreeBSD (8.0-RELEASE):

=
zfs get mounted tank/home
NAMEPROPERTYVALUE   SOURCE
tank/home   mounted no  -

zfs set mountpoint=/mnt/home tank/home

zfs get mounted tank/home
NAMEPROPERTYVALUE   SOURCE
tank/home   mounted no  -
=

This might not look like a serious issue at first, until you try doing
an installation of FreeBSD from FIXIT, trying to setup multiple
filesystems and their mountpoints at the very end of the installation
process. For example if you set the mountpoint of your
poolname/rootfs/usr to /usr as one of the finishing touches to the
system installation, it will immideately mount the filesystem,
instantly breaking your FIXIT environment and you cannot proceed any
further. Is this a known issue and/or should I submit a PR?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: managing ZFS automatic mounts - FreeBSD deviates from Solaris?

2010-02-13 Thread Dan Naumov

On Sun, Feb 14, 2010 at 2:24 AM, Dan Naumov  wrote:
> Hello
>
> From the SUN ZFS Administration Guide:
> http://docs.sun.com/app/docs/doc/819-5461/gaztn?a=view
>
> "If ZFS is currently managing the file system but it is currently
> unmounted, and the mountpoint property is changed, the file system
> remains unmounted."
>
> This does not seem to be the case in FreeBSD (8.0-RELEASE):
>
> =
> zfs get mounted tank/home
> NAME            PROPERTY        VALUE           SOURCE
> tank/home               mounted         no                      -
>
> zfs set mountpoint=/mnt/home tank/home
>
> zfs get mounted tank/home
> NAME            PROPERTY        VALUE           SOURCE
> tank/home               mounted         no                      -
> =
>
> This might not look like a serious issue at first, until you try doing
> an installation of FreeBSD from FIXIT, trying to setup multiple
> filesystems and their mountpoints at the very end of the installation
> process. For example if you set the mountpoint of your
> poolname/rootfs/usr to /usr as one of the finishing touches to the
> system installation, it will immideately mount the filesystem,
> instantly breaking your FIXIT environment and you cannot proceed any
> further. Is this a known issue and/or should I submit a PR?

Oops, I managed to screw up my previous email. My point was to show
that "mounted" changes to YES after changing the mountpoint property
:)

- Dan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: hardware for home use large storage

2010-02-14 Thread Dan Naumov

> On Sun, 14 Feb 2010, Dan Langille wrote:
>> After creating three different system configurations (Athena,
>> Supermicro, and HP), my configuration of choice is this Supermicro
>> setup:
>>
>> 1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping)
>> 2. SuperMicro 5046A $750 (+$43 shipping)
>> 3. LSI SAS 3081E-R $235
>> 4. SATA cables $60
>> 5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping)
>> 6. Xeon W3520 $310

You do realise how much of a massive overkill this is and how much you
are overspending?


- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: hardware for home use large storage

2010-02-14 Thread Dan Naumov

On Sun, Feb 14, 2010 at 11:38 PM, Dan Langille  wrote:
> Dan Naumov wrote:
>>>
>>> On Sun, 14 Feb 2010, Dan Langille wrote:
>>>>
>>>> After creating three different system configurations (Athena,
>>>> Supermicro, and HP), my configuration of choice is this Supermicro
>>>> setup:
>>>>
>>>>    1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping)
>>>>    2. SuperMicro 5046A $750 (+$43 shipping)
>>>>    3. LSI SAS 3081E-R $235
>>>>    4. SATA cables $60
>>>>    5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping)
>>>>    6. Xeon W3520 $310
>>
>> You do realise how much of a massive overkill this is and how much you
>> are overspending?
>
>
> I appreciate the comments and feedback.  I'd also appreciate alternative
> suggestions in addition to what you have contributed so far.  Spec out the
> box you would build.

==
Case: Fractal Design Define R2 - 89 euro:
http://www.fractal-design.com/?view=product&prod=32

Mobo/CPU: Supermicro X7SPA-H / Atom D510 - 180-220 euro:
http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H

PSU: Corsair 400CX 80+ - 59 euro:
http://www.corsair.com/products/cx/default.aspx

RAM: Corsair 2x2GB, DDR2 800MHz SO-DIMM, CL5 - 85 euro
==
Total: ~435 euro

The motherboard has 6 native AHCI-capable ports on ICH9R controller
and you have a PCI-E slot free if you want to add an additional
controller card. Feel free to blow the money you've saved on crazy
fast SATA disks and if your system workload is going to have a lot of
random reads, then spend 200 euro on a 80gb Intel X25-M for use as a
dedicated L2ARC device for your pool.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: hardware for home use large storage

2010-02-14 Thread Dan Naumov

On Mon, Feb 15, 2010 at 12:42 AM, Dan Naumov  wrote:
> On Sun, Feb 14, 2010 at 11:38 PM, Dan Langille  wrote:
>> Dan Naumov wrote:
>>>>
>>>> On Sun, 14 Feb 2010, Dan Langille wrote:
>>>>>
>>>>> After creating three different system configurations (Athena,
>>>>> Supermicro, and HP), my configuration of choice is this Supermicro
>>>>> setup:
>>>>>
>>>>>    1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping)
>>>>>    2. SuperMicro 5046A $750 (+$43 shipping)
>>>>>    3. LSI SAS 3081E-R $235
>>>>>    4. SATA cables $60
>>>>>    5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping)
>>>>>    6. Xeon W3520 $310
>>>
>>> You do realise how much of a massive overkill this is and how much you
>>> are overspending?
>>
>>
>> I appreciate the comments and feedback.  I'd also appreciate alternative
>> suggestions in addition to what you have contributed so far.  Spec out the
>> box you would build.
>
> ==
> Case: Fractal Design Define R2 - 89 euro:
> http://www.fractal-design.com/?view=product&prod=32
>
> Mobo/CPU: Supermicro X7SPA-H / Atom D510 - 180-220 euro:
> http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H
>
> PSU: Corsair 400CX 80+ - 59 euro:
> http://www.corsair.com/products/cx/default.aspx
>
> RAM: Corsair 2x2GB, DDR2 800MHz SO-DIMM, CL5 - 85 euro
> ==
> Total: ~435 euro
>
> The motherboard has 6 native AHCI-capable ports on ICH9R controller
> and you have a PCI-E slot free if you want to add an additional
> controller card. Feel free to blow the money you've saved on crazy
> fast SATA disks and if your system workload is going to have a lot of
> random reads, then spend 200 euro on a 80gb Intel X25-M for use as a
> dedicated L2ARC device for your pool.

And to expand a bit, if you want that crazy performance without
blowing silly amounts of money:

Get a dock for holding 2 x 2,5" disks in a single 5,25" slot and put
it at the top, in the only 5,25" bay of the case. Now add an
additional PCI-E SATA controller card, like the often mentioned PCIE
SIL3124. Now you have 2 x 2,5" disk slots and 8 x 3,5" disk slots,
with 6 native SATA ports on the motherboard and more ports on the
controller card. Now get 2 x 80gb Intel SSDs and put them into the
dock. Now partition each of them in the following fashion:

1: swap: 4-5gb
2: freebsd-zfs: ~10-15gb for root filesystem
3: freebsd-zfs: rest of the disk: dedicated L2ARC vdev

GMirror your SSD swap partitions.
Make a ZFS mirror pool out of your SSD root filesystem partitions
Build your big ZFS pool however you like out of the mechanical disks you have.
Add the 2 x ~60gb partitions as dedicated independant L2ARC devices
for your SATA disk ZFS pool.

Now you have redundant swap, redundant and FAST root filesystem and
your ZFS pool of SATA disks has 120gb worth of L2ARC space on the
SSDs. The L2ARC vdevs dont need to be redundant, because should an IO
error occur while reading off L2ARC, the IO is deferred to the "real"
data location on the pool of your SATA disks. You can also remove your
L2ARC vdevs from your pool at will, on a live pool.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: hardware for home use large storage

2010-02-14 Thread Dan Naumov

>> PSU: Corsair 400CX 80+ - 59 euro -
>
>> http://www.corsair.com/products/cx/default.aspx
>
> http://www.newegg.com/Product/Product.aspx?Item=N82E16817139008 for $50
>
> Is that sufficient power up to 10 SATA HDD and an optical drive?

Disk power use varies from about 8 watt/disk for "green" disks to 20
watt/disk for really powerhungry ones. So yes.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: hardware for home use large storage

2010-02-15 Thread Dan Naumov

> I had a feeling someone would bring up L2ARC/cache devices.  This gives
> me the opportunity to ask something that's been on my mind for quite
> some time now:
>
> Aside from the capacity different (e.g. 40GB vs. 1GB), is there a
> benefit to using a dedicated RAM disk (e.g. md(4)) to a pool for
> L2ARC/cache?  The ZFS documentation explicitly states that cache
> device content is considered volatile.

Using a ramdisk as an L2ARC vdev doesn't make any sense at all. If you
have RAM to spare, it should be used by regular ARC.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: hardware for home use large storage

2010-02-15 Thread Dan Naumov

On Mon, Feb 15, 2010 at 7:14 PM, Dan Langille  wrote:
> Dan Naumov wrote:
>>
>> On Sun, Feb 14, 2010 at 11:38 PM, Dan Langille  wrote:
>>>
>>> Dan Naumov wrote:
>>>>>
>>>>> On Sun, 14 Feb 2010, Dan Langille wrote:
>>>>>>
>>>>>> After creating three different system configurations (Athena,
>>>>>> Supermicro, and HP), my configuration of choice is this Supermicro
>>>>>> setup:
>>>>>>
>>>>>>   1. Samsung SATA CD/DVD Burner $20 (+ $8 shipping)
>>>>>>   2. SuperMicro 5046A $750 (+$43 shipping)
>>>>>>   3. LSI SAS 3081E-R $235
>>>>>>   4. SATA cables $60
>>>>>>   5. Crucial 3×2G ECC DDR3-1333 $191 (+ $6 shipping)
>>>>>>   6. Xeon W3520 $310
>>>>
>>>> You do realise how much of a massive overkill this is and how much you
>>>> are overspending?
>>>
>>> I appreciate the comments and feedback.  I'd also appreciate alternative
>>> suggestions in addition to what you have contributed so far.  Spec out
>>> the
>>> box you would build.
>>
>> ==
>> Case: Fractal Design Define R2 - 89 euro:
>> http://www.fractal-design.com/?view=product&prod=32
>>
>> Mobo/CPU: Supermicro X7SPA-H / Atom D510 - 180-220 euro:
>> http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA.cfm?typ=H
>>
>> PSU: Corsair 400CX 80+ - 59 euro:
>> http://www.corsair.com/products/cx/default.aspx
>>
>> RAM: Corsair 2x2GB, DDR2 800MHz SO-DIMM, CL5 - 85 euro
>> ==
>> Total: ~435 euro
>>
>> The motherboard has 6 native AHCI-capable ports on ICH9R controller
>> and you have a PCI-E slot free if you want to add an additional
>> controller card. Feel free to blow the money you've saved on crazy
>> fast SATA disks and if your system workload is going to have a lot of
>> random reads, then spend 200 euro on a 80gb Intel X25-M for use as a
>> dedicated L2ARC device for your pool.
>
> Based on the Fractal Design case mentioned above, I was told about Lian Lia
> cases, which I think are great.  As a result, I've gone with a tower  case
> without hot-swap.  The parts are listed at and reproduced below:
>
>  http://dan.langille.org/2010/02/15/a-full-tower-case/
>
>   1. LIAN LI PC-A71F Black Aluminum ATX Full Tower Computer Case $240 (from
> mwave)
>   2. Antec EarthWatts EA650 650W PSU $80
>   3. Samsung SATA CD/DVD Burner $20 (+ $8 shipping)
>   4. Intel S3200SHV LGA 775 Intel 3200 m/b $200
>   5. Intel Core2 Quad Q9400 CPU $190
>   6. SATA cables $22
>   7. Supermicro LSI MegaRAID 8 Port SAS RAID Controller $118
>   8. Kingston ValueRAM 4GB (2 x 2GB) 240-Pin DDR2 SDRAM ECC $97
>
> Total cost is about $1020 with shipping.  Plus HDD.
>
> No purchases yet, but the above is what appeals to me now.

A C2Q CPU makes little sense right now from a performance POV. For the
price of that C2Q CPU + LGA775 board you can get an i5 750 CPU and a
1156 socket motherboard that will run circles around that C2Q. You
would lose the ECC though, since that requires the more expensive 1366
socket CPUs and boards.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: hardware for home use large storage

2010-02-15 Thread Dan Naumov

>> A C2Q CPU makes little sense right now from a performance POV. For the
>> price of that C2Q CPU + LGA775 board you can get an i5 750 CPU and a 1156
>> socket motherboard that will run circles around that C2Q. You would lose
>> the ECC though, since that requires the more expensive 1366 socket CPUs
>> and boards.
>>
>> - Sincerely,
>> Dan Naumov
>
> Hi,
>
> Do have test about this? I'm not really impressed with the i5 series.
>
> Regards,
> Andras

There: http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=3634&p=10

The i5 750, which is a 180 euro CPU, beats Q9650 C2Q, which is a 300 euro CPU.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: booting off a ZFS pool consisting of multiple striped mirror vdevs

2010-02-16 Thread Dan Naumov

> I don't know, but I plan to test that scenario in a few days.
>
> Matt

Please share the results when you're done, I am really curious :)


> It *should* work... I made changes a while back that allow for multiple
> vdevs to attach to the root.  In this case you should have 3 mirror
> vdevs attached to the root, so as long as the BIOS can enumerate all of
> the drives, we should find all of the vdevs and build the tree
> correctly.  It should be simple enough to test in qemu, except that the
> BIOS in qemu is a little broken and might not id all of the drives.
>
> robert.

If booting of a stripe of 3 mirrors should work assuming no BIOS bugs,
can you explain why is booting off simple stripes (of any number of
disks) currently unsupported? I haven't tested that myself, but
everywhere I look seems to indicate that booting off a simple stripe
doesn't work or is that "everywhere" also out of date after your
changes? :)


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: booting off a ZFS pool consisting of multiple striped mirror vdevs

2010-02-18 Thread Dan Naumov

On Fri, Feb 19, 2010 at 1:03 AM, Matt Reimer  wrote:
> On Thu, Feb 18, 2010 at 10:57 AM, Matt Reimer  wrote:
>>
>> On Tue, Feb 16, 2010 at 12:38 AM, Dan Naumov  wrote:
>>>
>>> > I don't know, but I plan to test that scenario in a few days.
>>> >
>>> > Matt
>>>
>>> Please share the results when you're done, I am really curious :)
>>
>> Booting from a stripe of two raidz vdevs works:
>> FreeBSD/i386 boot
>> Default: doom:/boot/zfsloader
>> boot: status
>> pool: doom
>> config:
>>                     NAME      STATE
>>                     doom     ONLINE
>>                   raidz1     ONLINE
>>             label/doom-0     ONLINE
>>             label/doom-1     ONLINE
>>             label/doom-2     ONLINE
>>                   raidz1     ONLINE
>>             label/doom-3     ONLINE
>>             label/doom-4     ONLINE
>>             label/doom-5     ONLINE
>> I'd guess a stripe of mirrors would work fine too. If I get a chance I'll
>> test that combo.
>
> A stripe of three-way mirrors works:
> FreeBSD/i386 boot
> Default: mithril:/boot/zfsloader
> boot: status
> pool: mithril
> config:
>                     NAME      STATE
>                  mithril     ONLINE
>                   mirror     ONLINE
>             label/mithril-0     ONLINE
>             label/mithril-1     ONLINE
>             label/mithril-2     ONLINE
>                   mirror     ONLINE
>             label/mithril-3     ONLINE
>             label/mithril-4     ONLINE
>             label/mithril-5     ONLINE
> Matt

A stripe of 3-way mirrors, whoa. Out of curiosity, what is the system
used for? I am not doubting that there exist some uses/workloads for a
system that uses 6 disks with 2 disks worth of usable space, but
that's a bit of an unusual configuration. What are your system/disc
specs and what kind of performance are you seeing from the pool?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 8.0-RELEASE/amd64 - full ZFS install - low read and write disk performance (fixed)

2010-02-27 Thread Dan Naumov

Hello folks

A few weeks ago, there was a discussion started by me regarding
abysmal read/write performance using ZFS mirror on 8.0-RELEASE. I was
using an Atom 330 system with 2GB ram and it was pointed out to me
that my problem was most likely having both disks attached to a PCI
SIL3124 controller, switching to the new AHCI drivers didn't help one
bit. To reitirate, here are the Bonnie and DD numbers I got on that
system:

===

Atom 330 / 2gb ram / Intel board + PCI SIL3124

  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
 8192 21041 53.5 22644 19.4 13724 12.8 25321 48.5 43110 14.0 143.2 3.3

dd if=/dev/zero of=/root/test1 bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes transferred in 143.878615 secs (29851325 bytes/sec) (28,4 mb/s)

===

Since then, I switched the exact same disks to a different system:
Atom D510 / 4gb ram / Supermicro X7SPA-H / ICH9R controller (native).
Here are the updated results:

  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
 8192 30057 68.7 50965 36.4 27236 21.3 33317 58.0 53051 14.3 172.4  3.2

dd if=/dev/zero of=/root/test1 bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes transferred in 54.977978 secs (78121594 bytes/sec) (74,5 mb/s)

===

Write performance now seems to have increased by a factor of 2 to 3
and is now definately in line with the expected performance of the
disks in question (cheap 2TB WD20EADS with 32mb cache). Thanks to
everyone who has offered help and tips!


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

8.0 on new hardware and a few errors, should I be worried?

2010-02-27 Thread Dan Naumov

Hello

I've very recently finished installing 8.0-RELEASE on some new
hardware and I noticed a few error messages that make me a bit uneasy.
This is a snip from my dmesg:

--
acpi0:  on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of fee0, 1000 (3) failed
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, bf60 (3) failed
--

What do these mean and should I worry about it? The full DMESG can be
viewed here: http://jago.pp.fi/temp/dmesg.txt

Additionally, while building a whole bunch of ports on this new system
(about 30 or so, samba, ncftp, portaudit, bash, the usual suspects), I
noticed the following in my logs during the build process:

--
Feb 27 21:24:01 atombsd kernel: pid 38846 (try), uid 0: exited on
signal 10 (core dumped)
Feb 27 22:17:49 atombsd kernel: pid 89665 (conftest), uid 0: exited on
signal 6 (core dumped)
--

All ports seem to have built and installed succesfully. Again, what do
these mean and should I worry about it? :)

Thanks!

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

powerd on 8.0, is it considered safe?

2010-03-08 Thread Dan Naumov

Hello

Is powerd finally considered stable and safe to use on 8.0? At least
on 7.2, it consistently caused panics when used on Atom systems with
Hyper-Threading enabled, but I recall that Attilio Rao was looking
into it.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: powerd on 8.0, is it considered safe?

2010-03-08 Thread Dan Naumov

Okay, now I am baffled.

Up until this point, I wasn't using powerd on this new Atom D510
system. I ran sysctl and noticed that dev.cpu.0.freq: is actually 1249
and doesn't change no matter what kind of load the system is under. If
I boot to BIOS, under BIOS CPU is shown as 1,66 Ghz. Okayy... I guess
this explains why my buildworld and buildkernel took over 5 hours if
by default, it gets stuck at 1249 Mhz for no obvious reason. I enabled
powerd and now according to dev.cpu.0.freq:, the system is permanently
stuck at 1666 Mhz, regardless of whether the system is under load or
not.

atombsd# uname -a
FreeBSD atombsd.localdomain 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0:
Tue Jan  5 21:11:58 UTC 2010
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

atombsd# kenv | grep smbios.planar.product
smbios.planar.product="X7SPA-H"

atombsd# sysctl dev.cpu dev.est dev.cpufreq dev.p4tcc debug.cpufreq
kern.timecounter
dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.P001
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.freq: 1666
dev.cpu.0.freq_levels: 1666/-1 1457/-1 1249/-1 1041/-1 833/-1 624/-1
416/-1 208/-1
dev.cpu.0.cx_supported: C1/0
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% last 500us
dev.cpu.1.%desc: ACPI CPU
dev.cpu.1.%driver: cpu
dev.cpu.1.%location: handle=\_PR_.P002
dev.cpu.1.%pnpinfo: _HID=none _UID=0
dev.cpu.1.%parent: acpi0
dev.cpu.1.cx_supported: C1/0
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_usage: 100.00% last 500us
dev.cpu.2.%desc: ACPI CPU
dev.cpu.2.%driver: cpu
dev.cpu.2.%location: handle=\_PR_.P003
dev.cpu.2.%pnpinfo: _HID=none _UID=0
dev.cpu.2.%parent: acpi0
dev.cpu.2.cx_supported: C1/0
dev.cpu.2.cx_lowest: C1
dev.cpu.2.cx_usage: 100.00% last 500us
dev.cpu.3.%desc: ACPI CPU
dev.cpu.3.%driver: cpu
dev.cpu.3.%location: handle=\_PR_.P004
dev.cpu.3.%pnpinfo: _HID=none _UID=0
dev.cpu.3.%parent: acpi0
dev.cpu.3.cx_supported: C1/0
dev.cpu.3.cx_lowest: C1
dev.cpu.3.cx_usage: 100.00% last 500us
sysctl: unknown oid 'dev.est'

Right. So how do I investigate why does the CPU get stuck at 1249 Mhz
after boot by default when not using powerd and why it gets stuck at
1666 Mhz with powerd enabled and doesn't scale back down when IDLE?
Out of curiosity, I stopped powerd but the CPU remained at 1666 Mhz.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: powerd on 8.0, is it considered safe?

2010-03-08 Thread Dan Naumov

>Up until this point, I wasn't using powerd on this new Atom D510
>system. I ran sysctl and noticed that dev.cpu.0.freq: is actually 1249
>and doesn't change no matter what kind of load the system is under. If
>I boot to BIOS, under BIOS CPU is shown as 1,66 Ghz. Okayy... I guess
>this explains why my buildworld and buildkernel took over 5 hours if
>by default, it gets stuck at 1249 Mhz for no obvious reason. I enabled
>powerd and now according to dev.cpu.0.freq:, the system is permanently
>stuck at 1666 Mhz, regardless of whether the system is under load or
>not.

OK, a reboot somehow fixed the powerd issue:

1) Disabled powerd
2) Rebooted
3) Upon bootup, checked dev.cpu.0.freq - it's stuck at 1249 (should be
1666 by default)
4) Enabled and started powerd - CPU scales correctly according to load

There is some bug somewhere though, because something puts my CPU to
1249 Mhz upon boot with powerd disabled and it gets stuck there, this
shouldn't happen.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: powerd on 8.0, is it considered safe?

2010-03-08 Thread Dan Naumov

OK, now I feel a bit stupid. The second half of my PR at
http://www.freebsd.org/cgi/query-pr.cgi?pr=144551 (anything related to
powerd behaviour) can be ignored. For testing purposes, I started
powerd in the foreground and observed it's behaviour. It works exactly
as advertised and apparently the very act of issuing a "sysctl -a |
grep dev.cpu.0.freq" command uses up a high % of CPU time for a
fraction of a second, resulting in confusing output, I was always
getting the highest cpu frequency state as the output. Testing powerd
in foreground however, shows correct behaviour, CPU is downclocked
both before and after issuing that command :)

Still doesn't explain why the system boots up at 1249 Mhz, but that's
not that big of an issue at this point now I see that powerd is
behaving correctly.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Samba read speed performance tuning

2010-03-19 Thread Dan Naumov

On a FreeBSD 8.0-RELEASE/amd64 system with a Supermicro X7SPA-H board
using an Intel gigabit nic with the em driver, running on top of a ZFS
mirror, I was seeing a strange issue. Local reads and writes to the
pool easily saturate the disks with roughly 75mb/s throughput, which
is roughly the best these drives can do. However, working with Samba,
writes to a share could easily pull off 75mb/s and saturate the disks,
but reads off a share were resulting in rather pathetic 18mb/s
throughput.

I found a threadon the FreeBSD forums
(http://forums.freebsd.org/showthread.php?t=9187) and followed the
suggested advice. I rebuilt Samba with AIO support, kldloaded the aio
module and made the following changes to my smb.conf

From:
socket options=TCP_NODELAY

To:
socket options=SO_RCVBUF=131072 SO_SNDBUF=131072 TCP_NODELAY
min receivefile size=16384
use sendfile=true
aio read size = 16384
aio write size = 16384
aio write behind = true
dns proxy = no[/CODE]

This showed a very welcome improvement in read speed, I went from
18mb/s to 48mb/s. The write speed remained unchanged and was still
saturating the disks. Now I tried the suggested sysctl tunables:

atombsd# sysctl net.inet.tcp.delayed_ack=0
net.inet.tcp.delayed_ack: 1 -> 0

atombsd# sysctl net.inet.tcp.path_mtu_discovery=0
net.inet.tcp.path_mtu_discovery: 1 -> 0

atombsd# sysctl net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.recvbuf_inc: 16384 -> 524288

atombsd# sysctl net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.recvbuf_max: 262144 -> 16777216

atombsd# sysctl net.inet.tcp.sendbuf_inc=524288
net.inet.tcp.sendbuf_inc: 8192 -> 524288

atombsd# sysctl net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.sendbuf_max: 262144 -> 16777216

atombsd# sysctl net.inet.tcp.sendspace=65536
net.inet.tcp.sendspace: 32768 -> 65536

atombsd# sysctl net.inet.udp.maxdgram=57344
net.inet.udp.maxdgram: 9216 -> 57344

atombsd# sysctl net.inet.udp.recvspace=65536
net.inet.udp.recvspace: 42080 -> 65536

atombsd# sysctl net.local.stream.recvspace=65536
net.local.stream.recvspace: 8192 -> 65536

atombsd# sysctl net.local.stream.sendspace=65536
net.local.stream.sendspace: 8192 -> 65536

This improved the read speeds a further tiny bit, now I went from
48mb/s to 54mb/s. This is it however, I can't figure out how to
increase Samba read speed any further. Any ideas?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Samba read speed performance tuning

2010-03-19 Thread Dan Naumov

On Fri, Mar 19, 2010 at 11:14 PM, Dan Naumov  wrote:
> On a FreeBSD 8.0-RELEASE/amd64 system with a Supermicro X7SPA-H board
> using an Intel gigabit nic with the em driver, running on top of a ZFS
> mirror, I was seeing a strange issue. Local reads and writes to the
> pool easily saturate the disks with roughly 75mb/s throughput, which
> is roughly the best these drives can do. However, working with Samba,
> writes to a share could easily pull off 75mb/s and saturate the disks,
> but reads off a share were resulting in rather pathetic 18mb/s
> throughput.
>
> I found a threadon the FreeBSD forums
> (http://forums.freebsd.org/showthread.php?t=9187) and followed the
> suggested advice. I rebuilt Samba with AIO support, kldloaded the aio
> module and made the following changes to my smb.conf
>
> From:
> socket options=TCP_NODELAY
>
> To:
> socket options=SO_RCVBUF=131072 SO_SNDBUF=131072 TCP_NODELAY
> min receivefile size=16384
> use sendfile=true
> aio read size = 16384
> aio write size = 16384
> aio write behind = true
> dns proxy = no[/CODE]
>
> This showed a very welcome improvement in read speed, I went from
> 18mb/s to 48mb/s. The write speed remained unchanged and was still
> saturating the disks. Now I tried the suggested sysctl tunables:
>
> atombsd# sysctl net.inet.tcp.delayed_ack=0
> net.inet.tcp.delayed_ack: 1 -> 0
>
> atombsd# sysctl net.inet.tcp.path_mtu_discovery=0
> net.inet.tcp.path_mtu_discovery: 1 -> 0
>
> atombsd# sysctl net.inet.tcp.recvbuf_inc=524288
> net.inet.tcp.recvbuf_inc: 16384 -> 524288
>
> atombsd# sysctl net.inet.tcp.recvbuf_max=16777216
> net.inet.tcp.recvbuf_max: 262144 -> 16777216
>
> atombsd# sysctl net.inet.tcp.sendbuf_inc=524288
> net.inet.tcp.sendbuf_inc: 8192 -> 524288
>
> atombsd# sysctl net.inet.tcp.sendbuf_max=16777216
> net.inet.tcp.sendbuf_max: 262144 -> 16777216
>
> atombsd# sysctl net.inet.tcp.sendspace=65536
> net.inet.tcp.sendspace: 32768 -> 65536
>
> atombsd# sysctl net.inet.udp.maxdgram=57344
> net.inet.udp.maxdgram: 9216 -> 57344
>
> atombsd# sysctl net.inet.udp.recvspace=65536
> net.inet.udp.recvspace: 42080 -> 65536
>
> atombsd# sysctl net.local.stream.recvspace=65536
> net.local.stream.recvspace: 8192 -> 65536
>
> atombsd# sysctl net.local.stream.sendspace=65536
> net.local.stream.sendspace: 8192 -> 65536
>
> This improved the read speeds a further tiny bit, now I went from
> 48mb/s to 54mb/s. This is it however, I can't figure out how to
> increase Samba read speed any further. Any ideas?


Oh my god... Why did noone tell me how much of an enormous performance
boost vfs.zfs.prefetch_disable=0 (aka actually enabling prefetch) is.
My local reads off the mirror pool jumped from 75mb/s to 96mb/s (ie.
they are now nearly 25% faster than reading off an individual disk)
and reads off a Samba share skyrocketed from 50mb/s to 90mb/s.

By default, FreeBSD sets vfs.zfs.prefetch_disable to 1 on any i386
systems and on any amd64 systems with less than 4GB of avaiable
memory. My system is amd64 with 4gb ram, but integrated video eats
some of that, so the autotuning disabled the prefetch. I had read up
on it and a fair amount of people seemed to have performance issues
caused by having prefetch enabled and get better results with it
turned off, in my case however, it seems that enabling it gave a
really solid boost to performance.


- Sincerely
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Samba read speed performance tuning

2010-03-20 Thread Dan Naumov

On Sat, Mar 20, 2010 at 3:49 AM, Gary Gatten  wrote:
> It MAY make a big diff, but make sure during your tests you use unique files 
> or flush the cache or you'll me testing cache speed and not disk speed.

Yeah I did make sure to use unique files for testing the effects of
prefetch. This is Atom D510 / Supermicro X75SPA-H / 4Gb Ram with 2 x
slow 2tb WD Green (WD20EADS) disks with 32mb cache in a ZFS mirror
after enabling prefetch.:
Code:

bonnie -s 8192

  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
 8192 29065 68.9 52027 39.8 39636 33.3 54057 95.4
105335 34.6 174.1 7.9

DD read:
dd if=/dev/urandom of=test2 bs=1M count=8192
dd if=test2 of=/dev/zero bs=1M
8589934592 bytes transferred in 76.031399 secs (112978779 bytes/sec)
(107,74mb/s)


Individual disks read capability: 75mb/s
Reading off a mirror of 2 disks with prefetch disabled: 60mb/s
Reading off a mirror of 2 disks with prefetch enabled: 107mb/s


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

RE: Can't boot after make installworld

2010-03-22 Thread Dan Naumov

The ZFS bootloader has been changed in 8-STABLE compared to
8.0-RELEASE. Reinstall your boot blocks.

P.S: "LOADER_ZFS_SUPPORT=YES" is also deprecated in 8-STABLE, not to
mention that you have it in the wrong place, for 8.0, it goes into
make.conf, not src.conf.

Is there any particular reason you are upgrading from a production
release to a development branch of the OS?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Can't boot after make installworld

2010-03-22 Thread Dan Naumov

> I've read that FreeBSD kernel supports 3D acceleration in ATI R7xx
> chipset and as I own motherboard with HD3300 built-in I thought that I
> would give it a try. I upgraded to see if there is any progress with
> ¿zfs? I don't really know if it's zfs related, but at certain load, my
> system crashes, and reboots. It happens only when using bonnie++ to
> benchmark I/O. And I'm a little bit to lazy to prepare my system for
> coredumps - I don't have swap slice for crashdumps, because I wanted
> to simplify adding drives to my raidz1 configuration. Could anyone
> tell me what's needed, besides having swap to produce good crashdump?

As of right now, even if you don't care about capability to take crash
dumps, it is highly recommended to still use traditional swap
partitions even if your system is otherwise fully ZFS. There are know
stability problems involving using a ZVOL as a swap device. These
issues are being worked on, but this is still the situation as of now.

> At first I didn't knew that I am upgrading to bleeding edge/developer
> branch of FreeBSD.  I'll come straight out with it,  8.0-STABLE sounds
> more stable than 8.0-RELEASE-p2, which I was running before upgrade ;)
> I'm a little confused with FreeBSD release cycle at first I compared
> it with Debian release cycle,  because I'm most familiar to it, and I
> used it a lot before using FreeBSD. Debian development is more
> one-dimensional - unstable/testing/stable/oldstable whereas FreeBSD
> has two stable branches - 8.0 and 7.2 which are actively developed.
> But still I am confused with FreeBSD naming and it's relation with
> tags which are used in standard-supfile. We have something like this:
> 9.0-CURRENT -> tag=.
> 8.0-STABLE -> tag=RELENG_8
> 8.0-RELEASE-p2 ->  tag=RELENG_8_0 ? (btw what does p2 mean?)
> If someone patient could explain it to me I'd be grateful.


9-CURRENT: the real crazyland
8-STABLE: a dev branch, from which 8.0 was tagged and eventually 8.1 will be
RELENG_8_0: 8.0-RELEASE + latest critical security and reliability
updates (8.0 is up to patchset #2, hence -p2)

Same line of thinking applies to 7-STABLE, 7.3-RELEASE and so on.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Can't boot after make installworld

2010-03-22 Thread Dan Naumov

On Mon, Mar 22, 2010 at 10:41 PM, Krzysztof Dajka  wrote:
> I've read that FreeBSD kernel supports 3D acceleration in ATI R7xx
> chipset and as I own motherboard with HD3300 built-in I thought that I
> would give it a try. I upgraded to see if there is any progress with
> ¿zfs? I don't really know if it's zfs related, but at certain load, my
> system crashes, and reboots. It happens only when using bonnie++ to
> benchmark I/O.

If you can consistently panic your 8.0 system with just bonnie++
alone, something is really really wrong. Are you using an amd64 system
with 2gb ram or more or is this i386 + 1-2gb ram? Amd64 systems with
2gb ram or more don't really usually require any tuning whatsoever
(except for tweaking performance for a specific workload), but if this
is i386, tuning will be generally required to archieve stability.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

ZFS on top of GELI / Intel Atom 330 system

2009-05-29 Thread Dan Naumov

Is there anyone here using ZFS on top of a GELI-encrypted provider on
hardware which could be considered "slow" by today's standards? What
are the performance implications of doing this? The reason I am asking
is that I am in the process of building a small home NAS/webserver,
starting with a single disk (intending to expand as the need arises)
on the following hardware:
http://www.tranquilpc-shop.co.uk/acatalog/BAREBONE_SERVERS.html This
is essentially: Intel Arom 330 1.6 Ghz dualcore on an Intel
D945GCLF2-based board with 2GB Ram, the first disk I am going to use
is a 1.5TB Western Digital Caviar Green.

I had someone run a few openssl crypto benchmarks (to unscientifically
assess the maximum possible GELI performance) on a machine running
FreeBSD on nearly the same hardware and it seems the CPU would become
the bottleneck at roughly 200 MB/s throughput when using 128 bit
Blowfish, 70 MB/s when using AES128 and 55 MB/s when using AES256.
This, on it's own is definately enough for my neeeds (especially in
the case of using Blowfish), but what are the performance implications
of using ZFS on top of a GELI-encrypted provider?

Also, free free to criticize my planned filesystem layout for the
first disk of this system, the idea behind /mnt/sysbackup is to take a
snapshot of the FreeBSD installation and it's settings before doing
potentially hazardous things like upgrading to a new -RELEASE:

ad1s1 (freebsd system slice)
ad1s1a =>  128bit Blowfish ad1s1a.eli 4GB swap
ad1s1b 128GB ufs2+s /
ad1s1c 128GB ufs2+s noauto /mnt/sysbackup

ad1s2 =>  128bit Blowfish ad1s2.eli
zpool
/home
/mnt/data1


Thanks for your input.

- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI / Intel Atom 330 system

2009-05-29 Thread Dan Naumov

Ouch, that does indeed sounds quite slow, especially considering that
a dual core Athlon 6400 is pretty fast CPU. Have you done any
comparison benchmarks between UFS2 with Softupdates and ZFS on the
same system? What are the read/write numbers like? Have you done any
investigating regarding possible causes of ZFS working so slow on your
system? Just wondering if its an ATA chipset problem, a drive problem,
a ZFS problem or what...

- Dan Naumov

On Fri, May 29, 2009 at 12:10 PM, Pete French
 wrote:
>> Is there anyone here using ZFS on top of a GELI-encrypted provider on
>> hardware which could be considered "slow" by today's standards? What
>
> I run a mirrored zpool on top of a pair of 1TB SATA drives - they are
> only 7200 rpm so pretty dog slow as far as I'm concerned. The
> CPOU is a dual core Athlon 6400, and I am running amd64. The performance
> is not brilliant - about 25 meg/second writing a file, and about
> 53 meg/second reading it.
>
> It's a bit dissapointing really - thats a lot slower that I expected
> when I built it, especially the write speed.
>
> -pete.
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI / Intel Atom 330 system

2009-05-29 Thread Dan Naumov

Thank you for your numbers, now I know what to expect when I get my
new machine, since our system specs look identical.

So basically on this system:

unencrypted ZFS read: ~70 MB/s per disk

128bit Blowfish GELI/ZFS write: 35 MB/s per disk
128bit Blowfish GELI/ZFS read: 24 MB/s per disk

I am curious what part of GELI is so inefficient to cause such a
dramatic slowdown. In comparison, my home desktop is a

C2D E6600 2,4 Ghz, 4GB RAM, Intel DP35DP, 1 x 1,5TB Seagate Barracuda
- Windows Vista x64 SP1

Read/Write on an unencrypted NTFS partition: ~85 MB/s
Read/Write on a Truecrypt AES-encrypted NTFS partition: ~65 MB/s

As you can see, the performance drop is noticeable, but not anywhere
nearly as dramatic.


- Dan Naumov


> I have a zpool mirror on top of two 128bit GELI blowfish devices with
> Sectorsize 4096, my system is a D945GCLF2 with 2GB RAM and a Intel Arom
> 330 1.6 Ghz dualcore. The two disks are a WDC WD10EADS and a WD10EACS
> (5400rpm). The system is running 8.0-CURRENT amd64. I have set
> kern.geom.eli.threads=3.
>
> This is far from a real benchmarks but:
>
> Using dd with bs=4m I get 35 MByte/s writing to the mirror (writing 35
> MByte/s to each disk) and 48 MByte/s reading from the mirror (reading
> with 24 MByte/s from each disk).
>
> My experience is that ZFS is not much of an overhead and will not
> degrade the performance as much as the encryption, so GELI is the
> limiting factor. Using ZFS without GELI on this system gives way higher
> read and write numbers, like reading with 70 MByte/s per disk etc.
>
> greetings,
> philipp
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI / Intel Atom 330 system

2009-05-29 Thread Dan Naumov

Now that I have evaluated the numbers and my needs a bit, I am really
confused about what appropriate course of action for me would be.

1) Use ZFS without GELI and hope that zfs-crypto get implemented in
Solaris and ported to FreeBSD "soon" and that when it does, it won't
come with such a dramatic performance decrease as GELI/ZFS seems to
result in.
2) Go ahead with the original plan of using GELI/ZFS and grind my
teeth at the 24 MB/s read speed off a single disk.


>> So basically on this system:
>>
>> unencrypted ZFS read: ~70 MB/s per disk
>>
>> 128bit Blowfish GELI/ZFS write: 35 MB/s per disk
>> 128bit Blowfish GELI/ZFS read: 24 MB/s per disk


> I'm in the same spot as you, planning to build a home NAS. I have
> settled for graid5/geli but haven't yet decided if I would benefit most
> from a dual core CPU at 3+ GHz or a quad core at 2.6. Budget is a concern...

Our difference is that my hardware is already ordered and Intel Atom
330 + D945GCLF2 + 2GB ram is what it's going to have :)


- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI / Intel Atom 330 system

2009-05-29 Thread Dan Naumov

Pardon my ignorance, but what do these numbers mean and what
information is deductible from them?

- Dan Naumov



> I don't mean to take this off-topic wrt -stable but just
> for fun, I built a -current kernel with dtrace and did:
>
>        geli onetime gzero
>        ./hotkernel &
>        dd if=/dev/zero of=/dev/gzero.eli bs=1m count=1024
>        killall dtrace
>        geli detach gzero
>
> The hot spots:
> [snip stuff under 0.3%]
> kernel`g_eli_crypto_run                                    50   0.3%
> kernel`_mtx_assert                                         56   0.3%
> kernel`SHA256_Final                                        58   0.3%
> kernel`rijndael_encrypt                                    72   0.4%
> kernel`_mtx_unlock_flags                                   74   0.4%
> kernel`rijndael128_encrypt                                 74   0.4%
> kernel`copyout                                             92   0.5%
> kernel`_mtx_lock_flags                                     93   0.5%
> kernel`bzero                                              114   0.6%
> kernel`spinlock_exit                                      240   1.3%
> kernel`bcopy                                              325   1.7%
> kernel`sched_idletd                                       810   4.3%
> kernel`swcr_process                                      1126   6.0%
> kernel`SHA256_Transform                                  1178   6.3%
> kernel`rijndaelEncrypt                                   5574  29.7%
> kernel`acpi_cpu_c1                                       8383  44.6%
>
> I had to build crypto and geom_eli into the kernel to get proper
> symbols.
>
> References:
>  http://wiki.freebsd.org/DTrace
>  http://www.brendangregg.com/DTrace/hotkernel
>
> --Emil
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

ZFS NAS configuration question

2009-05-30 Thread Dan Naumov

Hey

I am not entirely sure if this question belongs here or to another
list, so feel free to direct me elsewhere :)

Anyways, I am trying to figure out the best way to configure a NAS
system I will soon get my hands on, it's a Tranquil BBS2 (
http://www.tranquilpc-shop.co.uk/acatalog/BAREBONE_SERVERS.html ).
which has 5 SATA ports. Due to budget constraints, I have to start
small, either a single 1,5 TB drive or at most, a small 500 GB system
drive + a 1,5 TB drive to get started with ZFS. What I am looking for
is a configuration setup that would offer maximum possible storage,
while having at least _some_ redundancy and having the possibility to
grow the storage pool without having to reload the entire setup.

Using ZFS root right now seems to involve a fair bit of trickery (you
need to make an .ISO snapshot of -STABLE, burn it, boot from it,
install from within a fixit environment, boot into your ZFS root and
then make and install world again to fix the permissions). To top that
off, even when/if you do it right, not your entire disk goes to ZFS
anyway, because you still do need a swap and a /boot to be non-ZFS, so
you will have to install ZFS onto a slice and not the entire disk and
even SUN discourages to do that. Additionally, there seems to be at
least one reported case of a system failing to boot after having done
installworld on a ZFS root: the installworld process removes the old
libc, tries to install a new one and due to failing to apply some
flags to it which ZFS doesn't support, leave it uninstall, leaving the
system in an unusable state. This can be worked around, but gotchas
like this and the amount of work involved in getting the whole thing
running make me really lean towards having a smaller traditional UFS2
system disk for FreeBSD itself.

So, this leaves me with 1 SATA port used for a FreeBSD disk and 4 SATA
ports available for tinketing with ZFS. What would make the most sense
if I am starting with 1 disk for ZFS and eventually plan on having 4
and want to maximise storage, yet have SOME redundancy in case of a
disk failure? Am I stuck with 2 x 2 disk mirrors or is there some 3+1
configuration possible?

Sincerely,
- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS NAS configuration question

2009-05-30 Thread Dan Naumov

Is the idea behind leaving 1GB unused on each disk to work around the
problem of potentially being unable to replace a failed device in a
ZFS pool because a 1TB replacement you bought actually has a lower
sector count than your previous 1TB drive (since the replacement
device has to be either of exact same size or bigger than the old
device)?

- Dan Naumov

On Sat, May 30, 2009 at 10:06 PM, Louis Mamakos  wrote:
> I built a system recently with 5 drives and ZFS.  I'm not booting off a ZFS
> root, though it does mount a ZFS file system once the system has booted from
> a UFS file system.  Rather than dedicate drives, I simply partitioned each
> of the drives into a 1G partition
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI / Intel Atom 330 system

2009-05-31 Thread Dan Naumov

I am pretty sure that adding more disks wouldn't solve anything in
this case, only either using a faster CPU or a faster crypto system.
When you are capable of 70 MB/s reads on a single unecrypted disk, but
only 24 MB/s reads off the same disk while encrypted, your disk speed
isn't the problem.

- Dan Naumov



On Sun, May 31, 2009 at 5:29 PM, Ronald Klop
 wrote:
> On Fri, 29 May 2009 13:34:57 +0200, Dan Naumov  wrote:
>
>> Now that I have evaluated the numbers and my needs a bit, I am really
>> confused about what appropriate course of action for me would be.
>>
>> 1) Use ZFS without GELI and hope that zfs-crypto get implemented in
>> Solaris and ported to FreeBSD "soon" and that when it does, it won't
>> come with such a dramatic performance decrease as GELI/ZFS seems to
>> result in.
>> 2) Go ahead with the original plan of using GELI/ZFS and grind my
>> teeth at the 24 MB/s read speed off a single disk.
>
> 3) Add extra disks. It will speed up reading. One disk extra will about
> double the read speed.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS on top of GELI / Intel Atom 330 system

2009-05-31 Thread Dan Naumov

Hi

Since you are suggesting 2 x 8GB USB for a root partition, what is
your experience with read/write speed and lifetime expectation of
modern USB sticks under FreeBSD and why 2 of them, GEOM mirror?

- Dan Naumov



> Hi Dan,
>
> everybody has different needs, but what exactly are you doing with 128GB
> of / ? What I did is the following:
>
> 2GB CF card + CF to ATA adapter (today, I would use 2x8GB USB sticks,
> CF2ATA adapters suck, but then again, which Mobo has internal USB ports?)
>
> Filesystem             1024-blocks      Used    Avail Capacity  Mounted on
> /dev/ad0a                   507630    139740   327280    30%    /
> /dev/ad0d                  1453102   1292296    44558    97%    /usr
> /dev/md0                    253678        16   233368     0%    /tmp
>
> /usr is quite crowded, but I just need to clean up some ports again.
> /var, /usr/src, /home, /usr/obj, /usr/ports are all on the GELI+ZFS
> pool. If /usr turns out to be to small, I can also move /usr/local
> there. That way booting and single user involves trusty old UFS only.
>
> I also do regular dumps from the UFS filesystems to the ZFS tank, but
> there's really no sacred data under / or /usr that I would miss if the
> system crashed (all configuration changes are tracked using mercurial).
>
> Anyway, my point is to use the full disks for GELI+ZFS whenever
> possible. This makes it more easy to replace faulty disks or grow ZFS
> pools. The FreeBSD base system, I would put somewhere else.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS NAS configuration question

2009-06-02 Thread Dan Naumov

USB root partition for booting off UFS is something I have considered. I
have looked around and it seems that all the "install FreeBSD onto USB
stick" guides seem to involve a lot of manual work from a fixit environment,
does sysinstall not recognise USB drives as a valid disk device to
parition/label/install FreeBSD on? If I do go with an USB boot/root, what
things I should absolutely keep on it and which are "safe" to move to a ZFS
pool? The idea is that in case my ZFS configuration goes bonkers for some
reason, I still have a fully workable singleuser configuration to boot from
for recovery.

I haven't really used USB flash for many years, but I remember when they
first started appearing on the shelves, they got well known for their
horrible reliability (stick would die within a year of use, etc). Have they
improved to the point of being good enough to host a root partition on,
without having to setup some crazy GEOM mirror setup using 2 of them?

- Dan Naumov



2009/6/2 Gerrit Kühn 

> On Sat, 30 May 2009 21:41:36 +0300 Dan Naumov  wrote
> about ZFS NAS configuration question:
>
> DN> So, this leaves me with 1 SATA port used for a FreeBSD disk and 4 SATA
> DN> ports available for tinketing with ZFS.
>
> Do you have a USB port available to boot from? A conventional USB stick (I
> use 4 GB or 8GB these days, but smaller ones would certainly also do) is
> enough to hold the base system on UFS, and you can give the whole of your
> disks to ZFS without having to bother with booting from them.
>
>
> cu
>  Gerrit
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS NAS configuration question

2009-06-02 Thread Dan Naumov

This reminds me. I was reading the release and upgrade notes of OpenSolaris
2009.6 and noted one thing about upgrading from a previous version to the
new one::

When you pick the "upgrade OS" option in the OpenSolaris installer, it will
check if you are using a ZFS root partition and if you do, it intelligently
suggests to take a current snapshot of the root filesystem. After you finish
the upgrade and do a reboot, the boot menu offers you the option of booting
the new upgraded version of the OS or alternatively _booting from the
snapshot taken by the upgrade installation procedure_.

Reading that made me pause for a second and made me go "WOW", this is how
UNIX system upgrades should be done. Any hope of us lowly users ever seeing
something like this implemented in FreeBSD? :)

- Dan Naumov

On Tue, Jun 2, 2009 at 9:47 PM, Zaphod Beeblebrox  wrote:

>
>
> The system boots from a pair of drives in a gmirror.  Mot because you can't
> boot from ZFS, but because it's just so darn stable (and it predates the use
> of ZFS).
>
> Really there are two camps here --- booting from ZFS is the use of ZFS as
> the machine's own filesystem.  This is one goal of ZFS that is somewhat
> imperfect on FreeBSD at the momment.  ZFS file servers are another goal
> where booting from ZFS is not really required and only marginally
> beneficial.
>
>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS NAS configuration question

2009-06-02 Thread Dan Naumov

A little more info for the (perhaps) curious:

Managing Multiple Boot Environments:
http://dlc.sun.com/osol/docs/content/2009.06/getstart/bootenv.html#bootenvmgr
Introduction to Boot Environments:
http://dlc.sun.com/osol/docs/content/2009.06/snapupgrade/index.html

- Dan Naumov



On Tue, Jun 2, 2009 at 10:39 PM, Dan Naumov  wrote:
>
> This reminds me. I was reading the release and upgrade notes of OpenSolaris 
> 2009.6 and noted one thing about upgrading from a previous version to the new 
> one::
>
> When you pick the "upgrade OS" option in the OpenSolaris installer, it will 
> check if you are using a ZFS root partition and if you do, it intelligently 
> suggests to take a current snapshot of the root filesystem. After you finish 
> the upgrade and do a reboot, the boot menu offers you the option of booting 
> the new upgraded version of the OS or alternatively _booting from the 
> snapshot taken by the upgrade installation procedure_.
>
> Reading that made me pause for a second and made me go "WOW", this is how 
> UNIX system upgrades should be done. Any hope of us lowly users ever seeing 
> something like this implemented in FreeBSD? :)
>
> - Dan Naumov
>
>
>
>
>
> On Tue, Jun 2, 2009 at 9:47 PM, Zaphod Beeblebrox  wrote:
>>
>>
>> The system boots from a pair of drives in a gmirror.  Mot because you can't 
>> boot from ZFS, but because it's just so darn stable (and it predates the use 
>> of ZFS).
>>
>> Really there are two camps here --- booting from ZFS is the use of ZFS as 
>> the machine's own filesystem.  This is one goal of ZFS that is somewhat 
>> imperfect on FreeBSD at the momment.  ZFS file servers are another goal 
>> where booting from ZFS is not really required and only marginally beneficial.
>>
>>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS NAS configuration question

2009-06-03 Thread Dan Naumov

Anyone else think that this combined with freebsd-update integration
and a simplistic menu GUI for choosing the preferred boot environment
would make an _awesome_ addition to the base system? :)

- Dan Naumov


On Wed, Jun 3, 2009 at 5:42 PM, Philipp Wuensche wrote:
> I wrote a script implementing the most useful features of the solaris
> live upgrade, the only thing missing is selecting a boot-environment
> from the loader and freebsd-update support as I write the script on a
> system running current. I use this on all my freebsd-zfs boxes and it is
> extremely useful!
>
> http://anonsvn.h3q.com/projects/freebsd-patches/wiki/manageBE
>
> greetings,
> philipp
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

sponsoring ZFS development on FreeBSD

2009-06-05 Thread Dan Naumov

Hello

Kip, since your name comes up wherever I look about ZFS development on
FreeBSD, I thought to send this mail to you directly as well. My
question is concerning sponsoring the FreeBSD project and ZFS
development in particular. I know I am just a relatively poor person
so I can't contribute much (maybe on the order of 20-30 euro a month),
but I keep seeing FreeBSD core team members keep mentioning "we value
donations of all sizes", so what the hell :) Anyways, in the past I
have directed my donations to The FreeBSD Foundation, if I want to
ensure that as much of my money as possible goes directly to benefit
the development of ZFS support on FreeBSD, should I continue donating
to the foundation or should I be sending donations directly to
specific developers?


Thank you
- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

gptzfsboot and RELENG_7

2009-06-08 Thread Dan Naumov

Hello list

Any ideas if gptzfsboot is going to be MFC'ed into RELENG_7 anytime
soon? I am going to be building a NAS soon and I would like to have a
"full ZFS" system without having to resort to running 8-CURRENT :)


Sincerely,
- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: gptzfsboot and RELENG_7

2009-06-08 Thread Dan Naumov

Several posts made to this list AFTER the zfs v13 MFC to RELENG_7
indicated that even after that MFC, you still needed gptzfsboot from
8-CURRENT to be able to boot from a full ZFS system. Is this not the
case? I have a 7.2-STABLE built on May 30 and I do not have gptzfsboot
in my /boot, only gptboot. I didn't make any changes to the stock
Makefiles and used GENERIC kernel config. Do I need to adjust some
options for gptzfsboot to get built?

- Dan Naumov



>>
>
> 5/25/09 - last month
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: /boot/loader and RELENG_7 (WAS: gptzfsboot and RELENG_7)

2009-06-08 Thread Dan Naumov

Ah, so there is still a (small) piece of 8-CURRENT needed to have a
working 7-STABLE zfs boot configuration? I am getting really confused
now, if I add LOADER_ZFS_SUPPORT=yes to my /etc/make.conf, the
RELENG_7 system will be built with zfs boot support, but I still need
the actual /boot/loader from 8-CURRENT? Is that getting MFC'ed into
into RELENG_7 anytime soon?

Where are all make.conf options documented by the way? Neither
/usr/share/examples/etc/make.conf nor "man make.conf" make any
reference to the LOADER_ZFS_SUPPORT option.

- Dan Naumov

On Mon, Jun 8, 2009 at 7:49 PM, Alberto Villa wrote:
> On Monday 08 June 2009 17:44:40 Dan Naumov wrote:
>> Several posts made to this list AFTER the zfs v13 MFC to RELENG_7
>> indicated that even after that MFC, you still needed gptzfsboot from
>> 8-CURRENT to be able to boot from a full ZFS system. Is this not the
>> case? I have a 7.2-STABLE built on May 30 and I do not have gptzfsboot
>> in my /boot, only gptboot. I didn't make any changes to the stock
>> Makefiles and used GENERIC kernel config. Do I need to adjust some
>> options for gptzfsboot to get built?
>
> no, it's /boot/loader from 8-current which is needed (the one shared on this
> list works perfectly for me)
> to build your system with zfs boot support just add LOADER_ZFS_SUPPORT=yes
> to /etc/make.conf
> --
> Alberto Villa 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

trouble building a "make release" snapshot of 7.2-STABLE

2009-06-09 Thread Dan Naumov

So first I cvsupped the entire cvs repository (sans ports) using the
following supfile:

===
*default host=ftp13.FreeBSD.org
*default base=/var/db
*default prefix=/backup/ncvs
*default release=cvs
*default delete use-rel-suffix compress

src-all
doc-all
cvsroot-all
===

Then I cd /usr/src/release and do:

===
make release RELEASETAG=RELENG_7 TARGET_ARCH=amd64 TARGET=amd64
BUILDNAME=7.2-STABLE CHROOTDIR=/backup/releng CVSROOT=/backup/ncvs
NODOC=yes NOPORTS=yes NOPORTREADMES=yes MAKE_ISOS=yes NO_FLOPPIES=yes
LOCAL_PATCHES=/root/zfs-libstand-loader-patch
===

However, the process bombs out on me within 5 seconds with the following:

===
--
>>> Installing everything
--
cd /usr/src; make -f Makefile.inc1 install
===> share/info (install)
install -o root -g wheel -m 444  dir-tmpl /backup/releng/usr/share/info/dir
install:No such file or directory
*** Error code 1

Stop in /usr/src/share/info.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src/release.
===

And...

===
agathon# which install
/usr/bin/install
===

Any ideas?

- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Issues with gjournal (heaaaaaaaaaaalp!)

2009-06-10 Thread Dan Naumov

You need to mount your /dev/ad6s1d.journal as /usr and not
/dev/ad6s1d, because this is the new device provided to you by GEOM.

- Dan Naumov



On Thu, Jun 11, 2009 at 5:50 AM, Garrett Cooper wrote:
> On Wed, Jun 10, 2009 at 7:44 PM, Garrett Cooper wrote:
>> Hi Pawel, ATA, and Stable folks,
>>
>>    This time when I did a reinstall I took the bullet and tried to
>> use gjournaling instead of softupdates. The unfortunate thing is that
>> I can't seem to get it to work.
>>
>> Here's the procedure that I'm trying to follow (based off of [1]):
>> - sysinstall from scratch with a minimal distribution. This creates
>> /usr // /dev/ad6s1d as UFS2 with softupdates disabled.
>> - Pull latest stable sources. Rebuild kernel (with `options
>> GEOM_JOURNAL'), world, install kernel, then world after reboot.
>> - gjournal label -f ad6s1d ad6s2d
>> - mount /dev/ad6s1d /usr # That works (I think...), but prints out the
>> error message below:
>>
>> GEOM_JOURNAL: [flush] Error while writing data (error=1)
>> ad6s2d[WRITE(offset=512, length=6656)]
>>
>> gjournal status says:
>>            Name   Status   Components
>> ad6s1d.journal       N/A   ad6s1d
>>                                     ad6s2d
>>
>> Some issues I noticed:
>>
>> - GJOURNAL ROOT (something) loops infinitely if the device can't be
>> found; this should probably time out and panic / exit if a device
>> becomes unavailable (depends on fstab values in the final 2 fields no
>> doubt). I did this by accident when I forgot to add iir statically to
>> the kernel.
>> - The LiveCD doesn't fully support gjournal (userland's there, kernel
>> support isn't). Kind of annoying and counterproductive...
>> - Existing journal partitions disappeared when I upgraded by accident
>> from 7.2-RELEASE to 8-CURRENT (silly me copied my srcs.sup file from
>> my server with label=.). That was weird...
>> - When I use gjournal label with an existing filesystem I _must_ use -f.
>>
>> Any help with this endeavor would be more than appreciated, as I want
>> to enable this functionality before I move on to installing X11, as
>> nvidia-driver frequently hardlocks the desktop (or has in the past).
>>
>> Thanks,
>> -Garrett
>>
>> [1] 
>> http://www.freebsd.org/doc/en_US.ISO8859-1/articles/gjournal-desktop/article.html
>
> And to answer another potential question, I've tried mounting both
> with -o rw,async and with -o rw.
> Thanks!
> -Garrett
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Does this disk/filesystem layout look sane to you?

2009-06-14 Thread Dan Naumov

Hello list.

I just wanted to have an extra pair (or a dozen) of eyes look this
configuration over before I commit to it (tested it in VMWare just in
case, it works, so I am considering doing this on real hardware soon).
I drew a nice diagram: http://www.pastebin.ca/1460089 Since it doesnt
show on the diagram, let me clarify that the geom mirror consumers as
well as the vdevz for ZFS RAIDZ are going to be partitions (raw disk
=> full disk slice => swap partition | mirror provider partition | zfs
vdev partition | unused.

Is there any actual downside to having a 5-way mirror vs a 2-way or a 3-way one?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Does this disk/filesystem layout look sane to you?

2009-06-14 Thread Dan Naumov

The main reason for NOT using zfs directly on raw disks is the fact
that you cannot replace a vdev in a pool with a smaller one, only with
one of equal size or bigger. This leads to a problem: if you are a
regular Joe User (and not a company buying certified hardware from a
specific vendor) and want to replace one of the disks in your pool.
The new 2tb disk you buy can very often be actually a few sectors
smaller then the disk you are trying to replace, this in turn will
lead to zfs not accepting the new disk as a replacement, because it's
smaller (no matter how small).

Using zfs on partitions instead and keeping a few gb unused on each
disk leaves us with some room to play and be able to avoid this issue.

- Dan Naumov

On Mon, Jun 15, 2009 at 5:16 AM, Freddie Cash wrote:
> I don't know for sure if it's the same on FreeBSD, but on Solaris, ZFS will
> disable the onboard disk cache if the vdevs are not whole disks.  IOW, if
> you use slices, partitions, or files, the onboard disk cache is disabled.
> This can lead to poor write performance.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Does this disk/filesystem layout look sane to you?

2009-06-15 Thread Dan Naumov

If this is true, some magic has been done to the FreeBSD port of ZFS,
because according to SUN documentation is is definitely not supposed
to be possible.

- Dan Naumov


On Mon, Jun 15, 2009 at 10:48 AM, Pete
French wrote:
>> The new 2tb disk you buy can very often be actually a few sectors
>> smaller then the disk you are trying to replace, this in turn will
>> lead to zfs not accepting the new disk as a replacement, because it's
>> smaller (no matter how small).
>
> Heh - you are in for a pleasent surprise my friend! ;-) If you actually
> try this in practice you will find ZFS *does* accept a smaller drive as
> a replacement. Preseumably to cope with the natural variability in sector
> size that you describe.
>
> Surprised me too the first time I saw it...
>
> -pete.
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Does this disk/filesystem layout look sane to you?

2009-06-15 Thread Dan Naumov

Haven't had time to test (stuck at work), but I will trust your word
:) Well, this sounds nice and sensible. I am curious though if there
have been any numbers regarding how much do "actual" drive sizes vary
in the real world when it comes to disks of same
manufacturer/model/size. I guess this probably varies from
manufacturer to manufacturer, but some average estimates would be
nice, just so that one could evaluate whether this 64k barrier is
enough.

- Dan Naumov

On Mon, Jun 15, 2009 at 11:35 AM, Pete
French wrote:
>> If this is true, some magic has been done to the FreeBSD port of ZFS,
>> because according to SUN documentation is is definitely not supposed
>> to be possible.
>
> I just tried it again to make sure I wasn't imagining things - you
> can give it a shot yourself using mdconfig to create some drives. It
> will let me drop in a replacement up to about 64k smaller than the original
> with no problems. Below that and it refuses saying the drive is too
> small.
>
> -pete.
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates

2009-06-17 Thread Dan Naumov

I am wondering if the numbers I am seeing is something expected or is
something broken somewhere. Output of bonnie -s 1024:

on UFS2 + SoftUpdates:

  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
 1024 56431 94.5 88407 38.9 77357 53.3 64042 98.6 644511 98.6
23603.8 243.3

on ZFS:

  ---Sequential Output ---Sequential Input-- --Random--
  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
 1024 22591 53.7 45602 35.1 14770 13.2 45007 83.8 94595 28.0 102.2  1.2


atom# cat /boot/loader.conf
vm.kmem_size="1024M"
vm.kmem_size_max="1024M"
vfs.zfs.arc_max="96M"

The test isn't completely fair in that the test on UFS2 is done on a
partition that resides on the first 16gb of a 2tb disk while the zfs
test is done on the enormous 1,9tb zfs pool that comes after that
partition (same disk). Can this difference in layout make up for the
huge difference in performance or is there something else in play? The
system is an Intel Atom 330 dualcore, 2gb ram, Western Digital Green
2tb disk. Also what would be another good way to get good numbers for
comparing the performance of UFS2 vs ZFS on the same system.


Sincerely,
- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS performance on 7.2-release/amd64 low compared to UFS2 + SoftUpdates

2009-06-17 Thread Dan Naumov

All the ZFS tuning guides for FreeBSD (including one on the FreeBSD
ZFS wiki) have recommended values between 64M and 128M to improve
stability, so that what I went with. How much of my max kmem is it
safe to give to ZFS?

- Dan Naumov

On Thu, Jun 18, 2009 at 2:51 AM, Ronald Klop wrote:
> Isn't 96M for ARC really small?
> Mine is 860M.
> vfs.zfs.arc_max: 860072960
> kstat.zfs.misc.arcstats.size: 657383376
>
> I think the UFS2 cache is much bigger which makes a difference in your test.
>
> Ronald.
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

ufs2 / softupdates / ZFS / disk write cache

2009-06-20 Thread Dan Naumov

I have the following setup:

A single consumer-grade 2tb SATA disk: Western Digital Green (model
WDC WD20EADS-00R6B0). This disk is setup like this:

16gb root partition with UFS2 + softupdates, containing mostly static things:
/bin /boot /etc /root /sbin /usr /var and such

a 1,9tb non-redundant zfs pool on top of a slice, it hosts things like:
/DATA, /home, /usr/local, /var/log and such.

What should I do to ensure (as much as possible) filesystem
consistency of the root filesystem in the case of the power loss? I
know there have been a lot of discussions on the subject of
consumer-level disks literally lying about the state of files in
transit (disks telling the system that files have been written to disk
while in reality they are still in disk's write cache), in turn
throwing softupdates off balance (since softupdates assumes the disks
don't lie about such things), in turn sometimes resulting in severe
data losses in the case of a system power loss during heavy disk IO.

One of the solutions that was often brought up in the mailing lists is
disabling the actual disk write cache via adding hw.ata.wc=0 to
/boot/loader.conf, FreeBSD 4.3 actually even had this setting by
default, but this was apparently reverted back because some people
have reported a write performance regression on the tune of becoming
4-6 times slower. So what should I do in my case? Should I disable
disk write cache via the hw.ata.wc tunable? As far as I know, ZFS has
a write cache of it's own and since the ufs2 root filesystem in my
case is mostly static data, I am guessing I "shouldn't" notice that
big of a performance hit. Or am I completely in the wrong here and
setting hw.ata.wc=0 is going to adversely affect the write performance
on both the root partition AND the zfs pool despite zfs using it's own
write cache?

Another thing I have been pondering is: I do have 2gb of space left
unused on the system (currently being used as swap, I have 2 swap
slices, one 1gb at the very beginning of the disk, the other being 2gb
at the end), which I could turn into a GJOURNAL for the root
filesystem...


Sincerely,
- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Zpool on raw disk and weird GEOM complaint

2009-06-29 Thread Dan Naumov

On Mon, Jun 29, 2009 at 12:43 PM, Patrick M. Hausen wrote:
> Hi, all,
>
> I have a system with 12 S-ATA disks attached that I set up
> as a raidz2:
>
> %zpool status zfs
>  pool: zfs
>  state: ONLINE
>  scrub: scrub in progress for 0h5m, 7.56% done, 1h3m to go
> config:
>
>        NAME        STATE     READ WRITE CKSUM
>        zfs         ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            da0     ONLINE       0     0     0
>            da1     ONLINE       0     0     0
>            da2     ONLINE       0     0     0
>            da3     ONLINE       0     0     0
>            da4     ONLINE       0     0     0
>            da5     ONLINE       0     0     0
>            da6     ONLINE       0     0     0
>            da7     ONLINE       0     0     0
>            da8     ONLINE       0     0     0
>            da9     ONLINE       0     0     0
>            da10    ONLINE       0     0     0
>            da11    ONLINE       0     0     0
>
> errors: No known data errors

I can't address your issue at hand, but I would point out that having
a raidz/raidz2 consisting of more than 9 vdevs is a BAD IDEA (tm). All
SUN documentation recommends using groups from 3 to 9 vdevs in size.
There are known cases where using more vdevs than recommended causes
performance degradation and more importantly, parity computation
problems which can result in crashes and potential data loss. In your
case, I would have the pool built as a group of 2 x 6-disk raidz.

Sincerely,
- Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: mergemaster merge left/right

2009-07-03 Thread Dan Naumov

Speaking of mergemaster, it would be really really nice to have
"freebsd-update install" get the following functionality/options from
mergemaster:

-i   Automatically install any files that do not exist in the
destination directory.
-F   If the files differ only by VCS Id ($FreeBSD) install the new file.

This would help avoid having to manually approve installation of
hundreds of files in /etc when you upgrade to new releases using
freebsd-update.

- Sincerely,
Dan Naumov



On Fri, Jul 3, 2009 at 11:51 AM, Dominic Fandrey wrote:
> I'd really like mergemaster to tell me whether the left
> or the right side is the new file.
>
> # $FreeBSD: src/etc/devd.conf,v 1.38. | # $FreeBSD: src/etc/devd.conf,v 1.38.
>
> Like this I have no idea which one to pick.
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

ZFS and df weirdness

2009-07-04 Thread Dan Naumov

Hello list.

I have a single 2tb disk used on a 7.2-release/amd64 system with a
small part of it given to UFS and most of the disk given to a single
"simple" zfs pool with several filesystems without redundancy. I've
noticed a really weird thing regarding what "df" reports regarding the
"total space" of one of my filesystems:

atom# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
tank   1.80T294G   1.51T15%  ONLINE -

atom# zfs list
NAME  USED  AVAIL  REFER  MOUNTPOINT
tank  294G  1.48T18K  none
tank/DATA 292G  1.48T   292G  /DATA
tank/home 216K  1.48T21K  /home
tank/home/jago132K  1.48T   132K  /home/jago
tank/home/karni62K  1.48T62K  /home/karni
tank/usr 1.33G  1.48T18K  none
tank/usr/local455M  1.48T   455M  /usr/local
tank/usr/obj   18K  1.48T18K  /usr/obj
tank/usr/ports412M  1.48T   412M  /usr/ports
tank/usr/src  495M  1.48T   495M  /usr/src
tank/var  320K  1.48T18K  none
tank/var/log  302K  1.48T   302K  /var/log

atom# df
Filesystem  1K-blocks   Used  Avail Capacity  Mounted on
/dev/ad12s1a  16244334   1032310   13912478 7%/
devfs1 1  0   100%/dev
linprocfs4 4  0   100%/usr/compat/linux/proc
tank/DATA   1897835904 306397056 159143884816%/DATA
tank/home   1591438848 0 1591438848 0%/home
tank/home/jago  1591438976   128 1591438848 0%/home/jago
tank/home/karni 1591438848 0 1591438848 0%/home/karni
tank/usr/local  1591905024466176 1591438848 0%/usr/local
tank/usr/obj1591438848 0 1591438848 0%/usr/obj
tank/usr/ports  1591860864422016 1591438848 0%/usr/ports
tank/usr/src1591945600506752 1591438848 0%/usr/src
tank/var/log1591439104   256 1591438848 0%/var/log

atom# df -h
Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/ad12s1a15G1.0G 13G 7%/
devfs  1.0K1.0K  0B   100%/dev
linprocfs  4.0K4.0K  0B   100%/usr/compat/linux/proc
tank/DATA  1.8T292G1.5T16%/DATA
tank/home  1.5T  0B1.5T 0%/home
tank/home/jago 1.5T128K1.5T 0%/home/jago
tank/home/karni1.5T  0B1.5T 0%/home/karni
tank/usr/local 1.5T455M1.5T 0%/usr/local
tank/usr/obj   1.5T  0B1.5T 0%/usr/obj
tank/usr/ports 1.5T412M1.5T 0%/usr/ports
tank/usr/src   1.5T495M1.5T 0%/usr/src
tank/var/log   1.5T256K1.5T 0%/var/log

Considering that every single filesystem is part of the exact same
pool, with no custom options whatsoever used during filesystem
creation (except for mountpoints), why is the size of tank/DATA 1.8T
while the others are 1.5T?


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS and df weirdness

2009-07-04 Thread Dan Naumov

On Sun, Jul 5, 2009 at 2:26 AM, Freddie Cash wrote:
>
>
> On Sat, Jul 4, 2009 at 2:55 PM, Dan Naumov  wrote:
>>
>> Hello list.
>>
>> I have a single 2tb disk used on a 7.2-release/amd64 system with a
>> small part of it given to UFS and most of the disk given to a single
>> "simple" zfs pool with several filesystems without redundancy. I've
>> noticed a really weird thing regarding what "df" reports regarding the
>> "total space" of one of my filesystems:
>>
>> atom# df -h
>> Filesystem         Size    Used   Avail Capacity  Mounted on
>> /dev/ad12s1a        15G    1.0G     13G     7%    /
>> devfs              1.0K    1.0K      0B   100%    /dev
>> linprocfs          4.0K    4.0K      0B   100%    /usr/compat/linux/proc
>> tank/DATA          1.8T    292G    1.5T    16%    /DATA
>> tank/home          1.5T      0B    1.5T     0%    /home
>> tank/home/jago     1.5T    128K    1.5T     0%    /home/jago
>> tank/home/karni    1.5T      0B    1.5T     0%    /home/karni
>> tank/usr/local     1.5T    455M    1.5T     0%    /usr/local
>> tank/usr/obj       1.5T      0B    1.5T     0%    /usr/obj
>> tank/usr/ports     1.5T    412M    1.5T     0%    /usr/ports
>> tank/usr/src       1.5T    495M    1.5T     0%    /usr/src
>> tank/var/log       1.5T    256K    1.5T     0%    /var/log
>>
>> Considering that every single filesystem is part of the exact same
>> pool, with no custom options whatsoever used during filesystem
>> creation (except for mountpoints), why is the size of tank/DATA 1.8T
>> while the others are 1.5T?
>
> Did you set a reservation for any of the other filesystems?  Reserved space
> is not listed in the "general" pool.

no custom options whatsoever were used during filesystem creation
(except for mountpoints).

- Dan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: bug in ufs?

2009-07-05 Thread Dan Naumov

2009/7/5 Marat N.Afanasyev :
> hello!
>
> i have a strange problem with writing data to my ufs2+su filesystem.
>
> 1. i made a 1T gpt partition on my storage server, and formatted it:
> newfs -U -m 0 -o time -i 32768 /dev/da1p3a
>
> 2. i tried to move data from other servers to this filesystem, total size of
> files is slightly less than 1T
>
> 3. i encountered a 'No space left on device' while i still have 11G of free
> space and about 13 million free inodes on the filesystem:
>
> #df -ih
> Filesystem     Size    Used   Avail Capacity  iused    ifree %iused Mounted
> on
> /dev/da1p3a    1.0T    1.0T     11G    99% 20397465 13363173   60%
> /mnt/45_114
>
> all i want to know is whether this is a bug or a feature? and if such a
> behavior is well-known, where can i read about it?

By default, a part of a filesystem is reserved, the amount reserved
has historically varied between 5-8%. This is adjustable. See the "-m"
switch to tunefs.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: What is /boot/kernel/*.symbols?

2009-07-06 Thread Dan Naumov

On Mon, Jul 6, 2009 at 11:34 AM, Dimitry Andric wrote:
> On 2009-07-06 09:42, Patrick M. Hausen wrote:
>>> #define ROOT_DEFAULT_SIZE               512
>>
>> IMHO it is not. If you install a kernel with *.symbols present
>> twice (i.e. kernel and kernel.old contain symbol files), your
>> root partition will be > 95% full.
>
> I'm not sure how you arrive at this number; even with -CURRENT (on i386,
> with all debug symbols), I could store about 4 complete kernels on such
> a filesystem:
>
> $ du -hs /boot/kernel*
> 122M    /boot/kernel
> 122M    /boot/kernel.20090629a
> 121M    /boot/kernel.20090630a
> 122M    /boot/kernel.20090702a
> 121M    /boot/kernel.20090703a
>
> All other files on my root filesystem use up an additional ~25 MiB, so
> in practice, it would be limited to 3 kernels, with more than enough
> breathing room.

atom# uname -a
FreeBSD atom.localdomain 7.2-RELEASE-p1 FreeBSD 7.2-RELEASE-p1 #0: Tue
Jun  9 18:02:21 UTC 2009
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

atom# du -hs /boot/kernel*
205M/boot/kernel


This is on a stock 7.2-release/amd64 updated to -p1 with
freebsd-update, 2 kernels is the maximum that would fit into the
default 512mb partition size for /, a bit too tight for my liking.


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

7.2-release/amd64: panic, spin lock held too long

2009-07-06 Thread Dan Naumov

I just got a panic following by a reboot a few seconds after running
"portsnap update", /var/log/messages shows the following:

Jul  7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel
Jul  7 03:49:38 atom kernel: spin lock 0x80b3edc0 (sched lock
1) held by 0xff00017d8370 (tid 100054) too long
Jul  7 03:49:38 atom kernel: panic: spin lock held too long

/var/crash looks empty. This is a system running official 7.2-p1
binaries since I am using freebsd-update to keep up with the patches
(just updated to -p2 after this panic) running with very low load,
mostly serving files to my home network over Samba and running a few
irssi instances in a screen. What do I need to do to catch more
information if/when this happens again?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 7.2-release/amd64: panic, spin lock held too long

2009-07-06 Thread Dan Naumov

On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote:
> 2009/7/7 Dan Naumov :
>> I just got a panic following by a reboot a few seconds after running
>> "portsnap update", /var/log/messages shows the following:
>>
>> Jul  7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel
>> Jul  7 03:49:38 atom kernel: spin lock 0x80b3edc0 (sched lock
>> 1) held by 0xff00017d8370 (tid 100054) too long
>> Jul  7 03:49:38 atom kernel: panic: spin lock held too long
>
> That's a known bug, affecting -CURRENT as well.
> The cpustop IPI is handled though an NMI, which means it could
> interrupt a CPU in any moment, even while holding a spinlock,
> violating one well known FreeBSD rule.
> That means that the cpu can stop itself while the thread was holding
> the sched lock spinlock and not releasing it (there is no way, modulo
> highly hackish, to fix that).
> In the while hardclock() wants to schedule something else to run and
> got stuck on the thread lock.
>
> Ideal fix would involve not using a NMI for serving the cpustop while
> having a cheap way (not making the common path too hard) to tell
> hardclock() to avoid scheduling while cpustop is in flight.
>
> Thanks,
> Attilio

Any idea if a fix is being worked on and how unlucky must one be to
run into this issue, should I expect it to happen again? Is it
basically completely random?


- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: FreeBSD 8.0-BETA1 Available

2009-07-07 Thread Dan Naumov

On Tue, Jul 7, 2009 at 3:33 AM, Ken Smith wrote:

> Be careful if you have SCSI drives, more USB disks than just the memory
> stick, etc - make sure /dev/da0 (or whatever you use) is the memory
> stick.  Using this image for livefs based rescue mode is known to not
> work, that is one of the things still being worked on.

Hey

Just wanted a small clarification, does livefs based rescue mode mean
"fixit environment" or not? I would like to do some configuration
testing with 8.0-beta1, but applying the configuration pretty much
requires working in FIXIT, since sysinstall isn't exactly up to the
task.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: What is /boot/kernel/*.symbols?

2009-07-07 Thread Dan Naumov

On Tue, Jul 7, 2009 at 7:09 PM, Rick C.
Petty wrote:
> On Tue, Jul 07, 2009 at 11:24:51AM +0200, Ruben de Groot wrote:
>> On Mon, Jul 06, 2009 at 04:20:45PM -0500, Rick C. Petty typed:
>> > On Mon, Jul 06, 2009 at 11:39:04AM +0200, Ruben de Groot wrote:
>> > > On Mon, Jul 06, 2009 at 10:46:50AM +0200, Dimitry Andric typed:
>> > > >
>> > > > Right, so it's a lot bigger on amd64.  I guess those 64-bit pointers
>> > > > aren't entirely free. :)
>> > >
>> > > I'm not sure where the size difference comes from. I have some sparc64
>> > > systems running -current with symbols and the size of /boot/kernel is
>> > > more comparable to i386, even with the 8-byte pointer size:
>> >
>> > Um, probably there are a lot of devices on amd64 that aren't available for
>> > sparc64?
>>
>> Yes, That's probably it.
>
> It was just a theory; I don't have sparc64.  What's your output of
> "ls -1 /boot/kernel | wc"?
>
> -- Rick C. Petty

atom# uname -a
FreeBSD atom.localdomain 7.2-RELEASE-p2 FreeBSD 7.2-RELEASE-p2 #0: Wed
Jun 24 00:14:35 UTC 2009
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

atom# ls -1 /boot/kernel | wc
10111011   15243

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS: drive replacement performance

2009-07-07 Thread Dan Naumov

On Wed, Jul 8, 2009 at 1:32 AM, Freddie Cash wrote:
> On Tue, Jul 7, 2009 at 3:26 PM, Mahlon E. Smith  wrote:
>
>> On Tue, Jul 07, 2009, Freddie Cash wrote:
>> >
>> > This is why we've started using glabel(8) to label our drives, and then
>> add
>> > the labels to the pool:
>> >   # zpool create store raidz1 label/disk01 label/disk02 label/disk03
>> >
>> > That way, it does matter where the kernel detects the drives or what the
>> > physical device node is called, GEOM picks up the label, and ZFS uses the
>> > label.
>>
>> Ah, slick.  I'll definitely be doing that moving forward.  Wonder if I
>> could do it piecemeal now via a shell game, labeling and replacing each
>> individual drive?  Will put that on my "try it" list.

Not to derail this discussion, but can anyone explain if the actual
glabel metadata is protected in any way? If I use glabel to label a
disk and then create a pool using /dev/label/disklabel, won't ZFS
eventually overwrite the glabel metadata in the last sector since the
disk in it's entirety is given to the pool? Or is every filesystem
used by FreeBSD (ufs, zfs, etc) hardcoded to ignore the last few
sectors of any disk and/or partition and not write data to it to avoid
such issues?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

glabel metadata protection (WAS: ZFS: drive replacement performance)

2009-07-07 Thread Dan Naumov

>> Not to derail this discussion, but can anyone explain if the actual
>> glabel metadata is protected in any way? If I use glabel to label a
>> disk and then create a pool using /dev/label/disklabel, won't ZFS
>> eventually overwrite the glabel metadata in the last sector since the
>> disk in it's entirety is given to the pool? Or is every filesystem
>> used by FreeBSD (ufs, zfs, etc) hardcoded to ignore the last few
>> sectors of any disk and/or partition and not write data to it to avoid
>> such issues?
>
> Disks labeled with glabel lose their last sector to the label.  It is not
> accessible by ZFS.  Disks with bsdlabel partition tables are at risk due to
> the brain dead decision to allow partitions to overlap the first sector,
> but modern designs like glabel avoid this mistake.
>
> -- Brooks

So what happens if I was to do the following (for the same of example):

gpart create -s GPT /dev/ad1
glabel label -v disk01 /dev/ad1
gpart add -b 1 -s  -t freebsd-zfs /dev/ad1

Does "gpart add" automatically somehow recognize that the last sector
of  contains the glabel and automatically re-adjusts this
command to make the freebsd-zfs partition take "entiredisk minus last
sector" ? I can understand the logic of metadata being protected if I
do a: "gpart add -b 1 -s  -t freebsd-zfs
/dev/label/disk01" since gpart will have to go through the actual
label first, but what actually happens if I issue a gpart directly to
the /dev/device?

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 7.2-release/amd64: panic, spin lock held too long

2009-07-07 Thread Dan Naumov

On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao wrote:
> 2009/7/7 Dan Naumov :
>> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote:
>>> 2009/7/7 Dan Naumov :
>>>> I just got a panic following by a reboot a few seconds after running
>>>> "portsnap update", /var/log/messages shows the following:
>>>>
>>>> Jul  7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel
>>>> Jul  7 03:49:38 atom kernel: spin lock 0x80b3edc0 (sched lock
>>>> 1) held by 0xff00017d8370 (tid 100054) too long
>>>> Jul  7 03:49:38 atom kernel: panic: spin lock held too long
>>>
>>> That's a known bug, affecting -CURRENT as well.
>>> The cpustop IPI is handled though an NMI, which means it could
>>> interrupt a CPU in any moment, even while holding a spinlock,
>>> violating one well known FreeBSD rule.
>>> That means that the cpu can stop itself while the thread was holding
>>> the sched lock spinlock and not releasing it (there is no way, modulo
>>> highly hackish, to fix that).
>>> In the while hardclock() wants to schedule something else to run and
>>> got stuck on the thread lock.
>>>
>>> Ideal fix would involve not using a NMI for serving the cpustop while
>>> having a cheap way (not making the common path too hard) to tell
>>> hardclock() to avoid scheduling while cpustop is in flight.
>>>
>>> Thanks,
>>> Attilio
>>
>> Any idea if a fix is being worked on and how unlucky must one be to
>> run into this issue, should I expect it to happen again? Is it
>> basically completely random?
>
> I'd like to work on that issue before BETA3 (and backport to
> STABLE_7), I'm just time-constrained right now.
> it is completely random.
>
> Thanks,
> Attilio

Ok, this is getting pretty bad, 23 hours later, I get the same kind of
panic, the only difference is that instead of "portsnap update", this
was triggered by "portsnap cron" which I have running between 3 and 4
am every day:

Jul  8 03:03:49 atom kernel: ssppiinn  lloocckk
00xxffffffff8800bb33eeddc400  ((sscchheedd  lloocck k1 )0 )h
ehledl db yb y 0x0xfff0f1081735339760e 0( t(itdi d
1016070)5 )t otoo ol olnogng
Jul  8 03:03:49 atom kernel: p
Jul  8 03:03:49 atom kernel: anic: spin lock held too long
Jul  8 03:03:49 atom kernel: cpuid = 0
Jul  8 03:03:49 atom kernel: Uptime: 23h2m38s

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 7.2-release/amd64: panic, spin lock held too long

2009-07-08 Thread Dan Naumov

On Wed, Jul 8, 2009 at 3:57 AM, Dan Naumov wrote:
> On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao wrote:
>> 2009/7/7 Dan Naumov :
>>> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao wrote:
>>>> 2009/7/7 Dan Naumov :
>>>>> I just got a panic following by a reboot a few seconds after running
>>>>> "portsnap update", /var/log/messages shows the following:
>>>>>
>>>>> Jul  7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel
>>>>> Jul  7 03:49:38 atom kernel: spin lock 0x80b3edc0 (sched lock
>>>>> 1) held by 0xff00017d8370 (tid 100054) too long
>>>>> Jul  7 03:49:38 atom kernel: panic: spin lock held too long
>>>>
>>>> That's a known bug, affecting -CURRENT as well.
>>>> The cpustop IPI is handled though an NMI, which means it could
>>>> interrupt a CPU in any moment, even while holding a spinlock,
>>>> violating one well known FreeBSD rule.
>>>> That means that the cpu can stop itself while the thread was holding
>>>> the sched lock spinlock and not releasing it (there is no way, modulo
>>>> highly hackish, to fix that).
>>>> In the while hardclock() wants to schedule something else to run and
>>>> got stuck on the thread lock.
>>>>
>>>> Ideal fix would involve not using a NMI for serving the cpustop while
>>>> having a cheap way (not making the common path too hard) to tell
>>>> hardclock() to avoid scheduling while cpustop is in flight.
>>>>
>>>> Thanks,
>>>> Attilio
>>>
>>> Any idea if a fix is being worked on and how unlucky must one be to
>>> run into this issue, should I expect it to happen again? Is it
>>> basically completely random?
>>
>> I'd like to work on that issue before BETA3 (and backport to
>> STABLE_7), I'm just time-constrained right now.
>> it is completely random.
>>
>> Thanks,
>> Attilio
>
> Ok, this is getting pretty bad, 23 hours later, I get the same kind of
> panic, the only difference is that instead of "portsnap update", this
> was triggered by "portsnap cron" which I have running between 3 and 4
> am every day:
>
> Jul  8 03:03:49 atom kernel: ssppiinn  lloocckk
> 00xx8800bb33eeddc400  ((sscchheedd  lloocck k1 )0 )h
> ehledl db yb y 0x0xfff0f1081735339760e 0( t(itdi d
> 1016070)5 )t otoo ol olnogng
> Jul  8 03:03:49 atom kernel: p
> Jul  8 03:03:49 atom kernel: anic: spin lock held too long
> Jul  8 03:03:49 atom kernel: cpuid = 0
> Jul  8 03:03:49 atom kernel: Uptime: 23h2m38s

I have now tried repeating the problem by running "stress --cpu 8 --io
8 --vm 4 --vm-bytes 1024M --timeout 600s --verbose" which pushed
system load into the 15.50 ballpark and simultaneously running
"portsnap fetch" and "portsnap update" but I couldn't manually trigger
the panic, it seems that this problem is indeed random (although it
baffles me why is it specifically portsnap triggering it). I have now
disabled powerd to check whether that makes any difference to system
stability.

The only other things running on the system are: sshd, ntpd, smartd,
smbd/nmdb and a few instances of irssi in screens.

- Sincerely,
Dan Naumov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

1 2 >

1 - 100 of 109 matches

Mail list logo