Specifying root mount options on diskless boot.

2011-01-01 Thread Peter Jeremy
[I'm not sure if -stable is the best list for this but anyway...]

I'm trying to convert an old laptop running FreeBSD 8.0 into a diskless
client (since its internal HDD is growing bad spots faster than I can
repair them).  I have it pxebooting nicely and running with an NFS root
but it then reports locking problems: devd, syslogd, moused (and maybe
others) lock their PID file to protect against multiple instances.
Unfortunately, these daemons all start before statd/lockd and so the
locking fails and reports "operation not supported".

It's not practical to reorder the startup sequence to make lockd start
early enough (I've tried).

Since the filesystem is reserved for this client, there's no real need
to forward lock requests across the wire and so specifying "nolockd"
would be another solution.  Looking through sys/nfsclient/bootp_subr.c,
DHCP option 130 should allow NFS mount options to be specified (though
it's not clear that the relevant code path is actually followed because
I don't see the associated printf()s anywhere on the console.  After
getting isc-dhcpd to forward this option (made more difficult because
its documentation is incorrect), it still doesn't work.

Understanding all this isn't helped by kenv(8) reporting three different
sets of root filesystem options:
boot.nfsroot.path="/tank/m3"
boot.nfsroot.server="192.168.123.200"
dhcp.option-130="nolockd"
dhcp.root-path="192.168.123.200:/tank/m3"
vfs.root.mountfrom="nfs:server:/tank/m3"
vfs.root.mountfrom.options="rw,tcp,nolockd"

And the console also reports conflicting root definitions:
Trying to mount root from nfs:server:/tank/m3
NFS ROOT: 192.168.123.200:/tank/m3

Working through all these:
boot.nfsroot.* appears to be initialised by sys/boot/i386/libi386/pxe.c
but, whilst nfsclient/nfs_diskless.c can parse boot.nfsroot.options,
there's no code to initialise that kenv name in pxe.c

dhcp.* appears to be initialised by lib/libstand/bootp.c - which does
include code to populate boot.nfsroot.options (using vendor specific
DHCP option 20) but this code is not compiled in.  Further studying
of bootp.c shows that it's possible to initialise arbitrary kenv's
using DHCP options 246-254 - but the DHCPDISCOVER packets do not
request these options so they don't work without special DHCP server
configuration (to forward options that aren't requested).

vfs.root.* is parsed out of /etc/fstab but, other than being
reported in the console message above, it doesn't appear to be
used in this environment (it looks like the root entry can be
commented out of /etc/fstab without problem).

My final solution was to specify 'boot.nfsroot.options="nolockd"' in
loader.conf - and this seems to actually work.

It seems rather unfortunate that FreeBSD has code to allow NFS root
mount options to be specified via DHCP (admittedly in several
incompatible ways) but none actually work.  A quick look at -current
suggests that the situation there remains equally broken.

Has anyone else tried to use any of this?  And would anyone be interested
in trying to make it actually work?

-- 
Peter Jeremy


pgpVVITD1dFyb.pgp
Description: PGP signature


Re: /libexec/ld-elf.so.1: Cannot execute objects on /

2011-01-01 Thread Miroslav Lachman

John Baldwin wrote:

On Saturday, December 25, 2010 6:43:25 am Miroslav Lachman wrote:

John Baldwin wrote:

On Saturday, December 11, 2010 11:51:41 am Miroslav Lachman wrote:

Miroslav Lachman wrote:

Garrett Cooper wrote:

2010/4/20 Miroslav Lachman<000.f...@quip.cz>:

I have large storage partition (/vol0) mounted as noexec and nosuid.
Then
one directory from this partition is mounted by nullfs as "exec and
suid" so
anything on it can be executed.

The directory contains full installation of jail. Jail is running
fine, but
some ports (PHP for example) cannot be compiled inside the jail with
message:

/libexec/ld-elf.so.1: Cannot execute objects on /

The same apply to executing of apxs

r...@rainnew ~/# /usr/local/sbin/apxs -q MPM_NAME
/libexec/ld-elf.so.1: Cannot execute objects on /

apxs:Error: Sorry, no shared object support for Apache.
apxs:Error: available under your platform. Make sure.
apxs:Error: the Apache module mod_so is compiled into.
apxs:Error: your server binary '/usr/local/sbin/httpd'..

(it should return "prefork")

So I think there is some bug in checking the mountpoint options,
where the
check is made on "parent" of the nullfs instead of the nullfs target
mountpoint.

It is on 6.4-RELEASE i386 GENERIC. I did not test it on another release.

This is list of related mount points:

/dev/mirror/gm0s2d on /vol0 (ufs, local, noexec, nosuid, soft-updates)
/vol0/jail/.nullfs/rain on /vol0/jail/rain_new (nullfs, local)
/usr/ports on /vol0/jail/rain_new/usr/ports (nullfs, local)
devfs on /vol0/jail/rain_new/dev (devfs, local)

If I changed /vol0 options to (ufs, local, soft-updates) the above
error is
gone and apxs / compilation works fine.

Can somebody look at this problem?


Can you please provide output from ktrace / truss for the issue?


I did
# ktrace /usr/local/sbin/apxs -q MPM_NAME

The output is here http://freebsd.quip.cz/ld-elf/ktrace.out

Let me know if you need something else.

Thank you for your interest!


The problem is still there in FreeBSD 8.1-RELEASE amd64 GENERIC (and in
7.x).

Can somebody say if this is a bug or an expected "feature"?


I think this is the expected behavior as nullfs is simply re-exposing /vol0
and it shouldn't be able to create a more privileged mount than the underlying
mount I think (e.g. a read/write nullfs mount of a read-only filesystem would
not make the underlying files read/write).  It can be used to provide less
privilege (e.g. a readonly nullfs mount of a read/write filesystem does not
allow writes via the nullfs layer).

That said, I'm not sure exactly where the permission check is failing.
execve() only checks MNT_NOEXEC on the "upper" vnode's mountpoint (i.e. the
nullfs mountpoint) and the VOP_ACCESS(.., V_EXEC) check does not look at
MNT_NOEXEC either.

I do think there might be bugs in that a nullfs mount that specifies noexec or
nosuid might not enforce the noexec or nosuid bits if the underlying mount
point does not have them set (from what I can see).


Thank you for your explanation. Then it is strange, that there is bug,
that allows execution on originally non executable mountpoint.
It should be mentioned in the bugs section of the mount_nullfs man page.

It would be useful, if 'mount' output shows inherited options for nullfs.

If parent is:
/dev/mirror/gm0s2d on /vol0 (ufs, local, noexec, nosuid, soft-updates)

Then nullfs line will be:
/vol0/jail/.nullfs/rain on /vol0/jail/rain_new (nullfs, local, noexec,
nosuid)

instead of just
/vol0/jail/.nullfs/rain on /vol0/jail/rain_new (nullfs, local)


Then I can understand what is expected behavior, but our current state
is half working, if I can execute scripts and binaries, run jail on it,
but can't execute "apxs -q MPM_NAME" and few others.


Hmm, so I was a bit mistaken.  The kernel is not failing to exec the binary.
Instead, rtld is reporting the error here:

static Obj_Entry *
do_load_object(int fd, const char *name, char *path, struct stat *sbp,
   int flags)
{
 Obj_Entry *obj;
 struct statfs fs;

 /*
  * but first, make sure that environment variables haven't been
  * used to circumvent the noexec flag on a filesystem.
  */
 if (dangerous_ld_env) {
 if (fstatfs(fd,&fs) != 0) {
 _rtld_error("Cannot fstatfs \"%s\"", path);
 return NULL;
 }
 if (fs.f_flags&  MNT_NOEXEC) {
 _rtld_error("Cannot execute objects on %s\n", fs.f_mntonname);
 return NULL;
 }
 }

I wonder if the fstatfs is falling down to the original mount rather than
being caught by nullfs.

Hmm, nullfs' statfs method returns the flags for the underlying mount, not
the flags for the nullfs mount.  This is possibly broken, but it is the
behavior nullfs has always had and the behavior it still has on other BSDs.


I am sorry, I am not a programmer, so the code doesn't tell me much.
Does it mean "we must leave it in current state" (for compatibility with 
other BSDs) or can it be fixed in the future?


I can't

Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

2011-01-01 Thread Damien Fleuriot
This is a home machine so I am afraid I won't have backups in place, if
only because I just won't have another machine with as much disk space.


The data is nothing critically important anyway, movies, music mostly.


My objective here is getting more used to ZFS and seeing how performance
gets.

I remember getting rather average performance on v14 but Jean-Yves
reported good performance boosts from upgrading to v15.

Will try this out when the disks arrive :)


Thanks for the pointers guys.

On 12/30/10 6:49 PM, Ronald Klop wrote:
> On Thu, 30 Dec 2010 12:40:00 +0100, Damien Fleuriot  wrote:
> 
>> Hello list,
>>
>>
>>
>> I currently have a ZFS zraid1 with 4x 1.5TB drives.
>> The system is a zfs-only FreeBSD 8.1 with zfs version 14.
>>
>> I am concerned that in the event a drive fails, I won't be able to
>> repair the disks in time before another actually fails.
>>
>>
>>
>>
>> I wish to reinstall the OS on a dedicated drive (possibly SSD, doesn't
>> matter, likely UFS) and dedicate the 1.5tb disks to storage only.
>>
>> I have ordered 5x new drives and would like to create a new zraid2
>> mirrored pool.
>>
>> Then I plan on moving data from pool1 to pool2, removing drives from
>> pool1 and adding them to pool2.
>>
>>
>>
>> My questions are as follows:
>>
>> With a total of 9x 1.5TB drives, should I be using zraid3 instead of
>> zraid2 ? I will not be able to add any more drives so unnecessary parity
>> drives = less storage room.
>>
>> What are the steps for properly removing my drives from the zraid1 pool
>> and inserting them in the zraid2 pool ?
>>
>>
>> Regards,
>>
>>
>> dfl
> 
> Make sure you have spare drives so you can swap in a new one quickly and
> have off-line backups in case disaster strikes. Extra backups are always
> nice. Disks are not the only parts which can break and damage your data.
> 
> Ronald.
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

2011-01-01 Thread Jean-Yves Avenard
On 2 January 2011 02:11, Damien Fleuriot  wrote:

> I remember getting rather average performance on v14 but Jean-Yves
> reported good performance boosts from upgrading to v15.

that was v28 :)

saw no major difference between v14 and v15.

JY
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-01 Thread Attila Nagy

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz



I've used this:
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz
on a server with amd64, 8 G RAM, acting as a file server on 
ftp/http/rsync, the content being read only mounted with nullfs in 
jails, and the daemons use sendfile (ftp and http).


The effects can be seen here:
http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/
the exact moment of the switch can be seen on zfs_mem-week.png, where 
the L2 ARC has been discarded.


What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased 
hard disk load (IOPS graph)


Maybe I could accept the higher system load as normal, because there 
were a lot of things changed between v15 and v28 (but I was hoping if I 
use the same feature set, it will require less CPU), but dropping the 
L2ARC hit rate so radically seems to be a major issue somewhere.
As you can see from the memory stats, I have enough kernel memory to 
hold the L2 headers, so the L2 devices got filled up to their maximum 
capacity.


Any ideas on what could cause these? I haven't upgraded the pool version 
and nothing was changed in the pool or in the file system.


Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-01 Thread Artem Belevich
On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy  wrote:
> What I see:
> - increased CPU load
> - decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
> hard disk load (IOPS graph)
>
...
> Any ideas on what could cause these? I haven't upgraded the pool version and
> nothing was changed in the pool or in the file system.


The fact that L2 ARC is full does not mean that it contains the right
data.  Initial L2ARC warm up happens at a much higher rate than the
rate L2ARC is updated after it's been filled initially. Even
accelerated warm-up took almost a day in your case. In order for L2ARC
to warm up properly you may have to wait quite a bit longer. My guess
is that it should slowly improve over the next few days as data goes
through L2ARC and those bits that are hit more often take residence
there. The larger your data set, the longer it will take for L2ARC to
catch the right data.

Do you have similar graphs from pre-patch system just after reboot? I
suspect that it may show similarly abysmal L2ARC hit rates initially,
too.

--Artem
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-01 Thread Attila Nagy

 On 01/01/2011 08:09 PM, Artem Belevich wrote:

On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy  wrote:

What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
hard disk load (IOPS graph)


...

Any ideas on what could cause these? I haven't upgraded the pool version and
nothing was changed in the pool or in the file system.

The fact that L2 ARC is full does not mean that it contains the right
data.  Initial L2ARC warm up happens at a much higher rate than the
rate L2ARC is updated after it's been filled initially. Even
accelerated warm-up took almost a day in your case. In order for L2ARC
to warm up properly you may have to wait quite a bit longer. My guess
is that it should slowly improve over the next few days as data goes
through L2ARC and those bits that are hit more often take residence
there. The larger your data set, the longer it will take for L2ARC to
catch the right data.

Do you have similar graphs from pre-patch system just after reboot? I
suspect that it may show similarly abysmal L2ARC hit rates initially,
too.


Sadly no, but I remember that I've seen increasing hit rates as the 
cache grew, that's what I wrote the email after one and a half days.

Currently it's at the same level, when it was right after the reboot...

We'll see after few days.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: bge driver regression in 7.4-PRERELEASE, Tyan S4881

2011-01-01 Thread Michael L. Squires



On Thu, 30 Dec 2010, Jeremy Chadwick wrote:


Please provide output from the following command, as root:

pciconf -lbvc

And only include the bge1 and bge0 devices in your output.  Thanks.



This is the output, as root, using the kernel with the 10/7/2010 bge code
(which works for me).  I can provide the code with the 7.4-PRERELEASE
kernel if you want that.

OS is compiled as amd64.

b...@pci0:17:2:0:   class=0x02 card=0x164814e4 chip=0x164814e4 rev=0x03
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet
bar   [10] = type Memory, range 64, base 0xd011, size 65536, enabled
bar   [18] = type Memory, range 64, base 0xd010, size 65536, enabled
cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split transact
ion
cap 01[48] = powerspec 2  supports D0 D3  current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit
b...@pci0:17:2:1:   class=0x02 card=0x164814e4 chip=0x164814e4 rev=0x03
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet
bar   [10] = type Memory, range 64, base 0xd013, size 65536, enabled
bar   [18] = type Memory, range 64, base 0xd012, size 65536, enabled
cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split transact
ion
cap 01[48] = powerspec 2  supports D0 D3  current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit

This is a hobby system supporting a home server,
so it's not "mission-critical" and my current hack is working properly.

Thanks to both of you for your assistance.

Mike Squires
mi...@siralan.org
UN*X at home
since 1986
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


tmpfs runs out of space on 8.2pre-release, zfs related?

2011-01-01 Thread miyamoto moesasji
In setting up tmpfs (so not tmpmfs) on a machine that is using
zfs(v15, zfs v4) on 8.2prerelease I run out of space on the tmpfs when
copying a file of ~4.6 GB file from the zfs-filesystem to the memory
disk. This machine has 8GB of memory backed by swap on the harddisk,
so I expected the file to copy to memory without problems.

Below in detail what happens: upon rebooting the machine the tmpfs has
8GB available as can be seen below:
---
h...@pulsarx4:~/ > df -hi /tmp

FilesystemSizeUsed   Avail Capacity iused ifree %iused  Mounted on

tmpfs 8.2G 12K8.2G 0%  19   39M0%   /tmp

---

Subsequently copying a ~4.6GB file from a location in the zfs-pool to
the memory filesystem fails with no more space left message
---
h...@pulsarx4:~/ > cp ~/temp/large.iso /tmp/large_file

cp: /tmp/large_file: No space left on device

---

After this the tmpfs has shrunk to just 2.7G, obviously much less than
the 8.2G available before the copy-operation. At the same time there
are still free inodes left, so that does not appear to be the problem.
Output of the df after the copy:
---
h...@pulsarx4:~/ > df -hi /tmp

FilesystemSizeUsed   Avail Capacity iused ifree %iused  Mounted on

tmpfs 2.7G2.7G1.4M   100%  19  6.4k0%   /tmp

---

A quick search shows the following bug-report for solaris:
http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=e4ae9c32983000ef651e38edbba1?bug_id=6804661This
appears closely related as here I also try to copy a file >50% of
memory to the tmpfs and the way to reproduce appears identical to what
I did here.

As it might help spot the problem: below the information on the zfs
ARC size obtained from the output of zfs-stats. This gives:

Before the copy:
---
System Memory Statistics:

Physical Memory:8161.74M

Kernel Memory:  511.64M

DATA:   94.27%  482.31M

TEXT:   5.73%   29.33M


ARC Size:

Current Size (arcsize): 5.88%   404.38M

Target Size (Adaptive, c):  100.00% 6874.44M

Min Size (Hard Limit, c_min):   12.50%  859.31M

Max Size (High Water, c_max):   ~8:16874.44M

---

After the copy:

---
System Memory Statistics:
Physical Memory:8161.74M
Kernel Memory:  3326.98M
DATA:   99.12%  3297.65M
TEXT:   0.88%   29.33M

ARC Size:
Current Size (arcsize): 46.99%  3230.55M
Target Size (Adaptive, c):  100.00% 6874.44M
Min Size (Hard Limit, c_min):   12.50%  859.31M
Max Size (High Water, c_max):   ~8:16874.44M
---

Unfortunately I have difficulties interpreting this further than this,
so suggestions how to prevent this behavior (or further trouble shoot
this) would be appreciated as my feeling is that this should not
happen.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-01 Thread J. Hellenthal
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/01/2011 13:18, Attila Nagy wrote:
>  On 12/16/2010 01:44 PM, Martin Matuska wrote:
>> Link to the patch:
>>
>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
>>
>>
>>
> I've used this:
> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz
> 
> on a server with amd64, 8 G RAM, acting as a file server on
> ftp/http/rsync, the content being read only mounted with nullfs in
> jails, and the daemons use sendfile (ftp and http).
> 
> The effects can be seen here:
> http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/
> the exact moment of the switch can be seen on zfs_mem-week.png, where
> the L2 ARC has been discarded.
> 
> What I see:
> - increased CPU load
> - decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
> hard disk load (IOPS graph)
> 
> Maybe I could accept the higher system load as normal, because there
> were a lot of things changed between v15 and v28 (but I was hoping if I
> use the same feature set, it will require less CPU), but dropping the
> L2ARC hit rate so radically seems to be a major issue somewhere.
> As you can see from the memory stats, I have enough kernel memory to
> hold the L2 headers, so the L2 devices got filled up to their maximum
> capacity.
> 
> Any ideas on what could cause these? I haven't upgraded the pool version
> and nothing was changed in the pool or in the file system.
> 

Running arc_summary.pl[1] -p4 should print a summary about your l2arc
and you should also notice in that section that there is a high number
of "SPA Mismatch" mine usually grew to around 172k before I would notice
a crash and I could reliably trigger this while in scrub.

What ever is causing this needs desperate attention!

I emailed mm@ privately off-list when I noticed this going on but have
not received any feedback as of yet.

[1] http://bit.ly/fdRiYT

- -- 

Regards,

 jhell,v
 JJH48-ARIN
-BEGIN PGP SIGNATURE-

iQEcBAEBAgAGBQJNH/msAAoJEJBXh4mJ2FR+bFYH/0bBJbLYU5zzbqpUUXX1M/B9
+g8RwQ9Tek4/fxwpD8DNIfkpzO0MvUcx5Nhwld69jk7sSys9IUpYhuYVggcOgavx
sl6AwNNUG0XD/spO2RvV3jD4tVbR6TjlSdLCyBG7iPFU2nNB6wZM+UfWxGYwEyUE
loOr13Vk4eU2l2cepUwJH0oGu2hsDZ7qR/fTd+d33NfS6/PT43vbCjPNTsnDJeY9
MdeC5vBUPl3AW3iC/5hxBi9WABGMHeAXTolpAtBQVBNi22mINacYFO6FEdfANy9E
Xo207Cd6vBmZb8aTs0BFHs5ZdTHUco/iNysaWvzx9TlIWlyyBRgOXgtBweB+6d4=
=lcxW
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"