Re: libkvm: consumers of kvm_getprocs for non-live kernels?

2010-11-11 Thread Robert Watson

On Wed, 10 Nov 2010, Ulrich Spörlein wrote:

I have this cleanup of libkvm sitting in my tree and it needs a little bit 
of testing, especially the function kvm_proclist, which is only called from 
kvm_deadprocs which is only called from kvm_getprocs when kd is not ALIVE.


The only consumer in our tree that I can make out is *probably* kgdb, as 
ps(1), top(1), w(1), pkill(1), fstat(1), systat(1), pmcstat(8) and bsnmpd 
don't really work on coredumps


But, the kgdb file gnu/usr.bin/binutils/gdb/kvm-fbsd.c, where kvm_getprocs 
is probably called on a dead kernel is not even used during build!


So I guess I'm staring at dead code here, any kvm people around that can 
clue me in?


Even if those tools aren't using kvm properly, they should be.  ps(1) at least 
used to work quite well on coredumps, and perhaps still does?


Stas has ongoing work on a libprocstat, you might want to give him a ping. 
I'm not sure if he plans to refactor some of those existing tools to use that 
library or not, but crashdump support is a key goal of it.


Robert___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: Creating an LVM-backed FreeBSD DomU in a Linux Dom0

2010-12-28 Thread Robert Watson


On Tue, 28 Dec 2010, Avleen Vig wrote:

After searching high and low and not finding exactly what I wanted (although 
Adrian Chadd's documents came close), I decided to document a lengthy but 
worthwhile procedure:


How to install a FreeBSD DomU guest in a Linux Dom0 Xen host, from scratch, 
with LVM-backed storage (rather than file based), and without the need to 
rely on random kernels and ISO[1]


http://bit.ly/dVhfFe

Hopefully people find it useful :-)


FYI, we now have a xen(4) man page, which will ship in 8.2.  It's not tutorial 
material like your document, but is useful reference material.  I'd like it 
very much if we could get something more along the lines of what you've 
created into the FreeBSD Handbook.


Robert




I haven't yet broached configuring inside the Xen host. Again there is
scattered documentation available. I'll try to bring it together next.

[1] I gave serious thought to uploading my own stuff along with the
other similar things available already, but in the end I thought it
better if people try out how to do it, given that the amount of work
will be almost the same, or even slightly less building it yourself.
Plus there are the usual security and availability concerns.. :)

--
Avleen Vig
Systems Administrator
Personal: www.silverwraith.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Why does printf(9) hang network?

2011-02-05 Thread Robert Watson


On Sat, 5 Feb 2011, dieter...@engineer.com wrote:

Why would doing a printf(9) in a device driver (usb, firewire, probably 
others) cause an obscenely long lockout on 
/usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx)  ?


Printf(9) alone isn't the problem, adding printfs to chown(2) does not cause 
the problem, but printfs from device drivers do.


Grep says that uipc_sockbuf.c is the only file that locks/unlocks sb_sx. The 
device drivers and printf don't even know that sb_sx exists.


I can't speak to the details of your situation, but one possible explanation 
might be: printf runs at the speed of the console, which for serious consoles 
can be extremely slowly.  Device driver interrupt threads can preempt other 
threads, possibly while those threads hold locks.  That causes them to hold 
the locks for much longer, as the threads may not get rescheduled for some 
period (for example, until the device driver is done doing a printf), leading 
other threads waiting for that lock to wait significantly longer.  Especially 
the case if the other thread was spinning adaptively, in which case it will 
then yield since the holder of the lock effectively yielded.


You might try forcing all the various threads to run on different CPUs using 
cpuset and see if the variance goes down.  You can also use KTR + schedgraph 
to explore the specific scheduling going on, although be aware that KTR 
can also noticeably perturb schediling itself.


In general, things shouldn't call kernel printf in steady state operation; if 
they need to log something, they should use log(9) or similar.  printf is 
primarily a tool for printing out device probe information, and for debugging 
purposes: it is not intended to be fast.


Robert



135  int
136  sblock(struct sockbuf *sb, int flags)
137  {
138
139  KASSERT((flags & SBL_VALID) == flags,
140  ("sblock: flags invalid (0x%x)", flags));
141
142  if (flags & SBL_WAIT) {
143  if ((sb->sb_flags & SB_NOINTR) ||
144  (flags & SBL_NOINTR)) {
145  sx_xlock(&sb->sb_sx);
146  return (0);
147  }
148  return (sx_xlock_sig(&sb->sb_sx));
149  } else {
150  if (sx_try_xlock(&sb->sb_sx) == 0)
151  return (EWOULDBLOCK);
152  return (0);
153  }
154  }

More info at: http://www.freebsd.org/cgi/query-pr.cgi?pr=118093


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Analyzing wired memory?

2011-02-08 Thread Robert Watson


On Tue, 8 Feb 2011, Alan Cox wrote:


On Tue, Feb 8, 2011 at 6:20 AM, Ivan Voras  wrote:

Is it possible to track by some way what kernel system, process or thread 
has wired memory? (including "data exists but needs code to extract it")



No.


I'd like to analyze a system where there is a lot of memory wired but not 
accounted for in the output of vmstat -m and vmstat -z. There are no user 
processes which would lock memory themselves.


Any pointers?


Have you accounted for the buffer cache?


John and I have occasionally talked about making procstat -v work on the 
kernel; conceivably it could also export a wired page count for mappings where 
it makes sense.  Ideally procstat would drill in a bit and allow you to see 
things at least at the granularty of "this page range was allocated to UMA".


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: CFR: FEATURE macros for AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/PMC/SYSV/...

2011-02-11 Thread Robert Watson


On Fri, 11 Feb 2011, Alexander Leidinger wrote:

during the last GSoC various FEATURE macros where added to the system. 
Before committing them, I would like to get some review (like if macro is in 
the correct file, and for those FEATURES where the description was not taken 
from NOTES if the description is OK).


If nobody complains, I would like to commit this in 1-2 weeks. If you need 
more time to review, just tell me.


Here is the list of affected files (for those impatient ones which do not 
want to look at the attached patch before noticing that they are not 
interested to look at it):


The additions for security/audit and security/mac both seem reasonable; I've 
been meaning to add them myself for quite a bit.  There's then some code in 
libc that can learn to use this as well, at least for MAC.


The one comment I'd make is that the MAC case should indicate that "The MAC 
Framework" is supported, rather than mandatory access controls being present 
-- the presence of the framework doesn't imply the presence of mandatory 
access control policies.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: CFR: FEATURE macros for AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/PMC/SYSV/...

2011-02-11 Thread Robert Watson

On Fri, 11 Feb 2011, Ilya Bakulin wrote:

When I was beginning this GSoC work, I primarily thought about unifying the 
way to determine if particular feature exists in the kernel. Of course there 
should be at least one way to check if the feature is available or not (by 
definition: if I may use some functionality, than feature is present, 
otherwise... Oh, no, may be I have no permissions to use it? or something is 
terribly wrong with system confuguration? Or?...), but it is better to have 
a sort of unified way to get this information without looking for files in 
/dev, parsing `kldstat -v`, etc.


One of the nice things about this is that when a conditionally compiled 
feature introduces a new system call, there can be forward (rather than 
backward) compatibility benefits.  If login(1) had checked for the Audit 
feature before trying audit system calls when we introduced it in 6.x, it 
would have avoided a few people shooting their feet off in the (officially 
unsupported) case where following a kernel and userspace roll-forward, a 
kernel roll-back was required to restore stability.  While we don't support it 
(you shouldn't run a new userspace with an old kernel), the failure mode would 
have been improved.


More abstractly: for a feature like MAC, testing for the presence of the 
framework is functionally fairly different from exercise the feature, as most 
instances of exercising it work only based on modules loaded by the framework, 
which is a different goal.  Right now, libc offers a mac_present API, which 
back-ends into manually testing a system call.  I'd rather it backended into a 
common feature test framework.


In many cases, it is of course desirable to test for a feature by using it -- 
a much more pragmatic approach, and generally one preferred in the world of 
autoconf, etc...


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: ixgbe DMA question

2011-02-11 Thread Robert Watson


On Fri, 11 Feb 2011, Santosh Rao Gururajan wrote:

I have a host machine with 2 ixgbe NICs. I am trying to pass the frames from 
one NIC to the other with the lowest possible overhead to the host (high 
speed bridge). I am wondering if I can do a rx-ring to tx-ring DMA copy 
without creating a mbuf on the host. Is that possible? What are the risks?


The only real risk is the simple matter of programming, I think.  There's no 
reason not to it except that it involves modifying device drivers, memory 
models, etc.  If you do what you describe, and you decide you do want to pass 
some frames up the stack, you can always hook up mbufs and use the external 
storage free routine to return the memory to the ring.  Jeff Roberson has been 
circulating some patches that eliminate the mbuf<->cluster relationship in its 
current form, instead preferring variable size mbufs, and I can't help but 
wonder if with such a patch, that wouldn't be simpler than what you propose, 
offering many of the same performance benefits while making the device driver 
changes smaller and still allowing you to direct some packets up the stack if 
desired.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: CFR: FEATURE macros for AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/PMC/SYSV/...

2011-02-12 Thread Robert Watson


On Sat, 12 Feb 2011, Alexander Leidinger wrote:


On Sat, 12 Feb 2011 00:52:48 + (GMT) Robert Watson
 wrote:

The one comment I'd make is that the MAC case should indicate that "The MAC 
Framework" is supported, rather than mandatory access controls being 
present -- the presence of the framework doesn't imply the presence of 
mandatory access control policies.


Does
FEATURE(mac, "Mandatory Access Control Framework support");
look better?

Alternatively/additionally we could use mac_framework as the name of the 
feature.


The above seems fine -- while I've been moving to names like mac_framework.h, 
it's still "options MAC" and "security/mac", etc, and think that "mac" is the 
most consistent options.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Prebind from OpenBSD

2011-03-27 Thread Robert Watson

On Sat, 26 Mar 2011, Jesse Smith wrote:

I'm interested in working on the "Port prebind from OpenBSD" project 
mentioned on the FreeBSD Ideas page. ( 
http://wiki.freebsd.org/IdeasPage#head-d28cdd95ca1755d5afe63d653cb4926d4bdc99de 
)


There isn't much to go on from the project description and I'm curious what 
FreeBSD devs are looking for specifically. For example, should the entire 
ldconfig program be ported from OpenBSD (it looks like it's close enough to 
FreeBSD's to make that suitable), or should just the prebind code be merged 
into FreeBSD's ldcnfig?


Once the project is complete, who should the work be submitted to? Has 
anyone else worked on this and made any progress?


Hi Jesse:

I think the intent of the ideas list entry is more a research project than a 
direct-to-commit project: the question is whether prebinding of some sort 
would observably help performance for important FreeBSD applications or, for 
example, the boot process.  If so, then certainly the OpenBSD prebinding code 
is a possible model -- Mac OS X also has prebinding, of course, and it's done 
quite differently (and probably less reusably from our perspective as they use 
Mach-O rather than ELF); however, there might be interesting ideas as well.


I think therefore I'd structure a project along the following lines: first, 
you want to establish to what extent synchronous waiting on linkage at 
run-time is a significant problem.  It could be that some combination of hwpmc 
and DTrace would be the right tools for this.  I'd especially pay attention to 
boot time, since we know that quite a lot of executing takes place then as 
part of rc.d.  I'd also investigate large applications like Firefox, Chrome, 
KDE, Gnome, etc.  KDE already integrates prebinding tricks in its design, but 
I don't think the others do.


Next, I'd dig a bit more into the areas where it's hurting performance -- can 
you add up all the time spent waiting and cut 10 seconds from boot, or 5 
seconds from Firefox startup?  Or is the best win going to be .2 seconds in 
Firefox?  Does the OpenBSD optimisation actually address the problem we're 
experiencing?  Perhaps perform some experiments with prebinding-like 
behaviour, working up to an implementation.


It's worth remembering that prebinding comes with some baggage as well, of 
course.  Perhaps less relevant in the world of 64-bit address spaces, but 
there are some design trade-offs in this department...


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: GSoC

2011-03-27 Thread Robert Watson


On Fri, 25 Mar 2011, Dudinskyi Olexandr wrote:

My name is Dudinskyi Oleksandr. I am a student of National aviation 
university, Ukraine. I want to participate in GSoC 2011 with your 
organization.


My project: Disk device error counters, iostat –e.

I thing this project is very necessary in the FreeBSD system.  Now I make a 
plan to develop this project.


What you can say about the idea of ​​my project?  And what about the favor of 
this project?


My mentor: Andriy Gapon.


Hi Dudinskyi:

It's a little hard to tell from your description exactly what it is you are 
proposing to do.  Could you flesh out the idea some for us, so that we can 
give you feedback?  What is the nature of the problem you want to solve? 
What software changes do you anticipate making?  How will you test your 
changes?


Robert___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: GSoC

2011-04-02 Thread Robert Watson


On Fri, 1 Apr 2011, Oleksandr Dudinskyi wrote:

I should like more specifically disclose my plan of action. One of the main 
tasks is find the places where registered errors, subsequently error 
analysis (their type) and separation errors related to disk and modifying 
the output format. There are different types of errors such as soft, hard, 
transport, device not ready, recoverable and other. Currently, presence the 
problem of reports and the majority error logs built as an individual files. 
Necessary changes in the kernel, which provide the emergence a database that 
processes information from several sources. The current kernel can't report 
what specific operations were errors, this further compounds the consistency 
problem. Reports of drivers errors requires a change. Systematization format 
recording of errors also is a priority,that we get and where the error 
occurred.


Hi Oleksandr:

This sounds like a potentially interesting project, but it remains a bit 
abstract to me, which makes me worry about it as a GSoC project.  Strong 
proposals typically have a well-defined and easily characterised objective 
(1-2 sentences), and 3-4 intermediate deliverables.  I worry that what you've 
described may be a bit too researchy for a summer project, but I'm willing to 
be convinced otherwise!  Could you flesh out in a bit more detail how what you 
have in mind would work: are there new daemons? system calls? will you reuse 
existing logging or error-handling infrastructure? what is the namespace for 
errors? how will it affect current operations?  We don't need perfect answers 
to these questions yet, but a slightly more worked out example might help 
resolve my concerns.


Thanks!

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Include file search path

2011-04-02 Thread Robert Watson

On Wed, 30 Mar 2011, Warner Losh wrote:


On Mar 30, 2011, at 9:23 AM, Dimitry Andric wrote:
This is a rather nasty hack, though.  If we can make it work, we should 
probably try using --sysroot instead, or alternatively, -nostdinc and 
adding include dirs by hand.  The same for executable and library search 
paths, although I am not sure if there is a way to completely reset those 
with the current options.


I'm pretty sure that the origins of this hack pre-dates the -sysroot feature 
in gcc.  It works in -current and has for years, so nobody has cared enough 
to even contemplate changing it.


If you can make the sysroot feature work, that would be great, since that 
would allow us to skip the compiler building phase if we were building using 
external compilers.  I have some patches to make that work, but this very 
problem is what I'd worked my way up to.  It works well if you are building 
current on current, but not so well if you are mixing versions (you can mix 
architectures if you are using the xdev feature I put in a while ago, but 
even that has one or two niggles I need to iron out).


Count me as another eager consumer awaiting a nice answer to the general 
cross-compile problem.  I'm really looking for three things:


(1) A bit more intelligence from our build framework regarding not rebuilding
the toolchain quite so many times!  I'd like to be able to do a buildworld
with TARGET_ARCH with significantly improved performance.  Perhaps we can
do this already, in which case a pointer considered welcome.

(2) Working clang/LLVM cross-compile of FreeBSD.  This seems like a basic
requirement to adopt clang/LLVM, and as far as I'm aware that's not yet a
resolved issue?

(3) Making it easy to plug in, first, an external gcc easily, and second, an
external clang/LLVM.  One worrying point for me on the last one is that we
can't yet build the whole kernel with clang/LLVM, at least for i386/amd64,
so I guess you need both external gcc *and* external clang/LLVM?

We (Cambridge) are currently bringing up FreeBSD on a new soft-core 64-bit 
MIPS platform.  We're already using a non-base gcc for our boot loader work, 
and plan to move to using clang/LLVM later in the year.  The base system seems 
a bit short on detail when it comes to the above, currently.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Mount_nfs question

2011-05-31 Thread Robert Watson


On Mon, 30 May 2011, Mark Saad wrote:


 So I am stumped on this one.  I want to know what the IP of each
nfs server that is providing each nfs export. I am running 7.4-RELEASE
When I run "mount -t nfs" I see something like this

VIP-01:/export/source on /mnt/src
VIP-02:/export/target   on /mnt/target
VIP-01:/export/logs on /mnt/logs
VIP-02:/export/package   on /mnt/pkg

The issue is I use a load balanced nfs server , from isilon. So VIP-01 could 
be any one of a group of IPs . I am trying to track down a network 
congestion issue and I cant find a way to match the output of lsof , and 
netstat to the output of mount -t nfs . Does anyone have any ideas how I 
could track this down , is there a way to run mount and have it show the IP 
and not the name of the source server ?


Unfortunately, there's not a good answer to this question.  nfsstat(1) should 
have a mode that can iterate down active mount points displaying statistics 
and connection information for each, but doesn't.  NFS sockets generally don't 
appear in sockstat(1) either.  However, they should appear in netstat(1), so 
you can at least identify the sockets open to various NFS server IP addresses 
(especially if they are TCP mounts).


Enhancing nfsstat(1) to display more detailed information would, I think, be a 
very useful task for someone to get up to (and perhaps should appear on our 
ideas list).  Something that would be nice to have, in support of this, is a 
way for file systems to provide extended status via a system call that queries 
mountpoints, both "portable" information that spans file systems, and file 
system-specific data.  Morally, similar to nmount(2) but for statistics rather 
than setting things.  The "easier" route is to add new sysctls that dump 
per-mountpoint state directly from NFS, but given how much other information 
we'd like to export, it would be great to have a more general mechanism.


(The more adventurous can, with a fairly high degree of safety, use kgdb on 
/dev/mem (read-only) to walk the NFS stack's mount tables, but that's not much 
fun.)


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: compiler warnings (was: Re: [rfc] a few kern.mk and bsd.sys.mk related changes)

2011-05-31 Thread Robert Watson


On Tue, 31 May 2011, Alexander Best wrote:


On Mon May 30 11, Dieter BSD wrote:

Chris writes:

Ports need attention. The warnings I get there are frightening.


I find it comforting that they're just that: warnings.

How do they frighten you?


High quality code does not have any warnings.

The most frightening thing is the attitute that "They're just warnings, so 
I'll ignore them."  Most compiler warnings should be fatal errors. And a 
lot of the warnings that require a -Wwhatever should be on by default.


please keep in mind that -Wfoo does reflect the ideas of the GNU people 
regarding *proper* code. the warnings themselves are sometimes wrong, 
because they complain about perfectly correct code. so -Wfoo should not be 
considered a code verifier, but in fact what it is: a warning flag. 
sometimes it's correct and indeed reports wrong code, sometimes it is 
completely wrong.


And, it's also worth remembering that warnings change over time, as the 
compiler changes.  One of the known issues building with clang is that large 
quantities of "warning-free code" under gcc are in fact rife with warnings 
under clang, including the gcc source code itself.  In general, my hope is 
that we can get the FreeBSD base warning-free for a useful set of warnings, 
and on the whole, this is the case.  Pretty much the entire kernel is compiled 
with quite a large number of warning classes enabled, and -Werror set, for 
example.


(One of the other tensions, of course, is the locally maintained vs externally 
maintained tension: fixing warnings in other people's code is useful only if 
you can get them to accept the fixes back -- maintaining large numbers of 
patch sets over time is not sustainable for non-trivial quantifies of code, if 
you're tracking the upstream vendor.  Ports is the worst possible case, where 
maintaining local patches is quite expensive.  In the FreeBSD base we can do 
a lot better, since we can use revision control and automatic merging to help 
us, but it's still an overhead that has to be reasoned about carefully.)


Robert___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: sizeof(function pointer)

2011-06-05 Thread Robert Watson


On Tue, 31 May 2011, m...@freebsd.org wrote:

I am looking into potentially MFC'ing r212367 and related, that adds drains 
to sbufs.  The reason for MFC is that several pieces of new code in CURRENT 
are using the drain functionality and it would make MFCing those changes 
much easier.


The problem is that r212367 added a pointer to a drain function in the sbuf 
(it replaced a pointer to void).  The C standard doesn't guarantee that a 
void * and a function pointer have the same size, though its true on amd64, 
i386 and I believe PPC.  What I'm wondering is, though not guaranteed by the 
standard, is it *practically* true that sizeof(void *) == 
sizeof(int(*)(void)), such that an MFC won't break binary compatibility for 
any supported architecture?  (The standard does guarantee, though not in 
words, that all function pointers have the same size, since it guarantees 
that pointers to functions can be cast to other pointers to functions and 
back without changing the value).


I think you're OK for MFC purposes, but that in general, we shouldn't assume 
that they are the same size.  I.e., we should use a function pointer type 
where we mean a function pointer type, and never write code that casts a 
function pointer to a regular pointer.  (Which the change is fine with respect 
to, I believe).


I'm doing some research on an experimental architecture where certain types of 
function pointers are 256-bit.  This has some interesting consequences; we 
haven't yet gotten to investigating C language extensions/compatibility, but 
that will follow in the next year or so.  (We also have 256-bit data 
references, similar to pointers, for use in some environments, which will also 
prove interesting.  I'm not yet convinced we'll try to use a general pointer 
type for them, but perhaps instead extend the language to have a qualified 
type of some sort).


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD I/OAT (QuickData now?) driver

2011-06-11 Thread Robert Watson


On Mon, 6 Jun 2011, grarpamp wrote:

I know we've got polling. And probably MSI-X in a couple drivers. Pretty 
sure there is still one CPU doing the interrupt work? And none of the 
multiple queue thread spreading tech exists?


Actually, with most recent 10gbps cards, and even 1gbps cards, we process 
inbound data with as many CPUs as the hardware has MSI-X enabled input and 
output queues.  So "a couple" understates things significantly.



   * Through PF_RING, expose the RX queues to the userland so that
the application can spawn one thread per queue hence avoid using
semaphores at all.


I'm probably a bit out of date, but last I checked, PF_RING still implied 
copying, albeit into shared memory buffers.  We support shared memory between 
the kernel and userspace for BPF and have done for quite a while.  However, 
right now a single shared memory buffer is shared for all receive queues on a 
NIC.  We have a Google summer of code student working on this actively right 
now -- my hope is that by the end of the summer we'll have a pretty functional 
system that allows different shared memory buffers to be used for different 
input queues.  In particular, applications will be able to query the set of 
queues available, detect CPU affinity for them, and bind particular shared 
memory rings to particular queues.  It's worth observing that for many types 
of high-performance analysis, BPF's packet filtering and truncation support is 
quite helpful, and if you're going to use multiple hardware threads per input 
queue anyway, you actually get a nice split this way (as long as those threads 
share L2 caches).


Luigi's work on mapping receive rings straight into userspace looks quite 
interesting, but I'm pretty behind currently, so haven't had a chance to read 
his NetMap paper.  The direct mapping of rings approach is what a number of 
high-performance FreeBSD shops have been doing for a while, but none had 
generalised it sufficiently to merge into our base stack.  I hope to see this 
happen in the next year.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: priv_check() question

2011-07-05 Thread Robert Watson


On Sun, 3 Jul 2011, exorcistkiller wrote:

Hi! I am taking a FreeBSD course this summer and I'm doing a homework. A new 
system call uidkill() is to be added. uidkill(uid_t uid, int signum) sends 
signal specified by signum to all processes owned by uid, excluding the 
calling process itself.


I'm almost done, however I get stuck with priv_check(). If the calling 
process is trying to send signal to processes owned by others, permission 
should be denied. My implementation simply uses an if (p->p_ucred->cr_uid == 
ksi.ksi_uid) to deny it, however priv_check() is required. My question is: 
what privilege a process should have to send signal to processes owned by 
others? PRIV_SIGNAL_DIFFCRED?


The right way to think about "privileges" in FreeBSD is that they exempt 
subjects (usually processes) from normal access control rules -- typically as 
a result of a root uid.  The access control rules for signalling are captured 
by p_cansignal() and cr_cansignal(), depending on whether the "subject" is a 
process or a cached credential.  Processes have access to slightly greater 
rights than raw credentials due to additional context -- for example, 
information about parent-child relationships.  These functions then invoke 
further privilege checks if required, perhaps overriding the normal 
requirement that uids match, etc.  kill() implements a couple of broadcast 
modes for signals -- you may want to look at the implementation there to see 
how this is done.


Robert

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Add setacl system call?

2011-07-25 Thread Robert Watson


On Sun, 24 Jul 2011, exorcistkiller wrote:

Hi, I'm working on a course project in which I need to add 3 system calls. 
One of which is setacl(char *name, int type, int idnum, int perms), which 
set acl for a file specified by name. I used newfs as in 
ftp://ftp.tw.freebsd.org/pub/FreeBSD/FreeBSD-current/src/sbin/newfs/ to make 
this new filesystem, named myfs (which really is UFS2) and mounted it.


My question is:
1) where to start with?
2) Is this filesystem actually a userland UFS and I can use functions in
libufs(3)?
3) What about functions in ufs_acl.c? Should the acls be stored on the
extended attributes blocks? Does FreeBSD 8.2 support it?

I know I'm asking stupid questions, but a small hint might help me a lot. 
Thank you so much..


Hi... er.. exorcistkiller...  (*)

This being FreeBSD, you may want to start with the existing programmer 
documentation, which should prove quite useful given your goals.  Try acl(3) 
for userspace, and acl(9) for the kernel.


You are doing this in the context of a course, so the constraints may be 
somewhat artificial.  However, normally my advice to someone wanting to add a 
new ACL implementation to FreeBSD would be to start with our existing 
implementation, which supports both POSIX.1e and NFSv4 ACLs (and is extensible 
to new ACL types without changing the current APIs (much)).  For example, if I 
were going to teach our native system call API about AFS ACLs, I'd start by 
perusing the above man pages and code, including:


  src/bin/*acl* # Commands for manipulating ACLs
  src/lib/libc/posix1e  # Library routines
  src/sys/kern/*acl*# File system-independent code
  src/sys/sys/acl.h # File system-independent header

As you've already found, ufs_acl.c contains the implementation for UFS; ZFS, 
NFS, etc, have similar-looking files with markedly different contents.  In 
general, if something looks file system-independent, we try to put it in the 
centralised files in kern, rather than replicate the code across file systems. 
Roughly half the code in the kern directory has to do with calls *into* the 
file system, and the other half is a library of routines called *by* the file 
system.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Finding symlink information in MAC Framework

2011-07-25 Thread Robert Watson


On Fri, 15 Jul 2011, s wrote:

I am trying to get some information related to the symlink which is being 
accessed by the user in MAC Framework. Currently I managed to get the 
uid/gid of the owner of the symlink that is being read, but now I need to 
get the same information about the target, that the symlink points to.


static int samplemac_vnode_check_link (struct ucred *cred, struct vnode *vp,
   struct label *vplabel)
{

int error;
struct vattr vap;

error = VOP_GETATTR(vp, &vap, cred);
if (error)
return (1);

if(vap.va_uid != 0) {
		log(LOG_NOTICE, "stub_vnode_check_readlink: %i, gid: %i\n", 
vap.va_uid, vap.va_gid);

return (0);
}

return (0);
}

And I have no idea how could I do that. Where should I look for that info? 
And what way would be the fastest?


Hi Jakub:

Could you say a bit more about what you're trying to accomplish?  The reason 
it's hard to express what you're trying to do (inspect the target of a symlink 
during a read of the symlink) is that it's not really a coherent concept in 
terms of kernel implementation.  At the point where the access control check 
on readlink is occuring, the string hasn't yet been read from the link, and 
even if it had, you couldn't look up the target object as you're already 
holding locks relating to lookup and read of the symlink itself.  Even if you 
could, there's also a risk of recursion: the symlink could point straight back 
to where you are, etc.  The readlink check is mid-lookup and triggering an 
entirely fresh lookup from there might be quite awkward for a number of such 
reasons.


In general, however, this is not an issue for the policies we've encountered 
thus far: they almost all care only about authorising path segment lookups (in 
which case readlink is just another segment in evaluation), or absolute paths 
to objects reconstructed during the actual operation on the target object, 
etc.  Hence my wondering what you're trying to accomplish -- the first 
question, really, is "is what you're trying to express actually safely 
expressible in a fine-grained, multiprocessing kernel?"


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Issue with 'Unknown Error: -512'

2011-07-25 Thread Robert Watson


On Mon, 18 Jul 2011, Andriy Gapon wrote:

In recent branches (confirmed with 224119) builds compiled with clang 
happen to throw 'Unknown error: -512' in a lot of places, making the system 
unusable. (Untested on gcc compiled systems). Originally I thought the 
problem was with specific programs, then I narrowed it down to file I/O, 
and now I've narrowed it down to open() with O_TRUNC. Without O_TRUNC there 
seems to be no issues whatsoever. With O_TRUNC on open() it fails with that 
'Unknown error: -512' every other time you run the program. Common issues, 
portsnap is affected, making it impossible to fetch/extract ports. As well 
as redirecting output in shells eg `echo 'hi' > test` fails every other 
try. You have the same issue with text editors like `edit` where it fails 
every other save. There are no issues with `echo 'hi' >> test` as there is 
no O_TRUNC, it only seems to be an O_TRUNC error.


Any tips? Otherwise I'll be looking into this today myself.


Just a hint that you could try using DTrace syscall and fbt providers to see 
where in kernel (if in kernel) that -512 return value originates.


Jon Anderson spotted that here during some Capsicum work -- initially we were 
concerned it was a local patch, but it sounds like it might be less local.  I 
think he saw it on calls to open(2) as well, and I couldn't help but wonder 
(given its recent arrival) if it was an outcome of the change to break falloc 
into two parts, leading to some or another problematic handling of file 
descriptor numbers.  I.e., it's not so much that -512 is being returned, as a 
number that's a bad file descriptor.  (Although now having seen 512 twice on 
two different machines, that particular explanation seems less credible). 
Perhaps this is indeed unrelated to Capsicum, and triggered by a clang bug or 
something else.


I've CC'd Jon, maybe he has gained further insight since we chatted.

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Kernel timers infrastructure

2011-07-25 Thread Robert Watson


On Mon, 25 Jul 2011, Filippo Sironi wrote:

I'm working on a university project that's based on FreeBSD and I'm 
currently hacking the kernel... but I'm a complete newbie. My question is: 
what if I have to call a certain function 10 times per second? I've seen a 
bit of code regarding callout_* functions but I can't get through them. Is 
there anyone who can help me?


Hi Filippo:

I'm not sure if you've found the callout(9) man page yet, but it talks about 
the KPI in some detail.  The basic idea, though, is that you describe a 
regular "callout" using a function pointer, an opaque data pointer, and how 
long until it should be invoked.  In its more complex incantations, you can 
also specify locks for it to acquire, etc.  The key aspect of the API that 
some people find confusing is that the time interval is described in ticks of 
length 1/hz seconds.  Unless software really wants one invocation per tick 
(generally unlikely), you will want to pass in some constant times/divided by 
hz so that it's appropriately scaled.


You can find two fairly straight-forward examples in kern/uipc_domain.c, which 
are respectively the "fast" and "slow" timers


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Add setacl system call?

2011-07-28 Thread Robert Watson


On Mon, 25 Jul 2011, exorcistkiller wrote:

Another question while I'm reading the code. In ufs_acl.c, in static int 
ufs_getacl_posix1e(struct vop_getacl_args *ap), you commented: As part of 
the ACL is stored in the inode, and the rest in an EA, assemble both into a 
final ACL product. From what I learned from Kirk's book, ACLs are supposed 
be stored in extended attributes blocks. So what do you mean by "part of the 
ACL is stores in the inode"? I know extended attributes blocks data can be 
addressed by inode, but how to get ACL directly from the inode?


POSIX.1e ACLs are defined as an extension to permissions: additional user 
entries, group entries, and a mask entry supplement the existing owner, group 
and other permission bits.  Both the APIs and command line tools allow the 
portions of the ACL representing permission bits to be directly manipulated. 
For the purpose of the UFS implementation (and I suspect this to be common in 
other implementations as well), we keep the owner/group/other bits (or 
sometimes the mask bits) in the existing inode permissions field.  All 
additional entries are stored in the extended attribute.  This has some nice 
properties, not least:


(1) stat(2) on the file still only needs look at the inode, not the extended
attributes, making it faster.
(2) chmod(2) can be implemented by writing out only the inode, also faster.
(3) Files without extended ACLs don't need extended attributes stored.

The inclusion of a "mask" field in POSIX.1e is motivated similarly: it is what 
allows stat(2) and chmod(2) to not touch extended ACL fields.


This is what the commend means by part of the ACL being stored in the inode, 
and part in the extended attribute: any areas of an ACL that are actually 
permission mask entries go in the existing mode bits in the inode for 
efficiency reasons.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: HTT vs SMT in x86 SMP topology reporting

2011-07-28 Thread Robert Watson


On Tue, 26 Jul 2011, Andriy Gapon wrote:

Can anybody explain to me why our _x86_ SMP topology discovery and reporting 
code sometimes reports "HTT" and sometimes "SMT"? As in FreeBSD/SMP: %d 
package(s) x %d core(s) x %d HTT threads vs FreeBSD/SMP: %d package(s) x %d 
core(s) x %d SMT threads


As I understand, and quoting Wikipedia (I know, I know), SMT stands for 
simultaneous multithreading and is a generic term for a particular kind of 
hardware multithreading: 
http://en.wikipedia.org/wiki/Simultaneous_multithreading


The only known (to me) implementation of SMT for x86 is Intel's 
Hyper-Threading Technology aka HTT aka HT Technology aka hyperthreading: 
http://en.wikipedia.org/wiki/Hyper-threading 
http://software.intel.com/en-us/articles/intel-hyper-threading-technology-your-questions-answered/?wapkw=%28Intel+Hyper-Threading+Technology%29


Several MIPS platforms we run on support SMT.  Typically this means a set of 
"weaker" threads sharing a single core, usually context switching as a result 
of memory access stalls in other threads, and perhaps sharing particularly 
expensive CPU features, such as a TLB.  They sometimes come with 
high-performance message-passing facilities between threads, or even between 
cores, to supplement shared memory and IPIs.


It may be that HTT is, among other things, a trademark of Intel.

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: MAC Framework, Socket information

2011-07-29 Thread Robert Watson


On Thu, 28 Jul 2011, s wrote:

I need to get some info about the socket being created by the user. What I 
want to do is log all TCP/UDP outgoing connections that are being made. I 
*need* to get the local and remote address, as well as the local and remote 
port. I managed to get all of the remote data, but this is useless to me, if 
I haven't got the local port. Here is what I have already written:


Most MAC Framework entry points are invoked before operations of interest, 
rather than after, because they are intended to perform access control on 
operations.  I think the closest you may be able to get given current entry 
points is logging when the first operation is performed on the connected 
socket: i.e., read, write, sendfile, etc, since it will be established at that 
point (some caution required: you can invoke system calls on sockets before 
and during connect()).


However, I can't help but wonder: would you be better-served by using the 
kernel's audit facilities to track events like socket connection?  Are you 
blending access control and logging in your module, or is this really just 
about logging?


Robert




static int slog_socket_check_connect(struct ucred *cred,
   struct socket *socket, struct label *socketlabel,
   struct sockaddr *sockaddr)
{
   if(sockaddr->sa_family == AF_INET) {
   struct sockaddr_in sa;
   log(LOG_SECURITY | LOG_DEBUG, "Somebody made a socket: %d:%d 
(%d)\n",

   cred->cr_ruid,
   ntohs(((struct sockaddr_in*)sockaddr)->sin_port),
   ntohs(((struct in_endpoints*)sockaddr)->ie_lport)
   );
   }
   return 0;
}

--
Pozdrawiam,
Jakub 'samu' Szafrański
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Re: MIPS toolchain

2011-07-31 Thread Robert Watson


On Fri, 29 Jul 2011, James Jones wrote:


Does anyone have a prebuilt MIPS tool chain?


For FreeBSD-related MIPS work, I generally use the FreeBSD "toolchain" target 
followed by the "buildenv" environment, but that requires first building a 
cross-toolchain using TARGET_ARCH and TARGET.  However, the result is a pretty 
sane compiler, linker, etc, setup for the MIPS of your choice (we tend to use 
mips64eb).


We also use the MIPS-provided SDE toolchain for Linux at the CL, but that 
appears to be out of maintenance, and I haven't found its bug density to be 
any lower, really, than the even more ageing FreeBSD versions of the tools. 
In fact, there are some toolchain bugs I'm running into that manifest only in 
the SDE toolchain and not the FreeBSD toolchain.  (Mind you, Philip has 
commented that in building Uboot for MIPS, he's found FreeBSD bugs that don't 
appear in the SDE toolchain, so mileage varies).


We're greatly looking forward to MIPS support for LLVM, which currently 
appears very premature indeed.  Someone from MIPS appears to be contributing 
to it, however, and we (cl.cam.ac.uk) hope to provide some implementation 
support for that effort in the immediate future.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Capsicum project: Ideas needed

2011-08-10 Thread Robert Watson


On Thu, 4 Aug 2011, Lars Engels wrote:


I just stumbled upon this rather outdated thread...

On Fri, 8 Jul 2011 15:09:52 +0400, Ilya Bakulin wrote: [...]

wget curl links/lynx
This is Ports software, we may try to modify it and even send patches to 
upstream, or maintain our local patches. I wanted to focus on base system 
components during GSoC, but it doesn't hurt to try to capsicumize these 
tools either.


fetch(1) is similar to wget and curl and is part of the base system, so 
would this be a candidate?


I'd think fetch would be quite a good candidate -- most of its work is done as 
a pipeline between a socket and a file, and sandboxing the gubbins that sits 
in the middle of that pipeline would be quite beneficial.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: Dynamic kernel module linking problem

2011-08-26 Thread Robert Watson


On Fri, 26 Aug 2011, Monthadar Al Jaberi wrote:

I have written a dynamic loadable module using DECLARE_MODULE in 
FreeBSD-Current.


And I want to iterate through the ifnet list using following code snippet:


If this is on a recent version of FreeBSD (8.x and later), then you probably 
mean to be using V_ifnet, and you should include if_var.h rather than using an 
extern in order to ensure virtualisation is handled properly.


Robert



extern struct ifnethead ifnet;
...
struct ifnet *ifp, *ifp_temp;
TAILQ_FOREACH_SAFE(ifp, &ifnet, if_link, ifp_temp) {
printf("%s\n", ifp->if_dname);
}

Compilation is fine, but when I load the module I get the following error:

...
/sbin/kldload -v module.ko
link_elf: symbol ifnet undefined
...

What am I doing wrong? Shouldn't kernel be able to link it on its own?

Grateful for any advice.
--
//Monthadar Al Jaberi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: TIME_WAIT Assassination in FreeBSD???

2011-09-05 Thread Robert Watson

On Sat, 3 Sep 2011, Jarrod Lee Petz wrote:

3. Does FreeBSD handle this situation? How? I can't seem to find much info 
on TIME_WAIT assassination in FreeBSD is mentioned in RFC 6056


I'm not familiar with the RFC side here, but I can confirm that FreeBSD will 
recycle TIMEWAIT connections more quickly than specified when load is very 
high.  This is done on the basis of allocated space; the sysctl:


  net.inet.tcp.maxtcptw

Instructs the stack regarding how much state to retain -- this is implemented 
by adjusting the allocation limit on the tcptw zone.  On my system, it seems 
to auto-tune to about 5000 connections, a value derived from the global limit 
on the number of sockets on the box I'm looking at -- your mileage may vary.


The resource limit case can occur in tcp_twstart(), when uma_zalloc() returns 
NULL on failing to allocate new TIMEWAIT state for a connection.  At that 
point, it forces an early scan of TIMEWAIT connections (which normally happens 
on 2msl intervals) with a 'reuse' argument of 1, authorising premature reuse. 
Without too close an analysis, it appears on face value to implement LRU: we 
reuse storage held by the connection that has been in TIMEWAIT the longest.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: buf_ring(9) API precisions

2011-09-15 Thread Robert Watson

On Thu, 15 Sep 2011, K. Macy wrote:

Why are you making an MD guess, the amount of padding to fit the size of a 
cache line, in MI API ? Strangely enough, you did not make this assumption 
in, say r205488 (picked randomly).


It has been several years, and I haven't done any work in svn in over a 
year, I don't remember. I probably meant to refine it in a later iteration.


If you would like to send me a patch addressing this I'd be more than happy 
to apply it if appropriate. Otherwise, I will deal with it some time after 9 
settles.


Thanks for pointing this out.


I'm not sure if gcc (and friends) allow __aligned(CACHE_LINE_SIZE) to be used 
on individual elements of a struct (causing appropriate padding to be added), 
but that may be one option here.  Of course, that introduces a further 
alignment requirement on the struct itself, so a moderate amount of care would 
need to be used.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-18 Thread Robert Watson


On Mon, 16 Jan 2012, Julian Elischer wrote:


On 1/16/12 3:32 PM, William Bentley wrote:
I also echo John's sentiments here. Very excellent points made here. Thank 
you for voicing your opinion. I was beginning to think I was the only one 
who felt this way.

[...]

We seem to have lost our way around the release of FreeBSD 7. I am all in 
favor of new features but not at the risk of stability and proper life 
cycle management.


Are me and John the only people that feel this way or are we among the 
minority?


It pretty much boils down to one thing..  man power..


I disagree.  Resourcing is an issue, but it is not *the* issue.  The real 
issue here is a failure by the release engineering team (which includes me) to 
concurrently perform major and minor releases.  Given that minor releases run 
like clockwork in most cases, this is disappointing.  In the past, there have 
been a lot of good technical and structural obstacles to trying to do 
clockwork releases for both major and minor releases:


- Tight synchronisation of the ports and base release schedule means that the
  base release schedule limits ports productivity.

- Long freezes forced on us by poor revision control support for branching.

None of these really apply any longer -- and in as much as they do, they 
should be addressed.  In particular, I think there's a growing feeling that 
ports should be conducting its own releases out of lockstep with the base 
tree, producing package sets as a primary product at regular intervals 
regardless of the base release schedule.  Likewise, long freezes enforced by 
expensive branching operation in CVS no longer apply due to use of Subversion 
-- it's not perfect, but it's workable.


There's no way to satisfy everyone with any particular maintenance schedule 
and release cycle.  However, it seems clear that the current model with minor 
releases spaced at a year is satisfying no one.  It's easy to point at a 
developer<->user divide, but I think that misses the point: most developers 
are users.  A big gap between development branch and shipped features hurts 
the commercial users of FreeBSD that pay for so much of its development, since 
it forces them to support diverging local development and shipping products -- 
ISPs, etc.  There is no incentive for year-long gaps in minor releases.


My view is therefore that we have a "social" -- which is to say structural -- 
problem.  Regardless of ".0" releases, we should be forcing out minor 
releases, which are morally similar to "service packs" in the vocabulary of 
other vendors: device driver improvements, new CPU support, steady of 
conservative feature development, etc, required to keep older major releases 
viable on contemporary hardware and with contemporary applications.  One known 
problem is using a single "head" release engineer in steering all releases. 
I think this is a mistake, as it makes the whole project's release schedule 
subject to individual unavailability, burnout, etc, as well as increasing the 
risks associated with low bus factor.  I'd like to see us move to a model 
where new release engineers are mentored in from the developer community for 
point releases, ensuring that we increase our expertise, share knowledge about 
release engineering in the broader community, and get new eyes on the process 
which can lead more readily to process improvements.  The role of the "head" 
release engineer shouldn't be hands-on prodution of every release, but rather, 
steering of the overall team.


I'd like to see this begin with 8.3, drawing a per-release lead from the 
developer community, and continue with a fixed schedule release of 8.4.  Yes, 
more staffing is needed, but first, what is needed is an improvement in model.


On a related note, the security team owns the "freebsd-update" mechanism, 
largely for historical reasons (Colin wrote it), but this is actually a bit 
backwards from how you would expect things to run, as we now use 
freebsd-update for upgrades, which are almost never engineered by the security 
team.  Not sure what the fix is there, but it seems related -- perhaps what is 
really called for is breaking out our .0 release engineering entirely from .x 
engineering, with freebsd-update being in the latter.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-18 Thread Robert Watson

On Wed, 18 Jan 2012, Andriy Gapon wrote:


on 18/01/2012 02:16 Igor Mozolevsky said the following:
Seriously, WTF is the point of having a PR system that allows patches to be 
submitted??! When I submit a patch I fix *your* code (not yours personally, 
but you get my gist).


Let me pretend that I don't get it.  It is as much your code as it is mine 
if you are a user of FreeBSD.  I just happen to have a commit bit at this 
point in time.


No other project requires a non-committer to be so ridiculously persistent 
in order to get a patch through.


There are about 5000 open PRs for FreeBSD base system, maybe more. There are 
only a few dozens of active FreeBSD developers.  Maybe less for any given 
particular point in time (as opposed to a period of time). And dealing with 
PRs is not always exciting. Need I continue?


P.S. Using GNATS for the PR database doesn't help either, in some technical 
ways.


The structural problem around the PR system for the base system is that there 
isn't a whole lot of incentive for most developers to use it.  I think we can 
reasonably categorise developers into three classes -- some move between or 
span them, of course:


(1) Volunteers.  Due to childhood trauma, they have a desperate urge to write
operating systems.  Not much incentive to do PRs here, as most refer to
versions of FreeBSD before their time, aren't great characterisations,
rarely come with patches, and when they do, the patches are out of date,
don't apply, have the wrong style, solve the wrong problem, etc.  A
sweeping generalisation, but you see what I mean.  The only exceptions
here are our dedicated team of bugmeisters, who get enourmous respect from
me, but they are a tiny minority.

(2) Employees.  They work at a company using FreeBSD as a product, and
effectively deliver their own CompanyBSD as a further product to their own
internal customers -- to be put on a web service frontline, to ship as the
foundation of an appliance, etc.  The key phrase here is "internal
customers" -- they have their own bug report database, which they respond
to in a timely way due to the incentives of the workplace, but also
because they are relevant bug reports for their product goals.

(3) Authors of upstream code.  They don't even work on FreeBSD, but their code
ends up in FreeBSD, so they also have their own bug report databases, fix
bugs, and eventually the fixes trickle into FreeBSD.

With the above, the incentives to handle PRs are very weak -- and it's 
compounded by gnats being terrible for both submitters and handlers of bug 
reports.  Contrast this with ports, where the PR database is a key part of the 
workflow.


However, and I am being entirely honest when I say this: FreeBSD works anyway. 
So somehow, we end up with a pretty good OS despite largely ignoring our bug 
report database.  Why?  Well, for (1) it's because volunteers have a strong 
sense of ownership of the code they've written and care about, (2) there's a 
significant internal QA and bug management effort at downstream companies from 
FreeBSD, whose improvements are frequently upstreamed by committers on staff, 
and (3) occurs independently of bugs in our bug report database.


Don't get me wrong: it's a problem that the PR database goes so unloved.  But 
it's a symptom of the construction of *extremely large* volunteer projects in 
which the incentives are not aligned for dealing with PRs most of the time. 
If you want to see something similarly sad, try counting dropped patches on 
the linux-kernel list.  Someone once ported the entire FreeBSD kernel audit 
framework and OpenBSM to Linux, posted on the list saying "here are my 
patches", never heard anything back, and went away.  You can moralise in 
various ways and for various parties in that relationship, but at heart, 
that's pretty similar to a lot of the patches in the PR database; you'll find 
similar stuff in every open source project of scale.  I submitted patches to 
fix several bugs in KDE a decade or so ago .. after five years, the reports 
were closed as "out of date".  Yet large open source products *do* work, and 
become the foundations for amazing things.


I think shifting away from Gnats would help as it would make it easier for 
developers to find bugs they care about, users to submit higher-quality 
reports, and so on.  Gnats makes it really hard to manage reports in a useful 
way.


Another possibility is to get some combination of {The FreeBSD Foundation, iX 
Systems, ...} to trawl the bug report database in a more official capacity. 
The problem there is that this will be a high burn-out job.  I'll bring it up 
at the next Foundation board meeting, especially after a bumper year of 
fund-raising, and see what we can do.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "f

Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-18 Thread Robert Watson


On Tue, 17 Jan 2012, Andriy Gapon wrote:


on 17/01/2012 00:28 John Kozubik said the following:

we going to run RELEASE software ONLY


My opinion: you've put yourself in a box that is not very compatible with 
the current FreeBSD release strategy.  With your scale and restrictions you 
probably should just use the FreeBSD source and roll your own releases from 
a stable branch of interest (including testing, etc).  Or have your own 
"branch" where you could cherry-pick interesting changes from any FreeBSD 
branches.  Tools like e.g. git and mercurial make it easy.  Of course, this 
strategy is not as easy as trying to persuade the rest of FreeBSD 
community/project/thing to change its ways, but perhaps a little bit more 
realistic.  You can bond with similarly minded organizations to share 
costs/work/etc.  It's a community-driven project after all.


Suppose for a moment we get the .x release process fixed: we start cutting 
regular point releases from -STABLE on a 6-month cycle (just a strawman). 
freebsd-update's update and upgrade features actually make tracking -STABLE at 
release engineered time slices plausible.


One reason that's true is that between 5.x and 6.x, the FreeBSD Project 
underwent a substantive change in our approach to binary interfaces.  In 4.x 
and before, the letters "ABI" rarely hit the mailing lists.  In 6.x and later, 
it's a key topic discussed whenever merges to -STABLE come up.  We now really 
care about keeping applications running as the OS moves under them.  We also 
build packages to better-defined ABIs -- not perfectly, but OK.


I think John gets a lot of what he wants if we just fix our release cycle.

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-18 Thread Robert Watson


On Tue, 17 Jan 2012, Doug Barton wrote:

The other thing I think has been missing (as several have pointed out in 
this thread already) is any sort of planning for what should be in the next 
release. The current time-based release schedule is (in large part) a 
reaction to the problems we had in getting 5.0 out the door. However I think 
the pendulum has swung *way* too far in the wrong direction, such that we 
are now afraid to put *any* kind of plan in place for fear that it will 
cause the release schedule to slip. Aside from the obvious folly in that 
(lack of) plan, it fails to take into account the fact that the release 
schedules already slip, often comically far out into the future, and that 
the results are often worse than they would have been otherwise.


Agreed entirely.  There's been an over-swing caused by the diagnosis "it's 
like herding cats" into "cats can't be herded, so why try?".  Projects like 
FreeBSD don't agree if there's no consensus on interesting problems to solve, 
directions to run in, etc.  The history of FreeBSD is also full of examples of 
successful collaborative development in which developers decide, together, on 
a direction and run that way.  Sure, it's not the same as "we are paying you 
to do X", but I think many FreeBSD developers like the idea that they are 
working on something larger than just their own micro-project, and would 
subscribe (and contribute) to a sensible plan.  In fact, I think we'd find 
that if we were a bit more forthcoming about our plans, we'd have an easier 
time soliciting contributions from people less involved in the project, as it 
would be more obvious how they could get involved.


It strikes me that the first basic plan would be a release schedule, however. 
:-)


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: SMP: protocol control block protection for a multithreaded process (ex: udp).

2012-05-29 Thread Robert Watson


On Tue, 29 May 2012, vasanth rao naik sabavat wrote:


In case of a Multicore cpu system running a multithreaded process.

For protocol control blocks there is no protection provided in the FreeBSD 
9. For example, udp_close() and udp_send() access the inp before taking the 
lock. Couldn't this cause the inp inconsistency on a multithreaded process 
running on multicore cpu system?


Say, If the two threads of a process are concurrently executing socket send 
and socket close say on a udp connection (this can happen in case of poorly 
written user code.). udp_close() will access the inp on one cpu and 
udp_send() will access the inp on another cpu. it is possible that 
udp_close() gets the locks first and free's the inp before udp_send() has a 
chance to run?


Am I missing anything?


The life cycle here is complicated and there is some subtlety.  The simple 
answer to your question is that udp_abort() and udp_close() don't free the 
inpcb -- that occurs in udp_detach(), which is called only when the reference 
count on the socket hits 0, which can't happen while udp_send() is in flight, 
as the caller owns a reference maintaining the stability of the socket.


Take a look at the comment at the top of uipc_socket.c for more detailed 
coverage of socket life cycles; for UDP, inpcbs are around for the entirely 
life cycle of the socket, so it is always safe to follow so->so_pcb if you 
hold a valid socket reference (either borrowed from a process's file 
descriptor, or held).  For TCP, things are more complex.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: SMP: protocol control block protection for a multithreaded process (ex: udp).

2012-05-29 Thread Robert Watson


On Tue, 29 May 2012, vasanth rao naik sabavat wrote:


Can somebody please reply to this email.

basically, can udp_detach() and udp_send() execute simultaneously for a 
process with multiple threads? if yes, then inp reference in udp_send() will 
be stale if udp_detach() free's the inp?


You are confusing application-level close() with an actual close in the socket 
implementation.  The socket will remain allocated as long as there are 
consumers using it, which is ensured through a reference count on the socket, 
regardless of close().  That isn't to say that there aren't bugs -- this stuff 
is pretty complex -- but the life cycle and synchronisation models around 
sockets should prevent the scenario you are describing from occurring.


Robert



Thanks,
Vasanth



On Tue, May 29, 2012 at 10:53 AM, vasanth rao naik sabavat <
vasanth.raon...@gmail.com> wrote:


Hi,

In case of a Multicore cpu system running a multithreaded process.

For protocol control blocks there is no protection provided in the FreeBSD
9. For example, udp_close() and udp_send() access the inp before taking the
lock. Couldn't this cause the inp inconsistency on a multithreaded process
running on multicore cpu system?

Say, If the two threads of a process are concurrently executing socket
send and socket close say on a udp connection (this can happen in case of
poorly written user code.).
udp_close() will access the inp on one cpu and udp_send() will access the
inp on another cpu. it is possible that udp_close() gets the locks first
and free's the inp before udp_send() has a chance to run?

Am I missing anything?

Thanks,
Vasanth






___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: sysctl filesystem ?

2012-06-26 Thread Robert Watson

On Tue, 26 Jun 2012, Chris Rees wrote:


as well as we don't depend of /proc for normal operation we shouldn't for

say /proc/sysctl


improvements are welcome, better documentation is welcome, changes to

what is OK - isn't.

/proc/sysctl might be useful.  Just because Linux uses it doesn't make it a 
bad idea.


One of the problems we've encounted with synthetic file systems is that 
off-the-shelf file system tools (e.g., cp, dd, cat) make simplistic (but not 
unreasonable) assumptions about the statistic content of files.  This comes up 
frequently with procfs-like systems where the size of, say, memory map data 
can be considerably larger than the perhaps 128-byte, 256-byte, or even 8k 
buffers that might exist in a stock file access tool.  Unless we change all of 
those tools to use buffers much bigger than they currently do, which even 
suggets changing the C library buffer to defaults for FILE *, that places an 
onus on the file system to provide persisting snapshots of data until it's 
sure that a user process is done -- e.g., over many system calls.


sysctl is not immune to the requirement of atomicity, but it has explicit 
control over it: sysctl is a single system call, rather than an unbounded 
open-read-seek-repeat-etc cycle, and has been carefully crafted to provide 
this and other MIB-like properties, such as a basic data type model so that 
command line tools know how to render content rather than having to guess 
and/or get it wrong.  sysctl has some file-system like properties, but on the 
whole, it's not a file system -- it's much more like an SNMP MIB.


While you can map anything into anything (including Turing machines), I think 
the sysctl command line tool and API, despite its limitations, is a better 
match for accessing this sort of monitoring and control data than the POSIX 
file API, and would recommend against trying to move to a sysctl file system.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: projects/armv6 merged to HEAD

2012-08-17 Thread Robert Watson


On Thu, 16 Aug 2012, Oleksandr Tymoshenko wrote:

projects/armv6 branch was merged to HEAD and should be considered dead now. 
This patch is a result of a joint effort by many people. Including but not 
limited to:


Amazing work -- many thanks are due to to everyone who was involved!

Robert



 Grzegorz Bernacki (gber@)
 Aleksander Dutkowski
 Ben R. Gray (bgray@)
 Olivier Houchard (cognet@)
 Rafal Jaworowski (raj@) and Semihalf team
 Tim Kientzle (kientzle@)
 Jakub Wojciech Klama (jceel@)
 Ian Lepore
 Warner Losh (imp@)
 Damjan Marion (dmarion@)
 Lukasz Plachno
 Stanislav Sedov (stas@)
 Mark Tinguely
 Andrew Turner (andrew@)

Thanks to all, who contributed by submitting code,
testing and giving valuable advices.

Code drop includes following parts:

- General ARMv6/ARMv7 kernel bits (pmap, cache,
   assembler routines, etc...)
- ARM SMP support
- VFP/Neon support
- ARM Generic Interrupt Controller driver
- Improved thread-local storage for cpus >=ARMv6
- Two new values for TARGET_ARCH: armv6 and armv6eb
- Driver for SMSC LAN95XX and LAN8710A ethernet controllers
- Marvell MV78x60 support (multiuser, ARMADA XP kernel config)
- TI OMAP4 and AM335x support (multiuser, no GPU or graphics
   support, kernel configs for Pandaboard and Beaglebone)
- LPC32x0 support (multiuser, frame buffer works with SSD1289
   LCD controller.Embedded Artists EA3250 kernel config)
- Barebone Nvidia Tegra2 support (timers, interrupts and UART.
   No kernel config)

Hope now that the code is in trunk it will get more attention
and love from developers.

Happy hacking

--
gonzo
___
freebsd-a...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscr...@freebsd.org"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: syslog(3) issues

2012-09-03 Thread Robert Watson

On Mon, 3 Sep 2012, Attilio Rao wrote:

I was trying to use syslog(3) in a port application that uses threading , 
having all of them at the LOG_CRIT level. What I see is that when the 
logging gets massive (1000 entries) I cannot find some items within the 
/var/log/messages (I know because I started stamping also some sort of 
message ID in order to see what is going on). The missing items are in the 
order of 25% of what really be there.


Someone has a good idea on where I can start verifying for my syslogd 
system? I have really 0 experience with syslogd and maybe I could be missing 
something obvious.


syslog(3)/syslogd(8) use datagram sockets for both local and networked 
logging, and it is possible for those datagram sockets to fill and drop 
messages.  I'm not sure if we have per-socket counters that can easily be 
queried by syslogd, but if we do, it might be beneficial to have syslogd wake 
up once a second and check to see if the counters have changed -- if they 
have, inject a log message indicating how many messages were dropped in the 
last $epsilon.  If we don't have counters along those lines, it might make 
sense to add them.  We might also find that it is appropriate to tune up the 
limits if they no longer seem sensible in the current world order -- they may 
have late 1980s/early 1990s values (or they may not).


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: No bus_space_read_8 on x86 ?

2012-10-12 Thread Robert Watson


On Fri, 12 Oct 2012, John Baldwin wrote:

I believe it was because bus reads weren't guaranteed to be atomic on 
i386. don't know if that's still the case or a concern, but it was an 
intentional omission.
True.  If you are on a 32-bit system you can read the two 4 byte values 
and then build a 64-bit value.  For 64-bit platforms we should offer 
bus_read_8() however.


I believe there is still no way to perform a 64-bit read on a i386 (or at 
least without messing with SSE instructions), but if you have to read a 
64-bit register, you are stuck with doing two 32-bit reads and 
concatenating them. I figure we may as well provide an implementation for 
those who have to do that as well as the implementation for 64-bit.


I think the problem though is that the way you should glue those two 32-bit 
reads together is device dependent.  I don't think you can provide a 
completely device-neutral bus_read_8() on i386.  We should certainly have it 
on 64-bit platforms, but I think drivers that want to work on 32-bit 
platforms need to explicitly merge the two words themselves.


Indeed -- and on non-x86, where there are uncached direct map segments, and 
TLB entries that disable caching, reading 2x 32-bit vs 1x 64-bit have quite 
different effects in terms of atomicity.  Where uncached I/Os are being used, 
those differences may affect semantics significantly -- e.g., if your device 
has a 64-bit memory-mapped FIFO or registers, 2x 32-bit gives you two halves 
of two different 64-bit values, rather than two halves of the same value.  As 
device drivers depend on those atomicity semantics, we should (at the busspace 
level) offer only the exactly expected semantics, rather than trying to patch 
things up.  If a device driver accessing 64-bit fields wants to support doing 
it using two 32-bit reads, it can figure out how to splice it together 
following bus_space_read_region_4().


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: No bus_space_read_8 on x86 ?

2012-10-13 Thread Robert Watson


On Fri, 12 Oct 2012, Carl Delsey wrote:

Indeed -- and on non-x86, where there are uncached direct map segments, and 
TLB entries that disable caching, reading 2x 32-bit vs 1x 64-bit have quite 
different effects in terms of atomicity. Where uncached I/Os are being 
used, those differences may affect semantics significantly -- e.g., if your 
device has a 64-bit memory-mapped FIFO or registers, 2x 32-bit gives you 
two halves of two different 64-bit values, rather than two halves of the 
same value.  As device drivers depend on those atomicity semantics, we 
should (at the busspace level) offer only the exactly expected semantics, 
rather than trying to patch things up.  If a device driver accessing 64-bit 
fields wants to support doing it using two 32-bit reads, it can figure out 
how to splice it together following bus_space_read_region_4().
I wouldn't make any default behaviour for bus_space_read_8 on i386, just 
amd64. My assumption (which may be unjustified) is that by far the most 
common implementations to read a 64-bit register on i386 would be to read the 
lower 4 bytes first, followed by the upper 4 bytes (or vice versa) and then 
stitch them together.  I think we should provide helper functions for these 
two cases, otherwise I fear our code base will be littered with multiple 
independent implementations of this.


Some driver writer who wants to take advantage of these helper functions 
would do something like

#ifdef i386
#definebus_space_read_8bus_space_read_8_lower_first
#endif
otherwise, using bus_space_read_8 won't compile for i386 builds.
If these implementations won't work for their case, they are free to write 
their own implementation or take whatever action is necessary.


I guess my question is, are these cases common enough that it is worth 
helping developers by providing functions that do the double read and shifts 
for them, or do we leave them to deal with it on their own at the risk of 
possibly some duplicated code.


I was thinking we might suggest to developers that they use a KPI that 
specifically captures the underlying semantics, so it's clear they understand 
them.  Untested example:


uint64_t v;

/*
 * On 32-bit systems, read the 64-bit statistic using two 32-bit
 * reads.
 *
 * XXX: This will sometimes lead to a race.
 *
 * XXX: Gosh, I wonder if some word-swapping is needed in the merge?
 */
#ifdef 32-bit
bus_space_read_region_4(space, handle, offset, (uint32_t *)&v, 2;
#else
bus_space_read_8(space, handle, offset, &v);
#endif

The potential need to word swap, however, suggests that you may be right about 
the error-prone nature of manual merging.


Robert

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: A question about creating a system call

2012-11-08 Thread Robert Watson

Hi Dave:

This wiki page may be of value:

http://wiki.freebsd.org/AddingAuditEvents

Robert N M Watson
Computer Laboratory
University of Cambridge

On Thu, 8 Nov 2012, dave jones wrote:


Hello,

I know how to create system calls, but I'm a bit confused about
sys/kern/syscalls.master file explained. For example, if I have a
foo system call, following code is added:

532 AUE_NULLSTD { int foo(char *str); }

The question is in column two AUE_NULL, can I replace it with AUE_FOO?
How to determine the system call should be audit or not? Thank you.

Regards,
Dave.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: KVERIFY for non-debug invariants?

2012-12-06 Thread Robert Watson


On Wed, 5 Dec 2012, Vijay Singh wrote:

All. KASSERT() is a really need way of expressing invariants when INVARIANTS 
is defined. However for regular, non-INVARIANTS code folks have the typical 
if() panic() combos, or private macros. Would a KVERIFY() that does this in 
non-INVARIANTS code make sense?


I'd certainly be fine with something like this.  It might be worth posting to 
arch@ with a code example, as hackers@ has a subset of the potentially 
interested audience.  INVARIANTS has got a bit heavier-weight over the years 
-- the main thing I run into in higher-performance scenarios is its additional 
UMA debugging, which causes a global lock to be acquired during sanity checks. 
It might be worth our pondering adding a new configure option for particularly 
slow invariant tests -- e.g., INVARIANTS_SLOW ... or maybe just 
INVARIANTS_UMA.  However, that's a different issue.


(I sort of feel that things labeled "assert" should be something we can turn 
on in production... so maybe INVARIANTS/KASSERT mission-creep is the issue.)


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: question in sosend_generic()

2013-06-08 Thread Robert Watson


On Fri, 7 Jun 2013, vasanth rao naik sabavat wrote:

When sending data out of the socket I don't see in the code where the sb_cc 
is incremented.


sb_cc reflects data appended to the socket buffer; sosend_generic() is 
responsible for arranging copying in and performing flow control, but the 
protocol's own pru_send() routine performs the append.  E.g., tcp_usr_send() 
performs sbappendstream() which actually adds it to the socket buffer. 
Notice that not all protocols actually use the send socket buffer -- for 
example, UNIX domain sockets direct cross-deliver to the receiving socket's 
receive buffer.


Is the socket send performed in the same thread of execution or the data is 
copied on to the socket send buffer and a different thread then sends the 
data out of the socket?


Protocols provide their own implementations to handle data moving down the 
stack, so the specifics are protocol-dependent.  In TCP, socket buffer append 
occurs synchronously in the same thread as part of the pru_send() downcall 
from the socket layer.  When data leaves the send socket buffer is quite a 
different question.  For TCP, data may be sent immediately if there various 
windows allow immediate transmit of the data (e.g., flow control, congestion 
control) ... or it may remain enqueued in the send socket buffer until an ACK 
is received that indicates the receiver is ready for more data (E.g., growing 
window size, ACK clocking, etc).  In the steady send state (e.g., filling the 
window) I would expect to see data sent (and later removed) from the socket 
buffer only in an asynchronous context.  Typically, ACK processing occurs in 
one of two threads: device driver interrupt handling (i.e., in the ithread) or 
in the netisr thread for encapsulated or looped back traffic.


Because, I see a call to sbwait(&so->so_snd) in the sosend_generic and I 
don't understand who would wake up this thread?


sbwait() implements blocking for flow/congestion control: when the socket 
buffer fills, the sending thread must wait for space to open up.  Space 
becomes available as a result of successful transmit -- e.g., the sbtruncate() 
of the sending socket buffer when a TCP ACK has been received.  So the thread 
that triggers the wakeup will usually be the ithread or netisr.  In the case 
of UNIX domain sockets, it's actually the receiving thread that triggers the 
wakeup directly.



If the data is not copied on to the socket buffers then it should 
technically send all data out in the same thread of execution and future 
socket send calls should see that space is always fully available. In that 
case I dont see a reason why we need to wait on the socket send buffer. As 
there would no one who will actually wake you up.


There are some false assumptions here.  The sending thread will always append 
data [that fits] to the socket buffer, but may have to loop awaiting space for 
all data, depending on blocking/non-blocking status.  Space becomes available 
when the remote endpoint acknowledges receipt, perhaps via a TCP ACK.  You 
might never wake up if flow control from the remote endpoint doesn't find 
space becoming available, you've enabled blocking, and no timeout is set.  If 
you fear the recipient may block the sender, then you need to implement some 
timeout mechanism to decide how long you're willing to wait.



   if (space < resid + clen &&
   (atomic || space < so->so_snd.sb_lowat || space <
clen)) {
   if ((so->so_state & SS_NBIO) || (flags & MSG_NBIO))
{
   SOCKBUF_UNLOCK(&so->so_snd);
   error = EWOULDBLOCK;
   goto release;
   }
   error = sbwait(&so->so_snd);
   SOCKBUF_UNLOCK(&so->so_snd);
   if (error)
   goto release;
   goto restart;
   }

In the above code snippet, for a blocking socket if the space is not
available, then it may trigger a deadlock?


You can experience deadlocks between senders and receivers as a result of 
cyclic waits for constrained resources (e.g., buffers).  However, that is a 
property of application design, and applications that are killed will close 
their sockets, releasing resources.  Most application designers attempt to 
avoid deadlock in their designs by ensuring that there is a path to progress, 
even a slow one.


The deadlock you're suggesting in general does not exist -- it would be silly 
to wait for something that could never happen.  Instead, we wait for things 
that generally will happen (e.g., a TCP ACK) or a timeout, which would close 
the connection.  Notice that sbwait() is allowed to fail -- if the connection 
is severed due to a timeout or RST, then it returns immediately with an error.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd

Re: a question regarding

2007-02-15 Thread Robert Watson

On Thu, 15 Feb 2007, Pascal Hofstee wrote:


On 1/31/07, Robert Watson <[EMAIL PROTECTED]> wrote:
If we do decide to go ahead with the ABI change, there are a number of 
other things that should be done simultaneously, such as changing the uid 
and gid fields to uid_t and gid_t.  I would very much like to see the ABI 
change happen, and the first step (breaking out kernel from user 
structures) has been done already as part of the MAC work.  The next step 
is to add routines that translate internal/external formats, which isn't 
hard, but requires a moderate pile of code to do (as well as great care 
:-).


Well .. i finally found some spare time to have a closer look at the 
"shm_segsz" issue ... and noticed there were actually a very limited number 
of direct uses of the shm_segsz struct member (26 lines in the entire 
/usr/src tree)


I have attached a patchset that should change shm_segsz to size_t. There 
were however 2 to 3 locations all regarding compat code (ibcs2, svr4 and 
COMPAT_43) where i opted to stay on the clear side and not touch anything, 
the rest was fairly straightforward as should be obvious from the diff. I 
checked to make sure no function prototypes changed anywhere.


Please have a look at the attached patch (available at 
http://callisto.offis.uni-oldenburg.de/shm_segsz-int2size_t.diff in case the 
attachment gets stripped off by the mailinglist software) and provide any 
feedback where appropriate.


Unfortunately, things are a bit more tricky.  The problem is not so much the 
API, where converting size_t/int is a relative non-event, rather, the ABI.  By 
changing the size of a field in a data structure, you may change the layout of 
the structure, and hence the offset of other fields.  This offset information 
is compiled into binaries that access the structure -- hence being part of the 
ABI.  On i386, the change from int to size_t doesn't modify the ABI, as both 
int and size_t are 32-bit.  However, on 64-bit platforms, int is 32-bit and 
size_t is 64-bit:


sledge:/tmp> uname -a
FreeBSD sledge.freebsd.org 7.0-CURRENT FreeBSD 7.0-CURRENT #898: Wed Feb 14 
14:20:16 UTC 2007 [EMAIL PROTECTED]:/h/src/sys/amd64/compile/SLEDGE 
amd64

sledge:/tmp> ./size_t
sizeof int: 4
sizeof size_t: 8

In practice, this means that all of the later fields in the data structure 
will be offset by 4 bytes.  This will affect any application that accesses 
later fields in the structure but isn't recompiled.  This is why DES and I 
have been discussing this change as requiring kernel compatibility code, which 
would provide new system calls working with the new layout, and retain old 
system calls working with the old layout.  So we'd need to provide a new 
shmctl() with the new structure, and an oshmctl() with the old layout.  While 
doing that, it makes sense to do all the other ABI-related things that we'd 
like to get out of the way, such as fixing the types in shm_perm.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nullfs and named pipes.

2007-02-16 Thread Robert Watson


On Thu, 15 Feb 2007, Kostik Belousov wrote:


On Thu, Feb 15, 2007 at 03:22:59PM +, Josef Karthauser wrote:

On Thu, Feb 15, 2007 at 02:57:50PM +0100, Jeremie Le Hen wrote:


Note that all processes within a jail can only intefere with processes 
from another jail or host as if they were on different machines.  This 
means they can communicate through PF_INET for instance but not PF_LOCAL.


You might think so!  However that's not what's going on here.

The named pipe/nullfs issue is nothing to do with jails.  It's just that 
nullfs is broken with respect to named pipes as I've previously reported. 
However with this patch:


cvs diff: Diffing .
Index: null_subr.c
===
RCS file: /home/ncvs/src/sys/fs/nullfs/null_subr.c,v
retrieving revision 1.48.2.1
diff -u -r1.48.2.1 null_subr.c
--- null_subr.c 13 Mar 2006 03:05:17 -  1.48.2.1
+++ null_subr.c 14 Feb 2007 00:02:28 -
@@ -235,6 +235,8 @@
xp->null_vnode = vp;
xp->null_lowervp = lowervp;
vp->v_type = lowervp->v_type;
+   if (vp->v_type == VSOCK || vp->v_type == VFIFO)
+   vp->v_un = lowervp->v_un;


I'm wondering is some reference counting needed there ?


Yes, I find this a bit worrying also, but I don't know enough about how nullfs 
works to reason about it.  What happens when a vnode in the bottom layer has 
its on-disk reference count drop to zero -- is the vnode in the top layer 
invalidated somehow?


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: a question regarding

2007-02-16 Thread Robert Watson

On Thu, 15 Feb 2007, Pascal Hofstee wrote:


On Thu, 2007-02-15 at 13:41 +, Robert Watson wrote:
Unfortunately, things are a bit more tricky.  The problem is not so much 
the API, where converting size_t/int is a relative non-event, rather, the 
ABI.  By changing the size of a field in a data structure, you may change 
the layout of the structure, and hence the offset of other fields.  This 
offset information is compiled into binaries that access the structure -- 
hence being part of the ABI.  On i386, the change from int to size_t 
doesn't modify the ABI, as both int and size_t are 32-bit.  However, on 
64-bit platforms, int is 32-bit and size_t is 64-bit:


sledge:/tmp> uname -a
FreeBSD sledge.freebsd.org 7.0-CURRENT FreeBSD 7.0-CURRENT #898: Wed Feb 14
14:20:16 UTC 2007 [EMAIL PROTECTED]:/h/src/sys/amd64/compile/SLEDGE
amd64
sledge:/tmp> ./size_t
sizeof int: 4
sizeof size_t: 8

In practice, this means that all of the later fields in the data structure 
will be offset by 4 bytes.  This will affect any application that accesses 
later fields in the structure but isn't recompiled.  This is why DES and I 
have been discussing this change as requiring kernel compatibility code, 
which would provide new system calls working with the new layout, and 
retain old system calls working with the old layout.  So we'd need to 
provide a new shmctl() with the new structure, and an oshmctl() with the 
old layout.  While doing that, it makes sense to do all the other 
ABI-related things that we'd like to get out of the way, such as fixing the 
types in shm_perm.


I understand ... i'll leave this up to you guys .. you have obviously a lot 
more hands on experience in these kinds of matters :)


Well -- don't let this discourage you from working on it, I'm just pointing 
out that there are some more details to work on before it will be done :-).
I'm happy to advise further as the work moves along, but unfortunately don't 
have time to do it myself at this point.  It's something I would very much 
like to see happen, though!


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nullfs and named pipes.

2007-02-19 Thread Robert Watson


On Sun, 18 Feb 2007, Josef Karthauser wrote:


On Fri, Feb 16, 2007 at 04:36:56PM +0200, Kostik Belousov wrote:

   cvs diff: Diffing .
   Index: null_subr.c
   ===
   RCS file: /home/ncvs/src/sys/fs/nullfs/null_subr.c,v
   retrieving revision 1.48.2.1
   diff -u -r1.48.2.1 null_subr.c
   --- null_subr.c 13 Mar 2006 03:05:17 -  1.48.2.1
   +++ null_subr.c 14 Feb 2007 00:02:28 -
   @@ -235,6 +235,8 @@
xp->null_vnode = vp;
xp->null_lowervp = lowervp;
vp->v_type = lowervp->v_type;
   +   if (vp->v_type == VSOCK || vp->v_type == VFIFO)
   +   vp->v_un = lowervp->v_un;


I'm wondering is some reference counting needed there ?


Yes, I find this a bit worrying also, but I don't know enough about how 
nullfs works to reason about it.  What happens when a vnode in the bottom 
layer has its on-disk reference count drop to zero -- is the vnode in the 
top layer invalidated somehow?


Vnode reclamation from lower layer cannot do anithing for corresponding 
nullfs vnode, but that vnode has reference from nullfs vnode. On the other 
hand, can forced unmount proceed for lower layer ?


Does know of any reason why I can't commit this as it is, at least for now. 
It doesn't appear that it would break anything that works currently, and in 
its current form it at least fixes named pipe functionality for the kinds of 
cases that people would want to use it.


Well, the worry would be that you would be replacing a clean error on failure 
with an occasional panic, the normal symptom of a race condition.


I think I'm alright with the VFIFO case above, but I'm quite uncomfortable 
with the VSOCK case.  In particular, I suspect that if the socket is closed, 
v_un will be reset in the lower layer, but continue to be a stale pointer in 
the upper layer, leading to accessing free'd or re-allocated kernel memory 
resulting in much badness.  I've noticed tested this, but you might give it a 
try and see what happens.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nullfs and named pipes.

2007-02-19 Thread Robert Watson

On Mon, 19 Feb 2007, Robert Watson wrote:


On Sun, 18 Feb 2007, Josef Karthauser wrote:

Well, the worry would be that you would be replacing a clean error on 
failure with an occasional panic, the normal symptom of a race condition.


I think I'm alright with the VFIFO case above, but I'm quite uncomfortable 
with the VSOCK case.  In particular, I suspect that if the socket is closed, 
v_un will be reset in the lower layer, but continue to be a stale pointer in 
the upper layer, leading to accessing free'd or re-allocated kernel memory 
resulting in much badness.  I've noticed tested this, but you might give it 
a try and see what happens.


Bad typing day.  Should read "not tested this".  In any case, you get the 
idea: the problem here is a potential coherency issue on contents of v_un 
between the two file system layers.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nullfs and named pipes.

2007-02-19 Thread Robert Watson


On Mon, 19 Feb 2007, Robert Watson wrote:


On Mon, 19 Feb 2007, Robert Watson wrote:


On Sun, 18 Feb 2007, Josef Karthauser wrote:

Well, the worry would be that you would be replacing a clean error on 
failure with an occasional panic, the normal symptom of a race condition.


I think I'm alright with the VFIFO case above, but I'm quite uncomfortable 
with the VSOCK case.  In particular, I suspect that if the socket is 
closed, v_un will be reset in the lower layer, but continue to be a stale 
pointer in the upper layer, leading to accessing free'd or re-allocated 
kernel memory resulting in much badness.  I've noticed tested this, but you 
might give it a try and see what happens.


Bad typing day.  Should read "not tested this".  In any case, you get the 
idea: the problem here is a potential coherency issue on contents of v_un 
between the two file system layers.


For some reason I was thinking of v_fifoinfo as being stable after it is 
initialized, but in fact, it is not, as it can be free'd later.  Also, the 
layers could become out of sync following a reboot.  So in conclusion, I think 
the fifo part of the patch also suffers from the same problem.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Progress on scaling of FreeBSD on 8 CPU systems

2007-02-25 Thread Robert Watson

On Sun, 25 Feb 2007, Kris Kennaway wrote:


On Sat, Feb 24, 2007 at 10:00:35PM -0700, Coleman Kane wrote:

What does the performance curve look like for the in-CVS 7-CURRENT tree 
with 4BSD or ULE ? How do those stand up against the Linux SMP scheduler 
for scalability. It would be nice to see the comparison displayed to see 
what the performance improvements of the aforementioned patch were realized 
to. This would likely be a nice graphics for the SMPng project page, BTW...


There are graphs of this on Jeff's blog, referenced in that URL. Fixing 
filedesc locking makes a HUGE difference.


I think the real message of all this is that our locking strategy is basically 
pretty reasonable for the paths exercised by this (and quite a few) workloads, 
but our low-level scheduler and locking primitives need a lot of refinement. 
The next step here is to look at the impact of these changes (individually and 
together) with other hardware configurations and other workloads.  On the 
hardware side, I'd very much like to see measurements done on that rather 
nasty generation of Intel Xeon P4's where the costs of mutexes were 
astronomically out of proportion with other operation costs, which 
historically has heavily pessimized ULE due to the additional locking it had 
(don't know if this still applies).


It would be really great if we could find "workload owners" who would maintain 
easy-to-run benchmark configurations and also run them regularly on a fixed 
hardware configuration over a long time publishing results and testing 
patches.  Kris has done this for SQL benchmarks to great effect, giving a nice 
controlled testing environment for a host of performance-related patches, but 
SQL is not the be-all and end-all of application workloads, so having others 
do similar things with other benchmarks would be very helpful.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Progress on scaling of FreeBSD on 8 CPU systems

2007-02-25 Thread Robert Watson

On Sun, 25 Feb 2007, Martin Blapp wrote:

It would be really great if we could find "workload owners" who would 
maintain easy-to-run benchmark configurations and also run them regularly 
on a fixed hardware configuration over a long time publishing results and 
testing patches.  Kris has done this for SQL benchmarks to great effect,


I'm interested in such a workload test. At my job we run various other 
servers which have a classic virus/antispam environment. And unfortunatly 
clamd behaves not very well on FreeBSD (see mails to freebsd-threads), and 
this happens even on 2-CPU systems.


I think its not very difficult to make a scripted load test, with 
2/4/6/8/16/32 scans in parallel, with ULE or BSD scheduler.


As long as it is realistic and reproduceable, it sounds good to me.

Btw: what is the best method to profile a threaded application to see where 
it spends the most CPU time ?


Try looking at system pmc support -- using system pmcs, you can profile a 
variety of factors (including CPU use, cache misses, etc) across the whole 
system (kernel and application), so it's a really neat tool.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: inconsistency in using vn_fullpath1()

2007-03-04 Thread Robert Watson


On Sun, 4 Mar 2007, Divacky Roman wrote:

I noticed that kern___getcwd() calls vn_fullpath1() with Giant held like 
this:


mtx_lock(&Giant);
FILEDESC_LOCK(fdp);
   error = vn_fullpath1(td, fdp->fd_cdir, fdp->fd_rdir, tmpbuf,
   &bp, buflen);
   FILEDESC_UNLOCK(fdp);
   mtx_unlock(&Giant);

on the other hand vn_fullpath() calls it without Giant held like this:

FILEDESC_LOCK(fdp);
   error = vn_fullpath1(td, vn, fdp->fd_rdir, buf, retbuf, MAXPATHLEN);
   FILEDESC_UNLOCK(fdp);

I dont see much difference in the callings so I wonder if holding Giant is 
necessary when calling vn_fullpath1(). Because we either - do one unecessary 
locking operation or unsufficiently lock it.


thnx for explaining to me and possibly fixing it.


I suspect that the Giant acquisition there is a conservative acquisition based 
on VFS not having been MPSAFE, and can be removed.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: investigation of Giant usage in kernel

2007-03-05 Thread Robert Watson

On Sun, 4 Mar 2007, Divacky Roman wrote:

I looked at where Giant is held in the kernel and I found these interesting 
things:


1) in fs/fifofs/fifo_vnops.c we lock Giant when calling sorecieve()/sosend() 
this is a bandaid for fixing a race that doesnt have to exist anymore. ie. 
it needs some testing and can be remvoed


Hmm.  I think that conclusion is a bit premature.  Per our conversation on 
IRC, the workaround was added back prior to a release due to our being unable 
to resolve a very difficult to debug race condition.  There is no evidence the 
race doesn't exist anymore: what is needed is testing to determine if it does 
or not.  The race condition occurred under high make -j load on SMP; FYI, make 
uses a fifo to implement a concurrency limiting token scheme in order to bound 
total simultaneous jobs despite many make instances running.


I've CC'd Kris since I know that he was able to reproduce the problem, so 
might be able to provide advice on how to do so.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Fwd: user-space locks

2007-03-09 Thread Robert Watson


On Fri, 9 Mar 2007, Kip Macy wrote:

Do you think that the umtx KPI may have reached the appropriate level of 
maturity for writing up a man page? The KSE equivalent has had a substantive 
man page for quite some time. I would be more than happy to do any of the 
necessary technical copy-editing for the English.


At this point I think you may be the only person well acquainted with the 
KPI. Thanks.


During our threading discussion and code-reading session at the dev summit, 
the KSE man page was very helpful in understanding what was going on; having 
similar man pages for the libthr and umtx system calls would have been very 
helpful.  The interfaces are a lot less complicated, but man pages are very 
useful generally. :-)


Robert N M Watson
Computer Laboratory
University of Cambridge



   -Kip


Kip Macy wrote:

umtx


[EMAIL PROTECTED]:man -k umtx
umtx: nothing appropriate
[EMAIL PROTECTED]:

also if you use umtx I think you limit yourself to libthr.




On 3/9/07, Peter Holmes <[EMAIL PROTECTED]> wrote:

Does FreeBSD have anything similar to Futexes for
Linux.

Thanks,
Peter





Looking for earth-friendly autos?
Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.
http://autos.yahoo.com/green_center/
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to
"[EMAIL PROTECTED]"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: user-space locks

2007-03-09 Thread Robert Watson


On Sat, 10 Mar 2007, Vlad GALU wrote:


On 3/10/07, Kip Macy <[EMAIL PROTECTED]> wrote:

umtx


Is it safe/recommended to use spinlocks, like in jemalloc, for very small 
portions of code? I'm particularly interested in protecting writes to a 
couple of word sized ints on amd64, so the critical section wouldn't be 
longer than two assignments. Of course, I could use a lockless queue for my 
purposes, but I'm asking anyway.


I believe that the system malloc library is forced to use low level locking 
primitives because the pthread library depends on malloc.  I would suggest 
using the pthread mutex primitives where at all possible.  We might want to 
consider adding "adaptive" mutex support to the pthread libraries if we don't 
have it.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: One method to recover a lost root password

2007-03-17 Thread Robert Watson


On Fri, 16 Mar 2007, Garrett Cooper wrote:


David S. Madole wrote:

From Derekj Tourneo on Friday, March 16, 2007 4:46 PM

How I recovered a lost root password in FreeBSD

Luckily I did know one user name and it had no password. cgadmin going to 
the repair mode with CDROM/DVD option off the install menu, using the 
"live" CDROM filesystem gave me a root prompt Fixit#


I am confused why this topic came up on this list..


Because it is a way to hack BSD, obviously. :-)

(It's gotten less frequent, but it used to be that once every few months an 
e-mail would turn up on the list asking about how to hack FreeBSD systems...)


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Locking etc. (Long, boring, redundant, newbie questions)

2007-03-28 Thread Robert Watson

On Wed, 28 Mar 2007, Duane Whitty wrote:

I know this is trivial material but I believe I have finally come to an 
understanding about some things that I have been struggling with.  Maybe by 
putting this in words I can help some other newbies like myself who are 
hoping to eventually understand more advanced topics and contribute to 
FreeBSD. But please, correct me if you would on topics I still do not have 
correct. And thanks in advance for your help and patience!


We are working to improve our documentation in this area, but there are a 
number of issues relating to consistent use of terminology, some of which you 
are running into.


I have been reading non-stop as much as I can on synchronizing code 
execution.  In the FreeBSD docs, and in other docs, there is talk about 
mutexes that block and mutexes that sleep.  But this is not true is it. 
What is really meant is that depending on the type of mutex a thread is 
trying to acquire, the thread will either spin or it will sleep waiting for 
the lock to become available. Am I correct so far?


We basically have two kinds of mutexes: mutexes that only spin, and mutexes 
that may also sleep.  The former category is intended for use in synchronizing 
with fast interrupts and in the scheduler, and are called "spin mutexes".  The 
latter are intended for use in pretty much any other case, and are called 
"default mutexes".  In terms of implementation, the main behaviors are:


- Spin mutexes will never sleep, and hence can be used in a borrowed execution
  context during interrupt delivery, and likewise in the scheduler (such as in
  implementing sleep).  They disable interrupt delivery on the current CPU,
  and as such, are quite expensive to acquire on some architectures.

- Default mutexes may sleep, but by default are "adaptive", meaning that they
  will try spinning where it makes sense (i.e., when the holder of the lock is
  executing on another CPU).

Unless you are working in the scheduler or synchronizing in/with a fast 
interrupt handler, do not use spin mutexes.


Maybe this method of talking about mutexes happens because we don't 
manipulate lock structures directly, but rather use routines which acquire 
these locks for us in a consistent way.  So for instance when we call 
mtx_lock(&some_lock) and the lock is contested, our thread sleeps.  It gets 
put on a sleep queue waiting for the lock to become available so that we can 
safely access the kernel data structure which this mutex protects.  Is this 
accurate so far?


Yes, although for reasons of optimization, when contending a lock we may spin 
instead of sleeping if the thread holding the mutex is in the run state.  This 
avoids the overhead of putting the current thread to sleep and then waking it 
up later.  The benefits of this optimization are significant and easily 
measurable.


Along the same line as above, if we call mtx_lock_spin(&some_lock), and the 
lock is contested, our thread trying to acquire the lock spins.  This means 
we go into a tight loop monopolizing whichever CPU we are running on until 
the mutex becomes available.  But, if we spin for so long that we use up our 
quantum of time scheduled to us, a panic happens, because when we try to 
acquire a spin mutex, interrupts are turned off and so we can't do a context 
switch.  If a thread slept with interrupts disabled, then interrupts would 
stay disabled, which must not happen.


Pretty much.  We disable interrupts for the following reason: as spin mutexes 
may be acquired in fast interrupt handlers, they may be running on the stack 
of an existing thread, which may also hold locks.  As such, we can't allow the 
fast handler to acquire any locks that are either already held by the thread, 
or that might violate the lock order.  By restricting fast interrupt handlers 
to holding only spin locks, and by making spin locks disable interrupts, we 
prevent that deadlock scenario.  "Slow" interrupt handlers run in complete 
thread contexts, called ithreads, and are, as such, able to acquire default 
mutexes and sleep in the scheduler.  In FreeBSD 7.x, we have moved to a model 
in which device drivers can register both fast and threaded handlers, whereas 
in 6.x they had to pick one (and hence if they needed both, they had to pick 
the fast handler and use a task queue for threaded work).


I'm not very sure on this point, but is the above the reason why interrupt 
service routines, also known as Fast ISRs (?), use mtx_lock_spin() mutexes? 
They are supposed to be as fast as possible, and they don't context switch. 
As well, isn't it basically agreed upon that Fast ISRs are really the only 
place to use spin mutexes?  Maybe I'm way off here but it sure would be nice 
finally putting this one away.


Spin locks are, FYI, slower than default mutexes.  The reason is that they 
have to do more work: they not only perform an atomic operation/memory barrier 
to set the cross-CPU lock state, but they also have to disable interrupts 

Re: kevent and unix dgram socket problem

2007-04-03 Thread Robert Watson


On Tue, 3 Apr 2007, Jason Carroll wrote:


   // create the local address, bind & listen
   struct sockaddr_un addr;
   memset(&addr, 0, sizeof(addr));
   addr.sun_family = AF_LOCAL;
   strncpy(addr.sun_path, "usock", UN_PATH_LEN - 1);
   assert(bind(fd, (sockaddr*) &addr, sizeof(sockaddr_un)) == 0);
   assert(listen(fd, LISTENQ) == 0);


Try dropping the listen() call.  This is only required for stream sockets 
where you will then accept() new connections (returning new sockets). 
listen() should probably be returning an error, and apparently isn't.  What 
may be happening is that as of the point where listen() is called, future 
attempts to register kevents for "read" will be set up to detect whether 
accept() will return a socket or not, not whether there is data in the socket 
listen() has been called on.


I'll investigate adding a check so that an error would have been returned 
here.  This relates to another bug we have, in which if you register a kqueue 
event for "read" on a TCP socket before calling listen(), then the result is 
very different from what happens if you register the kqueue event after 
listen().


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Mac OS underlying FreeBSD - does it run Linux emulation?

2007-04-04 Thread Robert Watson


On Wed, 4 Apr 2007, Mike Meyer wrote:


In <[EMAIL PROTECTED]>, Christoph P. Kukulies <[EMAIL PROTECTED]> typed:

does  anyone know whether one can run Linux applications under the underlying
FreeBSD of the MAC OS (on an Intel Core Duo mini Mac)?


No, you can't. The "underlying" FreeBSD is userland code; not kernel code. 
The OSX kernel is based on Mach.


While it's true you can't run Linux binaries on Mac OS X, it's not for the 
reason you're suggesting, and your statement regarding FreeBSD kernel code in 
Mac OS X is simply incorrect.  The Mac OS X kernel, XNU, contains significant 
quantities of FreeBSD kernel source code, including a FreeBSD-derived VFS and 
network stack.  Other parts of the kernel, such as the scheduler and VM 
system, are derived from Mach.  While the FreeBSD-derived code has been 
significantly modified since it was originally forked, a lot of code moves 
backward and forward between the platforms: the FreeBSD audit subsystem is 
derived from the Mac OS X audit subsystem, and Mac OS X's smbfs and MAC 
Framework support are derived from FreeBSD.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Mac OS underlying FreeBSD - does it run Linux emulation?

2007-04-05 Thread Robert Watson


On Wed, 4 Apr 2007, Coleman Kane wrote:

While it's true you can't run Linux binaries on Mac OS X, it's not for the 
reason you're suggesting, and your statement regarding FreeBSD kernel code 
in Mac OS X is simply incorrect.  The Mac OS X kernel, XNU, contains 
significant quantities of FreeBSD kernel source code, including a 
FreeBSD-derived VFS and network stack.  Other parts of the kernel, such as 
the scheduler and VM system, are derived from Mach.  While the 
FreeBSD-derived code has been significantly modified since it was 
originally forked, a lot of code moves backward and forward between the 
platforms: the FreeBSD audit subsystem is derived from the Mac OS X audit 
subsystem, and Mac OS X's smbfs and MAC Framework support are derived from 
FreeBSD.


In addition to this, there have been examples of the Linux kernel hosted by 
Mach in the past (such as MkLinux). From my understanding, the only thing 
that prevents this from being realized is that nobody has sat down to 
actually write/port the code to do it.


I'm not familiar with the structural layout of MkLinux, but I would caution 
those looking at XNU to be aware that the kernel is a monolothic kernel, in 
which the BSD and IOKit parts run directly in the kernel address space managed 
by Mach, and not as tasks over Mach.  If MkLinux runs Linux in a task under 
the microkernel, then the structures are quite different.  Mach provides quite 
nice interfaces for implementation virtualization services, however, as Mach 
VM, thread, and task interfaces give applications a lot of control in setting 
up memory and trap handling -- much more so than the UNIX equivilents.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: msleep() on recursivly locked mutexes

2007-04-28 Thread Robert Watson

On Thu, 26 Apr 2007, Julian Elischer wrote:

Further the idea that holding a mutex "except for when we sleep" is a 
generally bright idea is also a bit odd to me.. If you hold a mutex and 
release it during sleep you probably should invalidate all assumptions you 
made during the period before you slept as whatever you were protecting has 
possibly been raped while you slept. I have seen too many instances where 
people just called msleep and dropped the mutex they held, picked it up 
again on wakeup, and then blithely continued on without checking what 
happened while they were asleep.


And interesting observation here is that FreeBSD 4.x and earlier were actually 
rife with exactly this sort of race condition, exercised only when under 
kernel memory pressure because sleeping occurred only then.  The explicit 
locking model we use now makes these races larger due increased concurrency 
(preemption, parallelism, etc), but also makes our assertion model stronger.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: msleep() on recursivly locked mutexes

2007-04-28 Thread Robert Watson


On Fri, 27 Apr 2007, Julian Elischer wrote:

Basically you shouldn't have a recursed mutex FULL STOP. We have a couple of 
instances in the kernel where we allow a mutex to recurse, but they had to 
be hard fought, and the general rule is "Don't". If you are recursing on a 
mutex you need to switch to some other method of doing things. e.g. 
reference counts, turnstiles, whatever.. use the mutex to create these but 
don't hold the mutex for long enough to need to recurse on it. A mutex 
should generally lock, dash-in and work, unlock. We have some cases where 
that is not true, but we are trying to get rid of them, not add more.


Most of these instances have to do with legacy code and data structures that 
involve high levels of code recursion and reentrance.  This is frequently an 
unreliable way to organize code anyway, and often involves other bugs that are 
less visible.  Over time, it's my hope that we can eliminate quite a few 
sources of remaining lock recursion, but there are some tricky cases involving 
repeated callbacks between layers that make that harder.  For example, in the 
socket/network pcb relationship, there's a lack of clarity on which side 
drives the overlapping state machines present in both sets of data structures. 
Over time, we're migrating towards a model in which the socket infrastructure 
is more of a "library" in service to network protocols that will drive the 
actual transitions, but in the mean time, lock recursion is required.


For any significantly rewritten or new code, I would expect that recursion 
would be avoided in almost all cases.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Experiences with 7.0-CURRENT and vmware.

2007-05-10 Thread Robert Watson


On Thu, 10 May 2007, Darren Reed wrote:


I'm using FreeBSD 7.0-CURRENT under vmware and there are a few issues.


Generally speaking, I would suggest sending this post to current@, not 
hackers@, since your comments largely have to do between differences between 
-STABLE and -CURRENT, making it ideal fodder for the mailing list dedicated to 
-CURRENT development.


First, time. hint.hw.acpi.disabled="1" This appears to make _no_ difference 
to time keeping on FreeBSD 7 and nor does it seem to have any impact on ACPI 
being loaded.  Do I need to recompile a new kernel without it or is there a 
new way to disable ACPI?


Have you tried hint.acpi.0.disabled=1 instead?  This is what appears in 
acpi(4), and is what is used in various existing boot loader bits when I grep 
around.


I should add that FreeBSD 6, with the same setting, is no better and that I 
need to run ntpdate every 5-10 minutes via crontab in order to keep good 
time (timekeeping is *really* bad.)  In one instance, i was watching "zpool 
iostat 1" and it appeared like the rows were muching up at a rate of 2 a 
second for a minute or so. How do I disable TSC timekeeping?  (NetBSD has 
this disabled by default in their kernels.)  Or is there somethign else I 
must do?


kern.timecounter.hardware: ACPI-fast
kern.timecounter.choice: TSC(800) ACPI-fast(1000) i8254(0) dummy(-100)

I believe you can simply set kern.timecounter.hardware=APCI-fast and it will 
do what you expect.  An interesting question is why it selects what is 
arguably the wrong one; a post to current@ might help resolve that.


Second, networking. Prior to FreeBSD-7, the driver to use inside vmware 
workstation was lnc.  It has worked and contiues to work great.  No 
problemo. FreeBSD-7 uses the "em" driver.  To put it simply, it sucks in 
comparison.  When things really get bad I start seeing "em0: watchdog 
timeout" messages on the console.  I looked and I don't see a lnc driver 
anywhere.  Is there another alternative (le?) driver that I can use in place 
of em, if so, how?


Has VMware changed what network hardware they emulate, and/or does VMware 
offer options about what virtual hardware to expose?  The if_em driver is for 
Intel ethernet cards; historically VMware has exposed a Lance ethernet device 
supported by the lnc(4) device driver; now that driver has indeed been 
replaced with le(4).  But if if_em is probing, it suggests a VMware change 
rather than a FreeBSD change, which you may be able to revert by telling it to 
expose a Lance-style device as opposed to an Intel device.


There was recently a rather large overhaul of the if_em driver in 7.x--I 
suggest e-mailing Jack Vogel (jfv) who is bother a FreeBSD committer and the 
Intel employee responsible for the if_em driver.  He may rightly point out 
that this isn't real hardware, rather, virtual hardware, and therefore not 
supported by Intel, but it might also be that the new version of the driver 
contains a bug, there's an ACPI issue of some sort, etc.


Apart from these two issues (which are very central ones :-(), I'm using 
FreeBSD in a 64bit vmware workstation environment quick successfully and ZFS 
is quite happy with all the kva :-) ZFS and zpools are working just as I 
expect, even if a bit slower due to vmware but I'm not cranking out 
benchmarks here.


Oh, and how do I fix ssh/rsh to do passwordless sessions? I'm trying to 
setup cron jobs to automate various tasks but there's this small hurdle 
called a password prompt that I can't seem to get rid of :-/


Generally speaking, this would be a discouraged configuration, but you will 
probably need to frob two settings: first, PermitEmptyPasswords in 
sshd_config, and second, force non-PAM validation by setting UsePAM to false. 
Instead of doing this, I would advise instead setting up an SSH key for the 
account, and not set a passphrase on the SSH key.  This doesn't require any 
changing of the global sshd configuration and should offer most of the same 
benefits.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: My PPP timer PR [nag]

2007-05-11 Thread Robert Watson


On Thu, 10 May 2007, Sergey Zaharchenko wrote:

bin/102747 has been sitting there for about 8 months, with no activity since 
it was assigned to brian@, all my mail to whom bounces [CC'd just in case].


The patch attached in the PR has been working for me since, so it not being 
fixed in the main tree isn't a problem with me. I just thought someone would 
benefit from it being committed...


In general, I would suggest sending this sort of thing to current@ and/or 
net@, but not [EMAIL PROTECTED]


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: SoC: Distributed Audit Daemon project

2007-05-28 Thread Robert Watson

On Sat, 26 May 2007, M. Warner Losh wrote:


In message: <[EMAIL PROTECTED]>
   Benjamin Lutz <[EMAIL PROTECTED]> writes:
: On Friday 25 May 2007 01:22:21 Alexey Mikhailov wrote:
: > [...]
: > 2. As I said before initial subject of this project was "Distributed
: > audit daemon". But after some discussions we had decided that this
: > project can be done in more general maner. We can perform distributed
: > logging for any user-space app.
: > [...]
:
: This sounds very similar to syslogd. Is it feasible to make dlogd a drop-in
: replacement for syslogd, at least from a syslog-using-program point of view?

I suspect that it is dealing with different data streams.  syslog is for 
programs sending text voluntarily.  auditd is for pulling audit trails out 
of the kernel for which the 'target' programs have no knowledge that the 
audit trails are being generated, let alone anyway to prevent it.


To possibly clarify a few points:

(1) A distributed audit daemon wouldn't eliminate the need for local daemons
that already manage log streams from various sources -- for example,
syslogd for syslog, auditd for audit, Apache generating its own log files,
etc.  The goal of the distributed audit/log daemon is to manage these log
files once log sources (such as auditd) are done with their logs.

(2) One of the trickiest parts of the design will be the interaction between
log sources and the audit daemon, so that log files can reliably change
hands from being managed by the source to the distributed log tool.  In
the event of a system crash/power loss/network partition/syslogd
crash/etc, we still want the log file to be picked up and synchronized.
Hence discussion of an explicit hand-off API rather than casually looking
in the same directory and hoping we get it right.

(3) Unlikely syslogd's network logging support, the goal here is secure,
reliable, batched delivery.  We've talked a bit about live audit record
delivery for IDS, but up front what we actually want to do is push the
same sorts of reliability guarantees already present for local trail
management to the distributed case.  We are looking at pushing existing
trail files over the network in a spooled way rather than shipping
individual records as they are generated for this reason.

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: p_vmspace in syscall

2007-07-04 Thread Robert Watson


On Mon, 2 Jul 2007, Nicolas Cormier wrote:

I am trying to map some data allocated in kernel to a user process (via a 
syscall). I need the proc's vmspace, but the value of p_vmspace of the input 
proc argument is NULL ... How can I get a valid vmspace ?


When operating in a system call, the 'td' argument to the system call function 
is the current thread pointer.  You can follow td->td_proc to get to the 
current process (and therefore, its address space).  In general, I prefer 
mapping user pages into kernel instead of kernel pages into user space, as it 
reduces the chances of leakage of kernel data to user space, and there are 
some useful primitives for making this easier.  For example, take a look at 
the sf_buf infrastructure used for things like socket zero-copy send, which 
manages a temporary kernel mapping for a page.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: p_vmspace in syscall

2007-07-04 Thread Robert Watson


On Wed, 4 Jul 2007, Nicolas Cormier wrote:


On 7/4/07, Robert Watson <[EMAIL PROTECTED]> wrote:


On Mon, 2 Jul 2007, Nicolas Cormier wrote:

I am trying to map some data allocated in kernel to a user process (via a 
syscall). I need the proc's vmspace, but the value of p_vmspace of the 
input proc argument is NULL ... How can I get a valid vmspace ?


When operating in a system call, the 'td' argument to the system call 
function is the current thread pointer.  You can follow td->td_proc to get 
to the current process (and therefore, its address space).  In general, I 
prefer mapping user pages into kernel instead of kernel pages into user 
space, as it reduces the chances of leakage of kernel data to user space, 
and there are some useful primitives for making this easier.  For example, 
take a look at the sf_buf infrastructure used for things like socket 
zero-copy send, which manages a temporary kernel mapping for a page.


Yes Roman told me in private that I'm wrong with the first argument, I 
thought that it was a proc*...


For my module I try to create a simple interface of a network allocator: 
User code should look like this:


unsigned id;
void* data = netmalloc(host, size, &id);
memcpy(data, "toto", sizeof("toto");
netdetach(data);

and later in another process:
void* data = netattach(host, id);
...
netfree(data);

netmalloc syscall does something like that:
- query distant host to allocate size
- receive an id from distant host
- malloc in kernel size
- map the buffer to user process (*)

netdetach syscall:
- send data to distant host

netattach syscall:
- get data from host
- malloc in kernel size
- map the buffer to user process (*)

* I already watch the function vm_pgmoveco
(http://fxr.watson.org/fxr/source/kern/kern_subr.c?v=RELENG62#L78)

I used pgmoveco as follow:

vm_map_t mapa = &proc->p_vmspace->vm_map,
size = round_page(size);
void* data = malloc(size,  M_NETMALLOC, M_WAITOK);
vm_offset_t addr = vm_map_min(mapa);
vm_map_find(mapa, NULL, 0, &addr, size, TRUE, VM_PROT_ALL,
VM_PROT_ALL, MAP_NOFAULT);
vm_pgmoveco(mapa, (vm_offset_t)data, addr);


With this I have a panic with vm_page_insert, I am not sure to understand 
the reason of this panic. I can't have multiple virtual pages on the same 
physical page ?


I think part of what you're running into here is a conceptual issue.  The 
pages allocated by malloc(9) belong to the kernel memory allocator, and are 
generally managed by the slab allocator.  While in principle you can map them 
into user space, you're going to have to set up a lot of book-keeping to 
properly free them again later, etc.  There are really two approaches you 
could be looking at:


(1) The user app allocates memory pages, perhaps using mmap() to map anonymous
memory or a file.  You then borrow those pages to use in-kernel, mapping
as required.

(2) Your kernel code allocates pages directly from the VM system, possibly
anonymous swap-backed pages from the page allocator, and maps them into
the kernel as required.

In either case, you'll need to think about address space limits, especially if 
the buffer is large -- the kernel address space on 32-bit systems is limited 
in size, since it shares the address space with a user application.  On 64-bit 
systems, this is not an issue.  You'll also need to make sure that the pages 
are both paged in and pinned in memory.  So before we talk about the details 
of the calls, we should think about how you plan to use the memory.


How much memory are we talking about -- enough to potentially run into kernel 
address space problems on 32-bit systems?  How long will the mappings persist 
-- do you map them into kernel for a brief period to fill them, and then leave 
them mapped into user space, or is this going to be a persistent shared 
mapping over a very long period of time?  Is the memory going to be pageable? 
How will it interact with things like mprotect(), msync(), etc?  What should 
happen if a the pages are released by the process using munmap() or by mapping 
over the region with mmap()?  What should happen in a child process if a 
process forks after netattach() and the parent calls netdatach()?  What 
happens if the process calls send() using a source address in the memory 
region, and zero-copy sockets are enabled, which would normally lead the page 
to be "borrowed" from the user process?


The underlying point here is that there is a model by which VM is managed -- 
pages, pagers, memory objects, mappings, address spaces, etc.  We can't just 
talk about pages being shared or mapped, we need to think about what is to be 
accomplished, and how to map that into the abstractions that already exist. 
Memory comes in different flavours, and generally speaking, you don't want to 
use pages that come from malloc(9) for sharing with userspace, so we need 

Re: p_vmspace in syscall

2007-07-04 Thread Robert Watson


On Wed, 4 Jul 2007, Nicolas Cormier wrote:

Currently I'm just trying to play with kernel/modules/vm ... I'm a newbie in 
kernel development and I just want to make a little prototype of an 
in-kernel network allocator. To start I only need to map a page (1024 bytes) 
from kernel to user process. This memory will never be used by the kernel 
between the call of net(malloc/attach) and the call of net(detach/free). So 
user and kernel will never use this page at the same time.


The underlying point here is that there is a model by which VM is managed 
-- pages, pagers, memory objects, mappings, address spaces, etc.  We can't 
just talk about pages being shared or mapped, we need to think about what 
is to be accomplished, and how to map that into the abstractions that 
already exist. Memory comes in different flavours, and generally speaking, 
you don't want to use pages that come from malloc(9) for sharing with 
userspace, so we need to think about what kind of memory you do need.


Thank you for your answer. Right now, I just want to do it as easily as 
possible, I don't know if this kind of project could interest other persons 
? It is ok for me to work more on it later on, if there is any further 
interest in doing it.


What do you mean by a network allocator?  How do you plan to use these pages?

If you haven't already, you should look at the zero-copy socket code in 
uipc_cow.c.  The main criticism of this approach has been that it uses 
copy-on-write, leading to potential IPIs for VM shootdowns, etc.  An 
alternative, more along the lines of IO-Lite, would be to allow user space to 
explicitly abandon the page on send, then map a new page to replace it.  In 
which case you might consider a variation on the send system call that accepts 
only page-aligned arguments and has the effect of unmapping the pages that are 
sent.  In neither case, on the transmit side, does this require an 
modification to the kernel memory allocator.


The receive side has always been more tricky to deal with...

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: add closefrom() call

2007-07-06 Thread Robert Watson


On Fri, 6 Jul 2007, LI Xin wrote:


Joerg Sonnenberger wrote:


On Wed, Jul 04, 2007 at 08:27:49PM -0400, Ighighi Ighighi wrote:

The closefrom() call, available in Solaris, is present in NetBSD since 
version 3.0. It is implemented with the F_CLOSEM fcntl() available since 
version 2.0.


You could also add a system call like it was done in DragonFly. That might 
be even simpler to implement.


Here is my implementation for FreeBSD.  Some difference between my and 
DragonFly's implementation:


- closefrom(-1) would be no-op on DragonFly, my version would close all
open files (From my understanding of OpenSolaris's userland
implementation, this is Solaris's behavior).
- my version closefrom(very_big_fd) would result in EBADF.  I am not
very sure whether this is correct, but it does not hurt for applications
that thinks closefrom() would return void.

To RW:  I have not found a suitable audit event for this, should I create a 
new event?


Solaris side-steps this issue by simply auditing the individual close() system 
calls.  My preference would be that we implement this in user space also, 
which would likewise generate a series of audit events, one for each system 
call.  The procfs optimization they use (I wonder -- is it really an 
optimization?) won't work for us, however.  Do you think that there's a strong 
motivation to provide a closefrom(2) system call, rather than a closefrom(3) 
library call?  This would let us neatly avoid the question you've posed :-).


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: add closefrom() call

2007-07-06 Thread Robert Watson


On Fri, 6 Jul 2007, Julian Elischer wrote:


Ed Schouten wrote:

* LI Xin <[EMAIL PROTECTED]> wrote:
Here is my implementation for FreeBSD.  Some difference between my and 
DragonFly's implementation:


 - closefrom(-1) would be no-op on DragonFly, my version would close all 
open files (From my understanding of OpenSolaris's userland 
implementation, this is Solaris's behavior).
 - my version closefrom(very_big_fd) would result in EBADF.  I am not very 
sure whether this is correct, but it does not hurt for applications that 
thinks closefrom() would return void.


Wouldn't it be better to just implement it through fcntl() and implement 
closefrom() in libc?


that's a possibility but I personally thing the huge difference in 
efficiency makes it worth putting it in the kernel. Quite a few programs I 
know of could really help their startup time with this as the first thing 
they do is "close the first 2000 file descriptors.


The Solaris implementation appears to implement two strategies:

(1) If procfs is mounted, list the fd directory to get a list of open fds,
then close those by number.

(2) If procfs is not mounted, query the number of open fds using the resource
limit interface, then sequentially close until the right number close.

Hence my question as to whether there's actually a big benefit or not -- do we 
think closefrom() is a performance-critical function?


Robert N M Watson
Computer Laboratory
University of Cambridge

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: add closefrom() call

2007-07-10 Thread Robert Watson


On Fri, 6 Jul 2007, LI Xin wrote:

To RW:  I have not found a suitable audit event for this, should I create a 
new event?


BTW, I can add an AUE_CLOSEFROM event to OpenBSM.  This may require a little 
work by event consumers who will now need to know about an additional source 
of implicit closes.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [panic]Fatal trap 12: page fault while in kernel mode

2007-08-02 Thread Robert Watson

On Tue, 31 Jul 2007, ytriffy wrote:

Trap 12 occured when I rebooted PC. Sending you backtrace. My system: amd64 
3200+ Venice, MB ECS nForce4 A939,Samsung 250GB and WD 250 GB, 2 memory 
banks 512MB each, videocard: Geforce 6600gt 128MB, NIC on realtek chip, 
sound card cirrus logic cs4281. It's very unstable, crashes happen every 
day, so I'm hoping you would say why(any hints what hardware may cause it). 
How to repeat it? I don't know. It happened once during reboot process.


In general, you want to report this sort of bug using the send-pr interface, 
or the gnats web submission form.  In the past, I've quite a few bug reports 
sent to hackers@ get lost because many FreeBSD developers don't subscribe to 
the list.  You could also consider sending it to stable@, since that's the 
mailing list for discussing 6-STABLE development.  FYI, this looks like a 
NULL-pointer dereference in the VFS shutdown code.


Robert N M Watson
Computer Laboratory
University of Cambridge



[EMAIL PROTECTED] /var]# uname -a
FreeBSD freelanc.dubki.ru  6.2-STABLE-200706 
FreeBSD 6.2-STABLE-200706

#1: Mon Jul 23 13:34:27 MSD 2007
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/DEBUGGER
KERN i386

[EMAIL PROTECTED] /usr/obj/usr/src/sys/DEBUGGERKERN]# kgdb kernel.debug
/var/crash/vmcore.3
kgdb: kvm_nlist(_stopped_cpus):
kgdb: kvm_nlist(_stoppcbs):
[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
<118>Jul 25 14:06:32 freelanc syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...6 5 3 1 0 0 done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
All buffers synced.


Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x4
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc058a4e0
stack pointer = 0x28:0xe9455c48
frame pointer = 0x28:0xe9455c58
code segment = base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 44922 (reboot)
panic: from debugger
Uptime: 2h45m36s
Dumping 1022 MB (2 chunks)
chunk 0: 1MB (159 pages) ... ok
chunk 1: 1022MB (261600 pages) 1006 990 974 958 942 926 910 894 878 862
846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606 590 574
558 542 526 510 494 478 462 446 430 414 398 382 366 350 334 318 302 286
270 254 238 222 206 190 174 158 142 126 110 94 78 62 46 30 14

#0 doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0 doadump () at pcpu.h:165
#1 0xc053d916 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2 0xc053dbdc in panic (fmt=0xc06f5278 "from debugger")
at /usr/src/sys/kern/kern_shutdown.c:565
#3 0xc045361d in db_panic (addr=-1067932448, have_addr=0, count=-1,
modif=0xe9455a74 "") at /usr/src/sys/ddb/db_command.c:438
#4 0xc04535b4 in db_command (last_cmdp=0xc0766784, cmd_table=0x0,
aux_cmd_tablep=0xc0728e90, aux_cmd_tablep_end=0xc0728e94)
at /usr/src/sys/ddb/db_command.c:350
#5 0xc045367c in db_command_loop () at /usr/src/sys/ddb/db_command.c:458
#6 0xc0455291 in db_trap (type=12, code=0) at
/usr/src/sys/ddb/db_main.c:222
#7 0xc0556a2b in kdb_trap (type=12, code=0, tf=0xe9455c08)
at /usr/src/sys/kern/subr_kdb.c:473
#8 0xc06cba6c in trap_fatal (frame=0xe9455c08, eva=4)
at /usr/src/sys/i386/i386/trap.c:828
#9 0xc06cb7d7 in trap_pfault (frame=0xe9455c08, usermode=0, eva=4)
at /usr/src/sys/i386/i386/trap.c:745
#10 0xc06cb3f1 in trap (frame=
{tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = -381330360, tf_esi =
-993547624, tf_ebp = -381330344, tf_isp = -381330380, tf_ebx = 0, tf_edx
= -992513384, tf_ecx = 4, tf_eax = -950651024, tf_trapno = 12, tf_err =
0, tf_eip = -1067932448, tf_cs = 32, tf_eflags = 590338, tf_esp = 0,
tf_ss = -992305712})
at /usr/src/sys/i386/i386/trap.c:435
#11 0xc06b8b1a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#12 0xc058a4e0 in cache_purgevfs (mp=0xc4d77298)
at /usr/src/sys/kern/vfs_cache.c:622
#13 0xc0591f29 in dounmount (mp=0xc4d77298, flags=524288, td=0xc62ce300)
at /usr/src/sys/kern/vfs_mount.c:1214
#14 0xc0597d0a in vfs_unmountall () at /usr/src/sys/kern/vfs_subr.c:2837
#15 0xc053d807 in boot (howto=0) at /usr/src/sys/kern/kern_shutdown.c:391
#16 0xc053d2a2 in reboot (td=0xc62ce300, uap=0xc7563770)
at /usr/src/sys/kern/kern_shutdown.c:169
#17 0xc06cbdbb in syscall (frame=
{tf_fs = 59, tf_es = 59, tf_ds = 59, tf_

Re: work praudit with tee & grep

2007-08-21 Thread Robert Watson


On Mon, 20 Aug 2007, sam wrote:


I am installed AUDIT
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/audit.html

# praudit /etc/auditpipe | grep "xxx"
&
# praudit /etc/auditpipe | tee file.log
&
# praudit /etc/auditpipe > file.log

this is not work
please help me


Vladimir,

Could you confirm that when you typed the command, you entered it as above 
instead of using /dev/auditpipe, the actual name of the audit device?  I think 
all the examples in the Handbook are correct, suggesting a transcription error 
either when you typed the command, or when you copied it to the e-mail.  If 
that's not it, could you be more specific about the failure mode?


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: work praudit with tee & grep

2007-08-21 Thread Robert Watson


On Tue, 21 Aug 2007, Eric Crist wrote:


thx this not working wite up buffer-pipe to 4096 bytes


Can I ask what is in the /etc/auditpipe file?


I believe what is meant is /dev/auditpipe, which provides a live event stream 
from the kernel's audit subsystem in FreeBSD 6.2 and later.  You can read more 
about the event audit facility here:


  http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/audit.html

The auditpipe(4) man page provides more detailed information on audit pipes, 
which, unlike the trail files in /var/audit, provide live streams in a lossy 
way, and allow applications to push filters into the kernel as to what events 
they are interested in hearing about.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: praudit parse with gnu grep

2007-08-28 Thread Robert Watson


On Wed, 22 Aug 2007, sam wrote:


Index: praudit.c
===
RCS file: /data/fbsd-cvs/ncvs/src/contrib/openbsm/bin/praudit/praudit.c,v
retrieving revision 1.1.1.3
diff -u -r1.1.1.3 praudit.c
--- praudit.c16 Apr 2007 15:36:57 -1.1.1.3
+++ praudit.c21 Aug 2007 14:26:43 -
@@ -107,6 +107,7 @@
 free(buf);
 if (oneline)
 printf("\n");
+fflush(stdout);
 }
 return (0);
 }


my big thanks this patch is working


Vladimir,

I've merged this change into OpenBSM, and it will appear in the next release.

Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Exclusive binary files

2007-09-02 Thread Robert Watson


On Sun, 2 Sep 2007, Max Laier wrote:


On Saturday 01 September 2007, Klaus Schneider wrote:

Well, anybody know a way to make the FreeBSD run just binaries that I have 
compiled?


For example: A hacker get a access to a shell into my server, and then it 
put a exploit code, but on the machine don't have a compiler, then he tries 
to put the compiled exploit... supose that I can't mount the users 
partition in "noexec" mode...


Anybode knows a solution for these?


IIRC csjp@ had some code to do this inside the MAC framework.  Storing 
hashes in extended attributes and only allowing execution of signed 
executables ... 
http://perforce.freebsd.org/fileLogView.cgi?FSPC=//depot/projects/trustedbsd/mac/sys/security/mac%5fchkexec/mac%5fchkexec.c 
... not sure what became of it, though.


I believe he also was able to verify other things, such as shared libraries, 
which for modern binaries is the obvious next step given that a fair chunk of 
code run in many programs isn't in the main program binary.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to get filename of an open file descriptor

2007-11-12 Thread Robert Watson

On Mon, 12 Nov 2007, Yuri wrote:

I am looking for functionality similar to Linux's /proc//fd/. I 
need to know what is the file name of an open file descriptor.


/proc//fd is missing on FreeBSD.

There's something called 'fdescfs'. In /dev/fd/ it shows the list of file 
descriptors. But they don't seem to be symbolic links to open files. And 
also it only shows FDs of the current process.


So why there's no /proc//fd in FreeBSD? And how do I work around this? 
Or should I just invest time and write a kernel patch implementing 
/proc//fd/?


You can give these patches a try:

  http://www.watson.org/~robert/freebsd/20071112-procstat.tgz

They reflect a work-in-progress procstat(1) tool, which inspects process state 
in various ways.  They are developed against 8-CURRENT, but likely still apply 
fairly easily to 7-STABLE.  They suffer various deficiencies, such as relying 
on the name cache in-kernel to generate file paths for mapped files and open 
file descriptors, so don't currently work with devfs nodes (for example). 
However, they may do what you need.  Any feedback would be most welcome.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to get filename of an open file descriptor

2007-11-12 Thread Robert Watson


On Mon, 12 Nov 2007, Yuri wrote:


Thank you for your response.

I attempted to compile procstat but procstat.h seems to be missing in tgz.


Yuri,

Indeed -- looks like I forgot to p4 add on my development box.  I've updated 
the tarball to now include procstat.h.  If there are any other problems, do 
let me know.


Robert N M Watson
Computer Laboratory
University of Cambridge



Yuri

Quoting Robert Watson <[EMAIL PROTECTED]>:


On Mon, 12 Nov 2007, Yuri wrote:


I am looking for functionality similar to Linux's /proc//fd/. I
need to know what is the file name of an open file descriptor.

/proc//fd is missing on FreeBSD.

There's something called 'fdescfs'. In /dev/fd/ it shows the list of file



descriptors. But they don't seem to be symbolic links to open files. And
also it only shows FDs of the current process.

So why there's no /proc//fd in FreeBSD? And how do I work around this?



Or should I just invest time and write a kernel patch implementing
/proc//fd/?


You can give these patches a try:

   http://www.watson.org/~robert/freebsd/20071112-procstat.tgz

They reflect a work-in-progress procstat(1) tool, which inspects process
state
in various ways.  They are developed against 8-CURRENT, but likely still
apply
fairly easily to 7-STABLE.  They suffer various deficiencies, such as relying

on the name cache in-kernel to generate file paths for mapped files and open

file descriptors, so don't currently work with devfs nodes (for example).
However, they may do what you need.  Any feedback would be most welcome.

Robert N M Watson
Computer Laboratory
University of Cambridge




--
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to get filename of an open file descriptor

2007-11-12 Thread Robert Watson


On Mon, 12 Nov 2007, Yuri wrote:

I looked at the patch. It retrieves file description information through 
'sysctl' calls with proprietary keys.


Isn't it better architecturally to expose the same information through 
procfs interface? At least from the filesystem level and up standard tools 
like ls/cat will be able to show the the same information instead of the 
specialized utility.


Over the last several years, we have been working to deprecate procfs as a 
means as the official means of querying information.  This has been for 
several reasons:


(1) procfs has been a major source of security vulnerabilities in every
operating system that implements it.  You need only look at the
vulnerability history of Solaris, Linux, and earlier versions of FreeBSD
to see the rather copious list of problems.  My belief is that this
derives from the fundamental misalignment of the concepts of processes and
files: their life cycles are very different, and there appear to be
particular problems relating to execve(), which may reflect a security
transition that has no logical equivilent revocation point for files.
Most of the vulnerabilities have to do with a failure to properly revoke
across execution of setuid binaries, and these vulnerabilities seem
remarkable persistent over time.

(2) procfs is an unstructured query mechanism--sysctl defines certain
atomicity properties, has a structured get/set model, and standardized
tools for querying simple data.  There are well-defined interfaces for
requesting the size of the data, etc.  Especially for objects that are
dynamic in nature, properly implementing buffering of potentially stateful
non-atomic queries in a synthetic file system is quite a mess.

(3) For non-human interpretation of data, such as monitoring programs,
visualization programs, debugging programs, etc, we can avoid marshaling
to text and then demarshaling all data on its way through the query
interface, which is a common source of bugs (especially when it comes to
parsing data that may be defined by untrusted processes, or even just
signed vs. unsigned data).

I agree there are real trade-offs being made here that can reasonably be 
debated, but procstat(1) is pretty consistent with our overall direction, and 
the reasons for the direction are relatively sound.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to get filename of an open file descriptor

2007-11-14 Thread Robert Watson

On Tue, 13 Nov 2007, Yuri wrote:


Thank you for letting me know about this new feature procstat.

But is there any workaround in 6.3? I need to port one package that needs to 
lookup file names by FDs to the current FreeBSD and need some solution now.


If the port uses a script to extract the data, a tool like lsof may do the 
trick.  However, I'm not sure there are any native APIs to query that data "as 
shipped" in 6.3.  Once I've had some reasonable feedback on procstat(1), I'll 
merge it into CVS and start it on the MFC route, but 6.3 is almost certainly 
too soon for it to ship as part of that release.  I don't know if there will 
be a 6.4 or not, but I would anticipate procstat(1) appearing in 7.1, and 
6-STABLE if there are requests.  procstat(1) mostly relies on existing 
sysctls, and adds two new ones for the purposes of exporting the file 
descriptor and VM information only, so it is a fairly straight forward MFC.


Robert N M Watson
Computer Laboratory
University of Cambridge



Yuri


Quoting Robert Watson <[EMAIL PROTECTED]>:



On Mon, 12 Nov 2007, Yuri wrote:


Thank you for your response.

I attempted to compile procstat but procstat.h seems to be missing in

tgz.

Yuri,

Indeed -- looks like I forgot to p4 add on my development box.  I've updated

the tarball to now include procstat.h.  If there are any other problems, do

let me know.

Robert N M Watson
Computer Laboratory
University of Cambridge



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to get filename of an open file descriptor

2007-11-14 Thread Robert Watson


On Wed, 14 Nov 2007, Skip Ford wrote:


Robert Watson wrote:

On Tue, 13 Nov 2007, Yuri wrote:


Thank you for letting me know about this new feature procstat.

But is there any workaround in 6.3? I need to port one package that needs 
to lookup file names by FDs to the current FreeBSD and need some solution 
now.


If the port uses a script to extract the data, a tool like lsof may do the 
trick.  However, I'm not sure there are any native APIs to query that data 
"as shipped" in 6.3.  Once I've had some reasonable feedback on 
procstat(1),


Well, the header file procstat.h is still missing from the tarball AFAICT so 
I don't know how many people are using it.


Whoops!  While you have obviously extracted or recreated the file, here's a 
URL for everyone else:


  http://www.watson.org/~robert/freebsd/20071115-procstat.tgz

Not sure what type of feedback you want, but I've been using it since you 
posted the link and it works as advertised.  I like being able to see the vm 
map without using procfs.


Yeah, that was pretty much the motivation.  I also plan to add the ability to 
dump signal handler disposition information.


I don't like having a procstat(1) utility along with a ps(1) utility. 
"procstat" seems short for process status as does "ps". Seems like 
procstat(1) should be a library with ps(1) the frontend, or ps(1) should be 
merged with procstat(1).


Plus, the name "procstat" sounds an awful lot like a certain part of the 
body that makes me uncomfortable in my chair.  Do you really want to spend 
the rest of your life asking people to see their procstat output? ;-)


You are more evil than previously understood. :-)

I agree regarding the duplication with ps(1) -- however, I'm generally of the 
opinion that ps(1) is overburdened as tools go, and that the goals are 
actually somehwat different--procstat(1) intentionally doesn't have the 
ability to generate a list of processes, for example, taking pids explicitly 
as the argument; likewise, historically ps(1) has not been interested in 
printing more than one line per process (although I think -h changed this). 
I'll do a bit more investigation as to how easily it can be wedged in, and do 
recognize the concern here.


But, it works fine and provides access to information that's not readily 
available by other means.


Thanks for the feedback (working fine is useful feedback),

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: A TrustedBSD "voluntary sandbox" policy.

2007-11-16 Thread Robert Watson


On Thu, 8 Nov 2007, Andrea Campi wrote:


On Wed, Nov 07, 2007 at 10:20:28PM -0500, [EMAIL PROTECTED] wrote:

I'm considering developing a policy/module for TrustedBSD loosely based on 
the systrace concept - A process loads a policy and then executes another 
program in a sandbox with fine grained control over what that program can 
do.

...
Please note that the 'policy' given on the command line is purely for the 
sake of example, no syntax or semantics have been decided upon.


Can't comment on the implementation or wider issues, but if you pursue this, 
please have a look at how MacOS Leopard does it (Seatbelt). Would be nice to 
converge on both syntax (a Schema dialect) and tools names / command line 
args--or if converging is not possible, at least know where and why and make 
a conscious decision.


FYI, Seatbelt is based on the Mac OS X port of the TrustedBSD MAC Framework, 
which while it has some significant changes (some now present in the 8-CURRENT 
branch of FreeBSD), may well be a good starting point.  Last I checked, the 
source for Seatbelt wasn't yet available, but there was hope it would be 
available in the near future.  A port of the policy to FreeBSD sounds like it 
would be very interesting to do, and might provide a nice starting point 
rather than having to write up a policy from scratch.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to get filename of an open file descriptor

2007-11-16 Thread Robert Watson


On Wed, 14 Nov 2007, Skip Ford wrote:

I agree regarding the duplication with ps(1) -- however, I'm generally of 
the opinion that ps(1) is overburdened as tools go, and that the goals are 
actually somehwat different--procstat(1) intentionally doesn't have the 
ability to generate a list of processes, for example, taking pids 
explicitly as the argument; likewise, historically ps(1) has not been 
interested in printing more than one line per process (although I think -h 
changed this). I'll do a bit more investigation as to how easily it can be 
wedged in, and do recognize the concern here.


I understand, and I sort of knew that from the beginning which is why I 
didn't provide feedback immediately.  I don't have a suggestion as to what I 
think should be done.


While procstat(1) currently takes a list of pids, I wouldn't be surprised if 
somebody adds code to list all processes, unless you block it.  I think it 
would be useful, especially since some of it's options produce single-line 
per pid output, such as credentials.


The two utilities do provide different information, it's just a little odd 
to have two utilities with basically the same name.  But I can't think of a 
more appropriate name for procstat(1).


FWIW, it looks like on Solaris, there are a series of psig(1), pstack(1), 
ptree(1), etc, tools for similar sorts of per-process inspection purposes.  I 
think I prefer bundling it into a single tool, but it's certainly a similar 
idea.  Maybe I should just rename procstat(1) to pinfo(1) and be done with it?


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to get filename of an open file descriptor

2007-11-18 Thread Robert Watson

On Sun, 18 Nov 2007, Skip Ford wrote:


Thomas Hurst wrote:

* Skip Ford ([EMAIL PROTECTED]) wrote:

It would be interesting to know for sure, though, if Solaris uses 
hardlinks and, if so, what their utility is called.


Nope.  They *do* use hardlinks in that they have 32bit wrappers in /usr/bin 
etc which dispatch to the relevent architecture, but the commands 
themselves are all seperate.


Indeed, and each utility is quite complex as compared to what ours would be 
if split.


I would just rename procstat(1) to pargs(1) then hardlink the others since 
ours are much less complex, but I'll take anything at this point.


As for the procstat(1) code itself, I've found one bug and have two 
sugestions:


1)  procstat_args() doesn't use a local variable and the buffer doesn't
get cleared between calls:

$ procstat -a 797
 PID ARGS
 797 audacious
$ procstat -a 795 797
 PID ARGS
 795 xterm -xtsessionID 11c0a801030001185368263000768
 797 audacious essionID 11c0a801030001185368263000768
$

Other option's functions are not similarly affected.

2)  I think it should handle requests for information about pid 0 instead of 
requiring at least pid 1 as it currently does.  Solaris suggests '/proc/*' 
to see all processes.  If we use `ps axopid=` then it aborts on the swapper 
(pid 0) immediately.


3)  Similarly, I think all of the sysctl(3) calls within the individual 
option functions (procstat_bin(), procstat_args(), etc.) should just go 
ahead and print the header and pid, then print any sysctl(3) error in the 
PID's row instead of erroring out.  We're either about to finish executing 
anyway if that was the only pid requested, or we're moving on to another pid 
that has nothing to do with the previous pid.  There's not really any reason 
to stop processing further pids.  This also affects attempting to list all 
pids since it currently stops processing pids as soon as one doesn't exist. 
A global error variable could just be incremented with every call and 
returned at process exit, that way it'd still be meaningful for single PIDs.


Actually, I think I've fixed all of the above in p4 with some changes 
yesterday; I'll do a new code drop for you to try:


  http://www.watson.org/~robert/freebsd/20071118-procstat.tgz

The kernel patch is identical, so you can just rebuild procstat.

Since this is a per-process tool, I think it needs to complete "procstat -c 
`ps axopid=`" if at all possible.


Yes, I agree.

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: How to get filename of an open file descriptor

2007-11-19 Thread Robert Watson

On Sun, 18 Nov 2007, Skip Ford wrote:


1)  procstat_args() doesn't use a local variable and the buffer doesn't
get cleared between calls:

$ procstat -a 797
PID ARGS
797 audacious
$ procstat -a 795 797
PID ARGS
795 xterm -xtsessionID 11c0a801030001185368263000768
797 audacious essionID 11c0a801030001185368263000768
$

Other option's functions are not similarly affected.


Indeed, it turned out I fixed another related bug but not this bug (that if 
there was no pathname returned, we would print the previous pathname).  The 
bug here is not so much the buffer handling, but rather, that the termination 
condition for the printing loop was wrong.  I coded it to look for a 
double-nul, but in fact, I just needed to loop through until I hit the limit 
of the data returned by sysctl.  So this should now also be fixed.  I'm going 
going to hack a bit more on procstat today and then put up a new drop.


The main missing feature right now, from my perspective, is signal 
information, but are there other pieces of detailed process information we 
could usefully be displaying?  I'm not sure I want to get into teaching 
procinfo about generating stack traces, which is something the Solaris tools 
can do, but perhaps there are other things we could be displaying.


Although it occurs to me that, in many ways, it would be nice to be able to 
generate a kernel stack trace for each user thread--often when debugging a 
hung process, that's one of the pieces of information I'd really like to have, 
as just seeing a generic wchan sleep on a lock is not very useful.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: peak mbuf stat missing ... and needed ...

2007-11-22 Thread Robert Watson


On Mon, 19 Nov 2007, Juri Mianovich wrote:

I am sorry to repost, but I cannot get any answer on this from -net or 
-questions ... is there any answer to getting this stat ?  (see below)


Juri,

I recognize the importance of your point, and can shed a little light on why 
things are the way they are.  In FreeBSD 5, Bosko Milekic introduced MBUMA, a 
UMA-backed caching slab allocator for mbufs and related data structures 
implemented using extensions to UMA(9).  One of the properties of UMA is that 
it's possible to allocate packet storage from CPU-local caches rather than 
going to a central pool protected by central locks.  Almost all allocations 
occur this way in practice, and only intermittently return to the centra 
allocator to eithe flush many freed packets back to the central cache, or pull 
more out; this occurs when there is an imbalance in allocation and freeing 
across CPUs, such as when a pipeline occurs in packet processing over a series 
of CPUs.  As a result, there is in fact no central tracking of how many mbufs 
are currently allocated -- the central zone knows about the number currently 
not present in the zone, but that just means they are in either a per-CPU 
cache or in use, not that they are actually allocated.


The notion of peak allocation is obviously a very important one for precisely 
the reasons you identify.  The question is how best to provide it without 
seriously impacting performance *or* providing one that is potentially quite 
inaccurate.  The "current" measure is based on taking a non-atomic snapshot of 
the global allocation stats and per-CPU stats, which means potentially it can 
be very slightly inconsistent.  We don't want to update the peak stat on every 
allocation, I think, as it would be a global measure, and involve dirtying 
global cache lines and so on.  Perhaps we could be maintaining that peak value 
whenever CPUs go back to the global pool from a per-CPU cache, since the right 
locks will be held anyway...  I don't see this being fixed for 6.3 or 7.0 
given their proximity, but I will investigate a fix for later releases.  Could 
you file a feature request PR on this, and forward me the PR receipt so I can 
take ownership of it?


Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge



-

FreeBSD 4.x, netstat -m:

70/4336/26624 mbufs in use (current/peak/max)

Never any doubt - if peak=max, I hit the limit.  Super
useful.  Furthermore, by watching the peak I can see
when I am getting close, rather than waiting for
denied requests to pile up after the fact.

FreeBSD 6.x, netstat -m:

524/826/1350 mbufs in use (current/cache/total)

So ... how do I see peak mbufs in FreeBSD 6.x ?

Thanks.



 

Get easy, one-click access to your favorites.
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: a strange/stupid question

2007-11-24 Thread Robert Watson


On Sat, 24 Nov 2007, Aryeh Friedman wrote:

Where do I find the main() [and/or other entery point] for the kernel I 
tend to understand stuff better if I follow the flow of exec from the start


Everyone else is suggesting very earlier in the boot, but I think the point 
where the kernel where things get interesting is in init_main.c in 
mi_startup().  The first thing you'll find there is that our kernel 
initialization is modular, where different modules (compiled in or loaded as 
klds) register an ordered set of boot events (see sys/kernel.h for the boot 
order).  You'll need to grep around the kernel to find the registration 
points for various subsystems.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Before & After Under The Giant Lock

2007-11-25 Thread Robert Watson


On Sun, 25 Nov 2007, binto wrote:

From what I read in "The Design and Implementation of the FreeBSD Operating 
System",said:


'However, most of the heavily used parts of the kernel have been moved out 
from under the giant lock, including much of the virtual memory system, the 
networking stack, and the filesystem.'


What the different "virtual memory system, the networking stack, and the 
filesystem." before under giant lock & after moved out from under giant 
lock?


I'm interest get deeper learn operating system, especially with FreeBSD..


Binto,

Most currently available operating systems began life on uniprocessor 
hardware, and therefore started out with a kernel synchronization model 
intended to address concurrency generated by interrupt handlers, sleeping on 
I/O, etc, but not true parallelism.  Typically, that synchronization model has 
involved "disabling interrupts", perhaps with "interrupt levels" to allow 
prioritization and selective preemption, and long with simple sleep locks 
intended to synchronize acticities such as I/O in which kernel thread sleeping 
may take place.  And, as you might guess, that's where BSD, and hence FreeBSD, 
started out.


So the first step in introducing SMP support into an operating system is often 
to introduce a "Giant lock" around the the majority of the kernel, allowing 
the kernel to effectively run on only one CPU at a time.  The intent there is 
to restore the assumptions of the UP kernel despite running on SMP hardware. 
This allows user programs to run on multiple CPUs at the same time, but 
prevents kernel parallelism.  This is relatively easy to introduce in a 
kernel, as it doesn't require changing the synchronization model for the 
entire kernel, just adding the Giant lock, modifying the probing/boot code, 
dealing with interrupt forwarding, dealing with TLB shootdowns, etc.  However, 
you don't get any parallelism win for the kernel at all, so if you have 
kernel-intensive workloads, you've gained nothing but overhead.


So the next stage in SMP support is to start to modify the kernel 
synchronization model so that parts of the kernel can start to run in parallel 
on multiple CPUs, ideally leading to speedup.  For FreeBSD, the "Giant lock" 
was introduced in FreeBSD 3, and then we started to break down that lock in 
FreeBSD 5.  In FreeBSD 6, the Giant lock is gone from most of the kernel most 
of the time, and in FreeBSD 7, it's far more gone.  There are still some edge 
cases where Giant is present -- less commonly used file systems, some older 
device drivers, etc, but almost all of the time when in the steady state, 
you're not seeing seeing Giant-protected code running.  It's worth noting that 
if you take 1/2 the kernel out from under Giant, you've improved the 
performance of the Giant-protected code as well, since it has less other code 
to contend with.


At this point, Giant is gradually becoming a lock around the tty, newbus, usb, 
and msdosfs code, and we're largely at diminishing returns in terms of making 
improvements in parallelism through removing Giant.  In FreeBSD 7, the focus 
was on improving parallelism rather than removing Giant, with improvements in 
locking primitives, the scheduler, and lock granularity.  For example, most of 
the improvement in MySQL performance in FreeBSD 7 can be put down to a small 
number of changes:


- Conversion to 1:1 threads from M:N threads.

- Massive efficiency improvemnts in the sx(9) sleep locking primitive.

- Introduction of an efficient non-sleeping rw(9) locking primitive.

- Conversion of the kernel file descriptor table lock to a lower overhead
  sx(9) primitive, as well as efficiency improvements through redoing the
  locking to distinguish read and write locking.

- Move to fine-grained locking in UNIX domain sockets.

- Significant scalability improvements in scheduling due to introducing
  the ule(4) scheduler.

In FreeBSD 8, I expect we'll see a continued focus on both locking granularity 
and improving opportunities for kernel parallelism by better distributing 
workloads over CPU pools.  This is important because the number of cores/chip 
is continuing to increase dramatically, so MP performance is going to be 
important to keep working on.  That said, the results to date have been 
extremely promising, and I anticipate that we will continue to find ways to 
better exploit multiprocessor hardware, especially in the network stack.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Before & After Under The Giant Lock

2007-11-25 Thread Robert Watson


On Sun, 25 Nov 2007, Christopher Chen wrote:


On Nov 25, 2007 12:05 PM, Christopher Chen <[EMAIL PROTECTED]> wrote:

On Nov 25, 2007 3:13 AM, Robert Watson <[EMAIL PROTECTED]> wrote:
At this point, Giant is gradually becoming a lock around the tty, newbus, 
usb, and msdosfs code, and we're largely at diminishing returns in terms 
of making improvements in parallelism through removing Giant.  In FreeBSD 
7, the focus was on improving parallelism rather than removing Giant, with 
improvements in locking primitives, the scheduler, and lock granularity. 
For example, most of the improvement in MySQL performance in FreeBSD 7 can 
be put down to a small number of changes:


- Conversion to 1:1 threads from M:N threads.


I enjoyed reading your overview of changes from FreeBSD 6 to 7 with regards 
to MP scalability, but I am a bit confused over this point--Doesn't the 
user still have the choice between libthread, which is M:N, and libthr, 
which is 1:1?


At some point during the 6.x days, it was considered advantageous to use 
libthr when running MySQL. Has the project decided to go with libthread 
after all?


Perhaps we're talking about entirely different things.


My apologies. I re-read your statement and it makes sense now.

I thought you were saying we were converting from 1:1 to M:N.

Sorry for any confusion!


No problem -- just to be clear: in 7, users can still choose between 
libpthread (m:n) and libthr (1:1), but the default is now libthr rather than 
libpthread, as libthr seemed to perform better in most if not all workloads of 
interest.  The libthr in 7.0 is an enhanced version of the libthr that was 
present in 6.x, although I don't have a list of the changes off-hand.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Before & After Under The Giant Lock

2007-11-26 Thread Robert Watson


On Sun, 25 Nov 2007, Stephen Montgomery-Smith wrote:

(Also when I run 4 threads with 2 cpus, each with hyperthreading, it goes 
2.5 to 3 times faster - surprising since hyperthreading gets quite bad press 
for its performance improvements - I should add that Linux didn't do at all 
well at taking advantage of hyperthreading, running at the same speed as 
with 2 threads.)


I've seen gradual improvements both in our ability to manage HTT and HTT 
itself.  One of the things that gave HTT a particularly bad reputation was 
that it was first introduced in the P4 Xeon CPU line from Intel, and that line 
had extortionately expensive synchronization instructions compared to either 
prior or later CPU lines.  As a result, even a small amount of synchronization 
(read: kernel locking) quickly ate any benefits of potential parallelism. 
More recent CPUs have managed to reduce "extionate" to "relatively 
expensive", which is much more manageable.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: find_symdef() returns invalid value.

2007-11-27 Thread Robert Watson


On Tue, 27 Nov 2007, vasanth raonaik wrote:


Is any one looking into this issue. Please mail me for more info.


Vasanth,

Could you file a problem report using send-pr on this problem?  FreeBSD 
hackers@ has a somewhat mixed subscription, and may not catch all the relevant 
developers, and spitting it into the PR machine may help find an owner for the 
issue.


Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge



On Nov 26, 2007 5:50 PM, vasanth raonaik <[EMAIL PROTECTED]> wrote:

Hello Hackers,

find_symdef() sometimes returns invalid value in def and a null in
defobjout. This causes any binary to recieve a segmentation fault and
cores. I have recieved a core for rcp because of this issue. This
issue was also been raised by someothers in the list.

http://lists.freebsd.org/pipermail/freebsd-current/2004-February/021698.html

I would like to know if anyone has debugged this issue. This doesnt
happen always. There is definetely a bug which needs to be fixed.
Please mail in your messages about the issue and how to fix it.

Thanks,
vasanth


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Updated procstat(1)

2007-11-27 Thread Robert Watson


Dear all,

I've updated the procstat(1) kernel patch and userland tool; the updated 
version can be found at:


  http://www.watson.org/~robert/freebsd/20071127-procstat.tgz

The new version includes a number of changes from the old version, including:

- A number of bug fixes and cleanliness improvements in the layout of output,
  etc, including fixes for bugs reported by Skip Ford.

- "-a" now means "all processes", and the old -a has become -c, and the old -c
  has become -s.  I.e., "All", "Command line" and "Security" rather than
  "Args" and "Credential".

- Threads and processes are now sorted by pid and then tid.  If processes are
  specified manually by pid, they are not sorted, although their threads will
  be.

- A new "-k" has been added, which prints the kernel thread stacks for threads
  in a process (although not swapped out or actively running threads).  This
  is extremely useful for answering questions of the sort "But *why* is the
  process blocked in UMA".  It has both a simple mode (-k_, which lists just
  kernel function names, and a slightly more detailed mode (-kk), which adds
  the offset into the function.

The last of these required new kernel changes, including an MD component. 
I've tested the MD parts only on i386, although I have quick hacks at what 
they should look like on amd64, arm, powerpc, sparc64, sun4v.  I don't promise 
these compile or work, but they might do.


I think procstat(1) is getting a lot closer to commitable state for 8-CURRENT, 
but further feedback would be most welcome (including reports of success on 
non-i386 architectures, and possibly patches to fix them).  For FreeBSD 
developers with P4 access, you can also check out


  //depot/user/rwatson/procstat/...

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Updated procstat(1)

2007-11-28 Thread Robert Watson


On Tue, 27 Nov 2007, Wesley Shields wrote:

Here's an updated patch to sys/amd64/amd64/db_trace.c (it's a diff against 
revision 1.81).  It changes "register rbp" to be "register_t rbp" and fixes 
the extra "W" in TD_IS_SWAPPED.  The kernel built fine after these changes. 
I'll test it out tomorrow.


I've gone ahead and applied that change in Perforce, and look forward to 
hearing back on the testing.


I think procstat(1) is getting a lot closer to commitable state for 
8-CURRENT, but further feedback would be most welcome (including reports of 
success on non-i386 architectures, and possibly patches to fix them).  For 
FreeBSD developers with P4 access, you can also check out


Thank you for this.  I think procstat(1) is going to be very useful.


If you can think of other process-inspection related things it could be doing, 
let me know.  The one thing I currently have in mind that I haven't made 
progress on is dumping the kernel signal state for the process (i.e., what 
signals have handlers, etc), which may be useful when debugging signal 
problems for an application.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Updated procstat(1)

2007-11-28 Thread Robert Watson


On Wed, 28 Nov 2007, Skip Ford wrote:


- "-a" now means "all processes",


Thanks. :-)  I'm a little surprised.  You seemed pretty dedicated to a 
per-process tool.


I was, but then I read your e-mail and became convinced that the first patch 
that would be submitted against procstat(1) would be a "-a" patch. :-)


I personally would change it to allow either the all flag or a list of pids, 
rather than "at least one of".  For pathname, command-line, and credential 
information, the output will likely not change between showing the pid in 
the "all" output and the "list" output so you're just outputting it twice. 
If one really wants the same pid to be output multiple times for threads, 
kstack, or file descriptors, then I'd expect "procstat -k 0 0 0 0 0" to be 
more useful for that.


I would think a mistake in usage has been made if a list of pids is 
specified along with the "all" flag.  But, no real harm is done by doing it 
the current way.


I think your argument is convincing, and have changed it so that only one of 
-a and a pidlist can be specified.  I've also tightened down the syntax 
checking on flags a bit more.


- A new "-k" has been added, which prints the kernel thread stacks for 
threads

  in a process (although not swapped out or actively running threads).  This
  is extremely useful for answering questions of the sort "But *why* is the
  process blocked in UMA".  It has both a simple mode (-k_, which lists just
  kernel function names, and a slightly more detailed mode (-kk), which adds
  the offset into the function.


This is excellent.  Does this absolutely have to depend on DDB and KDB?


Currently, yes, as stack(9) is conditional on DDB, and the MD bits of stack(9) 
are defined in db_trace.c (and in some cases, depend on DDB definitions, such 
as DDB types, although I think not critically so).  I've also been pondering 
breaking out stack(9) from DDB but haven't done that yet.  Maybe that will be 
today's task, as I'd like -k to work without the kernel debugger, as it has 
use significantly beyond kernel debugging.



In sys/amd64/amd64/db_trace.c on line 537, change "SWWAPPED" to "SWAPPED".


Fixed, thanks.

The newly introducted function stack_save_td() doesn't panic in the MD 
powerpc code like it does for other arches.  I have no idea if this is 
correct, it just doesn't match the others.


Indeed, and I've now fixed this, thanks!

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Updated procstat(1)

2007-11-28 Thread Robert Watson


On Wed, 28 Nov 2007, Skip Ford wrote:


Skip Ford wrote:

Robert Watson wrote:

On Wed, 28 Nov 2007, Skip Ford wrote:


- "-a" now means "all processes",


Thanks. :-)  I'm a little surprised.  You seemed pretty dedicated to a 
per-process tool.


I was, but then I read your e-mail and became convinced that the first 
patch that would be submitted against procstat(1) would be a "-a" patch. 
:-)


Yep, would've happened.  Now the first patch submitted will be a "-w 
interval" patch... :-)


I couldn't resist implementing a crude interval arg just for kicks. Here's 
the output of find(1) every second.  This is so cool:


Very neat :-).  If you like this, you'll love DTrace, which allows you to do 
all sorts of things along these lines.  I'll add a -w mode, but be aware that 
if you want to do the below, what you really want is DTrace :-), which allows 
you do do things like sample kernel stack traces on the clock timer, based on 
function invocations, etc, so you can do things like say "sample all the paths 
to a particular kernel function".  Now that John is updating DTrace again, I 
hope that we'll be seeing it in the 8-CURRENT source RSN.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Updated procstat(1)

2007-11-28 Thread Robert Watson


On Wed, 28 Nov 2007, Bert JW Regeer wrote:

Have the licensing issues been resolved with regards to DTrace? This is a 
feature I was looking forward to in 7.0-RELEASE but it had been delayed 
because of the licensing.


The problems had to do with non-alignment of the licensing vs. software 
boundaries, and I believe have been addressed by moving the boundaries a bit 
(i.e., making some more DTrace data structures opaque, etc).  The key point is 
that the CDDL parts will be compartmentalized as we do for other licenses, but 
that DTrace will still be loadable as a module with a GENERIC kernel, as is 
the case with ZFS already. Unfortunately, DTrace won't ship in 7.0, but we 
believe that it can be MFC'd to RELENG_7.  I've not checked in with John 
Birrell in a few days, but when I last checked he was in the throes of 
updating the code and cleaning up the integration of the Solaris parts, so my 
hope is that we'll see CVS progress soon.  I know a lot of people are very 
eager to see this happen.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Linux executable picks up FreeBSD library over linux one and breaks

2007-12-01 Thread Robert Watson


On Sat, 1 Dec 2007, Alexander Leidinger wrote:

Have a look at the search order of libs in linux. Correlate this with the 
fact that when in linux an access is done to e.g. /lib/libX.so.y which means 
that the linuxulator first looks if /compat/linux/lib/libX.so.y is there, 
and if it isn't it looks if /lib/libX.so.y is available.


AFAIR a work around is to add a link in /compat/linux/usr/lib/librt.so.1 -> 
/lib/librt.so.1


I want to do something like this in the FC4 port, but hadn't time to do it 
and test it so far.


It sounds like the real problem is that there are some cases where we don't 
want the Linuxulator to merge the underlying and Linux views of the file 
system -- we don't want the union of /compat/linux/lib and /lib, we just want 
/compat/linux/lib?


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


procstat(1) committed to CVS HEAD

2007-12-02 Thread Robert Watson


Dear all, (and FYI to hackers@ where I previousl sought feedback):

I've now committed procstat(1) to CVS.  I've found it to be quite a helpful 
debugging tool, am particularly pleased with -k/-kk, and would welcome 
feedback and ideas on further improving it.


Robert N M Watson
Computer Laboratory
University of Cambridge

-- Forwarded message --
Date: Sun, 2 Dec 2007 23:31:46 + (UTC)
From: Robert Watson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: cvs commit: src/usr.bin/procstat Makefile procstat.1 procstat.c
 procstat.h procstat_args.c procstat_basic.c procstat_bin.c
procstat_cred.c procstat_files.c procstat_kstack.c
procstat_threads.c procstat_vm.c

rwatson 2007-12-02 23:31:46 UTC

  FreeBSD src repository

  Added files:
usr.bin/procstat Makefile procstat.1 procstat.c procstat.h
 procstat_args.c procstat_basic.c
 procstat_bin.c procstat_cred.c
 procstat_files.c procstat_kstack.c
 procstat_threads.c procstat_vm.c
  Log:
  Add procstat(1), a process inspection utility.  This provides both some
  of the missing functionality from procfs(4) and new functionality for
  monitoring and debugging specific processes.  procstat(1) operates in
  the following modes:

-b  Display binary information for the process.
-c  Display command line arguments for the process.
-f  Display file descriptor information for the process.
-k  Display the stacks of kernel threads in the process.
-s  Display security credential information for the process.
-t  Display thread information for the process.
-v  Display virtual memory mappings for the process.

  Further revision and modes are expected.

  Testing, ideas, etc:cognet, sam, Skip Ford 
  Wesley Shields 

  Revision  ChangesPath
  1.1   +15 -0 src/usr.bin/procstat/Makefile (new)
  1.1   +114 -0src/usr.bin/procstat/procstat.1 (new)
  1.1   +252 -0src/usr.bin/procstat/procstat.c (new)
  1.1   +46 -0 src/usr.bin/procstat/procstat.h (new)
  1.1   +74 -0 src/usr.bin/procstat/procstat_args.c (new)
  1.1   +64 -0 src/usr.bin/procstat/procstat_basic.c (new)
  1.1   +68 -0 src/usr.bin/procstat/procstat_bin.c (new)
  1.1   +57 -0 src/usr.bin/procstat/procstat_cred.c (new)
  1.1   +303 -0src/usr.bin/procstat/procstat_files.c (new)
  1.1   +198 -0src/usr.bin/procstat/procstat_kstack.c (new)
  1.1   +138 -0src/usr.bin/procstat/procstat_threads.c (new)
  1.1   +130 -0src/usr.bin/procstat/procstat_vm.c (new)
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: procstat(1) committed to CVS HEAD

2007-12-03 Thread Robert Watson


On Mon, 3 Dec 2007, Andrew Thompson wrote:


On Sun, Dec 02, 2007 at 11:38:45PM +, Robert Watson wrote:


Dear all, (and FYI to hackers@ where I previousl sought feedback):

I've now committed procstat(1) to CVS.  I've found it to be quite a helpful 
debugging tool, am particularly pleased with -k/-kk, and would welcome 
feedback and ideas on further improving it.


I would like to give some feedback. I listed the threads of proc 12 which is 
intr,


# procstat -t 12
 PIDTID COMM CPU  PRI STATE   WCHAN
  12 13 intr   0   40 wait-
  12 14 intr   0   52 wait-
  12 100030 intr   0   16 wait-
  [...]
  12 100036 intr   0   36 wait-
  12 100037 intr   0   24 wait-

I had expected it to show the thread name such as 'irq14: ata0', is this 
possible (and a good thing to do)?


I just print out the 'comm' field returned by the generic sysctl, and I notice 
that top(1) with -S is now having the same problem as procstat(1).  I think 
this is a kernel bug in how we initialize or otherwise handle thread names, 
and fairly recent, as it's not present on my 7.0BETA2 box.  If I had to guess, 
it's that these are now 'true threads' under the single 'intr' proc, and that 
we're not exporting the thread name?



Great work on procstat :)


Thanks!

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: procstat(1) committed to CVS HEAD

2007-12-03 Thread Robert Watson


On Mon, 3 Dec 2007, Andrew Thompson wrote:


I would like to give some feedback. I listed the threads of proc 12 which
is intr,

# procstat -t 12
 PIDTID COMM CPU  PRI STATE   WCHAN
  12 13 intr   0   40 wait-
  12 14 intr   0   52 wait-
  12 100030 intr   0   16 wait-
  [...]
  12 100036 intr   0   36 wait-
  12 100037 intr   0   24 wait-

I had expected it to show the thread name such as 'irq14: ata0', is this 
possible (and a good thing to do)?


I just print out the 'comm' field returned by the generic sysctl, and I 
notice that top(1) with -S is now having the same problem as procstat(1). I 
think this is a kernel bug in how we initialize or otherwise handle thread 
names, and fairly recent, as it's not present on my 7.0BETA2 box. If I had 
to guess, it's that these are now 'true threads' under the single 'intr' 
proc, and that we're not exporting the thread name?


Changing to ki_ocomm gets the desired result for single and multithreaded 
processes.


I wonder if we should be renaming ki_ocomm to ki_tdcomm or ki_tdname?

Robert N M Watson
Computer Laboratory
University of Cambridge




Andrew

Index: procstat_threads.c
===
RCS file: /home/ncvs/src/usr.bin/procstat/procstat_threads.c,v
retrieving revision 1.1
diff -u -p -r1.1 procstat_threads.c
--- procstat_threads.c  2 Dec 2007 23:31:45 -   1.1
+++ procstat_threads.c  3 Dec 2007 06:06:46 -
@@ -82,8 +82,8 @@ procstat_threads(pid_t pid, struct kinfo
   kipp = &kip[i];
   printf("%5d ", pid);
   printf("%6d ", kipp->ki_tid);
-   printf("%-20s ", strlen(kipp->ki_comm) ?
-   kipp->ki_comm : "-");
+   printf("%-20s ", strlen(kipp->ki_ocomm) ?
+   kipp->ki_ocomm : "-");
   if (kipp->ki_oncpu != 255)
   printf("%3d ", kipp->ki_oncpu);
   else if (kipp->ki_lastcpu != 255)



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


  1   2   3   4   5   6   7   8   >