Given that NetBSD, OpenBSD and DragonFly (as well as Solaris and maybe others) it'd be nice and worthwhile to implement it too on FreeBSD.
The attached shar archive contains 4 possible implementations of it. One, a system call (the approach use by the other BSD's), available here as a loadable kernel module for quick testing. The remaining 3 others are library versions. One of them doesn't currently work since FreeBSD lacks a /proc/<pid>/fd/ that I tried to emulate with /dev/fd/, both via devfs(5) and fdescfs(5): they seem to lacks some types of file descriptors... Another just does what a lot of programs do: try close() on every possible file descriptor and the other uses sysctl(). The implementation was inspired by the DragonFly code but the semantics match Open/NetBSD's (EBADF vs EINVAL). Their code is available at: http://www.dragonflybsd.org/cvsweb/~checkout~/src/sys/kern/kern_descrip.c http://cvsweb.netbsd.org/bsdweb.cgi/~checkout~/src/sys/kern/kern_descrip.c Also included in the archive is a timing test along with a regression test borrowed from OpenSSH. It was successfully built and tested on FreeBSD 6.2-STABLE. There's code to make it work in -CURRENT. A sample run on a Pentium 4 1.7Ghz: $ make test Trying closefrom_syscall(3) with 58976 open file descriptors user 0.000000 sys 0.030874 total 0.030874 Trying closefrom_syscall(3) with 58976 closed file descriptors user 0.000000 sys 0.000008 total 0.000008 Trying closefrom_sysctl(3) with 58976 open file descriptors user 0.050941 sys 0.045333 total 0.096274 Trying closefrom_sysctl(3) with 58976 closed file descriptors user 0.000877 sys 0.000939 total 0.001816 Trying closefrom_brute(3) with 58976 open file descriptors user 0.037777 sys 0.043793 total 0.081570 Trying closefrom_brute(3) with 58976 closed file descriptors user 0.026666 sys 0.046383 total 0.073049 closefrom_sysctl() has a a worst-case scenario when a lot of files are open that may make it slower than closefrom_brute(). Implementations using /proc/<pid>/fd/ are also vulnerable to this. With no library version guaranteed to be faster, and because of the various reasons discussed in http://lists.freebsd.org/pipermail/freebsd-hackers/2007-July/thread.html I believe it'd be best to implement it as a system call (which can be done through fcntl() anyway). More info is included in the README. Any ideas, suggestions? Salutes, Igh
#!/bin/sh # This is a shell archive echo x closefrom mkdir -p closefrom > /dev/null 2>&1 echo x closefrom/Makefile sed 's/^X//' > closefrom/Makefile << 'SHAR_END' XSUBDIR = module test X X.include <bsd.subdir.mk> SHAR_END echo x closefrom/README sed 's/^X//' > closefrom/README << 'SHAR_END' XOVERVIEW X XThis tarball contains 4 possible implementations of closefrom(). XThe first, a system call, is located in ./module/syscall.c and is Xavailable as a kernel module for quick testing. X XBoth NetBSD >= 3.0 and DragonFly >= 1.4 implement it as a system call. XIn NetBSD, it uses the F_CLOSEM fcntl(), available since version 2.0. X XThe second, implemented with the kern.file sysctl(), is available Xon both FreeBSD >= 5.0 and DragonFly >= 1.2. Dynamic memory should be Xallocated for an array of "struct xfile" structures that describes each Xopen file descriptor open file descriptor _for every running process_ in Xthe system...! (Note: the sysctl(3) manpage should be patched to reflect Xthe current behaviour since FreeBSD 5.0: it should mention struct xfile). XIn my system, the size of this structure is 52 bytes, so it could fail Xon systems that setup a larger kern.maxfiles. This function would be Xcleaner to implement in NetBSD which has an (undocumented) kern.file2 Xthat lets you work with a specific pid instead by passing KERN_FILE_BYPID. X XThe third is the usual brute force approach that uses getdtablesize(), Xused for reference on the approach most applications take. X XThe fourth tries to do what some implementations (including Solaris') do Xby browsing /proc/<pid>/fd/ but using /dev/fd/. Unfortunately, it doesn't Xwork because neither devfs(5) nor fdescfs(5) seem to include duplicated Xfile descriptors, sockets and maybe others. X X-o- X XIt was successfully built and tested on FreeBSD 6.2-STABLE (as of XSept, 18 2007), though code that should work on -CURRENT is present X(namely, the new FILEDESC_S[UN]LOCK macros). X XTo try the implementations, run these commands as follows: X Xcd module Xmake Xsudo make load Xcd .. Xcd test Xmake Xmake check Xmake test X XFor repeated testing of any of the implementations you may run: X./closefrom syscall X./closefrom sysctl X./closefrom brute X SHAR_END echo x closefrom/module mkdir -p closefrom/module > /dev/null 2>&1 echo x closefrom/test mkdir -p closefrom/test > /dev/null 2>&1 echo x closefrom/test/closefrom.c sed 's/^X//' > closefrom/test/closefrom.c << 'SHAR_END' X/* X * Copyright (c) 2007 by Ighighi X * All rights reserved. X * X * Redistribution and use in source and binary forms, with or without X * modification, are permitted provided that the following conditions X * are met: X * X * 1. Redistributions of source code must retain the above copyright X * notice, this list of conditions and the following disclaimer. X * 2. Redistributions in binary form must reproduce the above copyright X * notice, this list of conditions and the following disclaimer in the X * documentation and/or other materials provided with the distribution. X * X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, X * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY X * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL X * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, X * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, X * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; X * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, X * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR X * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF X * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. X */ X X#include <dirent.h> X#include <err.h> X#include <errno.h> X#include <fcntl.h> X#include <limits.h> X#include <stdio.h> X#include <stdlib.h> X#include <string.h> X#include <unistd.h> X#include <sys/types.h> X#include <sys/param.h> X#include <sys/file.h> X#include <sys/resource.h> X#include <sys/time.h> X#include <sys/sysctl.h> X X#include <sys/syscall.h> X#include <sys/module.h> X X#define DEBUG X Xstatic void Xusage(const char *argv0) X{ X fprintf(stderr, "Usage: %s syscall|sysctl|brute|devfd\n" X "Usage: %s check\n", argv0, argv0); X exit(1); X} X Xstatic int (*closefrom)(int); /* pointer to closefrom_xxx() */ X X/* X * LKM version of closefrom() X */ X Xstatic int syscall_num; X Xstatic void Xfind_module(void) X{ X struct module_stat stat; X int modid; X X modid = modfind("closefrom"); X if (modid == -1) X err(1, "modfind(closefrom)"); X X stat.version = sizeof(stat); X if (modstat(modid, &stat) == -1) X err(1, "modstat()"); X X syscall_num = stat.data.intval; X} X Xstatic int Xclosefrom_syscall(int lowfd) X{ X return (syscall(syscall_num, lowfd)); X} X X/* X * This version uses the kern.file sysctl() X */ Xstatic int Xclosefrom_sysctl(int lowfd) X{ X int mib[2] = { CTL_KERN, KERN_FILE }; X struct xfile *files = NULL; X pid_t pid = getpid(); X size_t fsize; X int i, nfiles; X X if (lowfd < 0) { X errno = EBADF; X return (-1); X } X X for (;;) { X if (sysctl(mib, 2, files, &fsize, NULL, 0) == -1) { X if (errno != ENOMEM) X goto bad; X else if (files != NULL) { X free(files); X files = NULL; X } X } else if (files == NULL) { X files = (struct xfile *) malloc(fsize); X if (files == NULL) X return (-1); X } else X break; X } X X /* XXX This structure may change */ X if (files->xf_size != sizeof(struct xfile) || X fsize % sizeof(struct xfile)) X { X errno = ENOSYS; X goto bad; X } X X nfiles = fsize / sizeof(struct xfile); X X for (i = 0; i < nfiles; i++) X if (files[i].xf_pid == pid && files[i].xf_fd >= lowfd) X if (close(files[i].xf_fd) < 0 && errno == EINTR) X goto bad; X X free(files); X return (0); X Xbad: X if (files != NULL) { X int save_errno = errno; X free(files); X errno = save_errno; X } X return (-1); X} X X/* X * This version iterates over all possible file descriptors >= lowfd X */ Xstatic int Xclosefrom_brute(int lowfd) X{ X int fd; X X if (lowfd < 0) { X errno = EBADF; X return (-1); X } X X for (fd = getdtablesize(); fd >= lowfd; fd--) X if (close(fd) < 0 && errno == EINTR) X return (-1); X X return (0); X} X X/* X * An example implementation using /dev/fd (other systems use /proc/<pid>/fd) X * Unfortunately, on FreeBSD, fdescf(5) doesn't include duplicated file X * descriptors and sockets. X */ Xstatic int Xclosefrom_devfd(int lowfd) X{ X struct dirent *d; X DIR *dir; X int fd; X X if (lowfd < 0) { X errno = EBADF; X return (-1); X } X X /* X * Close lowfd so we have a spare fd to use with /dev/fd X */ X close(lowfd++); X X if ((dir = opendir("/dev/fd")) == NULL) X return (-1); X X while ((d = readdir(dir)) != NULL) { X#ifdef DEBUG X printf("%s\n", d->d_name); X#endif X if (d->d_name[0] == '.') X continue; X fd = atoi(d->d_name); X if (fd >= lowfd && fd != dirfd(dir)) X if (close(fd) < 0 && errno == EINTR) X goto bad; X } X X (void)closedir(dir); X return (0); X Xbad: X { X int save_errno = errno; X (void)closedir(dir); X errno = save_errno; X return (-1); X } X} X Xstatic void Xtime_closefrom(int lowfd) X{ X struct rusage ru, rux; X struct timeval tv; X double usecs, ssecs; X X if (getrusage(RUSAGE_SELF, &ru) < 0) X err(1, "getrusage()"); X if (closefrom(lowfd) < 0) X err(1, "closefrom()"); X if (getrusage(RUSAGE_SELF, &rux) < 0) X err(1, "getrusage()"); X X timersub(&rux.ru_utime, &ru.ru_utime, &tv); X usecs = ((double)tv.tv_sec + (double)tv.tv_usec / 1000000); X printf("user\t%f\t", usecs); X timersub(&rux.ru_stime, &ru.ru_stime, &tv); X ssecs = ((double)tv.tv_sec + (double)tv.tv_usec / 1000000); X printf("sys\t%f\t", ssecs); X usecs += ssecs; X printf("total\t%f\n", usecs); X} X Xstatic void Xtry(int (*xclosefrom)(int), const char *str) X{ X int fd, lowfd, maxfd; X X lowfd = dup(STDIN_FILENO); X maxfd = getdtablesize(); X for (fd = 1; fd < maxfd; fd++) X if (dup(STDIN_FILENO) < 0) X break; X X closefrom = xclosefrom; X printf("Trying %s(%d) with %d open file descriptors\n", str, lowfd, fd); X time_closefrom(lowfd); X X printf("Trying %s(%d) with %d closed file descriptors\n", str, lowfd, fd); X time_closefrom(lowfd); X printf("\n"); X} X Xint test(int (*)(int)); X Xint Xmain(int argc, char *argv[]) X{ X if (argv[1] == NULL) X usage(argv[0]); X X if (!strcmp(argv[1], "check")) { X find_module(); X printf("testing closefrom_syscall():\t%s\n", X test(&closefrom_syscall) ? "failed" : "ok"); X printf("testing closefrom_sysctl():\t%s\n", X test(&closefrom_sysctl) ? "failed" : "ok"); X printf("testing closefrom_brute():\t%s\n", X test(&closefrom_brute) ? "failed" : "ok"); X } X else if (!strcmp(argv[1], "syscall")) { X find_module(); X try(&closefrom_syscall, "closefrom_syscall"); X } X else if (!strcmp(argv[1], "sysctl")) X try(&closefrom_sysctl, "closefrom_sysctl"); X else if (!strcmp(argv[1], "devfd")) X try(&closefrom_devfd, "closefrom_devfd"); X else if (!strcmp(argv[1], "brute")) X try(&closefrom_brute, "closefrom_brute"); X else X usage(argv[0]); X X return (0); X} X X/* X * NOTE: X * The following code was adapted from OpenSSH's X * openbsd-compat/regress/closefromtest.c X */ X X/* X * Copyright (c) 2006 Darren Tucker X * X * Permission to use, copy, modify, and distribute this software for any X * purpose with or without fee is hereby granted, provided that the above X * copyright notice and this permission notice appear in all copies. X * X * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES X * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF X * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR X * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES X * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN X * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF X * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. X */ X X#define NUM_OPENS 10 X X#define fail(str) \ X do { printf("%s\n", (str)); \ X return -1; } while(0) X Xint Xtest(int (*xclosefrom)(int)) X{ X int i, max, fds[NUM_OPENS]; X char buf[512]; X X for (i = 0; i < NUM_OPENS; i++) X if ((fds[i] = open("/dev/null", O_RDONLY)) == -1) X exit(0); /* can't test */ X max = i - 1; X X /* should close last fd only */ X xclosefrom(fds[max]); X if (close(fds[max]) != -1) X fail("failed to close highest fd"); X X /* make sure we can still use remaining descriptors */ X for (i = 0; i < max; i++) X if (read(fds[i], buf, sizeof(buf)) == -1) X fail("closed descriptors it should not have"); X X /* should close all fds */ X xclosefrom(fds[0]); X for (i = 0; i < NUM_OPENS; i++) X if (close(fds[i]) != -1) X fail("failed to close from lowest fd"); X X return 0; X} SHAR_END echo x closefrom/test/Makefile sed 's/^X//' > closefrom/test/Makefile << 'SHAR_END' XPROG = closefrom XNO_MAN = X XCFLAGS = -Wall -O2 X Xcheck: ${PROG} X @./${PROG} check X Xtest: ${PROG} X @./${PROG} syscall X @./${PROG} sysctl X @./${PROG} brute X X.include <bsd.prog.mk> SHAR_END echo x closefrom/module/Makefile mkdir -p closefrom/module > /dev/null 2>&1 sed 's/^X//' > closefrom/module/Makefile << 'SHAR_END' XKMOD = syscall XSRCS = syscall.c vnode_if.h X XCFLAGS += -Wall X Xreload: X @${MAKE} unload X @${MAKE} load X X.include <bsd.kmod.mk> SHAR_END echo x closefrom/module/syscall.c sed 's/^X//' > closefrom/module/syscall.c << 'SHAR_END' X/* X * Copyright (c) 2007 by Ighighi X * All rights reserved. X * X * Redistribution and use in source and binary forms, with or without X * modification, are permitted provided that the following conditions X * are met: X * X * 1. Redistributions of source code must retain the above copyright X * notice, this list of conditions and the following disclaimer. X * 2. Redistributions in binary form must reproduce the above copyright X * notice, this list of conditions and the following disclaimer in the X * documentation and/or other materials provided with the distribution. X * X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, X * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY X * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL X * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, X * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, X * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; X * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, X * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR X * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF X * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. X */ X X#include <sys/param.h> X#include <sys/file.h> X#include <sys/filedesc.h> X#include <sys/kernel.h> X#include <sys/proc.h> X#include <sys/syscallsubr.h> X#include <sys/sysent.h> X#include <sys/systm.h> X#include <sys/vnode.h> X#include <sys/module.h> X X/* X * Newer code in FreeBSD > 6.2 use shared/exclusive locks X */ X#ifndef FILEDESC_SLOCK X#define FILEDESC_SLOCK FILEDESC_LOCK_FAST X#define FILEDESC_SUNLOCK FILEDESC_UNLOCK_FAST X#endif X X/* X * kern_closefrom() X */ Xstatic int Xkern_closefrom(struct thread *td, int lowfd) X{ X struct filedesc *fdp; X int fd; X X /* X * Note: NetBSD uses EBADF and Dragonly uses (undocumented) EINVAL X */ X if (lowfd < 0) X return (EBADF); X X fdp = td->td_proc->p_fd; X X FILEDESC_SLOCK(fdp); X while ((fd = fdp->fd_lastfile) >= lowfd) { X FILEDESC_SUNLOCK(fdp); X if (kern_close(td, fd) == EINTR) X return (EINTR); X FILEDESC_SLOCK(fdp); X } X FILEDESC_SUNLOCK(fdp); X X return (0); X} X X/* closefrom() arguments */ Xstruct closefrom_args { X int fd; X}; X Xstatic int Xclosefrom(struct thread *td, void *args) X{ X struct closefrom_args *uap = (struct closefrom_args *)args; X X return (kern_closefrom(td, uap->fd)); X} X X/* closefrom() sysent[] */ Xstatic struct sysent closefrom_sysent = { X 1, /* number of arguments */ X closefrom /* implementing function */ X}; X X/* X * LKM stuff X */ X X/* offset in sysent[] where the syscall will be allocated */ Xstatic int offset = NO_SYSCALL; X Xstatic int Xload(struct module *module, int cmd, void *arg) X{ X int error = 0; X X switch (cmd) { X case MOD_LOAD: X uprintf("closefrom loaded at offset %d\n", offset); X break; X X case MOD_UNLOAD: X uprintf("closefrom unloaded from offset %d\n", offset); X break; X X default: X error = EOPNOTSUPP; X break; X } X X return (error); X} X XSYSCALL_MODULE(closefrom, &offset, &closefrom_sysent, load, NULL); SHAR_END exit
_______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"