Re: a BSD identd
In message , "Bria n F. Feldman" writes: >On 13 Jul 1999, Ville-Pertti Keinonen wrote: > >> >> gr...@freebsd.org (Brian F. Feldman) writes: >> >> > It's "out with the bad, in with the good." Pidentd code is pretty terrible >. >> > The only security concerns with my code were wrt FAKEID, and those were >> > mostly fixed (mostly meaning that a symlink _may_ be opened, but it won't >> > be read.) If anyone wants to audit my code for security, I invite them to. >> >> Did you mean to avoid reading through symlinks using the open + fstat >> method mentioned earlier in the thread? > >No, I meant to avoid opening a file the user couldn't, or reading from a dev. Why not actually store the fake ID in a symbolic link? That way you just do a readlink(), which would be safer, neater and faster than reading a file. A user can set up a fake ID with something like: ln -s "Warm-Fuzzy" .fakeid Ian To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: NFS problems due to getcwd/realpath
In message , Jan Conrad writes: >after wondering for two years why FreeBSD (2.2.x ... 3.2) might lock up >when an NFS server is down, I think I have found one reason for that (see >kern/12609 - I now know it doesn't belong to kern - sorry). > >It is the implementation of getcwd (src/lib/libc/gen/getcwd.c). When >examining the parent dir of a mounted filesystem, getcwd lstats every >directory entry prior to the mountpoint to find out the name of the >mountpoint (but it would only need the inodes's device to do a rough >check). This should no longer be an issue with FreeBSD 3.x, as the system normally uses the new _getcwd syscall. The old code is still in getcwd.c, but is only used if the syscall isn't present (e.g. if running a 3.x executable on a 2.2 system). We use the following patch on all our 2.2-stable machines, which works around the problem. This was submitted as PR bin/6658, but it wasn't committed, as a backport of 3.x's _getcwd (which never occurred) was considered to be a more appropriate change. Ian --- getcwd.c.orig Tue Jun 30 15:38:44 1998 +++ getcwd.cTue Jun 30 15:39:08 1998 @@ -36,6 +36,7 @@ #endif /* LIBC_SCCS and not lint */ #include +#include #include #include @@ -169,7 +170,28 @@ if (dp->d_fileno == ino) break; } - } else + } else { + struct statfs sfs; + char *dirname; + + /* +* Try to get the directory name by using statfs on +* the mount point. +*/ + if (!statfs(up[3] ? up + 3 : ".", &sfs) && + (dirname = rindex(sfs.f_mntonname, '/'))) + while((dp = readdir(dir))) { + if (ISDOT(dp)) + continue; + bcopy(dp->d_name, bup, dp->d_namlen+1); + if (!strcmp(dirname + 1, dp->d_name) && + !lstat(up, &s) && + s.st_dev == dev && + s.st_ino == ino) + goto found; + } + rewinddir(dir); + for (;;) { if (!(dp = readdir(dir))) goto notfound; @@ -187,7 +209,9 @@ if (s.st_dev == dev && s.st_ino == ino) break; } + } +found: /* * Check for length of the current name, preceding slash, * leading slash. To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message
Re: Panic Kernel Dump to umass device?
In message <[EMAIL PROTECTED]>, Scott Long writes: >You're correct that dumping is meant to be done with interrupts and task >switching disabled. The first thing that the umass driver is missing is >a working CAM poll handler. Without this, there is no way for command >completions to be seen when interrupts are disabled. Beyond that, I >somewhat suspect that the USB stack expects to be able to push command >completion work off to worker threads, at least for some situations, and >that also will not work in the kernel dump environment. So, there is a >lot of work needed to make this happen. The USB stack supports polled operations, so it's actually not to hard to make this work. Below is a patch I had in one of my local trees that adds a CAM poll handler to the umass driver. I've just tested this and it does seem to make kernel dumping work, but I guess it might not be as reliable as dumping to other devices. Ian Index: umass.c === RCS file: /dump/FreeBSD-CVS/src/sys/dev/usb/umass.c,v retrieving revision 1.128 diff -u -r1.128 umass.c --- umass.c 9 Jan 2006 01:33:53 - 1.128 +++ umass.c 11 Feb 2006 12:57:43 - @@ -2627,21 +2627,17 @@ } } -/* umass_cam_poll - * all requests are handled through umass_cam_action, requests - * are never pending. So, nothing to do here. - */ Static void umass_cam_poll(struct cam_sim *sim) { -#ifdef USB_DEBUG struct umass_softc *sc = (struct umass_softc *) sim->softc; DPRINTF(UDMASS_SCSI, ("%s: CAM poll\n", USBDEVNAME(sc->sc_dev))); -#endif - /* nop */ + usbd_set_polling(sc->sc_udev, 1); + usbd_dopoll(sc->iface); + usbd_set_polling(sc->sc_udev, 0); } ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Panic Kernel Dump to umass device?
In message <[EMAIL PROTECTED]>, Nate Nielsen writes: >Thanks, that helps. It works nicely with a uhci USB controller. > >However when the ohci driver is in use, we crash somewhere in >usb_transfer_complete. I'll look into this further. You could try updating to the latest 6-stable usb code, which might possibly help the ohci case. There were a number of quite severe ohci issues fixed since 6.0-release that might trigger more easily when using polling. In particular, these revisions may be of interest: ohci.c 1.154.2.1 ohcivar.h 1.40.2.1 usbdi.c 1.91.2.1 Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: contiguous memory allocation problem
In message <[EMAIL PROTECTED]>, Hans Petter Selasky writes: >But there is one problem, that has been overlooked, and that is High speed >isochronous transfers, which are not supported by the existing USB system. I >don't think that the EHCI specification was designed for scatter and gather, >when you consider this: > >8 transfers of 0xC00 bytes has to fit on 7 pages. If this is going to work, >and I am right, one page has to contain two transfers. (see page 43 of >ehci-r10.pdf) I haven't looked into the details, but the text in section 3.3.3 seems to suggest that EHCI is designed to not require physically contiguous allocations here either, so the same approach of using bus_dmamap_load() should work: This data structure requires the associated data buffer to be contiguous (relative to virtual memory), but allows the physical memory pages to be non-contiguous. Seven page pointers are provided to support the expression of 8 isochronous transfers. The seven pointers allow for 3 (transactions) * 1024 (maximum packet size) * 8 (transaction records) (24576 bytes) to be moved with this data structure, regardless of the alignment offset of the first page. Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: contiguous memory allocation problem
In message <[EMAIL PROTECTED]>, Hans Petter Selasky writes: >On Sunday 02 July 2006 14:05, Ian Dowse wrote: >> This data structure requires the associated data buffer to be >> contiguous (relative to virtual memory), but allows the physical >> memory pages to be non-contiguous. Seven page pointers are provided >> to support the expression of 8 isochronous transfers. The seven >> pointers allow for 3 (transactions) * 1024 (maximum packet size) >> * 8 (transaction records) (24576 bytes) to be moved with this >> data structure, regardless of the alignment offset of the first >> page. > >3 * 1024 bytes = 0xC00 bytes > >8 * 0xC00 = 0x6000 bytes maximum > >According to this you need "6" "EHCI pages", because "6 * 0x1000 = 0x6000". >The seventh "EHCI page" is just there to allow one to start at any page >offset. There is no eight "EHCI page". > >The only solution I see, is to have a double layer ITD. The first layer have >the 4 first transfers activated, and the second layer have the 4 last >transfers activated. > >A little more complicated, but not impossible. The trick is that if the 0x6000 bytes are contiguous in virtual memory then they never span more than 6 pages so one iTD is enough. i.e. you can just do malloc(0x6000) and you don't need multi-page physically contiguous buffers or extra memory-memory copies regardless of how the virtual buffer maps to physical pages. This seems to be the general extent of scatter-gather support offered by the various USB host controllers (modulo various caveats such as assuming pages are >= 4k, handling physical addresses > 4GB on non-IOMMU hardware and UHCI's lack of support for mid-packet non-contiguous page boundaries). Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: contiguous memory allocation problem
In message <[EMAIL PROTECTED]>, Ian Dowse writes: >The trick is that if the 0x6000 bytes are contiguous in virtual >memory then they never span more than 6 pages so one iTD is enough. Sorry, I meant of course 6 page boundaries, which means no more than 7 pages. This is why the 7 physical address slots in the iTD is always enough for 8 x 3k transaction records if the 24k buffer is contiguous in virtual memory. Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: contiguous memory allocation problem
In message <[EMAIL PROTECTED]>, Hans Petter Selasky writes: >Ok. So the solution to my problem is to use scatter and gather. I will see >about updating my USB system to do it like that. > >But there is one thing I do not understand yet. When you load a page that >physically resides above 4GB, because a computer has more than 4GB of memory, >how does "bus_dmamap_load()" move that page down below 4GB, so that the >32-bit USB host controllers can reach it? What should happen is that bus_dma allocates a bounce buffer and performs copies as required from within the bus_dmamap_sync() calls. This is something I haven't been able to verify yet with the USB code though, so there could easily be bugs there. BTW, as far as I know bus_dma is also missing support for multi-segment allocations, so for example if you ask it to allocate 16k in at most 4 segments below the 4GB mark, it will actually attempt a physically contiguous allocation. If this was fixed it could be used by usbd_alloc_buffer() to give directly usable buffers without contiguous allocations. Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Confusion in acpi_sleep_machdep().
In message <[EMAIL PROTECTED]>, Matthew Dillon w rites: >I'm trying to figure out how the acpi_sleep_machdep() code works and >there are a couple of lines I just don't understand: > >pm = vmspace_pmap(p->p_vmspace); >cr3 = rcr3(); >#ifdef PAE >load_cr3(vtophys(pm->pm_pdpt)); >#else >load_cr3(vtophys(pm->pm_pdir)); >#endif > >page = PHYS_TO_VM_PAGE(sc->acpi_wakephys); >pmap_enter(pm, sc->acpi_wakephys, page, > VM_PROT_READ | VM_PROT_WRITE | VM_PROT_EXECUTE, 1); > >First, why isn't it just using kernel_pmap ? What's all the load_cr3() >stuff for ? > >Second, why is it entering the physical address sc->acpi_wakephys >as the virtual address in the pmap ? Shouldn't it be using >sc->acpi_wakeaddr there? > >Anybody know ? I don't know the details, but acpi_sleep_machdep() sets up an identity mapping in the current process's vmspace (hence using virtual = physical). Lazy switching of address spaces means that cr3 may not currently refer to the same vmspace, which would break the identity mapping, so that's the reason for the load_cr3() calls. See revision 1.22 for a bit more information. Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Mounting a CDROM in freeBSD 4.2
In message <[EMAIL PROTECTED]>, "Daniel C. Sobral" writes: >> and you must make sure your kernel is compiled with >> options CD9660 > >Err... no. The kld gets autoloaded if the kernel doesn't have cd9660 >compiled-in. The error message that is printed is misleading though, and gives the impression that cd9660 filesystem support is missing: cd9660: No such file or directory When mount(8) runs mount_cd9660, it gives it an argv[0] of the fileystem type i.e. 'cd9660'. That's where the cd9660 in the error message comes from. Maybe mount_cd9660 (and other mount_* programs) should provide a bit more information in the error message? Ian Index: mount_cd9660.c === RCS file: /home/iedowse/CVS/src/sbin/mount_cd9660/mount_cd9660.c,v retrieving revision 1.15 diff -u -r1.15 mount_cd9660.c --- mount_cd9660.c 1999/10/09 11:54:08 1.15 +++ mount_cd9660.c 2001/01/17 12:34:23 @@ -176,7 +176,7 @@ errx(1, "cd9660 filesystem is not available"); if (mount(vfc.vfc_name, mntpath, mntflags, &args) < 0) - err(1, NULL); + err(1, "%s on %s: mount", mntpath, dev); exit(0); } To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Have root on vinum, one small problem..
In message <4FC7AFEB1135D4119926F87A88260D9530@TRSBS>, Chris Williams write s: > >Things seems to be working quite well, but there is one strange behavior which > worries me; whenever I shut down, right after syncing I get a panic: > >panic: Vrele: negative ref cnt I noticed this a while ago - it is due to inconsistent handling of 'rootvnode' in the kernel. You should find the details if you search for 'rootvnode' in the -hackers archive. The following patch should work around the panic by adding an extra vnode reference for rootvp: Ian Index: init_main.c === RCS file: /FreeBSD/FreeBSD-CVS/src/sys/kern/init_main.c,v retrieving revision 1.134.2.3 diff -u -r1.134.2.3 init_main.c --- init_main.c 2000/09/07 19:13:36 1.134.2.3 +++ init_main.c 2001/02/02 16:01:52 @@ -456,6 +456,7 @@ VREF(fdp->fd_fd.fd_cdir); VOP_UNLOCK(rootvnode, 0, &proc0); fdp->fd_fd.fd_rdir = rootvnode; + VREF(fdp->fd_fd.fd_rdir); } SYSINIT(retrofit, SI_SUB_ROOT_FDTAB, SI_ORDER_FIRST, xxx_vfs_root_fdtab, NULL) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: open (vfs_syscalls.c:994) && NFS
In message <[EMAIL PROTECTED]>, Oliver Cook writes: >After about a week there are hundreds of stuck >httpd processes in exactly this state. It is not >possible to attach to them, but information can >be gleaned from a kernel backtrace: Could you post the full output of "ps axl" on one of these machines? In this output, search for other odd process states, especially "vmopar", and include a gdb backtrace from these processes too. This sounds like a problem I described in http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=243599+249172+/usr/local/www/db/text/2000/freebsd-hackers/20001022.freebsd-hackers (split URL is http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=243599+249172+ /usr/local/www/db/text/2000/freebsd-hackers/20001022.freebsd-hackers in case the above doesn't work) Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: open (vfs_syscalls.c:994) && NFS
In message <[EMAIL PROTECTED]>, Oliver Cook writes: >There are three processes stuck in vmopar. I include the backtrace >of one of these below. Thanks. That particular process is hanging because nfs_loadattrcache() has noticed that the file shrunk, but it is not safe in this context (from vm_fault) to do anything about it. A workaround for this problem went into 4-stable at the end of last October, so upgrading to a more recent -stable will stop these hangs. As noted in the archived -hackers message I mentioned, there is another related problem that still exists, but it seems to occur much less frequently. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: open (vfs_syscalls.c:994) && NFS
In message <[EMAIL PROTECTED]>, Oliver Cook writes: >However, the more noticeable problem was the processes stuck in >nfsvin because of the broken directory entry. Have you any ideas >as to what would be causing that particular problem which is >plaguing our servers more than the vmopar problem? The processes stuck in "nfsvinval" are just a side-effect of the vmopar problem; they should go away too when you upgrade. I forget the details, but I think the vmopar-hung process is holding some lock so any other processes that try to access the same file hang in nfsvinval. You can probably verify that every time there are processes stuck in "nfsvinval" there is at least one process stuck in "vmopar". I haven't seen any evidence of the broken directory entries you mention - maybe you're reading too far into the struct nameidata fields in "nd". It may be normal for some fields to be uninitialised or point at junk data. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: FreeBSD 4.2 ,kernel panic.
In message <[EMAIL PROTECTED]>, Andrea writes: >MY FreeBSD 4.2 system has begun to crash some time ago.. >fault virtual address = 0x9ec03e00 This virtual address suggests that these crashes are caused by a bug that was fixed around two months ago. See http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=459199+462565+/usr/local/www/db/text/2001/freebsd-bugs/20010415.freebsd-bugs for further details; updating to a more recent -stable will solve this issue. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
UFS large directory performance
Prompted by the recent discussion about performance with large directories, I had a go at writing some code to improve the situation without requiring any filesystem changes. Large directories can usually be avoided by design, but the performance hit is very annoying when it occurs. The namei cache may help for lookups, but each create, rename or delete operation always involves a linear sweep of the directory. The idea of this code is to maintain a throw-away in-core data structure for large directories, allowing all operations to be performed quickly without the need for a linear search. The experimental (read 'may trash your system'!) proof-of-concept patch is available at: http://www.maths.tcd.ie/~iedowse/FreeBSD/dirhash.diff The implementation uses a hash array that maps filenames to the directory offset where the corresponding directory entry exists. A simple spillover mechanism is used to deal with hash collisions, and some extra summary information permits the quick location of free space within the directory itself for create operations. The in-core data structures have a memory requirement approximately equal to half of the on-disk directory size. Currently there are two sysctls that determine when directories get hashed: vfs.ufs.dirhashminsize Minimum directory on-disk size for which hashing should be used (default 2.5k). vfs.ufs.dirhashmaxmem Maximum system-wide amount of memory to use for directory hashes (default 2Mb). Even on a relatively slow machine (200Mhz P5), I'm seeing a file creation speed that remains at around 1000 creations/second for directories with more than 100,000 entries. Without this patch, I get less than 20 creations per second on the same directory (in both cases soft-updates is enabled). To test, apply the patch, and add "options UFS_DIRHASH" to the kernel config. Currently there are a number of features missing, and there is a lot of code for debugging and sanity checking that may affect performance. The main issues I'm aware of are: - There is no LRU mechanism for directory hash data structures. The hash tables get freed when the in-code inode is recycled, but no attempt is made to free existing memory when the dirhashmaxmem limit is reached. - The lookup code does not optimise the case where successive offsets from the hash table are in the same filesystem block. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: UFS large directory performance
In message <[EMAIL PROTECTED]>, Matt Dillon writes: >What are your commit plans? It looks extremely well contained, >it could be committed to -current and then -stable a few days later >without any destabilizing impact at all for when the option isn't >specified. ... >The only potential problem I see here is that you could end up >seriously fragmenting the malloc pool you are using to allocate the >slot arrays. And, of course, the two issues you brought up in >regards to regularing memory use. Thanks for the comments :-) Yes, malloc pool fragmentation is a problem. I think that it can be addressed to some extent by using a 2-level system (an array of pointers to fixed-size arrays) instead of a single large array, but I'm open to any better suggestions. If the second-level array size was fixed at around 4k, that would keep the variable-length first-level array small enough not to cause too many fragmentation issues. The per-DIRBLKSIZ free space summary array is probably relatively okay as it is now. The other main issue, that of discarding old hashes when the memory limit is reached, may be quite tricky to resolve. Any approach based on doing this from ufsdirhash_build() is likely to become a locking nightmare. My original idea was to have ufsdirhash_build() walk a list of other inodes with hashes attached and free them if possible, but that would involve locking the other inode, so things could get messy. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: UFS large directory performance
In message <[EMAIL PROTECTED]>, Matt Dillon writes: > >I would further recommend a (dynamic) array of pointers at the first >level as part of the summary structure. Any given array entry would >either point to the second level array (the 512 byte allocations), >be NULL (no second level array was necessary), or be (void *)-1 which >would indicate that the second level array was reclaimed for other >uses. Nice idea, but I'm not sure I see the benefit of partially reclaiming second-level arrays. Because it is a hash array, there isn't really the concept of a working set; a directory that is `in use' will rarely see many create/rename/delete operations on a small fixed set of filenames. The lookup case is already cached elsewhere. I think an all-or-nothing approach is likely to perform better and be simpler to implement. Even the lazy allocation of second-level arrays is unlikely to help a lot if the hash function does its job well. > >If the zone allocator is used for the second level block allocations >it shouldn't be a problem. You can (had better be able to!) put a mutex >around zone frees in -current. The locking issues I could see were more in the area of finding inodes to free hashes from. A linked list of dirhash structures could be maintained (protected by a mutex), but to free the dirhash belonging to an inode, the inode would probably need to be locked. That means dereferencing dirhash->dh_inode->i_vnode and trying to lock it, so things become complex. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: read(2) and ETIMEDOUT
In message <[EMAIL PROTECTED]>, Graham Barr writes: >Also why does this happen only every few hours ? There is a lot of >data going through these connections maybe the timer for SO_RCVTIMEO >is not being reset. > >But then we have another server, with a similar number of clients and >data through put, but it does not suffer from this problem. I suspect that the server seeing this problem has a client that occasionally disappears from the network, or for whatever reason fails to respond to any packets for a long time (something like 5 or 10 minutes). I've seen blocking TCP writes return ETIMEDOUT when the network between the client and the server goes down. In the non-blocking case I think the following can happen: 1) Client is connected to server. 2) Network goes down, or client is turned off 3) Server performs non-blocking write() on socket 4) Server uses poll/select/kevent waiting for data from socket 5) The write operation times out because no acknowledgements have been received. This occurs after TCP_MAXRXTSHIFT retransmits, so->so_error is set to ETIMEDOUT and the connection is shut down (I haven't read the code very carefully, so the details could be wrong. 6) select/poll/kevent notes the EOF condition, and says that the descriptor is ready to read. 7) read() returns the real error, which is ETIMEDOUT. I guess this should possibly be documented in read(2), but in practice there are numerous network errors that can be returned from read(). Normal practice in single-process servers is to consider any unknown errors from read(),write() etc as only fatal to that client rather than the whole server. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: UFS large directory performance
In message <[EMAIL PROTECTED]>, Terry Lambert writes: > >Use a chain allocator. I would suggest using the zone >allocator, but it has some fundamental problems that I >don't think are really resolvable without a rewrite. Heh, maybe, but I'm not sure I want to write a new allocator for this :-) Based on Matt's suggestions, I implemented the 2-level approach. It currently uses 256 slots per second-level block; these 1k blocks are allocated using zalloc(). The variable-length first-level arrays are still allocated with malloc, but these don't grow to more than a few kb in size unless the directories are enormous. There's now a simple LRU list of dirhash structures that have memory attached, and a new function ufsdirhash_recycle() that will free up memory when the sysctl limit is reached. Adding this required some locking, but the problematic inode locking is avoided by leaving the dirhash structure attached to the inode when its hash array is freed. An updated patch is available at http://www.maths.tcd.ie/~iedowse/FreeBSD/dirhash.diff3 I haven't had a chance to do more than a minimal amount of testing, so there may be many issues remaining. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Strange request: Reading RX-50 (aka DEC Rainbow 100) disks
In message <[EMAIL PROTECTED]>, Warner Losh writ es: >I do have the options of connection the hardware up to the floppy >controller in my desktop too :-). I have both the RX-50 drives, as >well as a pair of TEAC FD55 drives (that do the same data rate as the >RX-50's, with the same heads, but with only one drive per spindle and >two read heads instead of one). Trouble is, it looks like our floppy >driver doesn't grok single sided 400k disks :-(. That's what I'm >looking to hack and advise on how to hack. The fdcontrol program allows most of the paramaters to be set to match the disks, but unfortunately it cannot set the sector offset. MSDOS disks sectors are numbered starting at 1 (the sector offset is 1), but it was common practice with old 8-bit CP/M-type systems to choose sector numbers starting at 0x41, 0x81 or other values. I was attempting something similar last summer, but with disks from an Amstrad CPC computer. I used the following patch to the fd driver and fdcontrol to allow the sector offset to be specified along with the other parameters. It also allows a head offset to be specified, which is useful for reading the second side of double-sided disks that were written as single-sided disks with a hardware switch on the side-select line (i.e the head number written to disk does not match the hardware head number). The patch below is against RELENG_4 around Jan 2000, so it will need updating. I'm also not sure what sector offset the DEC Rainbow used - I think I have a Rainbow boot disk here, but I'd have to dig out a 5.25 floppy drive to check :-) Once you get the settings right, you can just dd the disk to an image file. Ian Index: sys/i386/include/ioctl_fd.h === RCS file: /dump/FreeBSD-CVS/src/sys/i386/include/Attic/ioctl_fd.h,v retrieving revision 1.13 diff -u -r1.13 ioctl_fd.h --- sys/i386/include/ioctl_fd.h 1999/12/29 04:33:02 1.13 +++ sys/i386/include/ioctl_fd.h 2001/06/10 15:36:24 @@ -86,6 +86,7 @@ struct fd_type { int sectrac;/* sectors per track */ int secsize;/* size code for sectors */ + int secoff; /* starting sector number*/ int datalen;/* data len when secsize = 0 */ int gap;/* gap len between sectors */ int tracks; /* total num of tracks */ @@ -95,6 +96,7 @@ int heads; /* number of heads */ int f_gap; /* format gap len*/ int f_inter;/* format interleave factor */ + int headoff; }; #define FD_FORM _IOW('F', 61, struct fd_formb) /* format a track */ Index: sys/isa/fd.c === RCS file: /dump/FreeBSD-CVS/src/sys/isa/fd.c,v retrieving revision 1.176 diff -u -r1.176 fd.c --- sys/isa/fd.c2000/01/08 09:33:06 1.176 +++ sys/isa/fd.c2001/06/10 15:52:19 @@ -125,24 +125,24 @@ static struct fd_type fd_types[NUMTYPES] = { -{ 21,2,0xFF,0x04,82,3444,1,FDC_500KBPS,2,0x0C,2 }, /* 1.72M in HD 3.5in */ -{ 18,2,0xFF,0x1B,82,2952,1,FDC_500KBPS,2,0x6C,1 }, /* 1.48M in HD 3.5in */ -{ 18,2,0xFF,0x1B,80,2880,1,FDC_500KBPS,2,0x6C,1 }, /* 1.44M in HD 3.5in */ -{ 15,2,0xFF,0x1B,80,2400,1,FDC_500KBPS,2,0x54,1 }, /* 1.2M in HD 5.25/3.5 */ -{ 10,2,0xFF,0x10,82,1640,1,FDC_250KBPS,2,0x2E,1 }, /* 820K in HD 3.5in */ -{ 10,2,0xFF,0x10,80,1600,1,FDC_250KBPS,2,0x2E,1 }, /* 800K in HD 3.5in */ -{ 9,2,0xFF,0x20,80,1440,1,FDC_250KBPS,2,0x50,1 }, /* 720K in HD 3.5in */ -{ 9,2,0xFF,0x2A,40, 720,1,FDC_250KBPS,2,0x50,1 }, /* 360K in DD 5.25in */ -{ 8,2,0xFF,0x2A,80,1280,1,FDC_250KBPS,2,0x50,1 }, /* 640K in DD 5.25in */ -{ 8,3,0xFF,0x35,77,1232,1,FDC_500KBPS,2,0x74,1 }, /* 1.23M in HD 5.25in */ - -{ 18,2,0xFF,0x02,82,2952,1,FDC_500KBPS,2,0x02,2 }, /* 1.48M in HD 5.25in */ -{ 18,2,0xFF,0x02,80,2880,1,FDC_500KBPS,2,0x02,2 }, /* 1.44M in HD 5.25in */ -{ 10,2,0xFF,0x10,82,1640,1,FDC_300KBPS,2,0x2E,1 }, /* 820K in HD 5.25in */ -{ 10,2,0xFF,0x10,80,1600,1,FDC_300KBPS,2,0x2E,1 }, /* 800K in HD 5.25in */ -{ 9,2,0xFF,0x20,80,1440,1,FDC_300KBPS,2,0x50,1 }, /* 720K in HD 5.25in */ -{ 9,2,0xFF,0x23,40, 720,2,FDC_300KBPS,2,0x50,1 }, /* 360K in HD 5.25in */ -{ 8,2,0xFF,0x2A,80,1280,1,FDC_300KBPS,2,0x50,1 }, /* 640K in HD 5.25in */ +{ 21,2,1,0xFF,0x04,82,3444,1,FDC_500KBPS,2,0x0C,2 }, /* 1.72M in HD 3.5in */ +{ 18,2,1,0xFF,0x1B,82,2952,1,FDC_500KBPS,2,0x6C,1 }, /* 1.48M in HD 3.5in */ +{ 18,2,1,0xFF,0x1B,80,2880,1,FDC_500KBPS,2,0x6C,1 }, /* 1.44M in HD 3.5in */ +{ 15,2,1,0xFF,0x1B,80,2400,1,FDC_500KBPS,2,0x54,1 }, /* 1.2M in HD 5.25/3.5 */ +{ 10,2,1,0xFF,0x10,82,1640,1,FDC_250KBPS,2,0x2E,1 }, /* 820K in HD 3.5in */ +{ 10,2,1,0xFF,0x10,80,1600,1,FDC_250KBPS,2,0x2E,1 }, /* 800K in HD 3.5in */ +{ 9,2,1,0xFF,0x20,80,1440,1,FDC_250KBPS,2,0x50,1 }, /*
Re: Strange request: Reading RX-50 (aka DEC Rainbow 100) disks
In message <[EMAIL PROTECTED]>, Warner Losh writ es: > >That's OK. The Rainbow disks have sectors numbered 1 through 10, for >both CP/M disks and MS-DOS disks. This makes things easier to cope >with. Great, then no driver changes are required. I've just tried it; I found a normal PC 5.25" drive, and I was able to read the DEC Rainbow boot disk I have here by doing # fdcontrol /dev/fd1 sectrac? []: 10 secsize? [2]: datalen? [0xff]: gap? [0x1b]: tracks? [80]: size? []: 800 steptrac? [1]: trans? []: 1 heads? []: 1 f_gap? [0x54]: f_inter? [1]: # hd /dev/fd1 |less Note: The `trans' values come from the 'FDC_???KBPS' #defines in fdreg.h. A value of 1 is 'FDC_300KBPS' which is different to the specs you quoted, but I think the PC standard 5.25" drive runs at 360rpm rather than 300. For a 300rpm drive you probably want a trans value of 2 (250kbps). I just left the `gap' and `f_gap' values at their defaults; I don't know the exact details of these fields, but I seem to remember that they are only used during writing and formatting, so you can ignore them for reading. >for this project. Any thumbnail about how to add a new type of drive >to fd.c? What parameters do I need for it? You could add an entry to the fd_types array in fd.c, but that requires linking the entry into a device node, so it's probably easier to just use fdcontrol. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: How to do proper locking
In message <[EMAIL PROTECTED]>, Hans Petter Selasky writes: >Yes, you are right, but the problem is, that for most callback systems in the >kernel, there is no mechanism that will pre-lock some custom mutex before >calling the callback. > >I am not speaking about adding lines to existing code, but to add one extra >parameter to the setup functions, where the mutex that should be locked >before calling the callback(s) can be specified. If it is NULL, Giant will be >used. > >The setup functions I have in mind are for example: "make_dev()", >"bus_setup_intr()", "callout_reset()" ... and in general all callback systems >that look like these. Note that FreeBSD's callout subsystem does already have such a mechanism. Just use callout_init_mtx() and the specified mutex will be acquired before the callback is invoked. See callout(9) for more details. Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Low umass performance with USB 2.0 ports
In message <[EMAIL PROTECTED]>, "Eygene A. Ryabinkin" wri tes: >> >> What is filesystem has your USB drive? > The one I was extensively testing has FAT, but I've checked the UFS2 -- >just a bit better -- 1.8 Mb/second. But you're right -- no wdrains at all. >> FreeBSD 4.x had very low performance with FAT filesystem, >> writing process spent lots of time in the wdrain state too. > Yes, it has. But here the same flash drive gives different results for >ehci and uhci devices, and the total speed of echi is lower due to wdrains: >300 Kb/sec versus 500 Kb/sec. And I sometimes write my data to the Windows >partition with FAT to my home HDD -- it has no wdrains. At least, I've not >noticed them. For flash I can. The patch in from the email below may help with the wdrain state - can you see if it makes any difference? Ian Date:Sun, 26 Jun 2005 17:42:44 BST To: Stefan Walter <[EMAIL PROTECTED]> cc: freebsd-stable@freebsd.org From:Ian Dowse <[EMAIL PROTECTED]> Subject: Re: EHCI: mtools stuck in state 'physrd' or panic OpenBSD have a workaround for problems with VIA EHCI controllers that can cause the hanging symptoms you describe. Below is a patch that implements their change in FreeBSD's driver. Could you try it to see if it helps? Thanks, Ian Index: sys/dev/usb/ehci.c === RCS file: /dump/FreeBSD-CVS/src/sys/dev/usb/ehci.c,v retrieving revision 1.14.2.9 diff -u -r1.14.2.9 ehci.c --- sys/dev/usb/ehci.c 31 Mar 2005 19:47:11 - 1.14.2.9 +++ sys/dev/usb/ehci.c 26 Jun 2005 16:21:11 - @@ -155,6 +155,7 @@ Static voidehci_idone(struct ehci_xfer *); Static voidehci_timeout(void *); Static voidehci_timeout_task(void *); +Static voidehci_intrlist_timeout(void *); Static usbd_status ehci_allocm(struct usbd_bus *, usb_dma_t *, u_int32_t); Static voidehci_freem(struct usbd_bus *, usb_dma_t *); @@ -491,6 +492,7 @@ EOWRITE4(sc, EHCI_ASYNCLISTADDR, sqh->physaddr | EHCI_LINK_QH); usb_callout_init(sc->sc_tmo_pcd); + usb_callout_init(sc->sc_tmo_intrlist); lockinit(&sc->sc_doorbell_lock, PZERO, "ehcidb", 0, 0); @@ -694,6 +696,11 @@ ehci_check_intr(sc, ex); } + /* Schedule a callout to catch any dropped transactions. */ + if ((sc->sc_flags & EHCI_SCFLG_LOSTINTRBUG) && + !LIST_EMPTY(&sc->sc_intrhead)) + usb_callout(sc->sc_tmo_intrlist, hz, ehci_intrlist_timeout, sc); + #ifdef USB_USE_SOFTINTR if (sc->sc_softwake) { sc->sc_softwake = 0; @@ -942,6 +949,7 @@ EOWRITE4(sc, EHCI_USBINTR, sc->sc_eintrs); EOWRITE4(sc, EHCI_USBCMD, 0); EOWRITE4(sc, EHCI_USBCMD, EHCI_CMD_HCRESET); + usb_uncallout(sc->sc_tmo_intrlist, ehci_intrlist_timeout, sc); usb_uncallout(sc->sc_tmo_pcd, ehci_pcd_enable, sc); #if defined(__NetBSD__) || defined(__OpenBSD__) @@ -2701,6 +2708,30 @@ splx(s); } + +/* + * Some EHCI chips from VIA seem to trigger interrupts before writing back the + * qTD status, or miss signalling occasionally under heavy load. If the host + * machine is too fast, we we can miss transaction completion - when we scan + * the active list the transaction still seems to be active. This generally + * exhibits itself as a umass stall that never recovers. + * + * We work around this behaviour by setting up this callback after any softintr + * that completes with transactions still pending, giving us another chance to + * check for completion after the writeback has taken place. + */ +void +ehci_intrlist_timeout(void *arg) +{ + ehci_softc_t *sc = arg; + int s = splusb(); + + DPRINTFN(3, ("ehci_intrlist_timeout\n")); + usb_schedsoftintr(&sc->sc_bus); + + splx(s); +} + // Static usbd_status Index: sys/dev/usb/ehci_pci.c === RCS file: /dump/FreeBSD-CVS/src/sys/dev/usb/ehci_pci.c,v retrieving revision 1.14.2.2 diff -u -r1.14.2.2 ehci_pci.c --- sys/dev/usb/ehci_pci.c 13 Jun 2005 09:00:19 - 1.14.2.2 +++ sys/dev/usb/ehci_pci.c 26 Jun 2005 16:21:11 - @@ -303,6 +303,10 @@ return ENXIO; } + /* Enable workaround for dropped interrupts as required */ + if (pci_get_vendor(self) == PCI_EHCI_VENDORID_VIA) + sc->sc_flags |= EHCI_SCFLG_LOSTINTRBUG; + /* * Find companion controllers. According to the spec they always * have lower function numbers so they should be enumerated already. Index: sys/dev/usb/ehcivar.h === RCS file: /dump/FreeBSD-CVS/src/sys/dev/u
Re: NFS/VM deadlock report and help request
In message <[EMAIL PROTECTED]>, Vadim Belman writes: >wmesg=0xc0233171 "vmopar", timo=0) at ../../kern/kern_synch.c:467 ... >#8 0xc01dd606 in vm_fault (map=0xdc3e7e80, vaddr=712876032, >fault_type=1 '\001', fault_flags=0) at ../../vm/vm_pager.h:130 If anyone is interested, here are a few further details from my mailbox. The patch David included appears to have solved this particular problem for us, but there is another similar problem lurking within the NFS/VM system. Ian The problem seems to originate with NFS's postop_attr information that is returned with a read or write RPC. Within a vm_fault context, the code cannot deal with vnode_pager_setsize() shrinking a vnode. The workaround in the patch below stops the nfsm_postop_attr() macro from ever shrinking a vnode. If the new size in the postop_attr information is smaller, then it just sets the nfsnode n_attrstamp to 0 to stop the wrong size getting used in the future. This change only affects postop_attr attributes; the nfsm_loadattr() macro works as normal. The change is implemented by adding a new argument to nfs_loadattrcache() called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never reduce the vnode/nfsnode size; instead it zeros n_attrstamp. --- Hmm. We used this patch for a while - it stopped those particular vmopar hangs, but another kind of deadlock has emerged (which happens with or without the patch). It seems that vinvalbuf() locks the vnode's v_interlock before calling vm_object_page_remove(). vm_object_page_remove will then lock a page i.e. vinvalbuf() [Lock v_interlock] -> vm_object_page_remove() [Lock page] If another process concurrently vm_fault's on the same vnode then it locks the page, and finishes with a vput(vp). vput() locks the interlock, so it results in: vm_fault() [Lock page] -> vput() [Lock v_interlock] This is a simple lock-ordering deadlock. Since vm_fault can keep the page locked for a considerable amount of time with NFS, this deadlock can happen quite easily. I'm not sure what to suggest as a solution, but keeping the v_interlock locked across a tsleep seems wrong... Any ideas? Traces below. #12 0xc02140f0 in atkbd_isa_intr (unit=0) at ../../i386/isa/atkbd_isa.c:84 #13 0xc020eceb in wait () #14 0xc01e22d3 in _unlock_things (fs=0xca6f0ef0, dealloc=0) at ../../vm/vm_fault.c:148 #15 0xc01e2b73 in vm_fault (map=0xca6d2ac0, vaddr=134766592, fault_type=1 '\001', fault_flags=0) at ../../vm/vm_fault.c:745 #16 0xc0210252 in trap_pfault (frame=0xca6f0fbc, usermode=1, eva=134769544) at ../../i386/i386/trap.c:816 #17 0xc020fda2 in trap (frame={tf_es = 39, tf_ds = 39, tf_edi = -1077946880, tf_esi = 1, tf_ebp = -1077947052, tf_isp = -898691100, tf_ebx = -1077946872, tf_edx = 4, tf_ecx = -1077947772, tf_eax = 2, tf_trapno = 12, tf_err = 4, tf_eip = 134769544, tf_cs = 31, tf_eflags = 66050, tf_esp = -1077947172, tf_ss = 39}) at ../../i386/i386/trap.c:358 #18 0x8086b88 in ?? () (kgdb) proc 1042 (kgdb) bt #0 mi_switch () at ../../kern/kern_synch.c:825 #1 0xc0150b4d in tsleep (ident=0xc0598534, priority=4, wmesg=0xc024d22a "vmopar", timo=0) at ../../kern/kern_synch.c:443 #2 0xc01eaec6 in vm_page_sleep (m=0xc0598534, msg=0xc024d22a "vmopar", busy=0xc0598563 "") at ../../vm/vm_page.c:1052 #3 0xc01e9aff in vm_object_page_remove (object=0xca6bac1c, start=0, end=0, clean_only=1) at ../../vm/vm_object.c:1335 #4 0xc0172a6a in vinvalbuf (vp=0xca6bf700, flags=1, cred=0xc171ec80, p=0xca6e5a40, slpflag=256, slptimeo=0) at ../../kern/vfs_subr.c:671 #5 0xc019541c in nfs_vinvalbuf (vp=0xca6bf700, flags=1, cred=0xc171ec80, p=0xca6e5a40, intrflg=1) at ../../nfs/nfs_bio.c:978 #6 0xc01b6859 in nfs_open (ap=0xca6f3e2c) at ../../nfs/nfs_vnops.c:490 #7 0xc01796ae in vn_open (ndp=0xca6f3f00, fmode=1, cmode=1512) at vnode_if.h:163 #8 0xc01760d9 in open (p=0xca6e5a40, uap=0xca6f3f94) at ../../kern/vfs_syscalls.c:935 #9 0xc02108bf in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 134725618, tf_esi = -1077946896, tf_ebp = -1077946944, tf_isp = -898678812, tf_ebx = -1077946956, tf_edx = -1077946588, tf_ecx = 134893176, tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 672042756, tf_cs = 31, tf_eflags = 514, tf_esp = -1077949296, tf_ss = 39}) at ../../i386/i386/trap.c:1100 #10 0xc01ff11c in Xint0x80_syscall () #11 0x8049d39 in ?? () - To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: dhcp boot was: Re: diskless workstation
In message <[EMAIL PROTECTED]>, Doug Ambrisko writes: >| to the kernel's output. I had a look at the pxe code in >| /sys/boot/i386/libi386/pxe.c where pxeboot is built from and in >| /sys/i386/i386/autoconf.c which is the kernel side and it looks like >| they don't do anything about swap. There is a /* XXX set up swap? */ >| placeholder though. :-) > >Yep looks like you're right, I just tried it on 4.2-BETA it worked in >4.1.1. Swap is now broken ... sigh this is going to be a problem. I >guess the only thing you might be able to do in the interim is to do a >vnconfig of a file and then mount that as swap. I think the vnconfig >man pages describes this. Hopefully it works over NFS. The diskless setup we use here is based on a compiled-in MFS root rather than an NFS root, so we couldn't use the bootp code to enable NFS swap. Our solution was a modification to swapon() to enable direct swapping to NFS regular files. This results in the same swaponvp() call that the bootp code would use (at the time we implemented this, swapping over NFS via vnconfig was extremely unreliable; I think things are much better now). The patch we use is below. Ian Index: vm_swap.c === RCS file: /FreeBSD/FreeBSD-CVS/src/sys/vm/vm_swap.c,v retrieving revision 1.96 diff -u -r1.96 vm_swap.c --- vm_swap.c 2000/01/25 17:49:12 1.96 +++ vm_swap.c 2000/11/05 11:04:34 @@ -202,10 +202,14 @@ NDFREE(&nd, NDF_ONLY_PNBUF); vp = nd.ni_vp; - vn_isdisk(vp, &error); - - if (!error) + if (vn_isdisk(vp, &error)) error = swaponvp(p, vp, vp->v_rdev, 0); + else if (vp->v_type == VREG && vp->v_tag == VT_NFS) { + struct vattr attr; + error = VOP_GETATTR(vp, &attr, p->p_ucred, p); + if (!error) + error = swaponvp(p, vp, NODEV, attr.va_size/DEV_BSIZE); + } if (error) vrele(vp); To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: post-install of kernal sources, maxusers max?
In message <[EMAIL PROTECTED]>, Len Conrad writes: ># vmstat -z ... >socket 607 1050 113/196K ... >kern.ipc.maxsockets: 1064 >doesn't look like it to me. I think a few slots are reserved, so you can consider 1050 as being equal to 1064. Try putting set kern.ipc.maxsockets=4000 in /boot/loader.rc and rebooting. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
rootvnode
It appears that the pointer to the root vnode, 'rootvnode' does not hold a corresponding vnode reference. Here's a fragment of code from start_init(): /* Get the vnode for '/'. Set p->p_fd->fd_cdir to reference it. */ if (VFS_ROOT(TAILQ_FIRST(&mountlist), &rootvnode)) panic("cannot find root vnode"); p->p_fd->fd_cdir = rootvnode; VREF(p->p_fd->fd_cdir); p->p_fd->fd_rdir = rootvnode; VOP_UNLOCK(rootvnode, 0, p); Since rootvnode is a global variable, three pointers to the root vnode are stored, but only two references are counted (one by VFS_ROOT, one by VREF). Normally this is not a problem, since proc0's fd_cdir and fd_rdir keep their references until the system is rebooted. However the code in vfs_syscalls.c's checkdirs() function assumes that rootvnode does hold a reference on the vnode: if (rootvnode == olddp) { vrele(rootvnode); VREF(newdp); rootvnode = newdp; } This bug reliably causes a panic on reboot if any filesystem has been mounted directly over /. For example, try: mount_mfs -T fd1440 none / Ctrl-Alt-Delete On -current the panic is 'vrele: missed vn_close'; on 4.1-STABLE it is 'vrele: negative ref cnt'. It occurs in dounmount() at the lines if ((coveredvp = mp->mnt_vnodecovered) != NULLVP) { coveredvp->v_mountedhere = (struct mount *)0; vrele(coveredvp); } when unmounting the second / filesystem. This occurs because checkdirs() has stolen a reference to /, so the reference count goes negative when we attempt to remove the last reference. This brings up another question: should the code reverse the changes made by checkdirs() when a filesystem is unmounted? It certainly seems to make sense to make rootvnode point to underlying vnode when the filesystem containing the current rootvnode is unmounted; I'm not sure how useful fixing up other fd_cdir/fd_rdir pointers would be. I can produce a simple patch which does the following: - vref(rootvnode) in start_init(). - vrele(rootvnode) if non-NULL, maybe in vfs_unmountall() - point rootvnode at underlying vnode when the filesystem containing rootvnode is unmounted. Does this sound reasonable? Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: fsck problem on large vinum volume
In message <[EMAIL PROTECTED]>, Jaye Mathisen writes: > >I have a 930GB vinum volume >However, I can't fsck it, I have to always use the alternate block. >newsfeed-inn2# fsck /dev/vinum/v-spool >** /dev/vinum/v-spool >BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE >/dev/vinum/v-spool: CANNOT FIGURE OUT FILE SYSTEM PARTITION Jaye sent me a ktrace.out for the fsck that was failing. It appears that the kernel had overshot the end of the superblock fs_csp[] array in ffs_mountfs(), since the list of pointers there extended through fs_maxcluster, fs_cpc, and fs_opostbl. This caused the mismatch between the master and alternate superblocks. The filesystem parameters were 8k/1k, and the total number of cylinder groups was 29782. fs_cssize was 29782*sizeof(struct csum) = 477184 bytes. Hence 477184/8192 = ~59 entries were being used in fs_csp, but fs_csp[] is only 31 entries long (15 on alpha). A larger block size should fix Jaye's case, but I think the correct solution is to fix the kernel so that it is not constrained by the MAXCSBUFS limit. There are a few ways to do this: - Store the fs_csp information in struct ufsmount rather than in the superblock. - Make use of the fact that the summary information is stored in one contigous region, and update the 'fs_csp' macro to find the right offset directly. I'll have a look and see which way looks neatest. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: fsck problem on large vinum volume
[moved to -fs] In message <[EMAIL PROTECTED]>, Ian Dowse writes: > >Jaye sent me a ktrace.out for the fsck that was failing. It appears >that the kernel had overshot the end of the superblock fs_csp[] array >in ffs_mountfs(), since the list of pointers there extended through >fs_maxcluster, fs_cpc, and fs_opostbl. This caused the mismatch between >the master and alternate superblocks. > >The filesystem parameters were 8k/1k, and the total number of cylinder >groups was 29782. fs_cssize was 29782*sizeof(struct csum) = 477184 >bytes. Hence 477184/8192 = ~59 entries were being used in fs_csp, >but fs_csp[] is only 31 entries long (15 on alpha). Here is a patch which should avoid the possibility of overflowing the fs_csp[] array. The idea is that since all summary blocks are stored in one contiguous malloc'd region, there is no need to have a separate pointer to the start of each block within that region. This is achieved by simplifying the 'fs_cs' macro from fs_csp[(indx) >> (fs)->fs_csshift][(indx) & ~(fs)->fs_csmask] to fs_csp[0][indx] so that only the start of the malloc'd region is needed, and can always be placed in fs_csp[0] without the risk of overflow. I have only tested this to the extent that the kernel compiles and runs, and only on -stable. Any comments or suggestions? Ian Index: ffs/ffs_vfsops.c === RCS file: /home/iedowse/CVS/src/sys/ufs/ffs/ffs_vfsops.c,v retrieving revision 1.134 diff -u -r1.134 ffs_vfsops.c --- ffs/ffs_vfsops.c2000/12/13 10:03:52 1.134 +++ ffs/ffs_vfsops.c2001/01/07 19:04:06 @@ -365,7 +365,7 @@ { register struct vnode *vp, *nvp, *devvp; struct inode *ip; - struct csum *space; + caddr_t space; struct buf *bp; struct fs *fs, *newfs; struct partinfo dpart; @@ -432,7 +432,7 @@ * Step 3: re-read summary information from disk. */ blks = howmany(fs->fs_cssize, fs->fs_fsize); - space = fs->fs_csp[0]; + space = (caddr_t)fs->fs_csp[0]; for (i = 0; i < blks; i += fs->fs_frag) { size = fs->fs_bsize; if (i + fs->fs_frag > blks) @@ -441,7 +441,8 @@ NOCRED, &bp); if (error) return (error); - bcopy(bp->b_data, fs->fs_csp[fragstoblks(fs, i)], (u_int)size); + bcopy(bp->b_data, space, (u_int)size); + space += size; brelse(bp); } /* @@ -513,7 +514,7 @@ register struct fs *fs; dev_t dev; struct partinfo dpart; - caddr_t base, space; + caddr_t space; int error, i, blks, size, ronly; int32_t *lp; struct ucred *cred; @@ -623,18 +624,18 @@ blks = howmany(size, fs->fs_fsize); if (fs->fs_contigsumsize > 0) size += fs->fs_ncg * sizeof(int32_t); - base = space = malloc((u_long)size, M_UFSMNT, M_WAITOK); + space = malloc((u_long)size, M_UFSMNT, M_WAITOK); + fs->fs_csp[0] = (struct csum *)space; for (i = 0; i < blks; i += fs->fs_frag) { size = fs->fs_bsize; if (i + fs->fs_frag > blks) size = (blks - i) * fs->fs_fsize; if ((error = bread(devvp, fsbtodb(fs, fs->fs_csaddr + i), size, cred, &bp)) != 0) { - free(base, M_UFSMNT); + free(fs->fs_csp[0], M_UFSMNT); goto out; } bcopy(bp->b_data, space, (u_int)size); - fs->fs_csp[fragstoblks(fs, i)] = (struct csum *)space; space += size; brelse(bp); bp = NULL; @@ -691,7 +692,7 @@ if (ronly == 0) { if ((fs->fs_flags & FS_DOSOFTDEP) && (error = softdep_mount(devvp, mp, fs, cred)) != 0) { - free(base, M_UFSMNT); + free(fs->fs_csp[0], M_UFSMNT); goto out; } if (fs->fs_snapinum[0] != 0) Index: ffs/fs.h === RCS file: /home/iedowse/CVS/src/sys/ufs/ffs/fs.h,v retrieving revision 1.16 diff -u -r1.16 fs.h --- ffs/fs.h2000/07/04 04:55:48 1.16 +++ ffs/fs.h2001/01/07 18:55:44 @@ -108,10 +108,10 @@ /* * The limit on the amount of summary information per file system * is defined by MAXCSBUFS. It is currently parameterized for a - * size of 128 bytes (2 million cylinder groups on machines with - * 32-bit pointers, and 1 million on 64-bit machines). One pointer - * is taken away to point to an array of cluster sizes that is - * computed as cylinder group
Re: Swapping in diskless ? (was :Re: [hackers] Re: getting rid of sysinstall)
In message <[EMAIL PROTECTED]>, David Gilbert write s: >Is it not possible (or has nobody done it) to swap with the current >diskless boot? I do remember some problem with PXE and swap, but I forget the details or if it was resolved. The diskless setup that we have locally uses an MFS root image in the kernel instead of an NFS root, which meant that we couldn't use DHCP tags to configure swap. Our solution was a small patch that allows swapon(8) to configure direct swapping to NFS regular files. This does the same thing as the DHCP swap tags, but is much more controllable - the rc scripts can do something like: swap=/swap/swapfile rm -f $swap truncate -s 30M $swap swapon $swap The patch (against RELENG_4) is below; I wonder should this just be committed? We have certainly found it quite useful. Ian Index: vm_swap.c === RCS file: /FreeBSD/FreeBSD-CVS/src/sys/vm/vm_swap.c,v retrieving revision 1.96.2.1 diff -u -r1.96.2.1 vm_swap.c --- vm_swap.c 2000/10/13 07:13:23 1.96.2.1 +++ vm_swap.c 2001/07/13 23:12:10 @@ -202,10 +202,14 @@ NDFREE(&nd, NDF_ONLY_PNBUF); vp = nd.ni_vp; - vn_isdisk(vp, &error); - - if (!error) + if (vn_isdisk(vp, &error)) error = swaponvp(p, vp, vp->v_rdev, 0); + else if (vp->v_type == VREG && vp->v_tag == VT_NFS) { + struct vattr attr; + error = VOP_GETATTR(vp, &attr, p->p_ucred, p); + if (!error) + error = swaponvp(p, vp, NODEV, attr.va_size/DEV_BSIZE); + } if (error) vrele(vp); To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Default retry behaviour for mount_nfs
Shortly after the TI-RPC changes in -current, the default retry behaviour for mount_nfs was changed. Previously, mount_nfs would keep retrying for a long time (~1 week) if the server didn't respond, but since revision 1.40 of mount_nfs.c, it gives up on non-background mounts after one attempt. I didn't back out this change in default behaviour in my later commits to this file, since it seemed like a more reasonable default; NFS filesystems listed in fstab listed without any options can no longer hang the boot process waiting for the server to respond, and background mounts will succeed whenever the server comes up. I subsequently MFC'd this about 3 weeks ago. What I just remembered the other day is that there are a class of situations where you do want certain NFS mounts to hang the boot process if the server is down. These include cases where an NFS filesystem is critical to the boot process, so the machine will get stuck if it tries to proceed without it. The changes to mount_nfs had broken support for that situation, but I committed a fix to -current today that allows you to add `-R0' to the mount options to force mount_nfs to retry forever. So the question is - should I keep the new behaviour that is probably a better default and will catch out fewer new users but may surprise some experienced users, or should I revert to the traditional default where `-R1' or `-b' are required to avoid boot-time hangs? Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Default retry behaviour for mount_nfs
In message <[EMAIL PROTECTED]>, Terry Lambert writes: >> FWIW, I vote that we rever to the traditional default and require >> -R1 or -b to avoid boot time hangs. The standard behaviour for most >> NFS implementations that I'm aware of would do this. > >I agree; people at work have bitched about this. We have a >FreeBSD NFS server that's flakey. Ok, from the small set of responses so far, it seems that the most acceptable option is to change mount_nfs to behave in the old way where it will retry forever by default even in foreground mode. Below is a proposed patch that does this. It also adds two paragraphs near the start of the manpage which describe the default behaviour and point readers at the relevant options. Comments welcome. >The other thing is that it appears to break amd behaviour. Does amd use mount_nfs(8)? I thought it did the mount syscalls directly. Ian Index: mount_nfs.8 === RCS file: /dump/FreeBSD-CVS/src/sbin/mount_nfs/mount_nfs.8,v retrieving revision 1.27 diff -u -r1.27 mount_nfs.8 --- mount_nfs.8 2001/07/19 21:11:48 1.27 +++ mount_nfs.8 2001/07/20 22:20:35 @@ -71,6 +71,28 @@ .%T "NFS: Network File System Version 3 Protocol Specification" , Appendix I. .Pp +By default, +.Nm +keeps retrying until the mount eventually succeeds. +This behaviour is intended for filesystems listed in +.Xr fstab 5 +that are critical to the boot process. +For non-critical filesystems, the +.Fl R +and +.Fl b +flags provide mechanisms to prevent the boot process from hanging +if the server is unavailable. +.Pp +If the server becomes unresponsive while an NFS filesystem is +mounted, any new or outstanding file operations on that filesystem +will hang uninterruptibly until the server comes back. +To modify this default behaviour, see the +.Fl i +and +.Fl s +flags. +.Pp The options are: .Bl -tag -width indent .It Fl 2 @@ -126,12 +148,8 @@ help, but for normal desktop clients this does not apply.) .It Fl R Set the mount retry count to the specified value. -A retry count of zero means to keep retrying forever. -By default, -.Nm -retries forever on background mounts (see the -.Fl b -option), and otherwise tries just once. +The default is a retry count of zero, which means to keep retrying +forever. There is a 60 second delay between each attempt. .It Fl T Use TCP transport instead of UDP. Index: mount_nfs.c === RCS file: /dump/FreeBSD-CVS/src/sbin/mount_nfs/mount_nfs.c,v retrieving revision 1.45 diff -u -r1.45 mount_nfs.c --- mount_nfs.c 2001/07/19 21:11:48 1.45 +++ mount_nfs.c 2001/07/20 21:37:19 @@ -486,7 +486,8 @@ name = *argv; if (retrycnt == -1) - retrycnt = (opflags & BGRND) ? 0 : 1; + /* The default is to keep retrying forever. */ + retrycnt = 0; if (!getnfsargs(spec, nfsargsp)) exit(1); To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
fdisk(8) adjusting to head/cylinder bounderies
For about a year, fdisk(8) has had code that automatically adjusts partitions to begin on a head boundary and end on a cylinder boundary. This is fine in most situations, but the way it is implemented makes it awkward to override, and more importantly it is way too easy to mess up an existing partition that is not properly aligned on a head/cylinder boundary. Currently, fdisk never asks the user for confirmation of changes to the partition start and size. It just prints out a message such as WARNING: adjusting start offset of partition to 12345 to fall on a head boundary and then it immediately goes on to print out the full slice details, so that warning is easily missed. It is possible to avoid the automatic adjustment by answering "y" to the Explicitly specify beg/end address? question that refers to setting the c/h/s parameters, but if you do that, then you can't make use of the automatic c/h/s calculation. This problem bites me almost every time I use fdisk, since we have a lot of disks that have been split into multiple partitions to get around the 7 partitions/slice limit. I have always just changed the slice to end exactly where the last partition ends, so having fdisk rounding that down by a few sectors is not desirable. These disks are generally SCSI, contain only FreeBSD partitions, and the BIOSes we work with have never had problems with partitions that are not head/cylinder aligned. Below is a patch that makes fdisk request user confirmation before making any changes to the start and end of partitions. It also untangles the automatic c/h/s calculation from the start/size adjustment, and doesn't set the partition type to 0 if the adjustment fails. I haven't put a great deal of thought into the specifics of the patch, so any comments or suggestions are welcome. I just want to avoid the behaviour where carefully calculated partition parameters supplied by the user get changed automatically with only an easily- missed warning printed. Ian Index: fdisk.c === RCS file: /dump/FreeBSD-CVS/src/sbin/i386/fdisk/fdisk.c,v retrieving revision 1.50 diff -u -r1.50 fdisk.c --- fdisk.c 2001/07/13 16:48:56 1.50 +++ fdisk.c 2001/07/21 12:02:01 @@ -548,6 +548,7 @@ Decimal("sysid (165=FreeBSD)", partp->dp_typ, tmp); Decimal("start", partp->dp_start, tmp); Decimal("size", partp->dp_size, tmp); + sanitize_partition(partp); if (ok("Explicitly specify beg/end address ?")) { @@ -572,8 +573,6 @@ partp->dp_esect = DOSSECT(tsec,tcyl); partp->dp_ehd = thd; } else { - if (!sanitize_partition(partp)) - partp->dp_typ = 0; dos(partp->dp_start, partp->dp_size, &partp->dp_scyl, &partp->dp_ssect, &partp->dp_shd); dos(partp->dp_start + partp->dp_size - 1, partp->dp_size, @@ -1398,6 +1397,17 @@ max_end = partp->dp_start + partp->dp_size; +if (partp->dp_start % dos_sectors != 0 || + (partp->dp_start + partp->dp_size) % dos_sectors != 0) { + if (partp->dp_start % dos_sectors != 0) + warnx("WARNING: partition does not begin on a head boundary"); + if ((partp->dp_start + partp->dp_size) % dos_sectors != 0) + warnx("WARNING: partition does not end on a cylinder boundary"); + warnx("WARNING: this may confuse the BIOS or other operating systems"); + if (!ok("Correct this automatically?")) + return(1); +} + /* * Adjust start upwards, if necessary, to fall on an head boundary. */ @@ -1412,9 +1422,7 @@ "ERROR: unable to adjust start of partition to fall on a head boundary"); return (0); } - warnx( -"WARNING: adjusting start offset of partition\n\ -to %u to fall on a head boundary", + warnx("WARNING: adjusting start offset of partition to %u", (u_int)(prev_head_boundary + dos_sectors)); partp->dp_start = prev_head_boundary + dos_sectors; } @@ -1434,10 +1442,7 @@ return (0); } if (adj_size != partp->dp_size) { - warnx( -"WARNING: adjusting size of partition to %u to end on a\n\ -cylinder boundary", - (u_int)adj_size); + warnx("WARNING: adjusting size of partition to %u", (u_int)adj_size); partp->dp_size = adj_size; } if (partp->dp_size == 0) { To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: fdisk(8) adjusting to head/cylinder bounderies
In message <[EMAIL PROTECTED]>, Brian Dean writes: >On Sat, Jul 21, 2001 at 02:47:29PM +0100, Ian Dowse wrote: > >> Below is a patch that makes fdisk request user confirmation before >> making any changes to the start and end of partitions. > >Please allow this behaviour to be overridden by a flag that can >specified so that scripts don't suddenly stop and wait for input. Sorry, I should have mentioned this; the patch only changes the interactive case. The code to adjust the partition offsets and sizes for config file based updates using the -f option has not changed. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: crunched binary oddity
In message <[EMAIL PROTECTED]>, Peter Pentchev writes : >On Tue, Jul 24, 2001 at 10:14:09AM -0700, Etienne de Bruin wrote: >> Greetings. I crunchgen'd newfs and linked mount_mfs to it (among many other >> progs), compiled it with success. And yet when I boot my MFS kernel and try >> to mount /tmp to mfs, boot_crunch complains that 'mfs' is not compiled into >> it? > >Could it be that it's not boot_crunch, but the kernel complaining? >What is the exact error message? When mount(8) invokes a mount_xxx program, it sets argv[0] to the name of the filesystem (ufs, mfs, nfs etc). Crunched binaries use the argv[0] name to determine which code to execute, so you need to add ln mount_mfs mfs to your crunchgen config file to get this to work. Alternatively, just invoke mount_mfs directly instead of using mount(8). Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: vnconfig + mount removes permission for a second
In message <[EMAIL PROTECTED]>, David Malone writes: > >When you do a mount it automatically HUP's mountd which then >re-exports NFS filesystems. I suspect what is happening is that >the the filesystem mountlist is being cleared for a moment and that >is upsetting the cp. Yes, the mountd-kernel interface for updating export lists is a bit stupid; you have to clear all exports and then add each allowed host/net one by one. Any NFS requests that come in after the exports have been deleted but before the entries have been re-added will get rejected. See PRs misc/3980 and kern/9619 for more details. I think NetBSD tried at one point to make mountd incrementally change the export list, but it turned out to be quite hard to get the logic right to keep the mountd and kernel lists in sync. I think they reverted that change eventually. This is certainly a bug that needs to be fixed; mountd should be able to build up a list of all exports for a filesystem and pass them into the kernel in one "replace export list" operation. Maybe nice'ing mountd to run at a higher priority, and/or specifying only IP addresses in /etc/exports would help things a bit now. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: problems with kvm_nlist()
In message <[EMAIL PROTECTED]>, Tabor Kelly writes: >Now that that is taken care of, would somebody mind explaining to me >what n_value represents? Is it an offset in kernel memory to retrieve >the actual data? It is the kernel virtual address of the symbol that you specified in n_name, which will be the same as an in-kernel pointer value (e.g. something like 0xc0123456). This address has no meaning in userland, but libkvm provides a kvm_read() function that does all the magic necessary to read from the kernel memory at this address. There are lots of examples of code using the libkvm interface in the FreeBSD source tree (fstat, ps, vmstat, pstat etc.) although many of these now use sysctl to retrieve values instead. Briefly, you just kvm_read the value of the variable whose symbol address you have found, e.g. something like the code below, but you'll want to add code to deal with any errors that the kvm_* calls might return. struct nlist nl[] = { {"nextpid"}, {NULL}, }; int nextpid; kd = kvm_openfiles(...); kvm_nlist(kd, nl); kvm_read(kd, nl[0].n_value, &nextpid, sizeof(nextpid)); printf("nextpid is %d\n", nextpid); Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Reading files within the kernel (was Re: allocating userland space...)
In message <003101c12411$294adaa0$[EMAIL PROTECTED]>, Sansonetti Laurent w rites: >Hello hackers, >I'm currently working on a kld syscall module which needs to read a config >file at startup (MOD_LOAD). >Following the advice of Eugene L. Vorokov, I tried to allocate some userland >space with mmap() to store a open_args struct, fill-it with copyout() / >subyte()... and call open with curproc on first argument. I really don't understand why people try these obscure mechanisms to read files within the kernel. There are existing kernel interfaces for accessing files that are much cleaner than these hacks. You can't use the familiour open/read/close calls, but using the vnode interface is really not that hard. Below is a simple KLD that prints /etc/motd on the console. There's not a lot involved really, since vn_open(), vn_rdwr() and vn_close() do most of the hard bits. The most strange stuff is probably the setting up of the nameidata structure, but even it isn't too complicated. To try it, just save the two files below in a directory, and run make depend make kldload ./kernio.ko (WARNING: not highly tested, so it may crash your machine!) For further reference, most of the VOP_* functions are documented in section 9 man pages. Ian Makefile -- KLDMOD= true KMOD= kernio SRCS= vnode_if.h kernio.c NOMAN= CFLAGS+= -I${.CURDIR}/.. -I/usr/src/sys .include - kernio.c -- #include #include #include #include #include #include #include #include static int kernio_example(void); static int kernio_open(int pathseg, const char *path, int flags, struct proc *p, struct vnode **vpp); static void kernio_close(struct vnode *vp, int flags, struct proc *p); static int kernio_modevent(module_t mod, int type, void *unused) { switch (type) { case MOD_LOAD: return kernio_example(); case MOD_UNLOAD: break; default: break; } return 0; } static int kernio_example(void) { struct vattr vattr; struct proc *p; struct vnode *vp; char *buf, *cp; int error, filesize, flags, pos, resid; p = curproc; flags = FREAD; buf = NULL; /* Open the file, and get its size. */ error = kernio_open(UIO_SYSSPACE, "/etc/motd", flags, p, &vp); if (error) return (error); error = VOP_GETATTR(vp, &vattr, p->p_ucred, p); if (error) goto errout; filesize = vattr.va_size; printf("file size = %d\n", filesize); /* Allocate space for the file contents. */ MALLOC(buf, char *, filesize, M_TEMP, M_WAITOK); if (buf == NULL) goto errout; /* Read in the complete file to `buf'. */ error = vn_rdwr(UIO_READ, vp, buf, filesize, 0, UIO_SYSSPACE, IO_NODELOCKED, p->p_ucred, &resid, p); if (error) goto errout; /* Silly example; print out the file line by line. */ cp = buf; for (pos = 0; pos < filesize; pos++) { if (buf[pos] != '\n') continue; buf[pos] = '\0'; printf("%s\n", cp); cp = &buf[pos] + 1; } errout: if (buf != NULL) FREE(buf, M_TEMP); kernio_close(vp, flags, p); return (error); } static int kernio_open(int pathseg, const char *path, int flags, struct proc *p, struct vnode **vpp) { struct nameidata nd; struct vnode *vp; int error; NDINIT(&nd, LOOKUP, FOLLOW, pathseg, path, p); #if __FreeBSD_version < 50 error = vn_open(&nd, flags, 0); #else error = vn_open(&nd, &flags, 0); #endif if (error) return (error); NDFREE(&nd, NDF_ONLY_PNBUF); vp = nd.ni_vp; if (vp->v_type != VREG) { VOP_UNLOCK(vp, 0, p); vn_close(vp, flags, p->p_ucred, p); return (EACCES); } *vpp = vp; return (0); } static void kernio_close(struct vnode *vp, int flags, struct proc *p) { VOP_UNLOCK(vp, 0, p); vn_close(vp, flags, p->p_ucred, p); } moduledata_t kernio_mod = { "kernio", kernio_modevent, 0 }; DECLARE_MODULE(kernio, kernio_mod, SI_SUB_DRIVERS, SI_ORDER_ANY); To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Reading files within the kernel (was Re: allocating userland space...)
In message <003401c1244d$1fa6ee80$[EMAIL PROTECTED]>, Sansonetti Laurent w rites: >A another stupid question, how can I do to stop the loading process in >MOD_LOAD event handler (in my case, if the cfg file doesn't exist, it should >be better to interrupt..) ? Someone else might a have better idea of how this works, but it seems to me that the best you can do is printf a descriptive error message and return a non-zero value from the module event handler function. The return code from the event handler will be printed on the console by the kernel, and the event handler will then immediately be called with MOD_UNLOAD. It seems that the KLD is not actually unloaded in this case, and no error is returned to the kldload process, but the user can then manually unload the KLD, correct the problem and try again. That's just from a quick read of the code so it may be wrong. Try adding printf's to the MOD_LOAD and MOD_UNLOAD cases in the event handler, and see what happens when MOD_LOAD returns non-zero. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Serious i386 interrupt mask bug in RELENG_4 (was Re: 4.4-RC NFS panic)
In message <[EMAIL PROTECTED]>, Warner Losh writes: > >I think that might be due to a bug in the shared interrupt code that >Ian Dowse sent me about earlier today. Just to add a few details - there is a bug in the update_masks() function in i386/isa/intr_machdep.c that can cause some interrupts to occur at times when they should be masked. The problem only occurs with certain configurations of shared interrupts and devices, and this code is only present in RELENG_4. The update_masks() function is called after an interrupt handler has been registered or removed. Its main function is to update the interrupt masks (tty_imask, net_imask etc) if necessary (e.g if IRQ11 is registered by a tty-type device, IRQ11 will be added to tty_imask so that future spltty()'s will mask IRQ11). A second function of update_masks() is to update the cached copy of the interrupt mask stored with each handler for a multiplexed interrupt. This is done via the call to update_mux_masks(). The bug is that update_masks() returns without calling update_mux_masks() in some cases where it should call it. Specifically, if a newly-added multiplexed interrupt handler has the same maskptr as another handler on the same IRQ line, that new handler doesn't get it's cached mask set. For example if a single IRQ has a usb device and a modem (tty), the second device to register it's handler will get its idesc->mask set to 0 instead of the value of tty_imask because update_mux_masks() may never be called to set it. Of course, if update_masks() is called later for some other device it may correct the situation. Interrupt handlers are called with intr_mask[irq] or'd into the cpl to block further interrupts; for non-multiplexed interrupts intr_mask[irq] will set from one of the *_imask masks. However with multiplexed interrupts, only the IRQ itself (and SWI_CLOCK_MASK) are blocked, and the multiplex handler intr_mux() needs to raise the cpl further when necessary. It uses idesc->mask to control this. When this bug occurs, idesc->mask == 0, so the device interrupt handler gets called with only the IRQ and SWI_CLOCK_MASK masked, instead of the full *_mask that it requested. Not good. On my laptop, this bug causes hangs within minutes of starting to use a pccard modem, but as should be apparent from the above it could strike virtually anywhere that multiplexed interrupts are used. The patch below seems to solve the problem; it just causes update_masks() to unconditionally update the masks. Ian Index: intr_machdep.c === RCS file: /home/iedowse/CVS/src/sys/i386/isa/intr_machdep.c,v retrieving revision 1.29.2.2 diff -u -r1.29.2.2 intr_machdep.c --- intr_machdep.c 2000/08/16 05:35:34 1.29.2.2 +++ intr_machdep.c 2001/08/23 20:24:17 @@ -651,15 +651,9 @@ if (find_idesc(maskptr, irq) == NULL) { /* no reference to this maskptr was found in this irq's chain */ - if ((*maskptr & mask) == 0) - return; - /* the irq was included in the classes mask, remove it */ *maskptr &= ~mask; } else { /* a reference to this maskptr was found in this irq's chain */ - if ((*maskptr & mask) != 0) - return; - /* put the irq into the classes mask */ *maskptr |= mask; } /* we need to update all values in the intr_mask[irq] array */ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: VM Corruption - stumped, anyone have any ideas?
> >The pointers in the last few entries of the vm_page_buckets array got >corrupted when an agument to a function that manipulated whatever was next >in ram was 0, and it turned out that it was 0 because > of some PTE flushing thing (you are the one that found it... remember?) I think I've also seen a few reports of programs exiting with "Profiling timer expired" messages with 4.4. These can be caused by stack overflows, since the p_timer[] array in struct pstats is one of the things that I think lives below the per-process kernel stack. I wonder if they are related? Stack overflows could result in corruption of local variables, after which anything could happen. That said, hardware problems are still a possiblilty. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: VM Corruption - stumped, anyone have any ideas?
In message <[EMAIL PROTECTED]>, Matt Dillon writes: > >Hmm. Do we have a guard page at the base of the per process kernel >stack? As I understand it, no. In RELENG_4 there are UPAGES (== 2 on i386) pages of per-process kernel state at p->p_addr. The stack grows down from the top, and struct user (sys/user.h) sits at the bottom. According to the comment in the definition of struct user, only the first three items in struct user are valid in normal running conditions: 8192 ??? 8176p_addr So if the stack does overflow, p_timer[ITIMER_PROF] is about the first noticable thing that gets clobbered, causing a SIGPROF signal delivery to the process some time later. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: bleh. Re: ufs_rename panic
In message <[EMAIL PROTECTED]>, Matt Dillon writes: >What I've done is add a SOFTLOCKLEAF capability to namei(). If set, and >the file/directory exists, namei() will generate an extra VREF() on >the vnode and set the VSOFTLOCK flag in vp->v_flag. If the vnode already >has VSOFTLOCK set, namei() will return EINVAL. I just tried a more direct approach, which is to implement a flag at the vnode layer that is roughly equivalent to UFS's IN_RENAME flag. This keeps the changes local to vfs_syscalls.c except for the addition of a new vnode flag in vnode.h. A patch is below. It doesn't include the changes to remove IN_RENAME etc, but these could be done later anyway. The basic idea is that the rename syscall locks the source node just for long enough to mark it with VRENAME. It then keeps an extra reference on the source node so that it can clear VRENAME before returning. The syscalls unlink(), rmdir() and rename() also check for VRENAME before proceeding with the operation, and act appropriately if it is found set. One case that is not being handled well is where the target of a rename has VRENAME set; the patch just causes rename to return EINVAL, but a better approach would be to unlock everything and try again. I don't know how to deal with the case of vn_lock(fvp, ...) failing at the end of rename() either. Only lightly tested, so expect lots of bugs... Ian Index: sys/vnode.h === RCS file: /dump/FreeBSD-CVS/src/sys/sys/vnode.h,v retrieving revision 1.157 diff -u -r1.157 vnode.h --- sys/vnode.h 13 Sep 2001 22:52:42 - 1.157 +++ sys/vnode.h 2 Oct 2001 19:06:41 - @@ -163,8 +163,8 @@ #defineVXLOCK 0x00100 /* vnode is locked to change underlying type */ #defineVXWANT 0x00200 /* thread is waiting for vnode */ #defineVBWAIT 0x00400 /* waiting for output to complete */ +#defineVRENAME 0x00800 /* rename operation on progress */ #defineVNOSYNC 0x01000 /* unlinked, stop syncing */ -/* open for business0x01000 */ #defineVOBJBUF 0x02000 /* Allocate buffers in VM object */ #defineVCOPYONWRITE0x04000 /* vnode is doing copy-on-write */ #defineVAGE0x08000 /* Insert vnode at head of free list */ Index: kern/vfs_syscalls.c === RCS file: /dump/FreeBSD-CVS/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.206 diff -u -r1.206 vfs_syscalls.c --- kern/vfs_syscalls.c 22 Sep 2001 03:07:41 - 1.206 +++ kern/vfs_syscalls.c 2 Oct 2001 20:29:54 - @@ -1573,6 +1573,9 @@ if (vp->v_flag & VROOT) error = EBUSY; } + /* Claim that the node is already gone if it is being renamed. */ + if (vp->v_flag & VRENAME) + error = ENOENT; if (vn_start_write(nd.ni_dvp, &mp, V_NOWAIT) != 0) { NDFREE(&nd, NDF_ONLY_PNBUF); vrele(vp); @@ -2879,20 +2882,29 @@ struct mount *mp; struct vnode *tvp, *fvp, *tdvp; struct nameidata fromnd, tond; - int error; + int err1, error; bwillwrite(); - NDINIT(&fromnd, DELETE, WANTPARENT | SAVESTART, UIO_USERSPACE, - SCARG(uap, from), td); + NDINIT(&fromnd, DELETE, WANTPARENT | LOCKLEAF | SAVESTART, + UIO_USERSPACE, SCARG(uap, from), td); if ((error = namei(&fromnd)) != 0) return (error); fvp = fromnd.ni_vp; - if ((error = vn_start_write(fvp, &mp, V_WAIT | PCATCH)) != 0) { + if (fvp->v_flag & VRENAME) + /* The node is being renamed; claim it has already gone. */ + error = ENOENT; + if (!error) + error = vn_start_write(fvp, &mp, V_WAIT | PCATCH); + if (error) { NDFREE(&fromnd, NDF_ONLY_PNBUF); vrele(fromnd.ni_dvp); - vrele(fvp); + vput(fvp); + fvp = NULL; goto out1; } + fvp->v_flag |= VRENAME; + vref(fvp); + VOP_UNLOCK(fvp, 0, td); NDINIT(&tond, RENAME, LOCKPARENT | LOCKLEAF | NOCACHE | SAVESTART | NOOBJ, UIO_USERSPACE, SCARG(uap, to), td); if (fromnd.ni_vp->v_type == VDIR) @@ -2929,6 +2941,10 @@ !bcmp(fromnd.ni_cnd.cn_nameptr, tond.ni_cnd.cn_nameptr, fromnd.ni_cnd.cn_namelen)) error = -1; + if (tvp != NULL && (tvp->v_flag & VRENAME)) { + /* XXX, should just unlock everything and retry. */ + error = EINVAL; + } out: if (!error) { VOP_LEASE(tdvp, td, td->td_proc->p_ucred, LEASE_WRITE); @@ -2961,6 +2977,18 @@ ASSERT_VOP_UNLOCKED(tond.ni_dvp, "rename"); ASSERT_VOP_UNLOCKED(tond.ni_vp, "rename"); out1: + if (fvp != NULL) { + /* We set the VRENAME flag a
Re: patch #3 (was Re: bleh. Re: ufs_rename panic)
In message <[EMAIL PROTECTED]>, Matt Dillon writes: > >:This seems rather large compared to Ian Dowse's version.. Are you sure that >:you're doing this the right way? Adding a whole new locking mechanism >:when the simple VRENAME flag to be enough seems like a bit of overkill.. Matt addresses the problem more completely than my patch does, so the differences in patch size and files touched are to be expected. In particular, the NFS server and unionfs code need to be changed in the same way as the syscalls, and the IN_RENAME flag can be removed from the ufs code, both of which are included in Matt's patch. >Ian's doesn't fix any of the filesystem semantics bugs, it only prevents >the panic from occuring. This is certainly correct, though the IN_RENAME flag in the UFS code currently has a few such semantics bugs where EINVAL can be returned in cases that would succeed if rename() was atomic. When a vnode cannot be renamed/unlinked/rmdir'd because it is being renamed, the operation should be retried until it succeeds, sleeping as necessary. As I understand it, this is mostly dealt with by Matt's patch, but not at all by mine. >If you remove the filesystem semantics fixes from my patch you >essentially get Ian's patch except that I integrated the vnode flag >in namei/lookup whereas Ian handles it manually in the syscall code. The addition of the SOFTLOCKLEAF code is quite a major change, so it would be very useful if you could describe exactly what it does, what its semantics are, and how it fits into the rename problem. My understanding of the problem is that VOP_RENAME is quite unique in that it is the only VOP that must modify entries in two separate directories. To avoid deadlock, it is not possible (very hard anyway) to lock all 4 vnodes (source node, source parent, target node, target parent) before calling VOP_RENAME. Instead, the approach taken is to lock only the target node and its parent, and have the VOP_RENAME implementation jump back and forth between locking the source and locking the target as necessary. Hence VOP_RENAME is the only VOP that must modify a node that is passed in unlocked. Because the source node and parent are not locked, there is the possibility that the source node could be renamed or removed at any time before VOP_RENAME finally gets around to locking it and removing it. Something needs to protect the source node against being renamed/removed between the point that the source node is initially looked up and the point that it is finally locked. Both Matt's SOFTLOCKLEAF and the VRENAME flag are there to provide this protection. It is the fact that this problem is entirely unique to VOP_RENAME that leads me to think that adding the generic SOFTLOCKLEAF code is overkill. The following fragment also suggests that maybe the approach doesn't actually fit in that well: fromnd.ni_cnd.cn_flags &= ~SOFTLOCKLEAF;/* XXX hack */ error = VOP_RENAME(fromnd.ni_dvp, fromnd.ni_vp, &fromnd.ni_cnd, tond.ni_dvp, tond.ni_vp, &tond.ni_cnd); fromnd.ni_cnd.cn_flags |= SOFTLOCKLEAF; NDFREE(&fromnd, NDF_ONLY_PNBUF & NDF_ONLY_SOFTLOCKLEAF); The way that vclearsoftlock() is used to clear a flag in an unlocked vnode is also not ideal. This should probably be protected at least by v_interlock as other flags are. The syscalls that need to be changed (rename, unlink, rmdir) could possibly use vn_* style wrapper functions to reduce the amount of code that must understand the new locking mechanism, although I'm not sure if this is practical for the NFS case. It might also be a good time to remove the WILLRELE from VOP_RENAME, which would simplify some of the surrounding code. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: problems with recurring SIGTRAP under gdb
In message <[EMAIL PROTECTED]>, k Macy writes: >Any idea why when I insert a breakpoint I get a >SIGTRAP >and can't continue any further? Is this a bug in the I've seen this on applications that use SIGIO on stdin. If this is the case, a workaround is to disable the SIGIO signal while using the debugger, e.g: (gdb) set $oldsigio = signal(23, (void *)1) The signal handler can be put back later with: call signal(23, (void *)$oldsigio) Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: FreeBSD on vmware
In message <[EMAIL PROTECTED]>, Robert Watson writ es: >I've had -STABLE run fine, but of late have had a lot of trouble with >-current. Userland processes during the boot sequence seem to spend a lot >of time just spinning -- it's not clear to me what the cause is, and I >haven't had time to debug. Someone mentioned on a list somewhere that vmware takes forever to emulate the cmpxchg instruction, and that using the I386_CPU version of atomic_cmpset_int() helps a lot. I noticed a major vmware slowdown with -current sometime in September, so I tried avoiding the cmpxchg's and things got much faster. Below is the patch I use (using this outside vmware on SMP hardware is a bad idea :-). Ian Index: atomic.h === RCS file: /dump/FreeBSD-CVS/src/sys/i386/include/atomic.h,v retrieving revision 1.21 diff -u -r1.21 atomic.h --- atomic.h2001/10/08 20:58:24 1.21 +++ atomic.h2001/10/09 18:35:25 @@ -111,7 +111,7 @@ * Returns 0 on failure, non-zero on success */ -#if defined(I386_CPU) +#if defined(I386_CPU) || 1 static __inline int atomic_cmpset_int(volatile u_int *dst, u_int exp, u_int src) { To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: hot swap with ugen
In message <[EMAIL PROTECTED]>, Srinivas Dharmasanam writ es: >Hi, >I'm using the generic usb device drive ugen for controlling a USB device. >The problem is I'm having to reboot the computer each time I >disconnect/connect the device in order for FreeBSD to see the USB device. Are you running usbd (usbd_enable="YES" in /etc/rc.conf)? Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: fujitsu MO drive: DA_Q_NO_SYNC_CACHE
In message <[EMAIL PROTECTED]>, "W.Scholten" writes: >I submitted a bugreport & patch for 3.3 /4.1 a year ago, but on >installing 4.4 a while back, I found it had not been incorporated. It's in -current and -stable now. Sorry for the delay. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: FreeBSD on vmware
In message <[EMAIL PROTECTED]>, Makoto Matsushita writes: >I really know I'm doing a stupid thing, but here is benchmark results >of both "plain" and "patched" 5-current (as of Nov/26/2001). Patched >FreeBSD is about 10% faster than before. ... but only if you spend most of your time running CPU benchmarks :-) Your results show a 50-100% speed increase for operations requiring a lot of kernel activity. Remember also that interrupts etc. cause a background rate of cmpxchg instructions that is quite high. On slower CPUs (I was using a 400MHz PII), the interrupts can soak up virtually all of the available processing capacity without the patch. I suspect this effect is responsible for the most dramatic speedups. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: bin/32261: dump creates a dump file much larger than sum of dumped files
In message <[EMAIL PROTECTED]>, Bernd Walter writes: >> Is there any reason we don't want to truncate the file? Does O_TRUNC >> not work well of the file is a tape device or something? > >I don't expect O_TRUNK to work on devices such tapes and disks. Well, it won't achieve anything on tapes or disk devices, but it should be completely harmless to add the O_TRUNC flag. The current behaviour is likely to be unexpected and cause confusion so it might as well be changed. I'll commit this later unless someone can think of a good reason not to. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: bin/32261: dump creates a dump file much larger than sum of dumped files
In message <[EMAIL PROTECTED]>, Matthew Dillon wri tes: >Woa! That sounds like a bad idea to me. If you want to do it right >then open(), fstat(), and only if the stat says it is a regular file >do you then ftruncate(). Passing O_TRUNC to a tape device may be ignored >by us, but it's not a valid flag to pass to a tape device and we shouldn't >do it. Yeah, I guess checking the file type first makes more sense. I tend to use shell `>' redirects a lot when accessing tape devices. They unconditionally add O_TRUNC, so I know I'd be very surprised if there were side-effects! However for dump I agree that it's best not to make such assumptions. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: switching to real mode
In message <[EMAIL PROTECTED]>, John Baldwin writes: >The short form is htat you need to hack the cpu_halt to call a function that >puts a stub down in low memory, and calls it. This code needs to be mapped 1:1 >so that the logical address == physical address. The first thing you will Yeah, I attempted something like this a few years ago without much success. I've just updated the code to compile on -stable, and it seems to half-work in that it appears to successfully switch to real mode and clear the screen using the video BIOS, but then it just hangs. That's pretty close to what I remember it doing originally, although I think it might have worked before the VM86 stuff was enabled by default in FreeBSD. Getting this sort of code to work reliably is almost impossible... Source is at http://www.maths.tcd.ie/~iedowse/FreeBSD/diskboot/ (loading the resulting KLD immediately shuts down to real mode). Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Hyperthreading slowdown
In message <[EMAIL PROTECTED]>, Kris Kennaway writes: >Yes, that's because (as discussed in the archives) the kernel treats >it like an extra, completely decoupled physical CPU and schedules >processes on it without further consideration. This is presumably the >cause of the slowdown, because it's only efficient to use the virtual >CPU under certain workload patterns. HTT is not magic performance >beans. Try also setting the sysctl variable "machdep.cpu_idle_hlt" to 1, as it doesn't help to have the idle logical CPUs spinning. Ian ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Need review of NFS patch set for server .. missing/wrong vput() issues
In message <[EMAIL PROTECTED]>, Matthew Dillon wri tes: >Patch section 1 > > Here we were previously vput()ing nd.ni_vp only if error == 0. > If error is returned non-zero from namei() this would normally be > correct. However, we force error on a number of occassions after > namei() succeeds, in which case nd.ni_vp may be non-NULL and we > must release it. This fixes it so nd.ni_vp is vput()'d if it is > non-NULL whether an error is specified at this point or not. I don't think this is necessary, because the cleanup code at the end of nfsrv_mknod() catches any cases where nd.ni_vp was not released earlier. It would be harmless to add it though. > (I believe this may have been Alexey's 'NFS hangs in inode state' > problem, which occurs if you are running innd over an NFS filesystem) Was that a client-side or server-side issue? >Patch section's 2 & 3 > > Here namei() is called only with LOCKPARENT, which means that the > leaf is not locked. So when releasing the vnodes we should not > have the if (vp == dvp) test, we should just vput() the dvp and > vrele the vp. Hmm, it seems that lookup() doesn't actually leave the parent locked in this case (it probably should), so I think the existing code is correct in that distorted sense of `correct'. The exit code in lookup() is: if ((cnp->cn_flags & LOCKLEAF) == 0) VOP_UNLOCK(dp, 0, td); return (0); I tried reproducing the vp == dvp case in nfsrv_link by attempting to create a link called `/somedir/.' to an existing regular file (I did this at the protocol level; I'm not sure if you can do this easily from a normal client). Instrumentation confirmed that the code in question does get executed with vp == dvp, but I saw no problems or panics either with or without your patch (!). It seems we don't have any VFS locking assertions compiled in even with INVARIANTS... When I added some assertions, your patch triggered my "vput: vnode not locked" error as soon as the weird link operation was repeated, but the existing code works fine. We really need some basic locking assertions such as checking that a vnode is locked when you vput it, and checking that it isn't locked when the last reference is vrele'd. This is complicated by the fact that we have at least 3 different types of vnode locking: vop_stdlock (ufs etc), vop_sharedlock (nfs), and vop_nolock (devfs, procfs etc). Maybe a VOP_LOCKASSERT would help, because VOP_ISLOCKED isn't useful for vop_nolock filesystems. Note that there are the `options DEBUG_VFS_LOCKS' assertions, but these are used in ways that can result in false positives. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Need review of NFS patch set for server .. missing/wrong vput() issues
In message <[EMAIL PROTECTED]>, Ian Dowse writes: >I don't think this is necessary, because the cleanup code at the >end of nfsrv_mknod() catches any cases where nd.ni_vp was not >released earlier. It would be harmless to add it though. Oops, I missed a 'return (0);' when reading the code. You're quite correct here; the first part of the patch looks correct, and could certainly cause vput's to be forgotten. I'll try to reproduce this now. It's just the vp == dvp stuff that is ok as it is. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Need review of NFS patch set for server .. missing/wrong vput() issues
In message <[EMAIL PROTECTED]>, Matthew Dillon wri tes: >Ok, cool. I'll get the commit gears started for the >first part of the patch. FYI, I was able to reproduce this and confirm that the first part of your patch fixes it. All that it takes is for the mknod to fail because the name already exists, but normally this is masked by the client because it does an NFSPROC_ACCESS RPC first. Another nasty bug in nfsrv_mknod that I just spotted is that it doesn't override the S_IFMT bits of the file mode supplied by the client. It should be completely ignoring those bits, and using only the node-type it has in the `vtyp' variable. I just managed to create a node that makes ls say "Bad file descriptor" by passing in a type of NFFIFO and a mode of 0... Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Need review of NFS patch set for server .. missing/wrong vput() issues
In message <[EMAIL PROTECTED]>, Matthew Dillon wri tes: >NFS fix). I think Ian's mknod tests are a no-brainer. They should >just go in, as should my mknod fix. I agree here - Matt's mknod fix and the S_IFMT mode bits corruption bug that I fixed are simple fixes and they are both effectively remotely exploitable (but only if you are running an NFS server, and generally only by hosts listed in /etc/exports). The first bug causes all processes to get stuck in state `inode', and the second causes filesystem corruption that requires a manual fsck to fix. Matt's mknod bug occurred during normal operation, but the other probably only happens with a hostile client. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/nfsserver/nfs_serv.c mknod bug: revision 1.114 S_IFMT bug: revision 1.113 >#1 Fix corruption that can occur if a RW mount is downgraded to RO >#2 Fix spl confusion that can occcur in ACQUIRE_LOCK*() softupdates > routines >#3 Fix softupdates panic that can occur during heavy I/O > (see 'drain_output' calls in patch below) > >I have included Kirk's patch (for stable) below for review. It's a bit >messy so I will note that the most important fix is #3 above, and it is >a very simple and tiny portion of the below patch. I'm not so sure about these. #3 looks simple on its own I suppose. #1 has been around for years, and although annoying, the corruption is simply that some blocks don't get freed until the next real fsck. This fix was only committed to -current yesterday, and it has already caused one problem there, so it's not looking too good from a gain vs. risk POV :-) I'm not sure about #2 either; the patch isn't too complex, but it's a bit strange. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/ufs/ffs/ffs_softdep.c #2: 1.104 #3: 1.103 BTW is the VDRAINED stuff in your patch just left over from something else? It doesn't seem to be present in -current. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Porting a userland NFS server
In message <[EMAIL PROTECTED]>, Daniel O'Connor writ es: >I end up with EFBIG when trying to read the .katie-server-info file, but >if I create a file inside the view (eg echo "abc" >foo) then it can be >read with no problem, _but_ the dump of NFS traffic doesn't show a read >for that file. At a guess, the server is incorrectly reporting the maximum file size. You might be able to verify this by creating a file of the same size as .katie-server-info and checking if you get the same error. The bug in the server is likely to be in its "fsinfo" op function - see the FSINFO3resok definition in RFC1813 for how the fsinfo reply is supposed to be formed. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: mmap and efence
In message, Kip Macy writes: >Looking at the source for efence this happens when mmap fails (in this case wi >th >ENOMEM). Looking at the man page the two possibilities are: the system has >reached the per-process mmap limit specified in the vm.max_proc_mmap sysctl or > >insufficient memory was available. *BSD limits the maximum amount of memory th >at >a process can mmap to swap+physical. I've also found it useful to increase the value of MEMORY_CREATION_SIZE in the ElectricFence source. Setting this to larger than the amount of address space ever used by the program seems to avoid the vm.max_proc_mmap limit; maybe when ElectricFence calls mprotect() to divide up its allocated address space, each part of the split region is counted as a separate mmap. I came across this before while debugging perl-Tk, and one other issue was that the program ran fantastically slowly; a trivial script that normally starts in a fraction of a second was taking close to an hour to get there on quite fast hardware. You expect ElectricFence to make things slow, but not quite that slow :-) Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: kernel backtrace of sleeping processes
In message <[EMAIL PROTECTED]>, Robe rt Watson writes: >Sigh. Remote gdb, not ddb. I tried the usual tricks (updating $sp in >gdb, etc) but gdb persisted in using the old frame. Nevermind. It seemed In gdb, the "proc" command switches processes, so this should work: proc bt Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: CPU context switching/load numbers
In message <[EMAIL PROTECTED]>, Jason Borkowsky writes: >1. How is it my load average is over 1, but my single CPU is 85% idle? This is quite possible due to process synchronisation, since there is no direct relationship between the load average and the percentage of time that the CPU is idle. The load average is a measure of the average number of processes that are in the "runnable" state, but obviously on a single-CPU machine, only one of them can actually be running at a time. As an example, consider the case where 2 processes are each "runnable" 50% of the time, but the times are synchronised. Half of the time there are 2 runnable processes, and the other half of the time there are no runnable processes. The load average will be 1.0 since the average number of runnable processes is 1, but there are no processes running half of the time, so the CPU is 50% idle. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: /usr/src/sys/kern/kern_sig.c
In message <[EMAIL PROTECTED]>, Marc Olzheim writes:, Marco van de Voort writes: >While working on tha FreePascal FreeBSD port, we found a bug in the >kernel source, that has been fixed in -CURRENT... >Any reason why pathes 1.137 and 1.148 of kern_sig.c have not yet been >committed to RELENG_4 ? Are these really the revisions you mean? 1.137 is completely harmless, and 1.48 is limited to the case where you define the undocumented option "COMPAT_SUNOS". Ian REV:1.148 kern_sig.c 2002/02/15 03:54:01 bde Fixed a typo in rev.1.65 that gave a reference to a nonexistent variable. This was not detected by LINT because LINT is missing COMPAT_SUNOS. REV:1.137 kern_sig.c 2001/10/07 16:11:37 iedowse Fix a typo in do_sigaction() where sa_sigaction and sa_handler were confused. Since sa_sigaction and sa_handler alias each other in a union, the bug was completely harmless. This had been fixed as part of the SIGCHLD changes in revision 1.125, but it was reverted when they were backed out in revision 1.126. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: can't mount cdrom 4.6-RELEASE
In message <[EMAIL PROTECTED]>, jogegabsd wr ites: >I just upgrade to 4.6-RELEASE. ... ># mount_cd9660 /dev/acd0c /cdrom >/dev/acd0c: Device not configured What way did you upgrade? The device minor number for acdXc changed between 4.5 and 4.6, so you need to ensure that you have an up-to-date /dev/MAKEDEV as well as re-running "sh MAKEDEV acd0". If you did a buildworld, you probably forgot the mergemaster step or did it in the wrong order. The output of "ls -l /dev/acd0c" should look something like: crw-r- 4 root operator 117, 0 Apr 27 20:24 /dev/acd0c Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: How does swap work address spacewise?
In message <[EMAIL PROTECTED]>, Bernd Walter writes: >I never saw any negative block numbers in on-disc structures. >Now I wonder if it was just hidden behind macros. >What is the reason to handle it that way? >Do you have some code reference for homework? These logical block numbers are not stored on disk; they are just used by the filesystem code to refer to block numbers within a file relative to the start of the file. The on-disk format uses direct and indirect block pointers to refer to the actual filesystem blocks, and it is easy to get from a lbn to the sequence of indirection blocks necessary to find the on-disk data. See ufs_getlbns() in sys/ufs/ufs/ufs_bmap.c for details. >> These are logical block numbers, which are fragment-sized (1K typically) (lbns are actually in block-sized, not fragment-sized units, since a single file block is always contiguous on the disk even if it does not begin on a disk block boundary or is not a full block in size. Physical UFS block numbers (ufs_daddr_t in the code) are in fragment-sized units.) >> Physical block numbers are 512-byte sized, with a range of 2^32 >> in -stable. This also winds up being 2TB. So increasing the fragment >> size does not help in -stable. >It's a proven fact that there is a 1T limit somewhere which was >explained with physical block numbers beeing signed. Yes, the daddr_t type is signed, so the real limit for filesystems is 1TB I think. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: kevent and pipes interaction on 4.6-STABLE
In message <20020826225851.GA93947@gallium>, Dominic Marks writes: >+static int kq = -1; >+int kq, rv, idx; >kevent(0x3,0xbfbfedbc,0x1,0x0,0x0,0x0) = 0 (0x0) >kevent(0x809abc0,0x0,0x0,0xbfbfede0,0x8,0x0) ERR#9 'Bad file descriptor' Look at the above 4 lines, and it is pretty clear what is going on. You don't want to hide the global `kq' behind an uninitialised local variable of the same name. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: USB->ATA devices
In message <[EMAIL PROTECTED]>, Soeren Schmidt writes: >It should be possible to hide the USB stuff under the ATA_* macroes >or even just under bus_space_*. >I need a bit more concrete details on how to call into the USB >code, then it should be pretty easy to add... This would be hard to do right, as the preferred way to talk to USB devices is with a request-callback model. The ATA command would need to be put into a request structure and handed to the USB device driver, and the USB driver would then call back when the request completes. There are hacks that can be used to perform the USB operations synchronously, but they generally do not handle unexpected removal of the device well at all. There are many possible ATA/ATAPI over USB protocols, so turning the ATA request into one or more USB transfers is a bridge-specific operation. Basically these odd protocols exist because the manufacturers of the various bridges have decided to cut corners and not implement the standard USB mass storage interface. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: vmware reads disk on non-sector boundary
In message, Garance A Drosihn writes: >I also have a partition with freebsd-current from two or three days >ago, and all the latest versions of the ports. Every time I try to >start vmware2 on the newer system, the hardware dies. Sometimes it >automatically reboots, other times it freezes up and I have to >force-reboot it (sometimes by unplugging it from the wall). See the patch I posted in: http://www.FreeBSD.org/cgi/getmsg.cgi?fetch=0+6285+/usr/local/www/db/text/2002/freebsd-emulation/20020908.freebsd-emulation There may still be further issues, but it allowed me to use vmware2 on a current from a week or two ago. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: vmware reads disk on non-sector boundary
In message <[EMAIL PROTECTED]>, Mark Santcroos writes: >On Thu, Oct 03, 2002 at 09:04:04AM +0100, Ian Dowse wrote: >> There may still be further issues, but it allowed me to use vmware2 >> on a current from a week or two ago. > >That's only for virtual disks, and that is not where the problem is (was). >For most people this is not a solution. True, it won't fix the problems you reported with raw disks, but it stops vmware from instantly panicking on recent -currents and that is the first problem you will encounter with the port. I tend to run vmware either diskless or with virtual disks, so I wouldn't notice the raw disk issues. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
gdb support for kernel modules
This is something I have been meaning to investigate for a while: when gdb encounters a userland executable that uses shared libraries it automatically adds the symbols from each library, so it seemed likely that gdb could be made do the same thing with kernel modules. I am aware of the existence of `gdbmods' etc, but it would be nicer to have the support built in to gdb. Anyway, below is a proof-of-concept patch that does the basics, but among other things, its logic for locating the kernel module files needs a lot of work - currently it just assumes /boot/kernel/, which is almost never what you actually want. It works for debugging vmcores and live /dev/mem access, but I don't know if it can work for remote debugging. Does anybody know gdb internals enough to comment on how this is done or suggest improvements? Ian # gdb -k kernel.debug /dev/mem ... This GDB was configured as "i386-undermydesk-freebsd"... panic messages: --- --- warning: skipping first file (kernel) Reading symbols from /boot/kernel/ufs.ko...done. Loaded symbols for /boot/kernel/ufs.ko Reading symbols from /boot/kernel/md.ko...done. Loaded symbols for /boot/kernel/md.ko Reading symbols from /boot/kernel/vinum.ko...done. Loaded symbols for /boot/kernel/vinum.ko #0 mi_switch () at ../../../kern/kern_synch.c:849 849 td->td_kse->ke_oncpu = PCPU_GET(cpuid); (kgdb) info sharedlibrary warning: skipping first file (kernel) >FromTo Syms Read Shared Object Library 0xc1098dd0 0xc10bfc10 Yes /boot/kernel/ufs.ko 0xc11e24d0 0xc11e4270 Yes /boot/kernel/md.ko 0xc10cf940 0xc10ddc30 Yes /boot/kernel/vinum.ko (kgdb) proc 316 (kgdb) bt #0 mi_switch () at ../../../kern/kern_synch.c:849 #1 0xc01b7d14 in msleep (ident=0xc1292e00, mtx=0x0, priority=76, wmesg=0x0, timo=0) at ../../../kern/kern_synch.c:559 #2 0xc11e3052 in md_kthread (arg=0xc1292e00) at /usr/src/sys/dev/md/md.c:578 #3 0xc019d2e5 in fork_exit (callout=0xc11e2fd0 , arg=0x0, frame=0x0) at ../../../kern/kern_fork.c:853 (kgdb) Index: Makefile === RCS file: /dump/FreeBSD-CVS/src/gnu/usr.bin/binutils/gdb/Makefile,v retrieving revision 1.61 diff -u -r1.61 Makefile --- Makefile29 Jun 2002 03:16:10 - 1.61 +++ Makefile7 Oct 2002 10:31:41 - @@ -37,7 +37,7 @@ ui-file.c ui-out.c wrapper.c cli-out.c \ cli-cmds.c cli-cmds.h cli-decode.c cli-decode.h cli-script.c\ cli-script.h cli-setshow.c cli-setshow.h cli-utils.c cli-utils.h -XSRCS+=freebsd-uthread.c kvm-fbsd.c +XSRCS+=freebsd-uthread.c kvm-fbsd.c solib-fbsd-kld.c SRCS= init.c ${XSRCS} nm.h tm.h xm.h gdbversion.c xregex.h .if exists(${.CURDIR}/Makefile.${TARGET_ARCH}) Index: fbsd-kgdb.h === RCS file: /dump/FreeBSD-CVS/src/gnu/usr.bin/binutils/gdb/fbsd-kgdb.h,v retrieving revision 1.3 diff -u -r1.3 fbsd-kgdb.h --- fbsd-kgdb.h 18 Sep 2002 16:20:49 - 1.3 +++ fbsd-kgdb.h 6 Oct 2002 23:32:14 - @@ -7,6 +7,7 @@ extern int kernel_debugging; extern int kernel_writablecore; +extern struct target_so_ops kgdb_so_ops; #define ADDITIONAL_OPTIONS \ {"kernel", no_argument, &kernel_debugging, 1}, \ Index: kvm-fbsd.c === RCS file: /dump/FreeBSD-CVS/src/gnu/usr.bin/binutils/gdb/kvm-fbsd.c,v retrieving revision 1.42 diff -u -r1.42 kvm-fbsd.c --- kvm-fbsd.c 18 Sep 2002 16:19:05 - 1.42 +++ kvm-fbsd.c 6 Oct 2002 23:41:56 - @@ -56,6 +56,7 @@ #include "bfd.h" #include "target.h" #include "gdbcore.h" +#include "solist.h" static void kcore_files_info (struct target_ops *); @@ -72,6 +73,10 @@ static int xfer_umem (CORE_ADDR, char *, int, int); +#ifdef SOLIB_ADD +static int kcore_solib_add_stub (PTR); +#endif + static char*core_file; static kvm_t *core_kd; static struct pcb cur_pcb; @@ -209,6 +214,12 @@ inferior_ptid = null_ptid; /* Avoid confusion from thread stuff. */ + /* Clear out solib state while the bfd is still open. See +comments in clear_solib in solib.c. */ +#ifdef CLEAR_SOLIB + CLEAR_SOLIB (); +#endif + if (core_kd) { kvm_close (core_kd); @@ -305,7 +316,16 @@ printf ("---\n"); } - if (!ontop) + if (ontop) +{ + /* Add symbols and section mappings for any kernel modules. */ +#ifdef SOLIB_ADD + current_target_so_ops = &kgdb_so_ops; + catch_errors (kcore_solib_add_stub, &from_tty, (char *) 0, + RETURN_MASK_ALL); +#endif +} + else { warning ("you won't be able to access this core file until you terminate\n" "your %s; do ``info files''", target_longname); @@ -651,6 +671,15 @@ if (set_context ((CORE_ADDR) val)) error ("invalid proc address"); } + +#ifdef SOLIB_ADD +static int +kcore_solib_add_stub (PTR from_ttyp) +{ + SO
Re: gdb support for kernel modules
In message <[EMAIL PROTECTED]>, Andrew Gallatin writes: >gdbmods does an ugly thing which is incredibly useful. It assumes >that the modules you want to debug are sitting in your kernel build >pool. So what it does is extract the build directory from the kernel >(using strings), and runs a find rooted there for the module in >question. But its a shell script, so it can get away with stuff like >that ;) Yes, I intend to attempt the same thing by extracting the path from version[] and using similar logic. It can probably use a list of likely locations and pick the first one where the module actually exists. GDB already has the `solib-absolute-prefix' and `solib-search-path' variables, but they are of limited use for kernel modules as the paths and module names you want for debugging are usually different to those that were actually loaded. >Perhaps we could embed the build directory somewhere the elf headers >of each kernel module (including the kernel) so that kgdb could find >the corresponding build file with symbols. Then your (very cool) >solib-fbsd-kld.c could easily find the kernel and modules which match >the kernel you're debugging.. True, even having the path as a variable inside the module should be sufficient I think. The other clever suggestion that was made to me was to maintain the standard r_debug* symbols in the kernel so that a virtually unmodified gdb could extract information about the loaded modules. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.7 RELEASE crashing when transferring large files over the network
In message <[EMAIL PROTECTED]>, Al-Afu writes: >Yes. I am using the fxp driver. Any other possiblities? Or should I take >it easy (and stick to 4.6.2-RELEASE) until such time a fix for the fxp >driver on 4.7-RELEASE is done? I've checked into -stable the fxp driver change that fixes some random crashes. It might not be the cause of the crashes you have seen, but it would be worth trying anyway. Either cvsup to -stable, or just grab revision 1.110.2.26 of sys/dev/fxp/if_fxp.c from cvsweb and use it instead of the 4.7-RELEASE version of that file. Note that the above revision will not fix the problem if you have "options DEVICE_POLLING" in your kernel config file. A fix for that case should appear in the next week or two though. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: strange netstat output inside 4.x jails...
In message <[EMAIL PROTECTED]>, Josh Brooks writes: > >I run netstat -i fxp0 while _innside_ a jail: >and then, I transfer a large file from the jail to some external host. >The file I transferred out was 4.3 megabytes. Opkts only increased by >1733 ... which means 2481 bytes per packet ... but ifconfig tells us: How long did you wait after the transfer completed before checking netstat? I think the packet counts for fxp interfaces are only updated about once a second, as the driver periodically polls them from the hardware (the byte counts are updated immediately though). Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: panic: icmp_error: bad length
In message <[EMAIL PROTECTED]> , Patrick Soltani writes: >In the last couple of months, upgraded to 4.6 and 4.7 using RELENG_4 = >with again no errors, however, now under a light smurf attack, I get: > >panic: icmp_error: bad length >Hardware: Dell PowerEdge 350, 2 built-in Intel nic cards, 256 meg of ram = >and only doing ipfw.=20 >The kernel is built with options BRIDGE. Don't know what other info you = >might be interested. > >Deeply appreciate any help or info.=20 Could you try to get a stack trace from the panic? There are instructions on how to set this up in the Kernel Debugging chapter of the Developers Handbook at: http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html Even just a list of the function names from DDB would be a good start, but if possible try to compile a debug kernel, get a full crash dump and provide the gdb stack trace. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: panic: icmp_error: bad length
In message <[EMAIL PROTECTED]>, Alexander Langer writes: >Yeah, same situation here. 4.6 used to work w/o problem, 4.7 doesn't. Great, thanks for the debugging info. The bug seems to be that icmp_error() requires that the IP header fields are in host order, but when it is called on a briged packet by the IPFW code, this is not the case. Something like the patch below (untested) should fix the IPFW1 case. A similar change is needed for IPFW2. Luigi: does this look reasonable? I'm not familiour enough with the IPFW code to know if it is OK to modify the mbuf like this. If not then it needs to be copied first like ip_forward() does, making sure that the IP header does not end up in a shared cluster. Ian Index: ip_fw.c === RCS file: /home/iedowse/CVS/src/sys/netinet/ip_fw.c,v retrieving revision 1.131.2.38 diff -u -r1.131.2.38 ip_fw.c --- ip_fw.c 21 Nov 2002 01:27:30 - 1.131.2.38 +++ ip_fw.c 12 Dec 2002 00:43:22 - @@ -1573,6 +1573,11 @@ break; } default:/* Send an ICMP unreachable using code */ + /* Must convert to host order for icmp_error(). */ + if (BRIDGED) { + NTOHS(ip->ip_len); + NTOHS(ip->ip_off); + } icmp_error(*m, ICMP_UNREACH, f->fw_reject_code, 0L, 0); *m = NULL; To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: panic: icmp_error: bad length
In message <[EMAIL PROTECTED]>, Luigi Rizzo writes: >the diagnosis looks reasonable, though i do not remember changing >anything related to this between 4.6 and 4.7 so i wonder why the >error did not appear in earlier versions of the code. Yes strange - actually, it looks like the "THERE IS NO FUNCTIONAL OR EXTERNAL API CHANGE IN THIS COMMIT" commit may be to blame :-) Some fragments below. Ian bridge.c 1.16.2.2: +#ifdef PFIL_HOOKS ... -* before calling the firewall, swap fields the same as IP does. -* here we assume the pkt is an IP one and the header is contiguous ... - ip = mtod(m0, struct ip *); - NTOHS(ip->ip_len); - NTOHS(ip->ip_off); ip_fw.c 1.131.2.34: - if (0 && BRIDGED) { /* not yet... */ - offset = (ntohs(ip->ip_off) & IP_OFFMASK); + if (BRIDGED) { /* bridged packets are as on the wire */ + ip_off = ntohs(ip->ip_off); ip_len = ntohs(ip->ip_len); } else { To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: panic: icmp_error: bad length
In message <[EMAIL PROTECTED]>, Rober t Watson writes: > >BTW, if this bug exists in 5.0 for the same reasons (or even different >ones), we should try to generate a fix ASAP and get it committed. I'll check later today if 5.0 is affected. It is probably easy to trigger by arranging for a bridged packet with ip->ip_len=0x100 to generate an ICMP reply from a firewall rule. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: a BSD identd
In message <[EMAIL PROTECTED]>, "Bria n F. Feldman" writes: >On 13 Jul 1999, Ville-Pertti Keinonen wrote: > >> >> [EMAIL PROTECTED] (Brian F. Feldman) writes: >> >> > It's "out with the bad, in with the good." Pidentd code is pretty terrible >. >> > The only security concerns with my code were wrt FAKEID, and those were >> > mostly fixed (mostly meaning that a symlink _may_ be opened, but it won't >> > be read.) If anyone wants to audit my code for security, I invite them to. >> >> Did you mean to avoid reading through symlinks using the open + fstat >> method mentioned earlier in the thread? > >No, I meant to avoid opening a file the user couldn't, or reading from a dev. Why not actually store the fake ID in a symbolic link? That way you just do a readlink(), which would be safer, neater and faster than reading a file. A user can set up a fake ID with something like: ln -s "Warm-Fuzzy" .fakeid Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: NFS problems due to getcwd/realpath
In message <[EMAIL PROTECTED] e>, Jan Conrad writes: >after wondering for two years why FreeBSD (2.2.x ... 3.2) might lock up >when an NFS server is down, I think I have found one reason for that (see >kern/12609 - I now know it doesn't belong to kern - sorry). > >It is the implementation of getcwd (src/lib/libc/gen/getcwd.c). When >examining the parent dir of a mounted filesystem, getcwd lstats every >directory entry prior to the mountpoint to find out the name of the >mountpoint (but it would only need the inodes's device to do a rough >check). This should no longer be an issue with FreeBSD 3.x, as the system normally uses the new _getcwd syscall. The old code is still in getcwd.c, but is only used if the syscall isn't present (e.g. if running a 3.x executable on a 2.2 system). We use the following patch on all our 2.2-stable machines, which works around the problem. This was submitted as PR bin/6658, but it wasn't committed, as a backport of 3.x's _getcwd (which never occurred) was considered to be a more appropriate change. Ian --- getcwd.c.orig Tue Jun 30 15:38:44 1998 +++ getcwd.cTue Jun 30 15:39:08 1998 @@ -36,6 +36,7 @@ #endif /* LIBC_SCCS and not lint */ #include +#include #include #include @@ -169,7 +170,28 @@ if (dp->d_fileno == ino) break; } - } else + } else { + struct statfs sfs; + char *dirname; + + /* +* Try to get the directory name by using statfs on +* the mount point. +*/ + if (!statfs(up[3] ? up + 3 : ".", &sfs) && + (dirname = rindex(sfs.f_mntonname, '/'))) + while((dp = readdir(dir))) { + if (ISDOT(dp)) + continue; + bcopy(dp->d_name, bup, dp->d_namlen+1); + if (!strcmp(dirname + 1, dp->d_name) && + !lstat(up, &s) && + s.st_dev == dev && + s.st_ino == ino) + goto found; + } + rewinddir(dir); + for (;;) { if (!(dp = readdir(dir))) goto notfound; @@ -187,7 +209,9 @@ if (s.st_dev == dev && s.st_ino == ino) break; } + } +found: /* * Check for length of the current name, preceding slash, * leading slash. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Sun4c as Xterminal - Problems
In message <[EMAIL PROTECTED]>, [EMAIL PROTECTED] writes: >I'm trying to use a Sun ELC (sun4c) as an Xterminal on my FreeBSD >system using Xkernel 2.0. I've used the old howto's from 1996 >(Philippe Regnauld) as well as NetBSD diskless howto's to set this up. >So, does anyone have a fix for this? Back in '96-97, Luigi Rizzo and >Mike Smith (among others) seemed to be doing this, so I'm hoping someone >still does. I think sometime around 3.0, the networking code in FreeBSD stopped responding to IP broadcasts where the 'zero' subnet broadcast address, which in your case is 209.9.69.0. We currently work around this on some 3.x machines by adding an alias address (which can be anything, even not in the same subnet) that has a broadcast address of our subnet zero address. Try something like: ifconfig fxp0 inet 10.0.0.1 netmask 0x broadcast 209.9.69.0 alias Maybe the old behaviour of responding to the subnet zero address should be available via a sysctl? Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Sun4c as Xterminal - Problems
In message <[EMAIL PROTECTED]>, Dan Busa row writes: >Earlier than that. 2.2.5? It prevents the machine from being used >as part of a smurf amplifier. If you want to change the behaviour >see > >icmp_bmcastecho="NO"# respond to broadcast ping packets This is different; the change I was referring to stops FreeBSD from recognising old-style IP broadcasts as broadcasts. If you have a network 172.16.0.0/16, then 172.16.255.255 is accepted as a broadcast address, but 172.16.0.0 is not. Diskless Sun machines attempt to use the latter, so the broadcasts get ignored. The change is older than I thought though. The code was #ifdef'd out back in Dec 1995 in v1.33 of sys/netinet/ip_input.c, and was removed completely in v1.48 (Oct 1996). Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Why was rsh removed from the fixit floppy?
In message <[EMAIL PROTECTED] de>, Jan Conrad writes: >When I cloned a new machine, I usually booted with the floppies, set up >DOS partitions and disk label and then pulled everyting over by tar and >rsh, thereby overwriting fstab etc. with prepared files. Worked pretty >fast... > >What would you suggest how to do it? Unless this has changed recently, the "Emergency Holographic Shell" option provides ifconfig and mount_nfs. That should allow you to get all the commands that you need from an NFS server, without even having to wait for the fixit floppy to load :) It's a while since I used this, but I remember doing something like: set -o emacs ifconfig fxp0 x.x.x.x netmask x.x.x.x mount_nfs x.x.x.x:/scratch /mnt /mnt/bin/ln -s /mnt/usr /usr /mnt/bin/mv /bin /bin.old /mnt/bin/mv /sbin /sbin.old /mnt/bin/ln -s /mnt/bin /bin /mnt/bin/ln -s /mnt/sbin /sbin where /scratch on the server can contains a minimal /bin, /sbin and /usr etc. The last few commands could obviously be put in a script on the server. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: empty lists in for
In message <[EMAIL PROTECTED]>, Chet Ramey writes: >> for f in $$empty_list ${SUBDIRS}; do ... >Not bad, but will break if the shell is run with the `-u' option on >for some reason. Ok, how about: for f in $$IFS ${SUBDIRS}; do ... Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: empty lists in for
In message <[EMAIL PROTECTED]>, Warner Losh writes: >: to >: >: sh_subdirs=${SUBDIRS}; for f in $$sh_subdirs ; do ... > >there's lots of other workarounds, from seeing if SUBDIRS is defined, >to using make's .foreach. Another option is: for f in $$empty_list ${SUBDIRS}; do ... where 'empty_list' is any undefined sh variable. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: PR #10971, not dead yet.
In message <[EMAIL PROTECTED]>, "David E. Cross" writes: >though. Especially confusing is the following sequence of events: > > 41096 ypserv CALL select(0x10,0x8051040,0,0,0xbfbff518) > 41096 ypserv PSIG SIGCHLD caught handler=0x804c75c mask=0x0 code=0x0 ... > 41096 ypserv RET sigreturn JUSTRETURN > 41096 ypserv CALL gettimeofday(0xbfbff510,0) > 41096 ypserv RET gettimeofday 0 > 41096 ypserv CALL read(0x1c,0x80f3fa0,0xfa0) > 41096 ypserv GIO fd 28 read 4000 bytes > >Note that the select returned with -1, with errno set to 4, and it >did not re-enter the select loop, but just started to read data. Also note A quick glance at the RPC library suggests a possible reason for this sequence. It appears there is a bug in svc_{unix,tcp}.c's handling of EINTR returns from select() - the code seems to assume that a 'continue' inside a do-while loop skips the while condition. Try the patch below (note that I don't use ypserv, I haven't checked if ypserv uses this code etc etc, so this may have nothing to do with your problem). Ian Index: svc_tcp.c === RCS file: /home/iedowse/CVS/src/lib/libc/rpc/svc_tcp.c,v retrieving revision 1.18 diff -u -r1.18 svc_tcp.c --- svc_tcp.c 2000/01/27 23:06:41 1.18 +++ svc_tcp.c 2000/06/01 00:21:26 @@ -360,6 +360,7 @@ if (tmp1.tv_sec < 0 || !timerisset(&tmp1)) goto fatal_err; delta = tmp1; + FD_CLR(sock, fds); continue; case 0: goto fatal_err; Index: svc_unix.c === RCS file: /home/iedowse/CVS/src/lib/libc/rpc/svc_unix.c,v retrieving revision 1.7 diff -u -r1.7 svc_unix.c --- svc_unix.c 2000/01/27 23:06:42 1.7 +++ svc_unix.c 2000/06/01 00:23:25 @@ -402,6 +402,7 @@ if (tmp1.tv_sec < 0 || !timerisset(&tmp1)) goto fatal_err; delta = tmp1; + FD_CLR(sock, fds); continue; case 0: goto fatal_err; To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Possible problem in find_symdef()
In message <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wri tes: >[EMAIL PROTECTED] >it hard to compile it under FreeBSD (however I can >compile it under Linux).I get "Buss error" and coredump It's a simple programming error - you're not initialising the pointer 'q' in main(), so your code is overwriting memory at whatever junk addresss ends up in q when main() is invoked. Add a q = malloc(sizeof(*q)); and it works. The compiler will spot this problem for you if you include the options '-Wall -O': > gcc -Wall -O -o q-pr q-pr.c q-pr.c: In function `main': q-pr.c:7: warning: `q' might be used uninitialized in this function Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message